In my previous blog, I discussed the somewhat unique expectations of high availability as they exist within a healthcare IT environment. It was no surprise to hear the budget approval challenges that my peers in the industry are facing regarding technology solutions. It also came as no surprise to hear that I’m not alone in working with businesses that demand extreme levels of availability of services. I intentionally asked some loaded questions, and made some loaded statements to inspire some creative dialogue, and I’m thrilled with the results!
In this post, I’m going to talk about another area in healthcare IT that I think is going to hit home for a lot of people involved in this industry: continuity of operations. Call it what you want. Disaster recovery, backup and recovery, business continuity, it all revolves around the key concept of getting the business back up and running after something unexpected happens, and then sustaining it into the future. Hurricane Irma just ripped through Florida, and you can bet the folks supporting healthcare IT (and IT and business, in general) in those areas are implementing certain courses of action right now. Let’s hope they’ve planned and are ready to execute.
If your experiences with continuity of operations planning are anything like mine, they evolved in a sequence. In my organization (healthcare on the insurance side of the house), the first thing we thought about was disaster recovery. We made plans to rebuild from the ashes in the event of a catastrophic business impact. We mainly focused on getting back and running. We spent time looking at solutions like tape backup and offline file storage. We spent most of our time talking about factors such as recovery-point objective (to what point in time are you going to recover), and recovery-time objective (how quickly can you recover back to this pre-determined state). We wrote processes to rebuild business systems, and we drilled and practiced every couple of months to make sure we were prepared to execute the plan successfully. It worked. We learned a lot about our business systems in the process, and ultimately developed skills to bring them back online in a fairly short period of time. In the end, while this approach might work for some IT organizations, we came to realize pretty quickly that this approach isn’t going to cut it long term as the business continued to scale. So, we decided to pivot.
Next we started talking about the next evolution in our IT operational plan: business continuity. So, what’s the difference, you ask? Well, in short, everything. With business continuity planning, we’re not so much focused on how to get back to some point in time within a given window, but instead we’re focused on keeping the systems running at all costs, through any event. It’s going to cost a whole lot more to have a business continuity strategy, but it can be done. Rather than spending our time learning how to reinstall and reconfigure software applications, we spent our time analyzing single points of failure in our systems. Those included software applications, processes, and the infrastructure itself. As those single points of failure were identified, we started to design around them. We figured out how to travel a second path in the event the first path failed, to the extreme of even building a completely redundant secondary data center a few states away so that localized events would never impact both sites at once. We looked at leveraging telecommuting to put certain staff offsite, so that in the event a site became inhabitable, we had people who could keep the business running. To that end, we largely stopped having to do our drills because we were no longer restoring systems. We just kept the business survivable.
While some of what we did in that situation was somewhat specific to our environment, many of these concepts can be applied to the greater IT community. I’d love to hear what disaster recovery or business continuity conversations are taking place within your organization. Are you building systems when they fail, or are you building the business to survive (there is certainly a place for both, I think)?
What other approaches have you taken to address the topic of continuity of operations that I haven’t mentioned here? I can’t wait to see the commentary and dialogue in the forum!
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community.
More than 150,000 members are here to solve problems, share technology and best practices, and directly
contribute to our product development process.
Learn more today by joining now.