When a lot of organizations first take on the challenge of setting up a disaster recovery plan it’s almost always based on the premise that a complete failure will occur. With that in mind, we take the approach of planning for a complete recovery. We replicate our services and VMs to some sort of secondary site and go through the processes of documenting how to bring them all up again. While this may be the basis of the technical recovery portion of a DR plan it’s important to take a step back before jumping right into the assumption of having to recover from a complete failure. Disasters come in all shapes, forms, and sizes, and a great DR plan will accommodate for as many types of disasters possible. For example, we wouldn’t use the same “runbook” to recover from simple data loss that we would use to recover from the total devastation of a hurricane. This just wouldn’t make sense. So even before beginning the recovery portions of our disaster recovery plans we really should focus on the disaster portion.
As mentioned above, the human mind always seems to jump into planning for the worst-case scenario when hearing the words “Disaster Recovery,” a building burning down, flooding, etc. What we fail to plan for is other minor, less significant disasters, such as temporary loss of power or loss of entrance due to quarantine. So, with that said let’s begin to classify disasters. For the most part, we can lump a disaster into two main categories
Natural Disasters – these are the most recognized types of disasters. Think of events such as a hurricane, flooding, fire, earthquake, lightning, water damage, etc. When planning for a natural disaster we can normally go under the assumption that we will be performing a complete recovery or avoidance scenario to a secondary location.
Man Made Disasters – These are the types of disasters that are lesser known to organizations when looking at DR. Think about things such as temporary loss of power, cyber-attacks, ransomware, protests, etc. While these intentional and unintentional acts are not as commonly approached, a good disaster recovery plan will address some of these as the recovery from them is often much different from that of a natural disaster.
Once we have classified our disaster into one of these two categories we can then move on by further drilling down on the disasters. Performing a risk and impact assessment of the disaster scenarios themselves is a great next step. Answers to questions like the ones listed below should be considered when performing our risk assessment as it allows us to further classify our disasters, and in turn, define expectations and appropriate responses accordingly.
- Do we still have access to our main premises?
- Have we lost any data?
- Has any IT function been depleted or lost?
- Do we have loss of skillset?
How these questions are answered as it pertains to a disaster can completely change our recovery scenarios. For example, If we have had a fire in the datacenter and lost data we would most likely be failing over to another building in a designated amount of time. However, if we had also lost employees, more specifically IT employees in that fire as well, then the time to recover will certainly be extended as we most likely would have lost skillsets and talent to execute the DR plan. Another great example comes in the form of ransomware. While we still would have physical access to our main premises, the data loss scenario could be much greater due to wide-spread encryption form the ransomware itself. If our backups were not air-gapped or separate from our infrastructure, then we may also have encrypted backups, meaning we have lost an IT function – thus provoking a possible failover scenario even with physical access to the building. On the flip side, our risks may not even be technical in nature. What is the impact of losing physical access to our building in the result of protests or chemical spills? Some disasters like this may not even require a recovery process at all, but still pose a threat due to the loss of access to the hardware.
Disaster Recovery is a major undertaking no matter what size the company or IT infrastructure and can take copious amounts of time and resources to get it off the ground. With that said, don’t make the mistake of only planning for those big natural disasters. While it may be a great starting point, it’s best to really list out some of the more common, more probable types of disasters as well, document the risks and recovery steps in turn. In the end, you are more likely to be battling cyber attacks, power loss, and data corruption then you are to be fighting off a hurricane. The key takeaway is – classify many different disaster types, document them, and in the end, you will have a more robust, more holistic plan you can use when the time comes. I would love to hear from you in regards to your journeys with DR. How do you begin to classify disasters or construct a DR plan? Have you experienced any "uncommon" scenarios which your DR plan has or hasn't addressed? Leave some comments below and let's keep this conservation going.