Thus far, we have gone over how to classify our disasters and how to have some of those difficult conversations with our organization regarding Disaster Recovery (DR). We've also briefly touched on Business Continuity, an important piece of disaster recovery. Now the time has come to gather all our information and put together something formal in terms of a Disaster Recovery plan. As easy as it sounds, it can be quite a daunting task once you begin. DR plans, just like their disasters, come in all forms, and you can go as broad or as detailed as you like. There is no real “set in stone” template or set of instructions for DR plan creation. For example, some DR plans may just cover how to get services back up and going at the 100-foot level, maybe focusing on more of a server level. Others may contain application-specific instructions for restoring services, while others cover how to recover from yet another disaster at your secondary site. The point is that it’s your organization's DR plan, so you can do as you like. Just remember that it might not be you, or even your IT department, executing the failover, so the more details the better. That said, I mentioned that once we begin to create our DR plan, it can become quite overwhelming. That is why I always recommend starting at that 100-foot level and circling back to input details later.
So, with all that said, we can conclude that our DR plans can be structured however we wish, and that’s true. A quick Google search will yield hundreds of different templates for DR plans, each unique in their own way. However, to have a legible, solid, successful DR plan, there are five sections it needs to contain.
The introduction of a DR plan is as important as one found in a textbook. Basically, this is where you summarize both the objectives and the scope of the plan. A good introduction will include all the IT services and locations that are protected, as well as the RTOs and RPOs associated with each. Aside from the technical aspect, the introduction should also contain the testing schedule and maintenance scope for the plan, as well as a history of revisions that have been made to the plan.
Roles and Responsibilities
We have talked a lot in this series about including stakeholders and application owners outside of the IT department in our primary discussions. This is the section of the plan where you will formally list all your internal and external departments and personnel who are key to each DR process that has been covered in our DR plan. Remember, execution of this plan is normally run under the event of a disaster, so names are not enough. You need brief descriptions of their duties, contact information, and even alternate contact information to ensure that no one is left in the dark.
This is where you will include how a disaster event is being declared, who has the power to do so, and the chain of communication that shall immediately follow. Remember, we can have many different types of disasters, therefore we can also have many different types of disaster declarations and incident responses. For instance, a major fire will yield a different incident response than that of an attempted ransomware attack. We need to know who is making the declaration, how they are doing so, and whom will be contacted, so on and so forth, down the chain of command.
Once your disaster has been declared, those outlined within the Roles and Responsibilities can begin to act on steps to bring the production environment back up within your secondary location. This is where those procedures and instructions are laid out, step by step, for each service that is identified within the plans’ scope. A lot of IT departments will jump right into this step, and this where our plan creation can tend to get out of control. A rule of thumb is to really start broad with your process, define any prerequisites, and then dive into details. Once you are done with that, you can circle back for yet another round of details.
For example, “Recover Accounting Services” may be a good place to start. You then can dive into the individual servers that support the service as a whole, listing out all the servers (names, IPs, etc.) you need to have available. You can then get into finer details about how to get each server up and running to support the service as a whole. Even further, you may need to make changes to the application for it to run at your secondary location (maybe you have a different IP scheme, different networks, etc.), or have support for external hardware, such as a fax server to send out purchase orders.
This is where you place a collection of any other documents that may be of value to your organization in the event of a disaster. Vendor contacts, insurance policies, support contracts, can all go into an appendix. If there is a certain procedure to recover a server (for example, you use the same piece of software to protect all services), and you've already provided--in the DR Procedures section--an exhaustive list of instructions, you can always add it here as well, and simply reference it from within the DR plan.
With these five sections filled out, you should be certain that your organization is covered in the event of a disaster. A challenge, however, may be keeping your document up to date as your production environment changes. Today’s data centers are far from the static providers they once were. We are always spinning up new services, retiring old ones, moving things to and from the cloud. Every time that happens--to be successful in DR--we need to reassess that service within our DR plan. It needs to be a living document, right from its creation, and must always be kept up to date! And remember, it’s your DR plan, so include any other documents or sections that you or your organization wants to. At the end of the day, it’s better to have more information available than not enough, especially if you aren’t the person responsible for executing it! Also, please store a copy of this at your secondary location and/or in the cloud. I’ve heard too many stories of organizations losing their DR plan along with their production site.
I’d love to hear your thoughts about all this! How do you structure your DR plans? Are you more detailed or broader in terms of laying out the instructions to recover? Have you ever had to execute a DR plan you weren’t a part of? If so, how did that change your views on creating these types of procedures and documents? Thanks for reading!