cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Business Continuity - DR is really just a piece of a much bigger cake!

Level 11

All too often, especially if disaster recovery (DR) is driven and pushed by the IT department, organizations can fall into the common mistake of assuming that they are “good to go” in the event disaster hits. While IT departments can certainly handle the technical side of things, ensuring services are up and running if production goes down, they are not necessarily the key stakeholder in ensuring that business processes and services can also be maintained. These business processes and activities can really be summed up in one key term that goes hand in hand with DR - business continuity (BC). Essentially, business continuity oversees the processes and procedures that are carried out in the event of a disaster to help ensure that business functions continue to operate as normal – the key here being business functions. Sure, following the procedures with our disaster recovery plan is a very big piece of our business continuity plan (BCP), but true BCP’s will encompass much more in terms of dealing with a disaster.

BCP: Just a bunch of little DR plans!

When organizations embark on tackling business continuity, it's sometimes easier to break it all down into a bunch of little disaster recovery plans – think DR for IT, DR for accounting, DR for human resources, DR for payroll, etc. The whole point of business continuity is to keep the business running. Sometimes, if it is IT pushing for this, we fall into the trap of just looking at the technical aspects, when really it needs to involve the whole organization! So, with that said, what should really be included in a BCP? Below, we will look at what I feel are four major components that a solid BCP should consider.

Where to go?

Our DR plan does a great job of ensuring that our data and services are up and running in the event disaster hits. However, often what we don’t consider is how employees will access that data. Our employees are used to coming in, sitting down, and logging into a secure internal network. Now that we have restored operations, does a secondary location offer the same benefit that's available to our end-users? Are there enough seats, DHCP, switches to handle all of this? Or, if we have utilized some sort of DRaaS, do they offer seats or labs in the event we need them? Furthermore, depending on the type of disaster incurred, for instance, say it is was a flood, will our employees even be able to travel to alternate locations at all?

Essential Equipment

We know we need to get our servers back up and running. That’s a no brainer! But what about everything else our organization uses to carry out its day-to-day business? It’s the items we take for granted that tend to be forgotten. Photocopiers, fax machines, desks, chairs, etc. Can ALL essential departments maintain their “business as usual” at our secondary site, either fully or in some sort of limited fashion? And aside from equipment, do we need to think of the infrastructure within our secondary site, as well? Are there phone lines installed? And can that be expanded in the event of long-term use of the facility? Even if these items are not readily available, having a plan on how to obtain them will save valuable time in the restoration process. Have a look around you at all the things on your desk and ask yourself if the same is available at your designated DR facility.

Communication

Here’s the reality: your building is gone, along with everything that was inside of it! Do you have plans on how to keep in touch with key stakeholders during this time? A good BCP will have lists upon lists of key employees with their contact information, both current and emergency. Even if it is as simple of having employees home/cell phone numbers listed, and possibly, if you host your own email servers, alternate email addresses that are checked on a regular basis. The last thing you want to have is a delay in the process of executing your BCP because you can’t get the go-ahead from someone because you are simply unable to contact them.

Updated Organizational Charts

While having an updated org chart is great to include within a BCP, it is equally, or perhaps even more important, to have alternate versions of these charts in the event that someone is not available. We may not want to think about it, but the possibility of losing someone within the disaster itself is not far-fetched. And since the key function of the BCP is to maintain business processes, we will need to know exactly who to contact if someone else is unavailable. The last thing we need at times like these is staff arguing, or worse, not knowing who will make certain key decisions. Having alternate org charts prepared and ready is critical to ensuring that recovery personnel has the information they need to proceed.

These four items are just the tip of the iceberg when it comes to properly grafting a BCP. But there is much more out there that needs to be considered. Paper records, back-up locations, insurance contacts, emergency contacts, vendor contacts, payroll, banking; essentially every single aspect of our business needs to have a Plan B to ensure that you have an effective, holistic, and more importantly, successful Business Continuity Plan in place. While we as IT professionals might not find these things as “sexy” as implanting SAN replication and metro clusters, the fact of the matter is that we are often called upon when businesses begin their planning around BC and DR. That’s not to say that BC is an IT-related function, because it most certainly is not. But due to our major role in the technical portion of it, we really need to be able to push BC back onto other departments and organizations to ensure that the lights aren’t just on, but that there are people working below them as well.

I’d love to hear from some of you that do have a successful BCP in place. Was it driven by IT to begin with, or was IT just called upon as a portion of it? How detailed (or not) is your plan? Is it simply, “Employees shall report to a certain location,” or does it go as far as prioritizing the employees who gain access? What else might you have inside your plan that isn’t covered here? If you don’t have a plan, why not? Budget? Time? Resources?

Thank you so much for all of the recent comments on the first two articles. Let's keep this conversation going!

23 Comments

DR communication is luckily one of our strongest areas in a large pediatric trauma center. We have IS plans that use internal systems with HA, and externally hosted methods (cell and slack), it rolls up to organization plans with include everbridge that is updated on a regular basis, and the org plan rolls over to regional DR and even Homeland security. Drills get run at all levels at a regular basis, some on paper, some mock scenario style. I have on multiple occasions been informed that I was to report to a DR site that morning, or to stay home and not reply to anyone.

Here, too, we're networking in health care, and you never know what might cause a change in networking needs.  For example:

  • Some years ago a train shipment with many cars carrying Benzene derailed into a local river.  You could see the clouds of gas rising from ten miles away.  The two towns of Superior, Wisconsin and Duluth, Minnesota were told to evacuate.  50,000 people headed out on a workday morning, and those that could, worked from home.
  • We have a small International airport nearby, with commercial flights and an Air National Guard base sharing the skies.  It's not impossible to imagine a disaster that could combine one or more jets having problems right over town.  I design redundant data centers and above-ground and below-ground legs between Access, Distribution, and Core systems to help ensure that potential fiber "jet-crash-fade" can't take us our hospitals out of production at any single point.
  • Last night we had a good bit of snow drop on Duluth/Superior, and Management gave the OK for some folks to work from home.  Others had to stay home and work from there when their daycare providers closed due to the weather.  Still more are working from home dealing with sickness.

In these cases, we can quickly ramp up Citrix remote sessions from our typical 12,000 to 21,250.  Our DIA has overflow burst capability, and our AnyConnect DHCP scopes can easily be built out to scale to whatever the need is.  And we saw that need today when we filled our DHCP capabilities and had to roll out a /22 to accept more unexpected remote sessions from home.

Knowing what the future MIGHT hold is key to helping Management understand what's possible and what's not.

When Management keeps IT in the loop for decisions, and doesn't make assumptions about IT's capacity, that helps make the whole organization work the way Management expects.

MVP
MVP

Good article

Level 14

A few years ago I briefly joined a major financial org as an IT contractor.  Their DR plan was excellent (as you would expect).  They even had arrangements to hire PCs at a moments notice so staff could still work.  Unfortunately they had forgotten that these PCs and staff had to sit somewhere.  The didn't have any alternative office space arranged. 

Level 11

Very cool and good for your org!  We will have a post soon in regards to testing - but in all honesty I never thought of testing the complete BCP - it's good to see orgs actually carrying out those tasks!

Level 11

Great points!  Having clear communication between management and IT certainly mitigates any surprises that may pop up at the worse of times - As for the snow comment, man, I'm sick of it - ready for summer!

Level 11

Yeah, for some strange reason that is one thing that tends to be forgotten!  Were they in a disaster situation when this was discovered or was it something you caught before hand?

Level 13

Some great points about things not thought about until a DR event.

Level 14

I realised it about 5 minutes into their presentation about the great DR solution they had.  The noise of the bubble bursting was quite loud.  

Level 13

Nice Article

Level 20

At least it wasn't a real disaster!

Level 20

DR/BC isn't easy it's pretty hard and testing it is even harder.  When it comes down to it's how much are you willing to invest to have HA.

Level 14

You could just ignore it and collect your P45 (get fired for our non UK readers).when it all goes wrong.  I know some people who are doing this.

"BCP: Just a bunch of little DR plans!"

Not sure how I agree with this. BCP is more than IT. It's a company-wide preparedness strategy that covers more than just technical. Staffing, communication, public relations, operations & facilities, and so much more. While IT plays a major role in DR/BCP, your non-IT business units have to have their procedures documented. Else the IT department will have a bunch of running servers with no users or transactions engaging them.

Level 14

It is just a bunch of DR plans.  It's just that some of them have nothing to do with IT itself.  You are still planning what to do in the event of a disaster.  It is just that the IT department is where everyone turns to when sh1t happens no matter what that sh1t is.  IT people tend to be more logical (usually) and capable of thinking things through.  I guess it has to do with the mindset you need to survive in IT.  Most 'business' people tend to wave their arms in the air and run around in panic when bad stuff happens.  Probably comes with all that 'strategic' thinking and not enough of the 'tactical' thinking.

MVP
MVP

plans are one thing....but can you do it beyond the theory ?

You have to practice them to get the bugs out...yes Murphy will still throw a wrench or even a whole toolbox into the works,

but without practice and validation the plan is workable it is nothing but a check in the box.

Do you need all the servers up in a DR environment ?  Probably not.  Thus the environment may not appear the same.

What about monitoring?  Will everything look the same for monitoring ?  Depends...

A bunch of little plans is a good start, but you have to fit them all together and take into account all the different remote sites as well...
without practice you will never know if the puzzle pieces are all there and they all fit.

What Jfrazier​ said is "the word." If you don't recognize it, if you don't trust a "strange" Network Engineer's word, then trust the experience of a fireman and an Emergency Medical Responder--he's all of that, and more.

Building your DR plans is a good first step, but that step is of limited worth until you have everyone practice DR and get their input to refine it, to make it better.

Now document it with the improvements you discovered on your practice runs.

Don't quit there; use the information in your DR plan to reduce or eliminate events you can logically predict or infer from the DR practices.

And keep your DR information where people can reach it if their network/switch/PC/TC/WAN/LAN/server/power is or unavailable.  Print it out if necessary, and keep it where everyone knows where it is.

And then practice it.  Again.  And again.

MVP
MVP

rschroeder​, not an EMT yet...taking the class to upgrade from EMR (Emergency Medical Responder).

Level 14

Not sure if I would trust a Firefighter         I used to work for the London Fire Brigade.  Most of them couldn't be trusted to tie their own shoelaces.  Still, I wouldn't trade places with any of them.  Bloody heroes.

Thanks for the clarification--I modified the comment to accommodate the updated info.

When it comes to knowing, and experiencing, disasters and their recoveries, I put my life in the hands of our fire fighters.  I've yet to find one who wasn't serious about saving a life.

And there's no better way of saving someone's life than being prepared, and ensuring others are prepared and can avoid bad situations, or knowing what to do once the bad situation has been experienced.

"The Word" always reminds me of... (edited)

Two notes of the chord, that's our full scope

But to reach the chord is our life's hope

And to name the chord is important to some

So they give a word and the word is Om

Level 21

I think one of the unfortunate realities of BC/DR is that companies often put the burden of it on IT and expect them to put it in place and that will always fail.  As has been pointed out here, BC/DR is much larger than just IT and it needs to be treated as such with all necessary stakeholders identified and equally involved.  If BC/DR is owned by IT it's doomed to fail.