cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Having and Testing Backups Is Key to Running Systems

Level 9

The title of this post says it all. In IT, we’ve said for years you need to take and test backups regularly to ensure your systems are being backed up correctly. There’s an adage in IT: “The only good backup is a tested backup.”

Why Do You Need Backups?

I bring this up because I have people coming to our consulting firm after some catastrophic event has occurred, and the first thing I ask for are the backups. Then the conversation usually gets awkward as they try to explain why there are no backups available. The reasons usually run the gamut from “we forgot to set up backups” to “the backup system ran out of space” to “the backup system failed months ago, and no one fixed it.”

Whatever the reason, the result is the same: there are no backups, there’s a critical failure, and no one wants to explain to the boss why they can’t recover the system. The system is down, and the normal recovery options aren’t available for one reason or another. In these cases, when there’s no backup available, the question becomes, “How critical is this to your business staying operational?” If it’s a truly critical part of your infrastructure, then the backups should be just as critical to the infrastructure as the system is. Those backups need to be taken and then tested to ensure the backup solution meets the needs of the company (and that the backups are being taken).

When planning the backups for a key system, the business owners need to be involved in setting the backup and recovery policies; after all, they’re responsible for the data. In IT terms, this is the Recovery Point Objective (RPO) and the Recovery Time Objective (RTO). In layman’s terms, these are how much data can the organization afford to lose and how long it takes to bring the system back online. These numbers can be anything they need to be, but the smaller they are, the larger the financial cost will be in completing this request. If the business wants an RPO and RTO of 0, and the business is willing to pay for it, then it’s IT's job to complete the request even if we don’t agree with it. And that means running test restores of those backups frequently, perhaps very frequently, as we need to ensure the backups we’re taking of the system are working.

Why Is It Important to Test Backups?

Testing backups should be done no matter what kind of backups are being taken. If you think you can skip on test restores because the backup platform reports the backup was successful, then you’re failing at taking backups. One of the mantras of IT is “trust but verify.” We trust that the backup software package did the backup it says it was going to do, but we verify it did the backup by testing it. If there’s a problem where the backup can’t be restored, it’s much better to find out when doing a test restore of the backup than when you need to restore the production system. If you wait until the production system needs to be restored to find out the backup failed, there’s going to be a lot of explaining to do as to why the system can’t be restored and what sort of impact that might have on the business—including, potentially, the company going out of business.

16 Comments
Level 13

Thanks for the article.

Level 14

Thanks for the article!  Good thing to point out about actually testing the backups instead of blindly trusting them. 

Level 15

Great posting.  It is strange how many times in my IT career has the phrase "When was the last time the backups were tested?" and the awkward stare onto the ceiling.  I was taught eons ago that the first thing you do when you start the day is the test restore some files from random points in the backup set.  Then, take the tape offsite.  Of course, the modern disk-to-disk backups makes the offsite tape storage obsolete, but the mantra is still the same.  Test restore files from the backup set, and then at least monthly, test restore an entire server.  This has become much easier process with virtual machines, but none the less a critical step.

Level 13

So True.

Level 16

Thanks for the write up.

Level 9

Trust, but verify. It's annoying but worth doing.

Level 9

Thankfully we have VMs now. I remember the days of needing physcal servers to do test restores to. Those were always interesting budget meetings. We need to buy a new server to make sure the restores work?

Level 11

Having backups that are not tested, is like having a DR plan that is never tested.  Both are great to have but useless unless tested.

Level 15

Yes, and I always used the argument.  How much will it cost the business to not ensure working backups.  Then, I usually got my budget   If anyone has had the joys of being audited (SOC, etc).  Backup and restore is a crucial item.

Level 13

Great post.  Over the years (nearly 30 in my case) this has been the hardest lesson to bake in to IT professionals.  Backups are critical, and must be tested regularly (I insist on at least weekly from a variety of sources), and you have to constantly monitor those processes and procedures without growing bored with the process or complacent.  If you don't get this right at some point it will bite you and it may cost you your job, and might even bring the company/enterprise down depending on how bad it is.

We have audits that seem like practically every month and they all want to see how we track and log backups.  I have a monthly task that is generated via our helpdesk system and I test and log local restores (on the appliance), hot copy restores (copied offsite), and cold copy restores (hot copies copied off to disk) and log them on a monthly basis.  I have surprised quite a few auditors that I have all the bases covered.  Always test restores!!!

Level 16

Your backup is only as good as your last restore.

Level 12

The strange thing is that we are still having these conversations.  There have been multiple news stories about companies that suffered because they didn't have backups, and major events - storms, earthquakes, human actions - that wrecked large areas.  One would think that this was enough to get it through to the planners that backups are absolutely essential to business!

Ah well, don't stop beleivin', hold on to that feelin'.  Maybe someday it'll get through...

Level 12

A site that had only one IT person for a long time grew to the point where they needed a second and I was hired. They acquired a new server and I told the guy I was working with that I was going to do a test restore before setting up the server. Nobody here will be surprised, but he was utterly shocked  when the restore failed. He couldn't understand that it would fail or why it failed. And the backup jobs were run under someone's domain account so if that person left or their account was suspended the jobs would not execute.

The company we worked for had a disaster recover plan and a warm backup site 550 miles away. They had never properly tested the DR plan and labeled what they had as good enough, so when they had a mission critical system go down and needed to fail over to the warm site nothing worked. Even more fun: the useless boss flew out that afternoon but the admins who could actually do something about it were told to drive seven hours to the backup site! It took them nearly 48 hours to restore functionality, and by then the primary site was mostly functional.

Level 20

Backups are one of my least favorite technologies but it is important.

Level 11

Thanks for the post, very much agree.

About the Author
I now have responsibility for all networking and telephony at Purdue University. Previously I have worked at Univesity of Washington, Alcatel-Lucent, Fiserv, ETI, and University of Salford