cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

World Backup Day 2020: Breaking Out of Backup Traps and Being a Recovery Hero

Community Manager

That special day of the year is almost here—the one when we remind you (and the world) how important it is to keep up with your backup strategy and make sure it’s up to speed and effective (yay!). 

 

We all know the heartbreaking feeling of losing valuable data and information (and sometimes it hurts more than a breakup). Backing up your data is critical to your business, but a backup can also quickly get complicated. Hardware failures, cyberattacks, natural disasters, and human error are just a couple of things that can get in your way when trying to keep up. 

 

In honor of World Backup Day (March 31), we want to hear about your backup headaches and success stories—tell us about those moments of slow data transfers and hardware breaks, and times when a successful backup was a hero in a recovery situation. 

 

We know you’re as excited as we are to celebrate and emphasize the importance of a backup strategy, so share your thoughts by March 11 and we’ll put 250 THWACK points in your account!

14 Comments

Perhaps the best backup support experience I've ever had was back in the 1990's when I was first exposed to a small HP DAT cartridge array.  I purchased it to back up a firewall on a daily basis, and it would automatically rotate through a small number (6?  8?) of DAT tapes, and it worked well.

Until the day it didn't.

Fortunately I wasn't relying on it for a restore, but I could see it had errors, and I had no technology documentation or training for it--one of those public school environments where there was need to do the job right, but no budget for the equipment or training or additional support contract.

I called HP's support number that was printed on a label on the DAT library and was immediately in touch with an engineer who knew this technology intimately.  I don't mean a Help Desk person who had experience with it; I'm talking to the actual engineer who designed it and who'd written its operating code.  Talk about competence on a land line!  I was in heaven.

This fellow was not only an uber-geek, but he had social communication skills that would make a hostage negotiator humble.  

It's a tech support experience I don't think I'll ever forget, and he was able to remotely use my eyes and hands to diagnose the issues and correct them quickly and efficiently.  I was grateful!

It's a lesson I remember each time I pick up the phone to answer a call from someone with a challenge.  I wish there were more magic phone numbers that put me in touch with the root wizard for any system I'm working on.  This guy was the best!

Level 14

I am sure I shared this a while ago, but it bears repeating. The EURO conversion which took place over the year end 1998-1999 (who planned this before Y2K?) required us to recalculate the Mutual Fund portfolios managed at the firm where I worked. The software vendor for the book of record system gave us a set of steps to follow. We were at the next to last step (waiting for the country to EURO final values) to continue. On the off chance we needed to fix something, I had my team snap another backup of the system "just in case".

A few hours later, the data came in and we were clear to apply the conversion rates and do the recalculation, 45 minutes later we got a panic call that there was something wrong with those rates and we would have to start again!!! From Scratch (picture 8-10 hours). All except for our firm.... We restored the backup applied the new "validated" rates and 2-3 hours later it was adult beverage time!

I heard horror stories from my peers at other firms that used the software package about recovering. When asked why I did the backup,  my comment was "stuff happens" and "Murphy lives!". 

Backups are the life preserver of IT!

Level 10

@rschroederTotally agree, HP Hardware and Software Support kept the bar very high.  I often compare to them with our daily support tickets. 

Level 20

We switched to a new backup software called Commvault a few years ago.  It has more knobs, buttons, and switches on it than any software I've EVER seen!  It's like the developers are programmers out of control and keep adding features until there's 4 ways to do the same thing.  It's like a labyrinth of different ways use it!  Another thing... trying to get it to let go of data so you can actually really delete it took a support call.  It's one of the high in the magic quadrant kinds of applications but it's not simple at all.

B

Level 14

We've got Commvault too.  I really really hate it and am looking for a suitable replacement.  Zerto provide a great system for VM server replication (and backups in v7.5).  We already use Zerto for DR and BC so I am looking at the new backup features.

Level 7

Best backup/recovery day was when one of our new shiny IBM Host servers (Other manufacturers also go wrong) decided to meltdown overnight I was first in 6 am do the morning checks and Nagios has so much red over it and enough dead systems to look like the last battle in a John Wick movie.

Fortunately the back up for that server completed about 20 minutes before it failed so with some creative space saving we where able to restore everything

 

My worst experience was a week later when the second new IBM server bit the dust this time part way through the back up and having used all the easy spare space on the previous weeks issues

Around six years ago when I was an IT consultant I received a call from a client asking if I could drop what I was doing and stop over they were having some server issues.  When I arrived I found out that the night before someone had accidentally left the water running in the janitors closet that was right above the server room (this is a backup discussion not a best practices discussion) and the water had made  pretty waterfall down the server rack and down and over the servers. After shutting down the waterfall and figuring out what was water damaged I asked about their disaster recovery planning...... which the answer was "Call Dave"...  Oh and the server that was hosed - the company's Exchange server.  So what in the heck did I do. 

We needed hardware - if you have any kind of disaster recovery or have done any kind of planning to successfully come back from a disaster you need something to either recover to OR have spun up someplace and syncing data.  They had neither of these because their disaster recovery plan was to Call Dave.  At the time I was using Ingram Micro as a vendor and made some calls and luckily they had the hardware in their depot in Chicago so I was able to send someone to pick up the new servers there.  Hardware complete after a few phone calls.  

Next up complete Exchange recovery from backup - which at the time freaked me out but from this experience I learned one thing - Exchange is really rock solid and Exchange recovery was amazingly easy.  Once the hardware was setup you did a few commands and switches the Exchange restoration process was really easy.  I was actually amazed at how easy it was.  

The whole process from start to finish was probably 36 hours or so.  They also had a Barracuda SPAM firewall at the time so all the email was cued up there and once the server came back up all the email was delivered. The only gotcha that happened was the company's network admin failed to tell me that he had removed Public Folders from the backup. So those did not come back and eventually was the scapegoat as to why those were not backed up.  I was the hero for a little bit.  

It was quite the learning experience and after this happened I ended up meeting with the rest of my clients to make sure they understood that a disaster recovery plan is not "call your support guy" 

If you need more stories I also had a client that got hit from Ransomware and was able to get them back up and running by recovering 2 severs from the previous nights backup.  😀

 

Level 12

Nothing jazzy or exciting for my story. Just a recovery job for our email system, in 2000's. It was a restore gone bad.

One of our team mates was not familiar with, restore to a different location. The files were for the vp, so imagine our surprise when emails were missing, because files were being overwritten.

Not the best way to learn how to use your backup system. Lost a member of our team that week.

MVP
MVP

I have a good one..... We have a NetApp filer managing our storage.  I wanted to find that special "air gap" solution (off site backup copy of data) then mount it with a Anti Virus Solution to add another layer to the mix.   I had vendors out there attempting to design a solution for me.. I finally asked.. is there anyone else trying to do what I am trying to do???? NO ...   what am I going to do with another copy of my data?   I have almost 4 copies of my data ... 1 at a DR, the live with snap shots, a monthly snap vault, and a 3rd that the SQL administrator insists on running. 

I need to work on a better Offense, I have a great defense for my systems.  I have great data that can only be written to by NetApp itself, I use snap vaults, and snap mirrors, and VEEAM.  I need to work on protecting my systems before they get hit rather than archiving another copy of the data.   

Some steps I am working on:

2FA for all the servers

Palo Alto Traps

SolarWinds ARM to complement SEM

NetApp vServer install w/ TrendMicro or Sophos

MVP
MVP

I guess I was a little off topic.. the burn is still really fresh!!

Your backups are only as good as your last restore!

Level 10

If backup solutions are extremely time consuming and a chore to do. you're not doing it right..

Ideally backups should only be used in case of an emergency, however sometimes due to user error, someone will accidentally delete an important document or something that is crucial. Telling the customer that he should be more careful, but we're not backing up the data because it would take too long seems kind of unacceptable... If recovering data is an issue, it's not the customers fault... it's just exposing a problem with the existing recovery solution.

no real story or morale.. just kind of venting, since I've seen system admins taking an unsympathetic attitude towards the users..

It worries me how people still are not backing up the cloud services like OneDrive and other O365 items like email.   I am also surprise more companies stick with the cheapest solution when they want to protect their most valuable asset, their data.  Backups are required for everything, and because you have never had an issue, doesn't mean you won't.  The way most management perceives backups is appalling.  

 

I celebrate a day which reminds us all to backup by testing my backups.    Taking Commvault, veeam and Nakivo, and put them to the test.   No I do not trust a single solution.   You test backups by how well you can recover.  So we test recovery, look at the speed of the recovery and accuracy.   Ah, what fun it is.  

 

Level 9

We Use Veritas Backup Exec

Level 9

In the early 90s, I was a CNE working for a fortune 500 company in silicon valley. We'd recently taken over the support contract and hadn't fully explored/comprehended the scope of the infrastructure yet. The company relied on cc:Mail and one of the Gateway/Router systems failed causing a frantic call for action because business processes were interrupted.

It was one of those situations where I didn't know what I didn't know. But, the system had to be returned to service quickly. I poked around the computer and noticed that the disk was reported "missing" during POST and it seemed that a rebuild (no backups or documentation available) was imminent. In desperation I removed the hard drive and gave it a good shaking, installed it, and crossed my fingers. The computer "saw" the disk during start-up and mail began to flow again.

Now I think more about business processes and how service interruptions should be evaluated for their impact and mitigated. Can the business pause while the staff scrambles to react? Is a backup/restore the only option for a system? Maybe the application/system support staff has some insights in how a recovery can meet RPO/RTO? I'm sure I'm opening some old wounds for people on this thread and not really pitching in a lighthearted backup/restore story.

But I think backups are a useful tool and it should be one resource in the SYSADMIN tool kit. Backup wise, it's all fun and games until a restore is needed.