Database Backups (& Restores) Best Practices

As an accidental DBA, I've always been a fan of doing regular backups, but that's because I like to avoid "Resume Generating Events."

I've been less good about doing regular restore tests.

Here's a series of open questions for the Database Divas out there:

  • What would be your ideal backup/restore strategy?
  • What's your organization's backup/restore strategy?
  • Is there a delta between you vs. your org?  Why?
  • How do you choose a backup/restore policy on a high transaction (think OLTP) database compared to a "normal" or "low use" database?

I'm anxious to hear from everyone.

  • This is SQL Server specific.  

    Type

    Native Backups vs. VSS Snapshot based unless the database is too large to make native backups possible. 

    Schedule

    Weekly FULL, Daily DIFF on the other 6 days, HOURLY Transaction Logs

    Target

    S3/Azure Blob if in the public cloud.  If on-prem, written to a local file share (off server usually) and sync'd to S3/Azure Blob ASAP.  I like that target because you can then use polices to manage retention, hot vs. cold tiers, etc.  And it's CHEAP compared to disk. 

  • I'm a huge fan of native backups, just makes things easier all around.  But.... we use Managed Backups on our Azure IaaS servers, which does use BACKUP DATABASE, but automagic scheduling.  Besides it making SQL Agent crash after 10 days if you have too many databases, and lack of a Differential Backup option, it works pretty well.  We run 200+ TB per week using it, on databases ranging from 6GB to 10+ TB.

    I have VSS Writer disabled, just so none sneaks in any backups using that =) 

    And the strategy for the company matches mine 100% since I wrote it!

  • Ideal backup strategy is kind of tricky...  This is one of those 'It Depends' topics - budget, sensitivity, traffic, etc. all play into what's ideal for a given system.  My first step is try to get RTO/RPO numbers from management for a system before start making recommendations.  Most of the time I end up with something very close to what RandyKnight recommended.  After the general strategy is outlined, I start to tweak based on the amount of traffic a system sees.  For instance, I have a couple of prod DBs that we technically have a 1-2 hour RPO on, but are active enough that we're doing log backups every 15 min.  Keeping the logs that close has a few benefits - minimizing backup time & network traffic, keeping log file size down, increasing the chance that we'll catch backup errors early, etc.  

    There is something to be said for 3rd party backup solutions as well.  Keeping track of backups instance by instance can be daunting when you've got lots of instances; a single pane from something like a CommVault/Veeam/Cohesity can simplify a lot while also providing some synergy with system backups.

  • And dang sometimes it is hard to get those RTO/RPO numbers out of management.

    Also number of databases on a server factors into how often we do backups.  15 minute intervals when you have 500+ databases on a server (total size a bit shy of 20TB) gets taxing on the system.  It Depends indeed.

  • What do you mean?  The RTO/RPO targets are always "Now/absolutely zero data loss" aren't they?  Money is no object... until it is.