There's been a long-standing "discussion" in the world of storage regarding snapshots and backups. Some people say that snapshots can replace backups, while others say that just can't be true. I side with the latter, but the latest industry developments are making me reconsider that stance.
What's a Backup?
A backup isn't just a copy of data. A backup has to be recoverable and reliable, and most snapshots just don't meet that criteria.
What does "recoverable" mean? Backups have to be indexed and searchable by common criteria like date, file name, location, file type, and so on. Ideally, you could also search by less-common criteria like owner, content, or department. But at the very least there should be a file-level index, and most snapshot tools don't even have this. It's hard to expect a block snapshot to include a file index, but most NAS systems don't have one either! That's just not a backup.
Then we have to think about reliability. The whole point of a backup is to protect your data. Snapshots can protect against deletion and corruption, but they don't do much if the datacenter catches on fire or a bug corrupts your storage array. And many snapshot systems don't "snap" frequently enough or keep enough copies long enough to protect against corruption very long. This is why storage nerds like me say "your backup should be on a different codebase and your archive in a different zip code."
Then there's the question of management. Most backup systems have "friendly" interfaces to schedule regular backup passes, set retention options, and execute restores. Many years ago, NetApp showed just how friendly a snapshot restore can be, but options for what to backup and when remain pretty scarce. Although backup software isn't known for having the friendliest interface, you usually have lots more options.
But array snapshots can be an important part of a backup environment, and many companies are headed in that direction.
Most of today's best backup products use snapshots as a data source, giving a consistent data set from which to read. And most of these products sport wide-reaching snapshot support, from storage array vendors to logical volume managers. This is one source of irritation when people claim that snapshots have nothing to do with backups - of course they do!
Some snapshot systems also work in concert with data replication solutions, moving data off-site automatically. I've enjoyed the speed boost of ZFS Send/Receive, for example, and have come to rely on it as part of my data protection strategy. This alleviates my "different zip code" concern, but I would prefer a "different codebase" as well. That's one thing I liked at this week's NetApp Insight show: A glimpse of Amazon S3 as a replication target.
Then there are the snapshot-integrated "copy data management" products from Catalogic, Actifio, and (soon) NetApp. These index and manage the data, not just the snapshot. And they can do some very cool things besides backup, including test and development support.
Snapshots aren't backups, but they can be a critical part of the backup environment. And, increasingly, companies are leveraging snapshot technology to make better backups.