Backing Up

A while ago I wrote about the scenario where your app crashes. If you live long enough, this will happen to you. It’s almost a guarantee. If it can happen to Google Drive, like it did a few weeks ago, it can and will happen to you.

In that previous post, there was brief mention of backing up your database as a key component of any rudimentary disaster recovery plan. There are different kinds of database disasters, and it’s very important that you understand each kind as you evaluate your risk.

Kinds of Disasters

There are two distinct kinds of database disasters: Corruption of data and loss of data. While those may sound like the same thing, they’re very different.

Corruption of Data

Corruption of data occurs when data becomes unreadable. Think of it like having a page of written words, and some of those words being transformed into illegible characters. Corruption happens for a number of reasons, and is not nearly as common today as it was even 5-10 years ago. Hardware today is so much more reliable, and the software that stores your data is also remarkably better at making sure that data is written successfully.

Corruption can also happen as the result of a deliberate sabotage of your data, as happens with malware.

Loss of Data

Loss of data is far more common these days, and is also harder to protect against. Loss of data might occur through bad programming (a mistake allows a user to delete more than they should), a mistake by your system administrator, or myriad other reasons including deliberate sabotage or attack.

Backup Strategies

First, you should have a really good reason to not be using a service for your database. Amazon’s Relational Database Service (RDS) or services like Compose for Mongo take nearly all of the headache out of managing a production database, and also provide services for backup and restore that will be so much easier than something you manage yourself.

If, however, you’re running your own database, here’s the basics that you’ll need to have in place.

Real-time Replication

In the event that you have a hardware problem with your primary database server, real-time replication can save your bacon. Every time you change data on your “live” server, that same change is replicated to your “standby” server almost immediately. If your live server crashes, you make the standby server your live server and continue on with life. Every major database platform (MySQL, Postgres, SQL Server) all have support for real-time replication.

Periodic Backups

You’ll also need to do periodic, full backups of your entire database. Typically these are done when your system is not being used, like in the middle of the night. There’s a number of ways to schedule these backups depending on the database technology you’re using. If you can’t afford to lose a day’s worth of data, you can run periodic backups more frequently (like every 4 hours).

Offsite Storage

Every time you do a full backup of your database, you need to ship that backup to another location, typically a different geography. If something happens to your primary environment (let’s say an entire datacenter fails), you can be up and running in a different data center by restoring your offsite backup. Offsite backups also allow you to recover from a catastrophic hardware failure where your entire environment fails. While rare, it can still happen.

What Fixes What?

  • Real-time replication will generally protect you against data corruption caused by minor (or even major) hardware failures. It will NOT protect you against data loss, as any data deleted on the live server will be replicated to the standby
  • Periodic backups and offsite storage protect you against catastrophic data corruption and data loss. If you have to restore from a backup, you’ll obviously lose everything since the last time you backed up, but you won’t lose everything.

Finally…

No backup strategy is complete without testing and verification. Once you’ve decided how you’re going to protect yourself against a data disaster, test your plan periodically (no less than once a quarter) to ensure that the systems you’ve put in place are actually working.

There’s nothing worse than thinking you’ve got reliable backups, restoring them, and realizing that they’re no good.

Just Remember

There are multiple kinds of data disasters (corruption and data loss) and multiple strategies to protect against each type. A matrix-style defense against data disasters to include real-time replication, periodic backups, and offsite storage are required to adequately protect your business from failure.

What you implement depends solely on what your business can afford to lose in both time and information. Today, the answer to both is nearly always “none”, so make sure you’re not setting yourself up to fail.

Your Assignment

Meet with your tech team and ask the following questions:

  • What is our current database backup strategy?
  • How would we recover if our database became corrupt?
  • How would we recover if we accidentally lost data?
  • How would we recover if we were hacked and data was deleted?

Walk through each of the scenarios and ensure you’re adequately planning for each kind of outage, and then test your recovery plan every now and then to make sure it’s all working as desired.