Knowledge Level:

Intermediate

Article tags:

Between Redis' replication options and some custom scripting you can ensure the data you store in Redis is kept in multiple places. This tutorial will step you through a method for backing your dump file up to remote locations.

Tutorial Goal

The goal of this tutorial is to demonstrate and teach a few ways you can backup your Redis data to remote servers. By the end of the tutorial you will know the process of doing so as well as the reasons why it works.

Redis Backup Options

While Redis has native replication support, some prefer having their data dumped to an external repository. The native solution is to run a backup slave which is configured to persist data to disk via the Dump method, AoF method, or both. One disadvantage of this method is it doesn’t provide protection against a stray “flushall”.

A common means to provide this type of protection is to periodically copy out the dump file to provide point-in-time recovery (PITR). The granularity of this approach depends on the frequency of backups as well as the size of your data.

How Often Do You Need to Backup?

This is the primary question to ask before deciding on how to schedule the backup. There are two aspects to consider in the decision: business requirements and technical limitations.

Limits of Technology and Science

Despite our desires and whims, technology has it’s limits. It takes time to transfer data whether from memory to disk or from one system to another. The larger the data set, the longer the time to SAVE. The longer the time to save, the longer your interval between SAVEs must and will be.

The tighter your window for remote backups is, the more weight should be given to FS monitoring. Conversely, the larger your dumpfile, the more preference should be given to scheduling them via cron. This latter aspect has to do with transfer times. If you are trying to copy out a 9GB file every few minutes you’re going to have a bad day.

Triggering The Backup

There are two primary ways to trigger the backup, determined by your system requirements and available resources. These two methods are:

Scheduled Backups (cron)
Filesystem Monitoring (iNotify)

Backing up to a Cloud Storage Provider

Installing RDB Uploader

I’ve written some code to simplify some of this process. I started out writing the code as tutorial code, but realized it might be easier to make it a utility and write the tutorial on the larger aspects and using this utility. It is relatively early code, so feel free to submit issues - and of course pull requests when appropriate as well. I call this tool “rdbuploader” because, well I wasn’t feeling the creative-naming juices.

It is available at rdbuploader source. To install simply run go get github.com/TheRealBill/rdbuploader/go/rdbuploader and configure your credentials, maximum file size, and location of your dump file per rdbuploader config‘s directions. You will, of course, need $GOPATH configured and have $GOPATH/bin in your PATH.

Once it is installed and configured, run it to make sure it works.

Cron Scheduling

This is by far the simplest option. Determine how often you want the backup to occur and create a cron entry for it.

For example, to run it every five minutes:

crontab */5 * * * * $GOPATH/bin/rdbuploader

Filesystem Change Notification

Using a tool such as inotify you can catch when changes are made to the dump file. As tempting as it may sound this scenario requires additional thought and effort. Redis can potentially modify the dump file every second. It is doubtful you want to initiate a process to copy it that often.

Thus to use this method you need two parts to the process. The first is the iNotify piece which essentially sets a flag of some sort (such as touching a file in a specific directory) which the upload script can check for to handle uploads. If the file is there it uploads the RDB and removes the trigger file, otherwise it skips it.

This process can save resources by not having multiple copies of the same data file in your backup location. However it also increases the difficulty of detecting failure in the system as not having changes looks identical to not running at all.

Copying Dump File To Remote Systems

Copy via SCP

If you are simply looking to upload it to a remote system rather than a cloud storage system such as Cloud Files, scp is a fairly reliable method. This method is also useful for doing persistence from a slave and having the dump file on the master.

Putting It All Together:

By combining a detection mechanism with an upload system you can achieve remote “offsite” copies of your Redis database at a frequency you need. You should consider adding a check to Redis which verifies there is not a save operation in progress as a condition to running the remote copy action. This can be as simple as redis-cli info persistence | grep rdb_bgsave_in_progress:1 which will return success if there is a background save operation running. Sleeping for a short time (such as the amount found in redis-cli info persistence | grep rdb_last_bgsave_time_sec +3 before retying can boost your data durability.

An alterative to this method would be to run backups on a replication slave which has persistence disabled via the SAVE directive. In this scenario your backup script calls redis-cli save to do a foreground save, following it with an upload. This will cause a slight delay on the slave as Redis will complete the SAVE operation before handling additional requests. If your data set is large or your disks slow this would not be a reliable option.

Extending The Process

Fundamentally, one could use this to back up an AoF file as well. By configuring rdbuploader to point to the AoF file in the dumpfile variable it would work for AoF just as well. If using both, set up two config files (one for each) and two cron/inotify jobs. Perhaps if there is enough interest those features could go into the rdbuploader tool

Tags: cloud backup persistence