Knowledge Level:
Intermediate
Article tags:
Between Redis' replication options and some custom scripting you can ensure the data you store in Redis is kept in multiple places. This tutorial will step you through a method for backing your dump file up to remote locations.
The goal of this tutorial is to demonstrate and teach a few ways you can backup your Redis data to remote servers. By the end of the tutorial you will know the process of doing so as well as the reasons why it works.
While Redis has native replication support, some prefer having their data dumped to an external repository. The native solution is to run a backup slave which is configured to persist data to disk via the Dump method, AoF method, or both. One disadvantage of this method is it doesn’t provide protection against a stray “flushall”.
A common means to provide this type of protection is to periodically copy out the dump file to provide point-in-time recovery (PITR). The granularity of this approach depends on the frequency of backups as well as the size of your data.
This is the primary question to ask before deciding on how to schedule the backup. There are two aspects to consider in the decision: business requirements and technical limitations.
Despite our desires and whims, technology has it’s limits. It takes time to transfer data whether from memory to disk or from one system to another. The larger the data set, the longer the time to SAVE. The longer the time to save, the longer your interval between SAVEs must and will be.
The tighter your window for remote backups is, the more weight should be given to FS monitoring. Conversely, the larger your dumpfile, the more preference should be given to scheduling them via cron. This latter aspect has to do with transfer times. If you are trying to copy out a 9GB file every few minutes you’re going to have a bad day.
There are two primary ways to trigger the backup, determined by your system requirements and available resources. These two methods are:
Scheduled Backups (cron)
Filesystem Monitoring (iNotify)
I’ve written some code to simplify some of this process. I started out writing the code as tutorial code, but realized it might be easier to make it a utility and write the tutorial on the larger aspects and using this utility. It is relatively early code, so feel free to submit issues - and of course pull requests when appropriate as well. I call this tool “rdbuploader” because, well I wasn’t feeling the creative-naming juices.
It is available at rdbuploader
source. To install simply run go
get github.com/TheRealBill/rdbuploader/go/rdbuploader
and configure your
credentials, maximum file size, and location of your dump file per rdbuploader
config‘s
directions. You will, of course, need $GOPATH
configured and have
$GOPATH/bin
in your PATH
.
Once it is installed and configured, run it to make sure it works.
This is by far the simplest option. Determine how often you want the backup to occur and create a cron entry for it.
For example, to run it every five minutes:
crontab */5 * * * * $GOPATH/bin/rdbuploader
Using a tool such as inotify you can catch when changes are made to the dump file. As tempting as it may sound this scenario requires additional thought and effort. Redis can potentially modify the dump file every second. It is doubtful you want to initiate a process to copy it that often.
Thus to use this method you need two parts to the process. The first is the iNotify piece which essentially sets a flag of some sort (such as touching a file in a specific directory) which the upload script can check for to handle uploads. If the file is there it uploads the RDB and removes the trigger file, otherwise it skips it.
This process can save resources by not having multiple copies of the same data file in your backup location. However it also increases the difficulty of detecting failure in the system as not having changes looks identical to not running at all.
If you are simply looking to upload it to a remote system rather than a cloud storage system such as Cloud Files, scp is a fairly reliable method. This method is also useful for doing persistence from a slave and having the dump file on the master.
By combining a detection mechanism with an upload system you can achieve
remote “offsite” copies of your Redis database at a frequency you need.
You should consider adding a check to Redis which verifies there is not a
save operation in progress as a condition to running the remote copy
action. This can be as simple as redis-cli info persistence | grep
rdb_bgsave_in_progress:1
which will return success if there is a
background save operation running. Sleeping for a short time (such as
the amount found in redis-cli info persistence |
grep rdb_last_bgsave_time_sec
+3 before retying can boost your data durability.
An alterative to this method would be to run backups on a replication
slave which has persistence disabled via the SAVE directive. In this
scenario your backup script calls redis-cli save
to do a foreground
save, following it with an upload. This will cause a slight delay on the
slave as Redis will complete the SAVE operation before handling
additional requests. If your data set is large or your disks slow this
would not be a reliable option.
Fundamentally, one could use this to back up an AoF file as well. By
configuring rdbuploader to point to the AoF file in the dumpfile
variable it
would work for AoF just as well. If using both, set up two config files (one
for each) and two cron/inotify jobs. Perhaps if there is enough interest those
features could go into the rdbuploader tool
Tags: cloud backup persistence
Follow Us
Follow Us online, join our conversations, engage with our teams around the world!