Encrypted and unencrypted backups using rsync and rsyncrypto
Posted: 02 Mar 2010, 18:28
Note, I am in no way a bash master. All these scripts work as I want them to but if anyone can see improvements (and there are sure to be plenty) please point them out.
Introduction (totally skippable if you want):
To me, offsite backups are a must.
This is just a short list of reasons I want an offsite backup:
1. Family photos and videos - I would be heartbroken if I lost these.
2. Documents etc. that I need to keep a record of.
3. Software and OS settings because it's a PITA to reinstall everything when a hard drive dies/you mess up your system.
I don't imagine this is a much different situation to anyone else.
I'd been happily using Mozy on my old Windows server. I had to stop when I bought my Bubba.
I've found ftp based backups over the Internet in general to be too flaky. I tried several online ftp storage providers and never received what I consider to be a decent service considering I was going to be paying them £15 per month.
I didn't want to invest in a second Bubba and host it at a friend's / family member's. I couldn't afford the initial outlay for another one.
I've found an online storage provider that meets my needs.
The pros and cons of this particular company are:
Pros:
Very cheap - $15 per YEAR for "unlimited" storage (I've seen a post by the CEO saying they think using much more than 500GB is a bit unreasonable for the price, I won't argue with that as the price is a bargain).
Lots of ways to access your data - rsync, ftp, webdav, web-based file browser.
Cons:
A fair bit of downtime (it's got better lately but 24 hour outages aren't uncommon). However, I've never actually lost any data with them. Also, to be fair, they're in the US and I'm in the UK. I imagine not all outages can be directly attributed to this company given the amount of cable between us.
Woeful support - emails never get replied to.
Because of this, I won't post the name here because I DO NOT recommend them but if you're interested, PM me.
For me, backups have to meet three main criteria.
1. I had to be able to encrypt some of my backups - I don't mind my photos and mp3s being sent in the clear but I don't want my letters to the bank manager being read by anyone else.
2. I had to be able to keep deleted and old versions of my files.
3. As I was going to be backing up a lot of data, if a backup did drop out I'd need it to be able to pick up from where it left off.
As rsync does number 3 excellently I decided to explore the options surrounding numbers 1 and 2 and over a couple of months I've ended up writing a some scripts based around rsync and rsyncrypto.
The set up:
I wanted quite fine control over what gets backed up and when so first I've created two directories called weekly and daily. I can write a script to backup a certain folder and can easily move it between the daily and weekly folders as my needs change.
To run the scripts in these folders I've created a new cron job in /etc/cron.d. Cron.d is a directory where you can create separate crontab files that have a specific purpose. My file is called run_backups and looks like this:
The scripts these jobs call look like this:
run_backups_weekly:
run_backups_daily:
Each one does the same thing - run any scripts called *backup in the respective weekly and daily directories. All the "lockfile" stuff is just to make sure only one instance ever tries to run at a time. The last line of the weekly job also purges old log files generated by the actual backups.
During my initial backups I had these jobs called from cron.hourly so that I could leave them unattended and if backup failed it would restart within 60 minutes. It took six weeks to complete my initial set of backups.
So onto the actual backing up. I've got two templates set up, one for unencrypted backups and one for encrypted backups. The encrypted backups use rsyncrypto.
Notes on rsyncrypto.
Rsyncrypto encrypts files in a "rsync friendly way". Normally, a small change in a file will cause the encrypted version of that file to be totally different to any previous encrypted versions. This would remove any efficiency gains from using rysnc. Rsyncrypto sacrifices "a small amount" of security but encrypted versions of files change in only small amounts as the original file changes. This makes it possible to use rsync on these encrypted files.
Using rsyncrypto means you have to have an encrypted copy of all your files on disk for rsync to sync so for me, encrypting all ~400 GB was a waste of disk space and time, YMMV.
Example of an unencrypted backup with annotations. With my set up, I only have to edit the first two variables and drop it in to the daily or weekly directory (making sure it's called [something]backup I add a number to the beginning of the file names as well to control the order they're backed up in).
Note 1. You need these lines if you're backing up to a server that runs an rsync dameon. If you're using SSH you need to set up a key pair for passwordless connections. Lots of stuff on the interwebs about that if needed.
Note 2. Explanation of some of the options:
a - basically syncs all files recursively.
z - compresses transfers.
b / --backup-dir=/Backups/$TARGET/$DATE - creates a directory named after time and date. Any files that are to be removed or updated get copied into this directory first. Hence, indefinte file versions and deletions can be backed up.
--fuzzy - good if you rename a file. Rsync will compare the renamed file and the file on the server and won't have to upload the whole file again.
-- delete-after - works in conjunction with --fuzzy. No files are deleted until the end of the action (the default is during).
[rsync server]/$TARGET - obviously, replace [rsync server] with the name of the server you are using.
Example of an encrypted backup. There is now an extra line to edit:
Note 1. Rsyncrypto is set to create encrypted files in /tmp/encrypted. The option -c only encrypts changed files (otherwise it encrypts the whole directory tree every time it's run). /tmp is cleared down if the Bubba is rebooted but the files will just be recreated on the next run. This way, if you are low on disk space you can purge the encrypted files at the end of the backup but your backups will take longer.
Keys: A key is created for each file and is saved to /home/admin/keys/keyfiles/$TARGET. These are needed to decrypt files along with the public key (/home/admin/keys/backupspublic.crt). However, when you set up rsyncrypto you will also create a private key. You can use this key to decrypt any files even if you don't have the individual keyfiles so make sure this private key is backed up and kept safe.
-trim=2: The trim option needs a bit of trial and error. Without it encrypted files would be copied to /tmp/encrypted/home/example. Instead they are copied to /tmp/encrypted/example. Basically, adjust this option to suit your needs.
Note 2. You're now rsyncing from /tmp/encrypted rather than the original directory. This is kind of important!
Excluding files:
There are times you don't want to back up everything in a tree. Excluding files is easy when running an unencrypted backup as rsync has the option
It's a bit hard with rsyncrypto because it doesn't have an exclude option. Instead, you need to use the find command to list all the files you want to backup and then feed this into rsyncrypto.
E.g. When backing up /home/admin I want to exclude my keyfiles directory (I have no idea what would happen if I tried to encrypt them - recursion hell I'd imagine).
So you have to do this (this would replace the single rsyncrypto line in the previous example):
Notice I also excluded /home/admin, i.e. the top of the directory tree. This is because if rsyncrypto sees this it will encrypt the entire directory tree regardless of what follows in the file listing. This actually contradicts the documentation and I need to raise it as a bug at some point.
What to backup:
It's not wise to backup your entire drive. That said, every forum post about backing up Linux differs as to what needs backing up and what doesn't. This is what I back up:
Settings (encyrpted):
/etc/
/usr/local/
/var/ excluding /var/tmp/ and /var/cache/
/root/
Docs etc (encrypted)
/home/admin/
/home/web/
/home/user1/
/home/user2/ etc.
Storage (not encrypted):
/home/storage excluding /home/stroage/extern
I've also written a monthly script that purges old backups. As well as that, I backup my client PCs to my Bubba and then back them up online from the Bubba. I'll fill in the details of that tomorrow because this is a long post and now it's my bed time.
Hope some of it helps!
Introduction (totally skippable if you want):
To me, offsite backups are a must.
This is just a short list of reasons I want an offsite backup:
1. Family photos and videos - I would be heartbroken if I lost these.
2. Documents etc. that I need to keep a record of.
3. Software and OS settings because it's a PITA to reinstall everything when a hard drive dies/you mess up your system.
I don't imagine this is a much different situation to anyone else.
I'd been happily using Mozy on my old Windows server. I had to stop when I bought my Bubba.
I've found ftp based backups over the Internet in general to be too flaky. I tried several online ftp storage providers and never received what I consider to be a decent service considering I was going to be paying them £15 per month.
I didn't want to invest in a second Bubba and host it at a friend's / family member's. I couldn't afford the initial outlay for another one.
I've found an online storage provider that meets my needs.
The pros and cons of this particular company are:
Pros:
Very cheap - $15 per YEAR for "unlimited" storage (I've seen a post by the CEO saying they think using much more than 500GB is a bit unreasonable for the price, I won't argue with that as the price is a bargain).
Lots of ways to access your data - rsync, ftp, webdav, web-based file browser.
Cons:
A fair bit of downtime (it's got better lately but 24 hour outages aren't uncommon). However, I've never actually lost any data with them. Also, to be fair, they're in the US and I'm in the UK. I imagine not all outages can be directly attributed to this company given the amount of cable between us.
Woeful support - emails never get replied to.
Because of this, I won't post the name here because I DO NOT recommend them but if you're interested, PM me.
For me, backups have to meet three main criteria.
1. I had to be able to encrypt some of my backups - I don't mind my photos and mp3s being sent in the clear but I don't want my letters to the bank manager being read by anyone else.
2. I had to be able to keep deleted and old versions of my files.
3. As I was going to be backing up a lot of data, if a backup did drop out I'd need it to be able to pick up from where it left off.
As rsync does number 3 excellently I decided to explore the options surrounding numbers 1 and 2 and over a couple of months I've ended up writing a some scripts based around rsync and rsyncrypto.
The set up:
I wanted quite fine control over what gets backed up and when so first I've created two directories called weekly and daily. I can write a script to backup a certain folder and can easily move it between the daily and weekly folders as my needs change.
To run the scripts in these folders I've created a new cron job in /etc/cron.d. Cron.d is a directory where you can create separate crontab files that have a specific purpose. My file is called run_backups and looks like this:
Code: Select all
# m h dom mon dow user command
#Weekly backups - every Friday at 02:00
0 2 * * 5 root /home/admin/backup_scripts/run_backups_weekly
#Daily backups - every day at 01:00
0 1 * * * root /home/admin/backup_scripts/run_backups_daily
run_backups_weekly:
Code: Select all
#!/bin/bash
LOCKFILE=/var/lock/backup_weekly.lockfile
if [ ! -e $LOCKFILE ];
( set -o noclobber; echo "$$" > "$LOCKFILE") 2> /dev/null;
then
trap 'rm -f "$LOCKFILE"; exit' INT TERM EXIT
for SCRIPT in /usr/local/bin/backup_scripts/weekly/*backup
do
$SCRIPT
done
rm -f "$LOCKFILE"
trap - INT TERM EXIT
else
exit;
fi
find /home/admin/backup_logs/* -mtime +30 -exec rm {} \;
exit;
Code: Select all
#!/bin/bash
LOCKFILE=/var/lock/backup_weekly.lockfile
if [ ! -e $LOCKFILE ];
( set -o noclobber; echo "$$" > "$LOCKFILE") 2> /dev/null;
then
trap 'rm -f "$LOCKFILE"; exit' INT TERM EXIT
for SCRIPT in /usr/local/bin/backup_scripts/weekly/*backup
do
$SCRIPT
done
rm -f "$LOCKFILE"
trap - INT TERM EXIT
else
exit;
fi
find /home/admin/backup_logs/* -mtime +30 -exec rm {} \;
exit;
During my initial backups I had these jobs called from cron.hourly so that I could leave them unattended and if backup failed it would restart within 60 minutes. It took six weeks to complete my initial set of backups.
So onto the actual backing up. I've got two templates set up, one for unencrypted backups and one for encrypted backups. The encrypted backups use rsyncrypto.
Notes on rsyncrypto.
Rsyncrypto encrypts files in a "rsync friendly way". Normally, a small change in a file will cause the encrypted version of that file to be totally different to any previous encrypted versions. This would remove any efficiency gains from using rysnc. Rsyncrypto sacrifices "a small amount" of security but encrypted versions of files change in only small amounts as the original file changes. This makes it possible to use rsync on these encrypted files.
Using rsyncrypto means you have to have an encrypted copy of all your files on disk for rsync to sync so for me, encrypting all ~400 GB was a waste of disk space and time, YMMV.
Example of an unencrypted backup with annotations. With my set up, I only have to edit the first two variables and drop it in to the daily or weekly directory (making sure it's called [something]backup I add a number to the beginning of the file names as well to control the order they're backed up in).
Code: Select all
#!/bin/bash
SOURCE=/home/storage/example/ # <-- Be sure to keep the trailing /
TARGET=example # <-- Keep this the same as the directory "example" above.
#########################################################################
DATE=`date +%d-%m-%y_%H.%M.%S`
LOG=/home/admin/backup_logs/"$TARGET"_"$DATE"_backup.log
#############################################################################
echo "Started at `date +%d-%m-%y_%H.%M.%S`" >> $LOG
trap 'unset RSYNC_PASSWORD; echo "`date +%d-%m-%y_%H.%M.%S` Process killed" >> $LOG; exit $?' INT TERM EXIT
echo "Beginning rsync" >> $LOG
export RSYNC_PASSWORD=[password] #see note 1
rsync -avzbh --fuzzy --backup-dir=/Backups/$TARGET/$DATE --delete-after --timeout=3600 --partial-dir=/partial_transfers/$TARGET $SOURCE [rsync server]/$TARGET >> $LOG #see note 2
unset RSYNC_PASSWORD #see note 1
trap - INT TERM EXIT
echo "`date +%d-%m-%y_%H.%M.%S` Complete." >> $LOG
exit;
Note 2. Explanation of some of the options:
a - basically syncs all files recursively.
z - compresses transfers.
b / --backup-dir=/Backups/$TARGET/$DATE - creates a directory named after time and date. Any files that are to be removed or updated get copied into this directory first. Hence, indefinte file versions and deletions can be backed up.
--fuzzy - good if you rename a file. Rsync will compare the renamed file and the file on the server and won't have to upload the whole file again.
-- delete-after - works in conjunction with --fuzzy. No files are deleted until the end of the action (the default is during).
[rsync server]/$TARGET - obviously, replace [rsync server] with the name of the server you are using.
Example of an encrypted backup. There is now an extra line to edit:
Code: Select all
#!/bin/bash
CRYPTOSOURCE=/home/example/ # <-- Files to be encrypted
SOURCE=/tmp/encrypted/example/ # <-- The location of the encrypted files. Be sure to keep the trailing / #See note 1
TARGET=example # <-- Keep this the same as the directory "example" above.
#########################################################################
DATE=`date +%d-%m-%y_%H.%M.%S`
LOG=/home/admin/backup_logs/"$TARGET"_"$DATE"_backup.log
#############################################################################
echo "Started at `date +%d-%m-%y_%H.%M.%S`" >> $LOG
trap 'unset RSYNC_PASSWORD; echo "`date +%d-%m-%y_%H.%M.%S` Process killed" >> $LOG; exit $?' INT TERM EXIT
#
echo "`date +%d-%m-%y_%H.%M.%S` Beginning encryption of "$CRYPTOSOURCE"." >> $LOG
rsyncrypto -rvc $CRYPTOSOURCE /tmp/encrypted/$TARGET /home/admin/keys/keyfiles/$TARGET/ /home/admin/keys/backupspublic.crt --trim=2 --delete-keys >> $LOG #See note 1
#
echo "`date +%d-%m-%y_%H.%M.%S` Beginning rsync" >> $LOG
export RSYNC_PASSWORD=[rsync password]
rsync -avzbh --fuzzy --backup-dir=/Backups/$TARGET/$DATE --delete-after --timeout=3600 --partial-dir=/partial_transfers/$TARGET $SOURCE [rsync server]/$TARGET >> $LOG #See note 2
unset RSYNC_PASSWORD
trap - INT TERM EXIT
echo "`date +%d-%m-%y_%H.%M.%S` Complete." >> $LOG
exit;
Keys: A key is created for each file and is saved to /home/admin/keys/keyfiles/$TARGET. These are needed to decrypt files along with the public key (/home/admin/keys/backupspublic.crt). However, when you set up rsyncrypto you will also create a private key. You can use this key to decrypt any files even if you don't have the individual keyfiles so make sure this private key is backed up and kept safe.
-trim=2: The trim option needs a bit of trial and error. Without it encrypted files would be copied to /tmp/encrypted/home/example. Instead they are copied to /tmp/encrypted/example. Basically, adjust this option to suit your needs.
Note 2. You're now rsyncing from /tmp/encrypted rather than the original directory. This is kind of important!
Excluding files:
There are times you don't want to back up everything in a tree. Excluding files is easy when running an unencrypted backup as rsync has the option
Code: Select all
--exclude 'directory_to_exclude'
It's a bit hard with rsyncrypto because it doesn't have an exclude option. Instead, you need to use the find command to list all the files you want to backup and then feed this into rsyncrypto.
E.g. When backing up /home/admin I want to exclude my keyfiles directory (I have no idea what would happen if I tried to encrypt them - recursion hell I'd imagine).
So you have to do this (this would replace the single rsyncrypto line in the previous example):
Code: Select all
echo "`date +%d-%m-%y_%H.%M.%S` Listing files in "$CRYPTOSOURCE1"." >> $LOG
find /home/admin/ -not \( -iwholename "/home/admin/keys*" -or -iwholename "/home/admin/" \) > $FILELISTHOME
echo "`date +%d-%m-%y_%H.%M.%S` Beginning encryption of "$CRYPTOSOURCE1"." >> $LOG
rsyncrypto -cv --filelist $FILELISTHOME /tmp/encrypted/$TARGET1 /home/admin/keys/keyfiles/$TARGET1/ /home/admin/keys/backupspublic.crt --trim=3 --delete-keys >> $LOG
rm $FILELISTHOME
What to backup:
It's not wise to backup your entire drive. That said, every forum post about backing up Linux differs as to what needs backing up and what doesn't. This is what I back up:
Settings (encyrpted):
/etc/
/usr/local/
/var/ excluding /var/tmp/ and /var/cache/
/root/
Docs etc (encrypted)
/home/admin/
/home/web/
/home/user1/
/home/user2/ etc.
Storage (not encrypted):
/home/storage excluding /home/stroage/extern
I've also written a monthly script that purges old backups. As well as that, I backup my client PCs to my Bubba and then back them up online from the Bubba. I'll fill in the details of that tomorrow because this is a long post and now it's my bed time.
Hope some of it helps!