My journey to a good backup system

Translations: Português (pt-BR)
Publication date: Sep 27, 2021
Tags: backup, borg

I'm a bit of a data hoarder. I still have some of the first programs I've ever written, photos I've taken on trips and drawings I did many years ago, to name a few. And since I don't trust some company to store all of this data for me, it was pretty clear that I needed to do my own backups.

One lesson I learned early on reading online is that backups should be as easy to do as possible, preferably completely autonomous, otherwise you end up not doing them at all.

In this post I want to recall my previous backup setup and then talk about the current one. Hopefully it will be clear by the end that the current one is much simpler and better.

Ye Olden Days

I think it was in 2018 that I started thinking seriously about making backups. For that I needed both hardware and software. The hardware to house the backups, and the software to make them happen.

Hardware

I had a desktop computer that no longer worked due to a problem in the motherboard, but everything else worked fine so it was perfect to use for my backup system. The hard drive could be used to store the backups, the power supply to power everything, and the computer case would house all the components.

I started by removing the malfunctioning motherboard and noticed that the power supply didn't give out any voltage by default. Searching online for its datasheet I discovered which wires needed to be shorted for it turn on. Since the original power button on the case was a push button that connected to the motherboard, which in turn shorted these wires, I also removed this power button and put a switch button in its place. I soldered the switch to the power supply wires and fastened it to the case with some tape and thin metal plates I had lying around to give it some rigidity. This is what it looks like:

{image}/button.jpg

As for the main component, the computer that would run the backups, I had a Raspberry Pi that wasn't being used so it was the obvious choice. I got a micro USB connector from some old cable, opened the cable up, and soldered the wires to the 5V and ground wires from the power supply, that way I could power everything from the power supply, since the hard drive was already powered by it.

Finally, since the hard drive was an internal one, which uses a SATA connection to transfer data, I bought a cheap SATA to USB converter cable and used it to connect the hard drive to the Raspberry Pi. This is the whole setup:

{image}/internals.jpg

(This is a current picture, it didn't have the Ethernet cable at the time)

That was it for the hardware setup, next was the software solution to make the backups happen.

Software

As I was thinking about which software to use for my backups, I stumbled upon this article. It presented to me the idea of incremental backups (only storing the differences in files from the previous backup), and backup rotation (only storing the N latest backups and deleting the older ones) which I found genius. It also relied mostly on rsync for the copies, which I was already familiar with. In the post they write a shell script to do the backup and retention, but I saw a comment mentioning that there was a tool that did exactly that, called rsnapshot. So I went for rsync + rsnapshot as my backup solution.

Very soon I faced some issues. The biggest one was that rsnapshot could only backup to a local storage. This meant that it needed to run on the Raspberry Pi, which was connected to the hard drive where the backups would be stored, and backup the files from my laptop.

Since the raspberry needed to connect to my computer, the normal thing to do would be to set a fixed IP for my laptop, but since I carried it with me to other networks, I thought it would be best to keep it with a dynamic IP. So I ended up with an ugly solution where my computer had the cronjob for the backup, in which it connected to the raspberry, updated its IP there and ran rsnapshot there. rsnapshot then used that updated IP to connect back to my computer and make the backup.

To make the data safe, the disk was encrypted using dm-crypt. This means that before rsnapshot began the backup, the disk was unlocked using the passphrase, which was saved as a file in the raspberry's system (very unsafe, I know!). After the backup it was re-locked and powered off.

One other issue is that rsync doesn't deal well with file/directory renaming or moving. It thinks that the previous one was deleted and a new one created. This was a big problem since I was always trying to find the best hierarchy for my files, which meant renaming or moving around the base directories and then suddenly all files had to be copied again...

That issue was further aggravated by the fact that the transfer between my computer and the raspberry was done wirelessly, so rather slow. To workaround this I added some logic to the backup scripts so that if the total size of the transfer was too big, it would just fail and notify me. I would then look at the diff, remember that I had done some renaming on the directories and copy them over manually to the backup through a temporary Ethernet cable.

Lastly, since the power supply was very noisy and I also didn't want to keep it constantly on, I'd have a daily alarm on my phone to remind myself to flip the little switch to turn it on just before the daily backup cronjob. And I'd turn it off later before going to sleep.

Nowadays

Some time has passed since then, and given that that setup worked but it was far from good, I've done some more research on the backup tools out there to see if there was anything better. Everything changed when I found borg.

Borg, or BorgBackup, is a program to make backups. It can backup locally or remotely. The backups can (and should!) be encrypted. And deduplicated! I'm not sure if renaming/moving files makes the backup take longer, but at least it won't take more space since all files are chunked and deduplicated independently of their path. Oh, and there's great documentation and an active community!

On top of that, there's another project called borgmatic. Borgmatic provides a single configuration file to configure borg's usage, like the repositories to use, retention policy and files to ignore for backups. So basically while borg provides the commands to do backups, borgmatic allows you to configure it all in a single file and it takes care of everything else.

In the end, borg + borgmatic make it incredibly easy to have secure and reliable backups. I now have a single cron entry to do my daily backups: 0 21 * * * cd ~/ && borgmatic --no-color --verbosity 1 --files >> ~/ark/etc/bkplogs. It's important to note that borg always uses relative paths in backups, so that's why I cd to where the folders I want to backup are first (my home directory).

Also since all the backup configuration is done in a single configuration file with borgmatic, it's easy to backup that file as well. So if I ever lose my files and need to recover from a backup, I'll immediately also have the backup setup ready to go again.

So yeah, borg is pretty much great and a whole lot better than any other solutions I tried before. There's a single catch though, if you're doing a backup to a remote location, borg also needs to be running there. So in these cases you have borg running on the local machine, doing the chunking and encrypting (so the remote already receives it encrypted and it's safe!), and on the remote machine another borg instance will be listening and storing the chunks in the archive.

Given that restriction, you can't backup to a remote that is a simple storage on the cloud. There are, however, very good alternatives focused on hosting borg backups, like BorgBase.

Now, I still use my raspberry setup, although I've now installed borg there instead of rsnapshot, and removed all my weird scripts. It's now just an additional entry in my borgmatic config. Also instead of having the disk password lying in the raspberry system I had the idea to send it as part of the ssh command, that way I can store it securely in my own encrypted machine. Oh, and I now rely on a remote server for daily backups and only do local backups once a week, that way I don't need to have the trouble of flipping that little switch every day 😉.

It's worth mentioning that recovery of backups is also very easy with borg. There are two commands (offered by both borg and borgmatic): extract and mount. extract recovers the file(s) from a given archive. Note that it will be saved in the same path as used in the archive, potentially overwriting your local files, so better run it inside a temporary folder first. mount mounts the archive on a local folder as if it were a filesystem, allowing you to browse the files.

Conclusion

As you can see, I've significantly improved my backup setup. And that's because it is less interesting, not the opposite. As much as I have fond memories of having my own system, which relied on multiple scripts to solve the issues with what I had, it was a hassle to maintain. It's great to have everything just work out-of-the-box with borg and borgmatic: I can even forget I'm actually doing backups now, which is refreshing.