Posted on | October 14, 2007 | 1 Comment
In all honesty, this is just a recipe for my enchilada, I’m still in the prep and planning stage.
My goal: Make a really efficient idiot-proof backup program that (could) use the the features of an ext3cow file system if one happened to be present.
I don’t spare a penny when it comes to computers that I lease to others, all SAS or SCSI drives, nice RAID cards (some with SODIMM caching), plenty of RAM (4GB minimum ECC), all the trimmings.
My desktop computer is a puny little p4 HT 3.0 with dual slow IDE drives, 1 GB of ram. My forky ad-hoc backup scripts cause this particular machine to slow to a crawl for two hours a day. I need something better, I bet a lot of people do.
I’ve been working with using inotify() to alert me of file activity in hostile places like /tmp on some of my production machines. I see no reason why I can’t use inotify watches to do smarter incremental backups.
My ideal system has :
- Super easy configuration with an optional Python GTK front end for desktop users (could do this with php, too, easily).
- bz2/gzip libs built in so that there is no need to fork tar / gzip (allowing the backup program to better watch and control its resource usage).
- Crypto built in and made simple.
- Built in hooks for the extended ext3cow ioctl API (for snapshots and epoch retrieval).
- Multi-user friendly, every user has a backup configuration that they manage in their home/ directory.
- Simple client to do restores, with the ability to create rescue disks. Users should be able to grab what they need from any available past archive.
- Simplified retention management without relying on native quotas.
- Maybe a web based management console, too.
All of these features exist in available free programs, however not all of them can be found in one single program (from what I’ve found). This is my thought on how it’s going to work:
inotify() watches whatever user’s configuration files put into scope. It talks to a sqlite database to keep track of files that have changed (no need to keep track of how many times they have changed, one change means it needs backed up). Obviously new files go into scope, deleted files get ignored, unless they become new files after being deleted.
A daemon runs (sort of like cron) that checks for new backup jobs, looks at loads and current use of dirty paging and ‘trickles’ them. This is possible because it does not rely on an external tar / bzip / gzip2 helper. No need for things like ‘find’, a list to backup would just be one query away.
If the source or destination directory is an ext3cow file system, this daemon could (and should) trigger a snapshot on files that have changed since its last encounter with the file system.
Backups get deposited in a local spot, or somewhere remotely through various means. SFTP/ftp/etc. NFS would be rather slow for such things, AoE or iSCSI mounts would be treated just like a local landing spot.
This should significantly reduce resources needed to do a proper backup. We’re always doing incremental backups, touching disk far less and have an enormous amount of control over malloc() happy compression utilities. Of course, root would control how ‘aggressive’ to be with backups on a per-user schema, sort of like a credit scheduler.
The idea is not yet fully baked, but I should have some rough code available in my repository in the next few weeks. Last week, I lost /home on my desktop and had a stale backup, I don’t wish to repeat the experience
This would also be a great system for web hosts to use, so I’m pretty eager to get it out the door.