Disk space may be cheep but off-site backups are often slow so daily full-backups aren’t always the best option. Tar supports the highly efficient “incremental” backup scheme and it’s simple to use.
Take a look at the example below:
tar -czf backup-day-$(date +%w).tar.gz --listed-incremental=/root/tar.snapshot /etc
The first time you run the command above, a full backup will be created containing all files and directories in the “/etc” directory. Because the “/root/tar.snapshot” file doesn’t exist yet, the backup will include all files in the “/etc” directory. Once finished, the file “/root/tar.snapshot” will contain a list of files included in the archive.
The second time you run the above command (on a different day to the first time you ran it) a new archive will be created containing only files and directories that have changed since the first (previous) time. How does tar know what’s changed? It uses the “/root/tar.snapshot” file. When this archive is created, tar updates the “/root/tar.snapshot” with information about changed files.
The third time you run the above command (again, on a different day to the previous time), a new archive will be created containing only files and directories that have changed since the previous run. Again it uses the “/root/tar.snapshot” to keep track of which files need to be added to the new archive and which files don’t need to be added.
On the first day of the week (perhaps Sunday or Monday), the “/root/tar.snapshot” file should be removed before the tar command is run. That way, a new full backup archive is created.
Consider the effect of overwriting the previous weeks full backup archive. The “incremental” backups will still exist with files that have changed over the past week but the full backup will include everything as of the time it was taken. Any file changed over the past week will be in the incremental backup archives.
Here’s walk-through starting with no backups at all:
- On Monday a full backup is made. There are no incremental backups. All files regardless of when they were created are included in this backup.
- On Tuesday an incremental backup is made including files and directories that are not in the full backup or have changed since the full backup.
- On Wednesday through to Sunday the same process happens. Each day contains files and directories that have been added or changed since the previous day’s backup.
- On Monday, a new full backup is made overwriting the previous full backup but not overwriting the existing incremental backups. At this point there is a new full backup and 6 daily incremental backups.
It’s now Monday and the new full backup has been created.
- If a file is requested to be restored from the previous Wednesday’s backup, the incremental file created at the end of the previous Wednesday still exists and the file can be restored provided that the file was changed or added on the previous Wednesday.
- If a file is requested to be restored from the previous Wednesday’s backup (or any day of the previous week) that wasn’t changed or added during the previous week, that file cannot be restored. It cannot be restored because the previous weeks full backup has been removed and replaced with a new full backup archive.
- To work-around point 2 above, you can a) never remove the full backup and continue to create incremental backups, b) keep the previous weeks full backup for an additional week before removal, or c) accept the reality that some files can’t be restored once the next full backup has been created.
Of course, if you’re storing backups on tap or external removable media, the issue of overwriting the previous full backup may not be an issue. But if you’re writing to a network location, the above conundrum might be a pain to solve. Given the option, i’d choose to keep two consecutive weeks of full backups.