Moving to GitHub, slowly

The software on this pages will slowly be moved to GitHub https://github.com/hilbix/. The CVS repository will be migrated to GIT as well, so the history will be preserved, a bit. See FAQ.

Scylla and Charybdis, md5backup - Tools

The tools are developed under Linux with ESR's paradigm release early, release often in mind.
So you can consider this beta software, or alpha, or pre-alpha, or even worse ;)

Have a look in the download directory for all downloads.
As always here, all you get is the source. No binaries here.

md5backup 0.3.15-20061003-231645

Interim backup tool (look into latest version, download latest version 0.3.15-20061003-231645)

md5backup is an interim filesystem to filesystem backup on my way to build a backup utility which suits all my needs. To use it, you need a second hard drive which is used as the backup media. Optionally you can do a networked backup, too, however this feature is not completely developed today.

Currently no metadata is backed up. It only protects the data inside files. This currently is not able to backup sparse files (most often databases which must be backed up by other means or cache files which can be ignored safely).

Please note, that there is no real restore function yet! There now is bin/md5restore.sh which can be used to restore a file interactively, but it is painfully slow and braindead to use. So you definitively don't want to restore complete with it yet and must be root to use it. Have a look at doc/restore.txt as well.

Sorry, there is no Wiki/FAQ/etc. yet. If I ever find some time I will prepare one.

md5backup is usable today, I backup all my production Internet servers with it which run RedHat 9, SuSe 7.2, SuSE 9.0 and Debian Sarge. Just run bin/dobackup.sh, this shall setup everything for you, too. Networked backup is possible too using scylla+charybdis, please look into the announcements of 0.3.10 below.

The main feature of md5backup is, that it stores the files under their content's MD5 sum, such that you can check the files integrity on the backup volume easily.

Please note that md5backup was written such, that it shall work reliable in any circumstances. However I cannot give you any guarantee that it can protects your valuable data! However I trust it. All scripts I use can be found in the bin/ directory. For more information, have a look into the doc/ directory and read sc-backup.txt.

New for the upcomming 0.4.x:

You need SQLite to compile md5backup (local copy of SQLite source code).

History:

version 0.3.15-20061003-231645

download (212859 bytes)

This version now has two new preliminary scripts:

1. There now is a restore script bin/md5restore.sh

2. To setup networking there now is bin/sc-setup.sh

Little bit else changed. The scripts are not completely ready yet!

The restore is not capable to restore the metadata of a file. And this script is not thought to restore directories. Also be aware of the fact that sparse files are not yet backed up.

You need scylla-charybdis compiled to use sc-setup.sh

version 0.3.14-20050306-002847

download (132389 bytes)

Some medium restructuring in some central routines done. Multiple file store added (my backup archive became full).

This new feature works, but is nearly untested (as always).

Additional readonly directories named outN where N is a number starting from 0 are searched for existing backed up files, too, like the out directory. This way you can (manually) move old data from out/ into another directory to extend the harddrive space for backup.

Files which are considered new (active data) are copied back into the main file store.

version 0.3.13-20050220-164446

download (124530 bytes)

Wildcard ignores added. As always, this new feature is not much tested.

Ignores, which are listed in a file, now can start with a ? (the ? is skipped) which enables wildcard matching. Wildcards are:

Allquantor (*), Existquantor (?) and variants ([...]):
First character ^ inverts content
First matching character can be anything, so []] matches ] and [[] matches [
a-b matches a to b including a and b (a<=b)
b-a matches a to b excluding a and b (a<b)
Example: []-]-] matches ] or - (this is a-b and -)

Wildcard ignores are matched at backup time. Until I manage to create a better O(n) regexp parser which suits all my needs, this eats O(n*m) CPU, as all (m) ignores run over each (n) file names found.

These new type of ignores also solves a problem with "normal" ignores (which are much faster) which are processed at startup time. The "normal" ignores sometimes produce "amazing" ignores on frequently changing files, as inode numbers are matched, so the ignore "hits" a wrong file (the file the inode became at the time the backup process reaches the file).

version 0.3.12-20041005-050410

download (99040 bytes)

"nice", security lack fixed, new sparse files handling, bin/compare.sh

md5backup now automatically nices itself and uses file flushs. Also the backed up files are no more readable by others. It ignores sparse files and the source has been reorganized internally, some routines have moved into tinolib.

There is bin/compare.sh which can be used as a template for a restore script! Just copy the script and replace the MODE=compare line by MODE=restore and be sure to have understood what you do (else you will miss the second safety belt). Also bin/compare.sh can check if the backup really worked ;)

Sparse files are skipped now if they are too big (over 1 MB) and are too sparse (75%). The problem with those is, they have too few data in it. The drawback is, that they are no more backed up until I have added some more efficient sparse file support. Think of following: Create a 200 TB file on a 64 bit filesystem. Add one block of data somewhere in it. Now do the backup. If you are able to process 1 GB/s (which is extremely fast) it still takes over 1 day just to hunt for this silly block of data.

The security thingie is, that the files in the out/ directory were globally readable. That really makes no sense but I am usually alone on my machines, so this did not harm me. Be sure to do
chmod -R o-rwx /backup/md5backup

The nice is a step in my continuing effort to make md5backup less invasive for the system. A backup system shall run in background and shall not use up a lot resources when it runs - md5backup does not reach this goal by far.

The nice seems to help when the filesystem is mounted without the option noatime and the harddrive is somewhat slow (as harddrives are, YKWIM). The frequent directory inode flushes of the backup process can hinder other processes from IO. Without the nice, md5backup gets a too high scheduling priority as it always runs as root.

What I would like is to (Posixly correct) scan the directory tree without 'accessing' it and to only use "background IO" (this is when the harddrive else is idle). Did not find a method for this yet, AFAICS (if it is not already present) there should be a process capability to do so, so the process can request from the kernel if it is allowed to scan the directories without leaving a trace (which the kernel grants or not) and to become a "nulltask" for a resource, this is, it only runs if the resource is not used by other processes.

(This text should go into a Wiki, but currently I do not have one.)

version 0.3.11-20040930-013306

download (81692 bytes)

Bugfix for sc-loop.sh: It simply did not work, oops ;)

sc-backup.sh is broken anyways. Following three scripts should run independently from each other:

The backup process backing up MySQL (sc-mysql.sh)
The backup process backing up files (dobackup.sh)
The network process, transporting files (sc-move.sh)

However sc-backup.sh (and therefor sc-loop.sh) calls them one after another. This way a network starvation slows down the backup cycle extremely. Bad design as it is, keep that in mind. I now "improved" sc-backup.sh a little bit such that the loop does not completely stop when the network is down (but it can take ages), so the left over data is hopefully transferred at the next cycle. Leave improvemnts of sc-backup.sh and/or sc-loop.sh for the future.

However you can always invent your own scripts or run the three scripts noted above from cron, of course.

[view more history] [view complete history]

License and Disclaimer

All you can see here is free software according to the GNU GPL.
Copyright (C)2000-2011 by Valentin Hilbig
Note that the software comes with absolutely no warranty of any kind.
You use the software at your own risk.
Valentin Hilbig cannot be hold responsible for any unintended damage,
lost data or malfunction of the software you can find here.

Last modified: 2011-09-12 by Valentin Hilbig [ Imprint / Impressum ]