[plug] Backups

Craig Ringer craig at postnewspapers.com.au
Tue Jul 27 17:42:58 WST 2004


Marc Wiriadisastra wrote:
> Sorry to not clarify since I've "never" backed up network data under 
> linux before.

No worries. That's why I asked ;-)

> I would like full whole harddrive back up once a week for the sake of if 
> the system all goes to the preverbial out house.

[note: reading my own reply, it becomes clear I'm not good at clear and 
simple answers. Sorry. I lack the time to edit this down right now, so I 
hope it's some use to you as-is. Please forgive all the assumptions.].

OK, so we're talking a full system snapshot there. Any idea what media 
you'll be using for that - do you want to try to fit it on a DVD, or 
will you be using tapes or perhaps hard disks?

My personal inclination with those backups is toward using something 
like LVM (Linux Volume Manager) to take a filesystem snapshot, then 
copying the snapshot onto the backup media as individual files..  This 
won't work so well for a DVD or tape backup though, I'm used to using 
SATA disks for my system snapshots.

For DVD or tape (ie non-random-access media), my inclination would be to 
use an uncompressed (if possible) `tar` or `star` archive. It's slow for 
partial restores and searches, but pretty quick for a full restore, and 
it's going to retain almost all info on the system. James' suggestion of 
`dump` also bears thinking about here. James: is `dump` capable of 
dumping a consistent point-in-time copy of a filesystem that's in 
read/write use, or must measures like read-only remounts or LVM 
snapshots be used?

Note that unless you're using something like LVM that permits you to 
make a 'frozen in time' view of a filesystem while continuing to work on 
it, you'll probably need to bring most services down to make a 
consistent system snapshot.

> I would also like nightly incremental backups with one weekly backup of 
> the following folders being the "usual" ones and then rotate those over 
> a weekly basis e.g. keep for 1 week or maybe longer I don't know.
> 
> /etc/ /home /workdir  /var

... and some periodic data and configuration backups. This is a classic 
case for `tar`, `pax`, or `cpio` (with `tar` being the most common among 
Linux users it seems) and I'd use one of those unless I had a good 
reason to do otherwise. There should be no problem with creating DVD 
images containing tar files and using those for your backups. You also 
have the option of just storing individual files on your DVD images, but 
you may run into "fun" with permissions, deep directory trees, and long 
file names with this approach.

If you're thinking about differential or incremental backups intead, 
then `dump` might again be a reasonable option, but I don't know enough 
about it to say.


I've realised that one aspect of this that might need clarifying is what 
all these different backup methods we're talking about /are/. We've 
covered what the differences in functionality are, but not so much 
anything else.

The main classes of backups we've discussed are:

- File-by-file copies
- Archives
- Filesystem images
- Filesystem dumps

Don't worry if not all of this makes sense. (a) I'm rather bad at 
explaining things clearly, and (b) you probably don't need to know it 
all in detail anyway. If you're interested, though, I hope this info 
helps clarify what we've been talking about with `dump`, `dd`, `tar`, etc.

A file by file copy is the simplest. It's a backup where you simply copy 
your files to some other volume, usually a normal filesystem on 
random-I/O media like another hard disk, but it can also be an ISO9660 
or UDF image for a CD or DVD. You might, for example, plug in a second 
hard disk, partition it to be the same as your current hard disk, make 
filesystems on the partitions, and copy all the files across. Just to 
confuse things even more, archiver programs are often used to copy files 
between the source and destination because they're usually more 
efficient than using 'cp' - but despite the use of an archiver program, 
it's still just a file-by-file backup.

Archives involve collecting all the files to be backed up into a large 
archive file (a Zip file is a type of archive). This file contains not 
only the file data, but their metadata - file name, last accessed time, 
permissions, ownership, etc. Archives are also often compressed. Common 
archiving tools include `tar`, `pax`, `cpio`, and `star` with `tar` 
being by far the most commonly used on Linux. Archives are probably the 
most portable approach, in that it's often possible to restore them to 
almost any kind of computer, but it can be slow and painful to do a 
restore of only a few files out of a large archive.

A filesystem image is usually a byte-for-byte clone of the entire 
filesystem data, including internal filesystem structures, free disk 
space contents, etc. `dd` is often used to create these images. This is 
NOT the same as programs like Norton Ghost and Partimage, which are 
somewhere between raw filesystem images and filesystem dumps.

Some filesystem imaging tools - like the aforementioned partimage and 
Ghost - understand the filesystem to some extent and can do things like 
leave out free space. These are halfway between dumpers and imaging 
tools, really.

Dumpers ... well, James knows lots more than I do about these, but I'll 
give you a quick summary. In general, they're programs that understand a 
particular filesystem and can copy all the important information from it 
into a form that can later be restored, without copying the parts that 
aren't useful like the contents of empty space.

Neither dumpers nor filesystem imagers will do you any good if you want 
to back up a remote filesystem mounted over NFS, SMB, etc.

I think we've covered the upsides and downsides of these various 
approaches reasonably well before, so I won't go into that now.


Nick Bannon wrote:
 > On Tue, Jul 27, 2004 at 04:04:26PM +0800, Marc Wiriadisastra wrote:
 >> granted its not used constantly and I'm not worried about but its
 >> just a situation where I don't want to lose data and the just in
 >> case a murphy's stupid laws is what concerns me.
 >
 > The only way you can have some peace of mind there is, every so often,
 > to pretend that you've just lost your hard disc and to test a restore
 > using only your backup copy.

Yep. Alas, just doing a restore isn't good enough. You really need to 
then /use/ the restored machine for your live services*. Fail to do 
that, and I guarantee you'll discover you've missed something critical 
as soon as you really need it.

* I haven't yet followed my own advice here. A _full_ restore here will 
take 12 + hours (though it's < 1 hour from bare metal to working 
critical services), and I lack a spare machine with sufficient grunt to 
handle the main server role. Machines with 2GB or more of RAM and 750GB 
RAID arrays are not something I have just lying around. Both of these 
issues mean that my own backups are not up to scratch - mostly due to 
lack of available funds to do the proper testing. *sigh*.

--
Craig Ringer




More information about the plug mailing list