[plug] Backups
Craig Ringer
craig at postnewspapers.com.au
Tue Jul 27 17:42:58 WST 2004
Marc Wiriadisastra wrote:
> Sorry to not clarify since I've "never" backed up network data under
> linux before.
No worries. That's why I asked ;-)
> I would like full whole harddrive back up once a week for the sake of if
> the system all goes to the preverbial out house.
[note: reading my own reply, it becomes clear I'm not good at clear and
simple answers. Sorry. I lack the time to edit this down right now, so I
hope it's some use to you as-is. Please forgive all the assumptions.].
OK, so we're talking a full system snapshot there. Any idea what media
you'll be using for that - do you want to try to fit it on a DVD, or
will you be using tapes or perhaps hard disks?
My personal inclination with those backups is toward using something
like LVM (Linux Volume Manager) to take a filesystem snapshot, then
copying the snapshot onto the backup media as individual files.. This
won't work so well for a DVD or tape backup though, I'm used to using
SATA disks for my system snapshots.
For DVD or tape (ie non-random-access media), my inclination would be to
use an uncompressed (if possible) `tar` or `star` archive. It's slow for
partial restores and searches, but pretty quick for a full restore, and
it's going to retain almost all info on the system. James' suggestion of
`dump` also bears thinking about here. James: is `dump` capable of
dumping a consistent point-in-time copy of a filesystem that's in
read/write use, or must measures like read-only remounts or LVM
snapshots be used?
Note that unless you're using something like LVM that permits you to
make a 'frozen in time' view of a filesystem while continuing to work on
it, you'll probably need to bring most services down to make a
consistent system snapshot.
> I would also like nightly incremental backups with one weekly backup of
> the following folders being the "usual" ones and then rotate those over
> a weekly basis e.g. keep for 1 week or maybe longer I don't know.
>
> /etc/ /home /workdir /var
... and some periodic data and configuration backups. This is a classic
case for `tar`, `pax`, or `cpio` (with `tar` being the most common among
Linux users it seems) and I'd use one of those unless I had a good
reason to do otherwise. There should be no problem with creating DVD
images containing tar files and using those for your backups. You also
have the option of just storing individual files on your DVD images, but
you may run into "fun" with permissions, deep directory trees, and long
file names with this approach.
If you're thinking about differential or incremental backups intead,
then `dump` might again be a reasonable option, but I don't know enough
about it to say.
I've realised that one aspect of this that might need clarifying is what
all these different backup methods we're talking about /are/. We've
covered what the differences in functionality are, but not so much
anything else.
The main classes of backups we've discussed are:
- File-by-file copies
- Archives
- Filesystem images
- Filesystem dumps
Don't worry if not all of this makes sense. (a) I'm rather bad at
explaining things clearly, and (b) you probably don't need to know it
all in detail anyway. If you're interested, though, I hope this info
helps clarify what we've been talking about with `dump`, `dd`, `tar`, etc.
A file by file copy is the simplest. It's a backup where you simply copy
your files to some other volume, usually a normal filesystem on
random-I/O media like another hard disk, but it can also be an ISO9660
or UDF image for a CD or DVD. You might, for example, plug in a second
hard disk, partition it to be the same as your current hard disk, make
filesystems on the partitions, and copy all the files across. Just to
confuse things even more, archiver programs are often used to copy files
between the source and destination because they're usually more
efficient than using 'cp' - but despite the use of an archiver program,
it's still just a file-by-file backup.
Archives involve collecting all the files to be backed up into a large
archive file (a Zip file is a type of archive). This file contains not
only the file data, but their metadata - file name, last accessed time,
permissions, ownership, etc. Archives are also often compressed. Common
archiving tools include `tar`, `pax`, `cpio`, and `star` with `tar`
being by far the most commonly used on Linux. Archives are probably the
most portable approach, in that it's often possible to restore them to
almost any kind of computer, but it can be slow and painful to do a
restore of only a few files out of a large archive.
A filesystem image is usually a byte-for-byte clone of the entire
filesystem data, including internal filesystem structures, free disk
space contents, etc. `dd` is often used to create these images. This is
NOT the same as programs like Norton Ghost and Partimage, which are
somewhere between raw filesystem images and filesystem dumps.
Some filesystem imaging tools - like the aforementioned partimage and
Ghost - understand the filesystem to some extent and can do things like
leave out free space. These are halfway between dumpers and imaging
tools, really.
Dumpers ... well, James knows lots more than I do about these, but I'll
give you a quick summary. In general, they're programs that understand a
particular filesystem and can copy all the important information from it
into a form that can later be restored, without copying the parts that
aren't useful like the contents of empty space.
Neither dumpers nor filesystem imagers will do you any good if you want
to back up a remote filesystem mounted over NFS, SMB, etc.
I think we've covered the upsides and downsides of these various
approaches reasonably well before, so I won't go into that now.
Nick Bannon wrote:
> On Tue, Jul 27, 2004 at 04:04:26PM +0800, Marc Wiriadisastra wrote:
>> granted its not used constantly and I'm not worried about but its
>> just a situation where I don't want to lose data and the just in
>> case a murphy's stupid laws is what concerns me.
>
> The only way you can have some peace of mind there is, every so often,
> to pretend that you've just lost your hard disc and to test a restore
> using only your backup copy.
Yep. Alas, just doing a restore isn't good enough. You really need to
then /use/ the restored machine for your live services*. Fail to do
that, and I guarantee you'll discover you've missed something critical
as soon as you really need it.
* I haven't yet followed my own advice here. A _full_ restore here will
take 12 + hours (though it's < 1 hour from bare metal to working
critical services), and I lack a spare machine with sufficient grunt to
handle the main server role. Machines with 2GB or more of RAM and 750GB
RAID arrays are not something I have just lying around. Both of these
issues mean that my own backups are not up to scratch - mostly due to
lack of available funds to do the proper testing. *sigh*.
--
Craig Ringer
More information about the plug
mailing list