[plug] ZFS and deduplicaton?
Andrew Furey
andrew.furey at gmail.com
Mon Dec 23 09:06:40 UTC 2013
Looks like it does it with hard-linking identical files and relying on most
of them not changing (which is what I'm already doing successfully [with
scripts by hand] for other aspects of the server backup).
Unfortunately these 25Gb database files are GUARANTEED to change one to
another (even 5 minutes apart, they'd have internal log pointers etc that
would have changed; they're Informix IDS L0 backup files). Given that a
difference of even 1 byte means it needs a different copy of the file...
I'm relying on the fact that while SOME of the file will have changed, MUCH
of it won't at block level. I just seem to be doing it wrong for ZFS when
compared to the compression opendedup obtained (which I would have expected
for the data in question).
Further; running "zdb -S backup" to simulate the deduplication with the
data, returned all the same numbers; so it looks like it thinks it IS
deduping. Might the two systems use differing block sizes for comparison,
or something?
Andrew
On 23 December 2013 16:25, William Kenworthy <billk at iinet.net.au> wrote:
> Rather than dedupe after, is this something dirvish may be better at?
>
> http://www.dirvish.org/
>
> BillK
>
>
>
>
>
> On 23/12/13 15:59, Andrew Furey wrote:
> > Hi all,
> >
> > I'm testing different deduplicating filesystems on Wheezy for storing
> > database backups (somewhat-compressed database dumps, average of about
> 25Gb
> > times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I
> > have a set of 4 days' worth from the same server, of 21Gb each day.
> >
> > I first played with opendedup (aka sdfs) which is Java-based so loads up
> > the system a bit when reading and writing (not near as bad on physical as
> > on a VM, though). With that, the first file is the full 21Gb or near to,
> > while the subsequent ones are a bit smaller - one of them is down to
> 5.4Gb,
> > as reported by a simple du.
> >
> > Next I'm trying ZFS, as something a bit more native would be preferred. I
> > have a 1.06Tb raw LVM logical volume, so I run
> >
> > zpool create -O dedup=on backup /dev/VolGroup00/LogVol01
> >
> > zpool list gives:
> >
> > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> > backup 1.05T 183K 1.05T 0% 1.00x ONLINE -
> >
> > I then create a filesystem device under it (I've tried without it first,
> > made no difference to what's coming):
> >
> > zfs create -o dedup=on backup/admin
> >
> > Now zfs list gives:
> >
> > NAME USED AVAIL REFER MOUNTPOINT
> > backup 104K 1.04T 21K /backup
> > backup/admin 21K 1.04T 21K /backup/admin
> >
> > Looks OK so far.
> >
> > Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as
> > before), I only get a dedupe ratio of 1.01x (ie nothing at all):
> >
> > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> > backup 1.05T 78.5G 1001G 7% 1.01x ONLINE -
> >
> > I also found "zdb backup | grep plain", which indicates that there is no
> > deduping being done on any files on the disk, including the schema files
> > also included (column 7 should be something less than 100):
> >
> > 107 2 16K 128K 2.75M 2.75M 100.00 ZFS plain file
> > 108 2 16K 128K 2.13M 2.12M 100.00 ZFS plain file
> > 109 1 16K 8K 8K 8K 100.00 ZFS plain file
> > 110 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
> > 111 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
> > 112 1 16K 12.0K 12.0K 12.0K 100.00 ZFS plain file
> > 113 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
> > 114 4 16K 128K 19.9G 19.9G 100.00 ZFS plain file
> > 115 1 16K 512 512 512 100.00 ZFS plain file
> > 116 1 16K 8K 8K 8K 100.00 ZFS plain file
> > 117 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
> > 118 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
> > 119 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file
> > 120 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file
> > 121 1 16K 3.50K 3.50K 3.50K 100.00 ZFS plain file
> >
> > 95% of those schema files are in fact identical, so filesystem hard links
> > would dedupe them perfectly...
> >
> >
> > I must be missing something, surely? Or should I just go ahead with
> > opendedup and be done with? Any others I should know about (btrfs didn't
> > sound terribly stable from what I've been reading)?
> >
> > TIA and Merry Christmas,
> > Andrew
> >
> >
> >
> > _______________________________________________
> > PLUG discussion list: plug at plug.org.au
> > http://lists.plug.org.au/mailman/listinfo/plug
> > Committee e-mail: committee at plug.org.au
> > PLUG Membership: http://www.plug.org.au/membership
> >
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
>
--
Linux supports the notion of a command line or a shell for the same
reason that only children read books with only pictures in them.
Language, be it English or something else, is the only tool flexible
enough to accomplish a sufficiently broad range of tasks.
-- Bill Garrett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20131223/3f29c2ca/attachment.html>
More information about the plug
mailing list