[plug] ZFS and deduplicaton?
Andrew Furey
andrew.furey at gmail.com
Mon Dec 23 07:59:22 UTC 2013
Hi all,
I'm testing different deduplicating filesystems on Wheezy for storing
database backups (somewhat-compressed database dumps, average of about 25Gb
times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I
have a set of 4 days' worth from the same server, of 21Gb each day.
I first played with opendedup (aka sdfs) which is Java-based so loads up
the system a bit when reading and writing (not near as bad on physical as
on a VM, though). With that, the first file is the full 21Gb or near to,
while the subsequent ones are a bit smaller - one of them is down to 5.4Gb,
as reported by a simple du.
Next I'm trying ZFS, as something a bit more native would be preferred. I
have a 1.06Tb raw LVM logical volume, so I run
zpool create -O dedup=on backup /dev/VolGroup00/LogVol01
zpool list gives:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backup 1.05T 183K 1.05T 0% 1.00x ONLINE -
I then create a filesystem device under it (I've tried without it first,
made no difference to what's coming):
zfs create -o dedup=on backup/admin
Now zfs list gives:
NAME USED AVAIL REFER MOUNTPOINT
backup 104K 1.04T 21K /backup
backup/admin 21K 1.04T 21K /backup/admin
Looks OK so far.
Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as
before), I only get a dedupe ratio of 1.01x (ie nothing at all):
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backup 1.05T 78.5G 1001G 7% 1.01x ONLINE -
I also found "zdb backup | grep plain", which indicates that there is no
deduping being done on any files on the disk, including the schema files
also included (column 7 should be something less than 100):
107 2 16K 128K 2.75M 2.75M 100.00 ZFS plain file
108 2 16K 128K 2.13M 2.12M 100.00 ZFS plain file
109 1 16K 8K 8K 8K 100.00 ZFS plain file
110 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
111 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
112 1 16K 12.0K 12.0K 12.0K 100.00 ZFS plain file
113 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
114 4 16K 128K 19.9G 19.9G 100.00 ZFS plain file
115 1 16K 512 512 512 100.00 ZFS plain file
116 1 16K 8K 8K 8K 100.00 ZFS plain file
117 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
118 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file
119 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file
120 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file
121 1 16K 3.50K 3.50K 3.50K 100.00 ZFS plain file
95% of those schema files are in fact identical, so filesystem hard links
would dedupe them perfectly...
I must be missing something, surely? Or should I just go ahead with
opendedup and be done with? Any others I should know about (btrfs didn't
sound terribly stable from what I've been reading)?
TIA and Merry Christmas,
Andrew
--
Linux supports the notion of a command line or a shell for the same
reason that only children read books with only pictures in them.
Language, be it English or something else, is the only tool flexible
enough to accomplish a sufficiently broad range of tasks.
-- Bill Garrett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20131223/d82bc4a7/attachment.html>
More information about the plug
mailing list