<div dir="ltr"><div><div><div><div><div><div><div><div>Hi all,<br><br></div>I'm testing different deduplicating filesystems on Wheezy for storing database backups (somewhat-compressed database dumps, average of about 25Gb times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I have a set of 4 days' worth from the same server, of 21Gb each day.<br>
<br>I first played with opendedup (aka sdfs) which is Java-based so loads up the system a bit when reading and writing (not near as bad on physical as on a VM, though). With that, the first file is the full 21Gb or near to, while the subsequent ones are a bit smaller - one of them is down to 5.4Gb, as reported by a simple du.<br>
<br></div>Next I'm trying ZFS, as something a bit more native would be preferred. I have a 1.06Tb raw LVM logical volume, so I run<br><br>zpool create -O dedup=on backup /dev/VolGroup00/LogVol01<br><br></div>zpool list gives:<br>
<br>NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br>backup 1.05T 183K 1.05T 0% 1.00x ONLINE -<br><br></div>I then create a filesystem device under it (I've tried without it first, made no difference to what's coming):<br>
<br>zfs create -o dedup=on backup/admin<br><br></div>Now zfs list gives:<br><br>NAME USED AVAIL REFER MOUNTPOINT<br>backup 104K 1.04T 21K /backup<br>backup/admin 21K 1.04T 21K /backup/admin<br>
<br></div>Looks OK so far.<br><br>Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as before), I only get a dedupe ratio of 1.01x (ie nothing at all):<br><br></div><div>NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br>
backup 1.05T 78.5G 1001G 7% 1.01x ONLINE -<br></div><div><br></div>
I also found "zdb backup | grep plain", which indicates that there is no deduping being done on any files on the disk, including the schema files also included (column 7 should be something less than 100):<br><br>
</div> 107 2 16K 128K 2.75M 2.75M 100.00 ZFS plain file<br> 108 2 16K 128K 2.13M 2.12M 100.00 ZFS plain file<br> 109 1 16K 8K 8K 8K 100.00 ZFS plain file<br> 110 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
111 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br> 112 1 16K 12.0K 12.0K 12.0K 100.00 ZFS plain file<br> 113 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br> 114 4 16K 128K 19.9G 19.9G 100.00 ZFS plain file<br>
115 1 16K 512 512 512 100.00 ZFS plain file<br> 116 1 16K 8K 8K 8K 100.00 ZFS plain file<br> 117 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br> 118 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
119 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file<br> 120 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file<br> 121 1 16K 3.50K 3.50K 3.50K 100.00 ZFS plain file<br><div><br>
95% of those schema files are in fact identical, so filesystem hard links would dedupe them perfectly...<br>
<div><div><div><div><div><br><br></div><div>I must be missing something, surely? Or should I just go ahead with opendedup and be done with? Any others I should know about (btrfs didn't sound terribly stable from what I've been reading)?<br>
<br></div><div>TIA and Merry Christmas,<br></div><div>Andrew<br clear="all"></div><div><div><div><div><div><br>-- <br>Linux supports the notion of a command line or a shell for the same<br>reason that only children read books with only pictures in them.<br>
Language, be it English or something else, is the only tool flexible<br>
enough to accomplish a sufficiently broad range of tasks.<br> -- Bill Garrett
</div></div></div></div></div></div></div></div></div></div></div>