<div dir="ltr">Edit: err gave up after 3 hours, not 30. Wasn't QUITE that bad.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 24 December 2013 13:40, Andrew Furey <span dir="ltr"><<a href="mailto:andrew.furey@gmail.com" target="_blank">andrew.furey@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Thanks Leon,<br><br></div>Yes, interesting read; I tested with ONLY the schema files, not the L0 files, and I get dedupe of 4 as I would have expected.<br>
<br>The block size gave me an idea. ZFS uses 128k by default whereas SDFS is a default 4k; presumably this sort of data requires the lower size for that dedupe ratio.<br>
<br></div>I eventually managed to get a test 100Gb ZFS filesystem with 4k blocks ("zfs set recordsize=4k backup/admin", once I eventually worked that out). The copying process of the 80Gb is far far slower than either of the previous tests (I gave up after nearly 30 hours, it having only finished 30Gb; it was much slower once it hit that second 20Gb file, as I'd expect).<br>
<br></div>A "zpool list" of that then gives<div class="im"><br><br>NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br></div>backup 99.5G 31.7G 67.8G 31% 1.12x ONLINE -<br><br></div>
So it DID make some difference, but it's so much slower and impractical that it looks like I'll stick with my initially-tested sdfs. (At least there is a .deb for it, in GoogleCode, which works well.)<span class="HOEnZb"><font color="#888888"><br>
<br>Andrew<br><div><br></div></font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On 23 December 2013 17:39, Leon Wright <span dir="ltr"><<a href="mailto:techman83@gmail.com" target="_blank">techman83@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div>I've been using ZFS for a while and the deduplication pretty much "Just works" from what I can tell.<br>
<br>root@kitten:/home/leon# zfs list<div><br>NAME USED AVAIL REFER MOUNTPOINT<br></div>
zfs 506G 133G 30K /zfs<br>zfs/data 505G 133G 505G /data<br><br>root@kitten:/home/leon# zpool list<div><br>NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br></div>zfs 496G 353G 143G 71% 1.56x ONLINE -<br>
<br>Filesystem Size Used Avail Use% Mounted on<br>zfs/data 639G 506G 134G 80% /data<br><br></div>I'm using more than the disk size and have 134G free :-)<br><br></div>Though It may depend on the size of the files and the block sizes. This site had some interesting info:<br>
<br><a href="https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment" target="_blank">https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment</a><br><br></div>Leon<br></div><div class="gmail_extra">
<br clear="all">
<div>--<br>DRM 'manages access' in the same way that jail 'manages freedom.'<br><br># cat /dev/mem | strings | grep -i cats<br>Damn, my RAM is full of cats... MEOW!!</div><div><div>
<br><br><div class="gmail_quote">On Mon, Dec 23, 2013 at 5:06 PM, Andrew Furey <span dir="ltr"><<a href="mailto:andrew.furey@gmail.com" target="_blank">andrew.furey@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div>Looks like it does it with hard-linking identical files and relying on most of them not changing (which is what I'm already doing successfully [with scripts by hand] for other aspects of the server backup).<br>
<br>Unfortunately these 25Gb database files are GUARANTEED to change one to another (even 5 minutes apart, they'd have internal log pointers etc that would have changed; they're Informix IDS L0 backup files). Given that a difference of even 1 byte means it needs a different copy of the file...<br>
<br></div>I'm relying on the fact that while SOME of the file will have changed, MUCH of it won't at block level. I just seem to be doing it wrong for ZFS when compared to the compression opendedup obtained (which I would have expected for the data in question).<br>
<br>Further; running "zdb -S backup" to simulate the deduplication with the data, returned all the same numbers; so it looks like it thinks it IS deduping. Might the two systems use differing block sizes for comparison, or something?<span><font color="#888888"><br>
<br></font></span></div><span><font color="#888888">Andrew<br></font></span></div><div class="gmail_extra"><div><div><br><br><div class="gmail_quote">On 23 December 2013 16:25, William Kenworthy <span dir="ltr"><<a href="mailto:billk@iinet.net.au" target="_blank">billk@iinet.net.au</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Rather than dedupe after, is this something dirvish may be better at?<br>
<br>
<a href="http://www.dirvish.org/" target="_blank">http://www.dirvish.org/</a><br>
<br>
BillK<br>
<div><div><br>
<br>
<br>
<br>
<br>
On 23/12/13 15:59, Andrew Furey wrote:<br>
> Hi all,<br>
><br>
> I'm testing different deduplicating filesystems on Wheezy for storing<br>
> database backups (somewhat-compressed database dumps, average of about 25Gb<br>
> times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I<br>
> have a set of 4 days' worth from the same server, of 21Gb each day.<br>
><br>
> I first played with opendedup (aka sdfs) which is Java-based so loads up<br>
> the system a bit when reading and writing (not near as bad on physical as<br>
> on a VM, though). With that, the first file is the full 21Gb or near to,<br>
> while the subsequent ones are a bit smaller - one of them is down to 5.4Gb,<br>
> as reported by a simple du.<br>
><br>
> Next I'm trying ZFS, as something a bit more native would be preferred. I<br>
> have a 1.06Tb raw LVM logical volume, so I run<br>
><br>
> zpool create -O dedup=on backup /dev/VolGroup00/LogVol01<br>
><br>
> zpool list gives:<br>
><br>
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br>
> backup 1.05T 183K 1.05T 0% 1.00x ONLINE -<br>
><br>
> I then create a filesystem device under it (I've tried without it first,<br>
> made no difference to what's coming):<br>
><br>
> zfs create -o dedup=on backup/admin<br>
><br>
> Now zfs list gives:<br>
><br>
> NAME USED AVAIL REFER MOUNTPOINT<br>
> backup 104K 1.04T 21K /backup<br>
> backup/admin 21K 1.04T 21K /backup/admin<br>
><br>
> Looks OK so far.<br>
><br>
> Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as<br>
> before), I only get a dedupe ratio of 1.01x (ie nothing at all):<br>
><br>
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT<br>
> backup 1.05T 78.5G 1001G 7% 1.01x ONLINE -<br>
><br>
> I also found "zdb backup | grep plain", which indicates that there is no<br>
> deduping being done on any files on the disk, including the schema files<br>
> also included (column 7 should be something less than 100):<br>
><br>
> 107 2 16K 128K 2.75M 2.75M 100.00 ZFS plain file<br>
> 108 2 16K 128K 2.13M 2.12M 100.00 ZFS plain file<br>
> 109 1 16K 8K 8K 8K 100.00 ZFS plain file<br>
> 110 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
> 111 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
> 112 1 16K 12.0K 12.0K 12.0K 100.00 ZFS plain file<br>
> 113 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
> 114 4 16K 128K 19.9G 19.9G 100.00 ZFS plain file<br>
> 115 1 16K 512 512 512 100.00 ZFS plain file<br>
> 116 1 16K 8K 8K 8K 100.00 ZFS plain file<br>
> 117 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
> 118 1 16K 9.5K 9.5K 9.5K 100.00 ZFS plain file<br>
> 119 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file<br>
> 120 1 16K 14.5K 14.5K 14.5K 100.00 ZFS plain file<br>
> 121 1 16K 3.50K 3.50K 3.50K 100.00 ZFS plain file<br>
><br>
> 95% of those schema files are in fact identical, so filesystem hard links<br>
> would dedupe them perfectly...<br>
><br>
><br>
> I must be missing something, surely? Or should I just go ahead with<br>
> opendedup and be done with? Any others I should know about (btrfs didn't<br>
> sound terribly stable from what I've been reading)?<br>
><br>
> TIA and Merry Christmas,<br>
> Andrew<br>
><br>
><br>
><br>
</div></div>> _______________________________________________<br>
> PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank">plug@plug.org.au</a><br>
> <a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>
> Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank">committee@plug.org.au</a><br>
> PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>
><br>
<br>
_______________________________________________<br>
PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank">plug@plug.org.au</a><br>
<a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>
Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank">committee@plug.org.au</a><br>
PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>
</blockquote></div><br><br clear="all"><br></div></div><div>-- <br>Linux supports the notion of a command line or a shell for the same<br>reason that only children read books with only pictures in them.<br>Language, be it English or something else, is the only tool flexible<br>
enough to accomplish a sufficiently broad range of tasks.<br> -- Bill Garrett
</div></div>
<br>_______________________________________________<br>
PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank">plug@plug.org.au</a><br>
<a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>
Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank">committee@plug.org.au</a><br>
PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br></blockquote></div><br></div></div></div>
</blockquote></div><br><br clear="all"><br>-- <br>Linux supports the notion of a command line or a shell for the same<br>reason that only children read books with only pictures in them.<br>Language, be it English or something else, is the only tool flexible<br>
enough to accomplish a sufficiently broad range of tasks.<br> -- Bill Garrett
</div>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Linux supports the notion of a command line or a shell for the same<br>reason that only children read books with only pictures in them.<br>Language, be it English or something else, is the only tool flexible<br>
enough to accomplish a sufficiently broad range of tasks.<br> -- Bill Garrett
</div>