<div dir="ltr"><div><div><div>I've been using ZFS for a while and the deduplication pretty much "Just works" from what I can tell.<br><br>root@kitten:/home/leon# zfs list<br>NAME       USED  AVAIL  REFER  MOUNTPOINT<br>

zfs        506G   133G    30K  /zfs<br>zfs/data   505G   133G   505G  /data<br><br>root@kitten:/home/leon# zpool list<br>NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT<br>zfs    496G   353G   143G    71%  1.56x  ONLINE  -<br>

<br>Filesystem            Size Used Avail Use% Mounted on<br>zfs/data              639G  506G  134G  80% /data<br><br></div>I'm using more than the disk size and have 134G free :-)<br><br></div>Though It may depend on the size of the files and the block sizes. This site had some interesting info:<br>

<br><a href="https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment">https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment</a><br><br></div>Leon<br></div><div class="gmail_extra"><br clear="all">

<div>--<br>DRM 'manages access' in the same way that jail 'manages freedom.'<br><br># cat /dev/mem | strings | grep -i cats<br>Damn, my RAM is full of cats... MEOW!!</div>

<br><br><div class="gmail_quote">On Mon, Dec 23, 2013 at 5:06 PM, Andrew Furey <span dir="ltr"><<a href="mailto:andrew.furey@gmail.com" target="_blank">andrew.furey@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div dir="ltr"><div><div>Looks like it does it with hard-linking identical files and relying on most of them not changing (which is what I'm already doing successfully [with scripts by hand] for other aspects of the server backup).<br>


<br>Unfortunately these 25Gb database files are GUARANTEED to change one to another (even 5 minutes apart, they'd have internal log pointers etc that would have changed; they're Informix IDS L0 backup files). Given that a difference of even 1 byte means it needs a different copy of the file...<br>


<br></div>I'm relying on the fact that while SOME of the file will have changed, MUCH of it won't at block level. I just seem to be doing it wrong for ZFS when compared to the compression opendedup obtained (which I would have expected for the data in question).<br>


<br>Further; running "zdb -S backup" to simulate the deduplication with the data, returned all the same numbers; so it looks like it thinks it IS deduping. Might the two systems use differing block sizes for comparison, or something?<span class="HOEnZb"><font color="#888888"><br>


<br></font></span></div><span class="HOEnZb"><font color="#888888">Andrew<br></font></span></div><div class="gmail_extra"><div><div class="h5"><br><br><div class="gmail_quote">On 23 December 2013 16:25, William Kenworthy <span dir="ltr"><<a href="mailto:billk@iinet.net.au" target="_blank">billk@iinet.net.au</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Rather than dedupe after, is this something dirvish may be better at?<br>

<br>

<a href="http://www.dirvish.org/" target="_blank">http://www.dirvish.org/</a><br>

<br>

BillK<br>

<div><div><br>

<br>

<br>

<br>

<br>

On 23/12/13 15:59, Andrew Furey wrote:<br>

> Hi all,<br>

><br>

> I'm testing different deduplicating filesystems on Wheezy for storing<br>

> database backups (somewhat-compressed database dumps, average of about 25Gb<br>

> times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I<br>

> have a set of 4 days' worth from the same server, of 21Gb each day.<br>

><br>

> I first played with opendedup (aka sdfs) which is Java-based so loads up<br>

> the system a bit when reading and writing (not near as bad on physical as<br>

> on a VM, though). With that, the first file is the full 21Gb or near to,<br>

> while the subsequent ones are a bit smaller - one of them is down to 5.4Gb,<br>

> as reported by a simple du.<br>

><br>

> Next I'm trying ZFS, as something a bit more native would be preferred. I<br>

> have a 1.06Tb raw LVM logical volume, so I run<br>

><br>

> zpool create -O dedup=on backup /dev/VolGroup00/LogVol01<br>

><br>

> zpool list gives:<br>

><br>

> NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT<br>

> backup  1.05T   183K  1.05T     0%  1.00x  ONLINE  -<br>

><br>

> I then create a filesystem device under it (I've tried without it first,<br>

> made no difference to what's coming):<br>

><br>

> zfs create -o dedup=on backup/admin<br>

><br>

> Now zfs list gives:<br>

><br>

> NAME           USED  AVAIL  REFER  MOUNTPOINT<br>

> backup         104K  1.04T    21K  /backup<br>

> backup/admin    21K  1.04T    21K  /backup/admin<br>

><br>

> Looks OK so far.<br>

><br>

> Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as<br>

> before), I only get a dedupe ratio of 1.01x (ie nothing at all):<br>

><br>

> NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT<br>

> backup  1.05T  78.5G  1001G     7%  1.01x  ONLINE  -<br>

><br>

> I also found "zdb backup | grep plain", which indicates that there is no<br>

> deduping being done on any files on the disk, including the schema files<br>

> also included (column 7 should be something less than 100):<br>

><br>

>        107    2    16K   128K  2.75M  2.75M  100.00  ZFS plain file<br>

>        108    2    16K   128K  2.13M  2.12M  100.00  ZFS plain file<br>

>        109    1    16K     8K     8K     8K  100.00  ZFS plain file<br>

>        110    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file<br>

>        111    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file<br>

>        112    1    16K  12.0K  12.0K  12.0K  100.00  ZFS plain file<br>

>        113    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file<br>

>        114    4    16K   128K  19.9G  19.9G  100.00  ZFS plain file<br>

>        115    1    16K    512    512    512  100.00  ZFS plain file<br>

>        116    1    16K     8K     8K     8K  100.00  ZFS plain file<br>

>        117    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file<br>

>        118    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file<br>

>        119    1    16K  14.5K  14.5K  14.5K  100.00  ZFS plain file<br>

>        120    1    16K  14.5K  14.5K  14.5K  100.00  ZFS plain file<br>

>        121    1    16K  3.50K  3.50K  3.50K  100.00  ZFS plain file<br>

><br>

> 95% of those schema files are in fact identical, so filesystem hard links<br>

> would dedupe them perfectly...<br>

><br>

><br>

> I must be missing something, surely? Or should I just go ahead with<br>

> opendedup and be done with? Any others I should know about (btrfs didn't<br>

> sound terribly stable from what I've been reading)?<br>

><br>

> TIA and Merry Christmas,<br>

> Andrew<br>

><br>

><br>

><br>

</div></div>> _______________________________________________<br>

> PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank">plug@plug.org.au</a><br>

> <a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>

> Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank">committee@plug.org.au</a><br>

> PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>

><br>

<br>

_______________________________________________<br>

PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank">plug@plug.org.au</a><br>

<a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>

Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank">committee@plug.org.au</a><br>

PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br>

</blockquote></div><br><br clear="all"><br></div></div><div class="im">-- <br>Linux supports the notion of a command line or a shell for the same<br>reason that only children read books with only pictures in them.<br>Language, be it English or something else, is the only tool flexible<br>


enough to accomplish a sufficiently broad range of tasks.<br>                          -- Bill Garrett

</div></div>

<br>_______________________________________________<br>

PLUG discussion list: <a href="mailto:plug@plug.org.au">plug@plug.org.au</a><br>

<a href="http://lists.plug.org.au/mailman/listinfo/plug" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>

Committee e-mail: <a href="mailto:committee@plug.org.au">committee@plug.org.au</a><br>

PLUG Membership: <a href="http://www.plug.org.au/membership" target="_blank">http://www.plug.org.au/membership</a><br></blockquote></div><br></div>