[plug] ZFS and deduplicaton?

Mon Dec 23 09:39:51 UTC 2013

I've been using ZFS for a while and the deduplication pretty much "Just
works" from what I can tell.

root at kitten:/home/leon# zfs list
NAME       USED  AVAIL  REFER  MOUNTPOINT
zfs        506G   133G    30K  /zfs
zfs/data   505G   133G   505G  /data

root at kitten:/home/leon# zpool list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zfs    496G   353G   143G    71%  1.56x  ONLINE  -

Filesystem            Size Used Avail Use% Mounted on
zfs/data              639G  506G  134G  80% /data

I'm using more than the disk size and have 134G free :-)

Though It may depend on the size of the files and the block sizes. This
site had some interesting info:

https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment

Leon

--
DRM 'manages access' in the same way that jail 'manages freedom.'

# cat /dev/mem | strings | grep -i cats
Damn, my RAM is full of cats... MEOW!!

On Mon, Dec 23, 2013 at 5:06 PM, Andrew Furey <andrew.furey at gmail.com>wrote:

> Looks like it does it with hard-linking identical files and relying on
> most of them not changing (which is what I'm already doing successfully
> [with scripts by hand] for other aspects of the server backup).
>
> Unfortunately these 25Gb database files are GUARANTEED to change one to
> another (even 5 minutes apart, they'd have internal log pointers etc that
> would have changed; they're Informix IDS L0 backup files). Given that a
> difference of even 1 byte means it needs a different copy of the file...
>
> I'm relying on the fact that while SOME of the file will have changed,
> MUCH of it won't at block level. I just seem to be doing it wrong for ZFS
> when compared to the compression opendedup obtained (which I would have
> expected for the data in question).
>
> Further; running "zdb -S backup" to simulate the deduplication with the
> data, returned all the same numbers; so it looks like it thinks it IS
> deduping. Might the two systems use differing block sizes for comparison,
> or something?
>
> Andrew
>
>
> On 23 December 2013 16:25, William Kenworthy <billk at iinet.net.au> wrote:
>
>> Rather than dedupe after, is this something dirvish may be better at?
>>
>> http://www.dirvish.org/
>>
>> BillK
>>
>>
>>
>>
>>
>> On 23/12/13 15:59, Andrew Furey wrote:
>> > Hi all,
>> >
>> > I'm testing different deduplicating filesystems on Wheezy for storing
>> > database backups (somewhat-compressed database dumps, average of about
>> 25Gb
>> > times 12 clients, ideally 30 days worth, so 9 terabytes raw). To test I
>> > have a set of 4 days' worth from the same server, of 21Gb each day.
>> >
>> > I first played with opendedup (aka sdfs) which is Java-based so loads up
>> > the system a bit when reading and writing (not near as bad on physical
>> as
>> > on a VM, though). With that, the first file is the full 21Gb or near to,
>> > while the subsequent ones are a bit smaller - one of them is down to
>> 5.4Gb,
>> > as reported by a simple du.
>> >
>> > Next I'm trying ZFS, as something a bit more native would be preferred.
>> I
>> > have a 1.06Tb raw LVM logical volume, so I run
>> >
>> > zpool create -O dedup=on backup /dev/VolGroup00/LogVol01
>> >
>> > zpool list gives:
>> >
>> > NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> > backup  1.05T   183K  1.05T     0%  1.00x  ONLINE  -
>> >
>> > I then create a filesystem device under it (I've tried without it first,
>> > made no difference to what's coming):
>> >
>> > zfs create -o dedup=on backup/admin
>> >
>> > Now zfs list gives:
>> >
>> > NAME           USED  AVAIL  REFER  MOUNTPOINT
>> > backup         104K  1.04T    21K  /backup
>> > backup/admin    21K  1.04T    21K  /backup/admin
>> >
>> > Looks OK so far.
>> >
>> > Trouble is, when I copy my 80Gb-odd set to it with plain rsync (same as
>> > before), I only get a dedupe ratio of 1.01x (ie nothing at all):
>> >
>> > NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
>> > backup  1.05T  78.5G  1001G     7%  1.01x  ONLINE  -
>> >
>> > I also found "zdb backup | grep plain", which indicates that there is no
>> > deduping being done on any files on the disk, including the schema files
>> > also included (column 7 should be something less than 100):
>> >
>> >        107    2    16K   128K  2.75M  2.75M  100.00  ZFS plain file
>> >        108    2    16K   128K  2.13M  2.12M  100.00  ZFS plain file
>> >        109    1    16K     8K     8K     8K  100.00  ZFS plain file
>> >        110    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file
>> >        111    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file
>> >        112    1    16K  12.0K  12.0K  12.0K  100.00  ZFS plain file
>> >        113    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file
>> >        114    4    16K   128K  19.9G  19.9G  100.00  ZFS plain file
>> >        115    1    16K    512    512    512  100.00  ZFS plain file
>> >        116    1    16K     8K     8K     8K  100.00  ZFS plain file
>> >        117    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file
>> >        118    1    16K   9.5K   9.5K   9.5K  100.00  ZFS plain file
>> >        119    1    16K  14.5K  14.5K  14.5K  100.00  ZFS plain file
>> >        120    1    16K  14.5K  14.5K  14.5K  100.00  ZFS plain file
>> >        121    1    16K  3.50K  3.50K  3.50K  100.00  ZFS plain file
>> >
>> > 95% of those schema files are in fact identical, so filesystem hard
>> links
>> > would dedupe them perfectly...
>> >
>> >
>> > I must be missing something, surely? Or should I just go ahead with
>> > opendedup and be done with? Any others I should know about (btrfs didn't
>> > sound terribly stable from what I've been reading)?
>> >
>> > TIA and Merry Christmas,
>> > Andrew
>> >
>> >
>> >
>> > _______________________________________________
>> > PLUG discussion list: plug at plug.org.au
>> > http://lists.plug.org.au/mailman/listinfo/plug
>> > Committee e-mail: committee at plug.org.au
>> > PLUG Membership: http://www.plug.org.au/membership
>> >
>>
>> _______________________________________________
>> PLUG discussion list: plug at plug.org.au
>> http://lists.plug.org.au/mailman/listinfo/plug
>> Committee e-mail: committee at plug.org.au
>> PLUG Membership: http://www.plug.org.au/membership
>>
>
>
>
> --
> Linux supports the notion of a command line or a shell for the same
> reason that only children read books with only pictures in them.
> Language, be it English or something else, is the only tool flexible
> enough to accomplish a sufficiently broad range of tasks.
>                           -- Bill Garrett
>
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20131223/03577539/attachment.html>