<div dir="auto">Thanks for that Brad<div dir="auto"><br></div><div dir="auto">I did some testing a number of years ago when I got my raid controller.</div><div dir="auto">I was benchmarking xfs and ext4 also.</div><div dir="auto"><br></div><div dir="auto">From memory xfs came out on top also in those cases.</div><div dir="auto">It also seemed to have the benefit of a dedicated filesystem backup tool in one of the xfs packages. I think it's called xfsdump now. It might be a good idea to look at using that.</div><div dir="auto"><br><br><div data-smartmail="gmail_signature" dir="auto">from my Tablet</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 2 Feb. 2020, 2:42 pm Brad Campbell, <<a href="mailto:brad@fnarfbargle.com">brad@fnarfbargle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 15/1/20 10:35 pm, Brad Campbell wrote:<br>

> On 9/1/20 2:12 pm, Brad Campbell wrote:<br>

>> On 8/1/20 13:39, Byron Hammond wrote:<br>

>>> I'm keeping my eye on this thread with great interest.<br>

>>><br>

>>> I'm really curious to see what your findings are and how you got there.<br>

>>><br>

>><br>

>> It will be interesting. I'll say in response to "how you got there", <br>

>> the current answer is _slowly_.<br>

> <br>

> Untarring the backup file onto a clean ext4 filesystem on a 5 drive <br>

> RAID5 took 176 hours for the bulk restore, and then tar seems to do <br>

> another pass removing symlinks, creating a new symlink and then <br>

> hardlinking that. That took an additional 7 hours.<br>

> <br>

> So ~183 hours to restore the tar file onto a clean ext4 filesystem.<br>

> <br>

> At least I have a reproducible test case. That averaged 5.3MB/s.<br>

> <br>

> Correcting my mistake, this filesystem has 50.2 million inodes and 448 <br>

> million files/directories.<br>

> <br>

> root@test:/server# tar -tvzf backups.tgz | wc -l<br>

> 448763241<br>

> <br>

> Just the tar | wc -l took the best part of a day. This might take a while.<br>

> <br>

> <br>

<br>

So from here on this E-mail has just been built up in sequence as things<br>

are tried. Lost of big gaps (like 183 hours to build the ext4<br>

filesystem), so it's all kinda "stream of consciousness". I'll put a<br>

summary at the end.<br>

<br>

I needed some way of speeding this process up and write caching the<br>

drives seemed like the sanest way to do it. I ran up a new qemu(kvm)<br>

devuan instance, passed it the raw block device and set the caching<br>

method to "unsafe".  That basically ignores all data safety requests<br>

(sync/fsync/flush) and allows the machine to act as a huge cache.<br>

<br>

So, the filesystem had already been created and populated (183 hours<br>

worth). This is a simple find on the filesystem from inside the VM.<br>

<br>

ext4:<br>

root@devfstest:/mnt/source# time find . | wc -l<br>

448763242<br>

<br>

real    1300m56.182s<br>

user    3m18.904s<br>

sys     12m56.012s<br>

<br>

I've created a new xfs filesystem and :<br>

real    10130m14.072s<br>

user    9631m11.388s<br>

sys     325m38.168s<br>

<br>

So 168 hours for xfs.<br>

<br>

I've noticed an inordinate amount of time being spent inside tar, so I<br>

took the time to create the archive again, this time with a separate<br>

tarball for each backed up directory.<br>

<br>

So, let's repeat that test with xfs :  A bit complexish, but let's see<br>

what happens. Surely can't be slower!<br>

<br>

root@devfstest:/mnt# time for i in `ssh test ls /server/fred/*.tgz` ; do<br>

echo $i ; ssh test cat $i | pigz -d | tar -x ; done<br>

real    730m25.915s<br>

user    496m28.968s<br>

sys     209m7.068s<br>

<br>

12.1 hours using separate tarballs vs one big tarball.<br>

<br>

So, in this instance tar was/is the bottleneck! All future tests will be<br>

done using the multiple tarball archive.<br>

<br>

Right, so now create a new ext4 filesystem on there and repeat the test<br>

<br>

real    1312m53.829s<br>

user    481m3.272s<br>

sys     194m49.744s<br>

<br>

Summary :<br>

<br>

xfs  : 12.1 hours<br>

ext4 : 21.8 hours<br>

<br>

Filesystem population test win : XFS by a long margin.<br>

<br>

Now, I wasn't clever enough to do a find test on xfs before doing the<br>

ext4 creation test, so let's run the find on ext4, then re-create the<br>

xfs and do it again.<br>

<br>

This should be interesting to see how it compares to the initial find<br>

test on the fs created on the bare metal and then the block device<br>

passed through to the VM (first result in this mail, some 1300 seconds).<br>

Not entirely a fair test as the filesystems differ in content. The "one<br>

big tarball" was about 10 days before the "multiple smaller tarballs",<br>

Cbut still ~45-50 million inodes.<br>

<br>

Lesson learned, make sure filesystem is mounted noatime before the test.<br>

Several restarts before I figure out what was writing to the disk.<br>

<br>

Find test on ext4 :<br>

cd /mnt ; time find . | wc -l<br>

<br>

ext4 :<br>

real    1983m45.609s<br>

user    3m32.184s<br>

sys     14m2.420s<br>

<br>

Not so pretty. So 50% longer than last time. Still, different filesystem<br>

contents so not directly comparable. Right, lets build up a new xfs<br>

filesystem and repeat the test :<br>

<br>

root@devfstest:/mnt# time for i in `ssh test ls /server/fred/*.tgz` ; do<br>

echo $i ; ssh test cat $i | pigz -d | tar -x ; done<br>

real    711m17.118s<br>

user    498m12.424s<br>

sys     210m50.748s<br>

<br>

So create was 730 minus last time and 711 mins this time. ~3% variance.<br>

Close enough.<br>

<br>

root@devfstest:/mnt# time find . | wc -l<br>

497716103<br>

<br>

real    43m13.998s<br>

user    2m49.624s<br>

sys     6m33.148s<br>

<br>

xfs ftw! 43 mins vs 730 mins.<br>

<br>

So, summary.<br>

xfs create : 12.1 hours<br>

ext4 create : 21.8 hours<br>

<br>

xfs find : 43 min<br>

ext4 find : 12.1 hours<br>

<br>

Let's do a tar test and see how long it takes to read the entire<br>

filesystem. This would be a good indicator of time to replicate. Again,<br>

because I wasn't clever enough to have this stuff thought up before<br>

hand, I'll have to do it on xfs, then recreate the ext4 and run it again.<br>

<br>

root@devfstest:/mnt# time for i in * ; do echo $i ; tar -cp $i ><br>

/dev/null ; done<br>

real    108m59.595s<br>

user    20m14.032s<br>

sys     50m48.216s<br>

<br>

Seriously?!? 108 minutes for 3.5TB of data. I've done something wrong<br>

obviously. Let's retest that with pipebench to make sure it's actually<br>

archiving data :<br>

<br>

root@devfstest:/mnt# time for i in * ; do echo $i ; tar -cp $i |<br>

pipebench -b 32768 > /dev/null ; done<br>

real    308m44.940s<br>

user    31m58.108s<br>

sys     98m8.844s<br>

<br>

Better. Just over 5 hours.<br>

<br>

Lets do a du -hcs *<br>

root@devfstest:/mnt# time du -hcs *<br>

real    73m20.487s<br>

user    2m53.884s<br>

sys     29m49.184s<br>

<br>

xfs tar test : 5.1 hours<br>

xfs du -hcs test : 73 minutes<br>

<br>

Right, now to re-populate the filesystem ext4 and re-test.<br>

Hrm. Just realised that all previous ext4 creation tests were at the<br>

mercy of lazy_init, so create the new one with no lazy init on block<br>

tables or journal.<br>

<br>

real    1361m53.562s<br>

user    499m20.168s<br>

sys     212m6.524s<br>

<br>

So ext4 create : 22.6 hours. Still about right.<br>

<br>

Time for the tar create test :<br>

root@devfstest:/mnt# time for i in * ; do echo $i ; sleep 5 ; tar -cp $i<br>

| pipebench -b 32768 > /dev/null ; done<br>

real    2248m18.299s<br>

user    35m6.968s<br>

sys     98m57.936s<br>

<br>

Right. That wasn't really a surprise, but the magnitude of the <br>

difference was.<br>

xfs : 5.1 hours<br>

ext4 : 37.4 hours<br>

<br>

Now the du -hcs * test :<br>

real    1714m21.503s<br>

user    3m40.596s<br>

sys     37m24.928s<br>

<br>

xfs : 74 minutes<br>

ext4 : 28.5 hours<br>

<br>

<br>

Summary<br>

<br>

Populate fresh & empty fs from tar files :<br>

xfs  : 12.1 hours<br>

ext4 : 21.8 hours<br>

<br>

Find :<br>

xfs  : 43 min<br>

ext4 : 12.1 hours<br>

<br>

du -hcs * :<br>

xfs  : 74 minutes<br>

ext4 : 28.5 hours<br>

<br>

tar create :<br>

xfs  : 5.1 hours<br>

ext4 : 37.4 hours<br>

<br>

I think there's a pattern there.<br>

<br>

So, using one VM config and hardware set. One set of source tar files.<br>

<br>

Tests were performed sequentially, so there were likely workload <br>

variations on the host server, but nothing significant and certainly not <br>

enough to make more than a couple of percent difference either way.<br>

<br>

So I still need to go back and figure out what happened with the first <br>

xfs tar test and how it possibly exceeded the available throughput for <br>

the disks. Everything else was pretty sane.<br>

<br>

It would appear xfs destroys ext4 for this perverse use case.<br>

<br>

I suppose my next step is migrating the system across to xfs and if I <br>

take the time to copy the whole thing across, probably foregoing a <br>

couple of nights backups or just start a new drive from scratch and put <br>

the current ext4 drive in the safe for a couple of months.<br>

<br>

Regards,<br>

Brad<br>

-- <br>

An expert is a person who has found out by his own painful<br>

experience all the mistakes that one can make in a very<br>

narrow field. - Niels Bohr<br>

_______________________________________________<br>

PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank" rel="noreferrer">plug@plug.org.au</a><br>

<a href="http://lists.plug.org.au/mailman/listinfo/plug" rel="noreferrer noreferrer" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>

Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank" rel="noreferrer">committee@plug.org.au</a><br>

PLUG Membership: <a href="http://www.plug.org.au/membership" rel="noreferrer noreferrer" target="_blank">http://www.plug.org.au/membership</a><br>

</blockquote></div>