<div dir="auto"><div>As a guess I'd say the excessive metadata syscalls are due to your -type predicate and maybe the format string (find has a number of other fmt parameters that reference stat info). It sounds like you have lots of directories too; limiting the number of number of directories will reduce the rate of dentry and metadata reads. squid does something similar to group objects together with its L1/L2 cache_dier hierarchy.</div><div dir="auto"><br></div><div dir="auto">Also do you need to hash the whole file? Seeing as you already have the metadata in cache you could probably get a quick performance win by comparing the metadata to a previous value or just only hashing the metadata.<br><br><div class="gmail_quote" dir="auto"><div dir="ltr" class="gmail_attr">On Thu, 28 July 2022, 5:22 pm Brad Campbell, <<a href="mailto:brad@fnarfbargle.com">brad@fnarfbargle.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">G'day all,<br>
<br>
An observation while I'm still playing with my sizeable set of backup directories.<br>
I've been adding a bit that creates a file of crc32s of the updated files, and then toying around with a script to crawl the drive and check them all.<br>
<br>
I started using find to give me a list of dirs that contain the files. It was spending a *lot* of time just creating the list. In fact it spent more time looking for the files than the subsequent iteration and check of each one.<br>
I must qualify that with the fact, I'm about 10 days into creating the crcs and most directories already have ~800 days worth of backups.<br>
<br>
The script run with :<br>
for j in `find . -maxdepth 2 -type f -name bkb.rhash.crc32 -printf "%h\n"` ; do<br>
<br>
Checked 170 directories with 0 errors in 0:00:34:58<br>
<br>
stracing find, it's dropping into each directory and performing a stat on every file. Some dirs have a *lot* of files.<br>
<br>
I thought about trying a bit of globbing with ls instead, and blow me down if it wasn't "a bit faster".<br>
<br>
The script run with :<br>
for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)<br>
<br>
Checked 170 directories with 0 errors in 0:00:09:49<br>
<br>
I know premature optimisation is the root of all evil, but this one might have been a case of "using the right tool".<br>
<br>
Regards,<br>
Brad<br>
-- <br>
An expert is a person who has found out by his own painful<br>
experience all the mistakes that one can make in a very<br>
narrow field. - Niels Bohr<br>
_______________________________________________<br>
PLUG discussion list: <a href="mailto:plug@plug.org.au" target="_blank" rel="noreferrer">plug@plug.org.au</a><br>
<a href="http://lists.plug.org.au/mailman/listinfo/plug" rel="noreferrer noreferrer" target="_blank">http://lists.plug.org.au/mailman/listinfo/plug</a><br>
Committee e-mail: <a href="mailto:committee@plug.org.au" target="_blank" rel="noreferrer">committee@plug.org.au</a><br>
PLUG Membership: <a href="http://www.plug.org.au/membership" rel="noreferrer noreferrer" target="_blank">http://www.plug.org.au/membership</a><br>
</blockquote></div></div></div>