[plug] speed: find vs ls

Thomas Cuthbert tcuthbert90 at gmail.com
Fri Jul 29 02:19:22 AWST 2022


Also https://github.com/sharkdp/fd is a find clone with an emphasis on
performance which might be useful!

On Fri, 29 July 2022, 2:15 am Thomas Cuthbert, <tcuthbert90 at gmail.com>
wrote:

> As a guess I'd say the excessive metadata syscalls are due to your -type
> predicate and maybe the format string (find has a number of other fmt
> parameters that reference stat info). It sounds like you have lots of
> directories too; limiting the number of number of directories will reduce
> the rate of dentry and metadata reads. squid does something similar to
> group objects together with its L1/L2 cache_dier hierarchy.
>
> Also do you need to hash the whole file? Seeing as you already have the
> metadata in cache you could probably get a quick performance win by
> comparing the metadata to a previous value or just only hashing the
> metadata.
>
> On Thu, 28 July 2022, 5:22 pm Brad Campbell, <brad at fnarfbargle.com> wrote:
>
>> G'day all,
>>
>> An observation while I'm still playing with my sizeable set of backup
>> directories.
>> I've been adding a bit that creates a file of crc32s of the updated
>> files, and then toying around with a script to crawl the drive and check
>> them all.
>>
>> I started using find to give me a list of dirs that contain the files. It
>> was spending a *lot* of time just creating the list. In fact it spent more
>> time looking for the files than the subsequent iteration and check of each
>> one.
>> I must qualify that with the fact, I'm about 10 days into creating the
>> crcs and most directories already have ~800 days worth of backups.
>>
>> The script run with :
>> for j in `find . -maxdepth 2 -type f -name bkb.rhash.crc32 -printf
>> "%h\n"` ; do
>>
>> Checked 170 directories with 0 errors in 0:00:34:58
>>
>> stracing find, it's dropping into each directory and performing a stat on
>> every file. Some dirs have a *lot* of files.
>>
>> I thought about trying a bit of globbing with ls instead, and blow me
>> down if it wasn't "a bit faster".
>>
>> The script run with :
>> for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)
>>
>> Checked 170 directories with 0 errors in 0:00:09:49
>>
>> I know premature optimisation is the root of all evil, but this one might
>> have been a case of "using the right tool".
>>
>> Regards,
>> Brad
>> --
>> An expert is a person who has found out by his own painful
>> experience all the mistakes that one can make in a very
>> narrow field. - Niels Bohr
>> _______________________________________________
>> PLUG discussion list: plug at plug.org.au
>> http://lists.plug.org.au/mailman/listinfo/plug
>> Committee e-mail: committee at plug.org.au
>> PLUG Membership: http://www.plug.org.au/membership
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20220729/496fb673/attachment.html>


More information about the plug mailing list