[plug] speed: find vs ls

Brad Campbell brad at fnarfbargle.com
Fri Jul 29 12:39:20 AWST 2022


On 29/7/22 11:29, Jeremy Kerr wrote:
> Hi Brad,
> 
>> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
>> 439
>>
>> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
>>
>> real    0m13.161s
>> user    0m0.004s
>> sys     0m0.162s
> 
> The 'time' in this case isn't measuring the dentry scan - your shell is
> expanding the '?????-????/' glob, then passing all of those expanded
> filenames to the ls invocation. That operation is happening outside of
> the 'time' measurement.
> 
> Since 'ls' is passed those expanded files, all it needs to do is stat
> each one, without the dentry scan.
> 
> To compare more directly against the find:
> 
>     time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'

Indeed, right you are.

root at rpi31:/mnt/backup/work# time for i in `/bin/sh -c 'ls ??????-????/bkb.rhash.crc32'` ; do j=$(dirname $i) ; echo $j ; done

real	0m13.062s
user	0m0.017s
sys	0m0.207s

> the quotes there will prevent the interactive shell (which isn't being
> timed) from performing the glob expansion, and instead we're timing the
> glob in the subshell.
> 
> While we're on the topic though:
> 
>  From your example of:
> 
>    for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)
> 
> Comparing to your `find ... -printf %h`, I assume this might be a typo of:
> 
>    for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=$(dirname $j)

Yes, it was. I picked that up early but figured it wasn't worth mentioning.

> The ls is a bit useless here; it will just stat each .crc32 file
> (twice, after the shell has already done so!) and print out the names
> already provided to it as separate arguments.
> 
> If we're trying to avoid filesystem interactions, you could just use
> the result of the shell glob directly:
> 
>     for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)

root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j); echo $j ; done

real	0m13.072s
user	0m0.015s
sys	0m0.192s

> We can also avoid spawning a dirname process for each dir:
> 
>     for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}

root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*} ; echo $j ; done

real	0m13.163s
user	0m0.012s
sys	0m0.142s

Much of a muchness and all well within error margins. Still over an order of magnitude faster that find, plus I learned some new tricks.
Thanks!

Regards,
Brad
-- 
An expert is a person who has found out by his own painful
experience all the mistakes that one can make in a very
narrow field. - Niels Bohr


More information about the plug mailing list