[plug] speed: find vs ls
Brad Campbell
brad at fnarfbargle.com
Fri Jul 29 12:39:20 AWST 2022
On 29/7/22 11:29, Jeremy Kerr wrote:
> Hi Brad,
>
>> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
>> 439
>>
>> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
>>
>> real 0m13.161s
>> user 0m0.004s
>> sys 0m0.162s
>
> The 'time' in this case isn't measuring the dentry scan - your shell is
> expanding the '?????-????/' glob, then passing all of those expanded
> filenames to the ls invocation. That operation is happening outside of
> the 'time' measurement.
>
> Since 'ls' is passed those expanded files, all it needs to do is stat
> each one, without the dentry scan.
>
> To compare more directly against the find:
>
> time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'
Indeed, right you are.
root at rpi31:/mnt/backup/work# time for i in `/bin/sh -c 'ls ??????-????/bkb.rhash.crc32'` ; do j=$(dirname $i) ; echo $j ; done
real 0m13.062s
user 0m0.017s
sys 0m0.207s
> the quotes there will prevent the interactive shell (which isn't being
> timed) from performing the glob expansion, and instead we're timing the
> glob in the subshell.
>
> While we're on the topic though:
>
> From your example of:
>
> for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)
>
> Comparing to your `find ... -printf %h`, I assume this might be a typo of:
>
> for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=$(dirname $j)
Yes, it was. I picked that up early but figured it wasn't worth mentioning.
> The ls is a bit useless here; it will just stat each .crc32 file
> (twice, after the shell has already done so!) and print out the names
> already provided to it as separate arguments.
>
> If we're trying to avoid filesystem interactions, you could just use
> the result of the shell glob directly:
>
> for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)
root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j); echo $j ; done
real 0m13.072s
user 0m0.015s
sys 0m0.192s
> We can also avoid spawning a dirname process for each dir:
>
> for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}
root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*} ; echo $j ; done
real 0m13.163s
user 0m0.012s
sys 0m0.142s
Much of a muchness and all well within error margins. Still over an order of magnitude faster that find, plus I learned some new tricks.
Thanks!
Regards,
Brad
--
An expert is a person who has found out by his own painful
experience all the mistakes that one can make in a very
narrow field. - Niels Bohr
More information about the plug
mailing list