[plug] speed: find vs ls
Jeremy Kerr
jk at ozlabs.org
Fri Jul 29 11:29:24 AWST 2022
Hi Brad,
> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
> 439
>
> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
>
> real 0m13.161s
> user 0m0.004s
> sys 0m0.162s
The 'time' in this case isn't measuring the dentry scan - your shell is
expanding the '?????-????/' glob, then passing all of those expanded
filenames to the ls invocation. That operation is happening outside of
the 'time' measurement.
Since 'ls' is passed those expanded files, all it needs to do is stat
each one, without the dentry scan.
To compare more directly against the find:
time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'
the quotes there will prevent the interactive shell (which isn't being
timed) from performing the glob expansion, and instead we're timing the
glob in the subshell.
While we're on the topic though:
>From your example of:
for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)
Comparing to your `find ... -printf %h`, I assume this might be a typo of:
for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=$(dirname $j)
The ls is a bit useless here; it will just stat each .crc32 file
(twice, after the shell has already done so!) and print out the names
already provided to it as separate arguments.
If we're trying to avoid filesystem interactions, you could just use
the result of the shell glob directly:
for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)
We can also avoid spawning a dirname process for each dir:
for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}
Cheers,
Jeremy
More information about the plug
mailing list