[plug] speed: find vs ls

Jeremy Kerr jk at ozlabs.org
Fri Jul 29 11:29:24 AWST 2022


Hi Brad,

> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
> 439
> 
> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
> 
> real    0m13.161s
> user    0m0.004s
> sys     0m0.162s

The 'time' in this case isn't measuring the dentry scan - your shell is
expanding the '?????-????/' glob, then passing all of those expanded
filenames to the ls invocation. That operation is happening outside of
the 'time' measurement.

Since 'ls' is passed those expanded files, all it needs to do is stat
each one, without the dentry scan.

To compare more directly against the find:

   time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'

the quotes there will prevent the interactive shell (which isn't being
timed) from performing the glob expansion, and instead we're timing the
glob in the subshell.

While we're on the topic though:

>From your example of:

  for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)

Comparing to your `find ... -printf %h`, I assume this might be a typo of:

  for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=$(dirname $j)

The ls is a bit useless here; it will just stat each .crc32 file
(twice, after the shell has already done so!) and print out the names
already provided to it as separate arguments.

If we're trying to avoid filesystem interactions, you could just use
the result of the shell glob directly:

   for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)

We can also avoid spawning a dirname process for each dir:

   for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}

Cheers,


Jeremy


More information about the plug mailing list