[plug] speed: find vs ls

Brad Campbell brad at fnarfbargle.com
Fri Jul 29 20:55:19 AWST 2022


On 29/7/22 14:53, Onno Benschop wrote:
> One thing to point out which might not be obvious to everyone reading along with this exploration is that none of this matters unless you're doing it often.
> 
> What I mean by that is in the vast majority of cases a computer spends its time waiting for your input and every now and then it will spend a little more effort running a process like we're discussing here. Shaving off a few CPU cycles here and there on a job that runs once a day won't make any difference in the scheme of things and readability, robustness and maintenance are much more important.
> 
> Not that this conversation is useless at all, I've had the fun of processing data in the TB range with thousands of files and spending some time thinking about these kinds of issues can make the difference between waiting an hour, or waiting a day for the same answer.
> 
> Context is important!

This is true. This particular task is a weekly verification process. If we were arguing about seconds I'd not even bother, but when we're talking about over an order of magnitude it does start to make a significant difference.

Additionally, I find it a valuable learning exercise. Bad day when you don't learn something.

> 
> --
> finger painting on glass is an inexact art - apologies for any errors in this scra^Hibble
> 
> ()/)/)() ..ASCII for Onno..
> 
> On Fri, 29 July 2022, 12:39 Brad Campbell, <brad at fnarfbargle.com <mailto:brad at fnarfbargle.com>> wrote:
> 
>     On 29/7/22 11:29, Jeremy Kerr wrote:
>     > Hi Brad,
>     >
>     >> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
>     >> 439
>     >>
>     >> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
>     >>
>     >> real    0m13.161s
>     >> user    0m0.004s
>     >> sys     0m0.162s
>     >
>     > The 'time' in this case isn't measuring the dentry scan - your shell is
>     > expanding the '?????-????/' glob, then passing all of those expanded
>     > filenames to the ls invocation. That operation is happening outside of
>     > the 'time' measurement.
>     >
>     > Since 'ls' is passed those expanded files, all it needs to do is stat
>     > each one, without the dentry scan.
>     >
>     > To compare more directly against the find:
>     >
>     >     time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'
> 
>     Indeed, right you are.
> 
>     root at rpi31:/mnt/backup/work# time for i in `/bin/sh -c 'ls ??????-????/bkb.rhash.crc32'` ; do j=$(dirname $i) ; echo $j ; done
> 
>     real    0m13.062s
>     user    0m0.017s
>     sys     0m0.207s
> 
>     > the quotes there will prevent the interactive shell (which isn't being
>     > timed) from performing the glob expansion, and instead we're timing the
>     > glob in the subshell.
>     >
>     > While we're on the topic though:
>     >
>     >  From your example of:
>     >
>     >    for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=($dirname $j)
>     >
>     > Comparing to your `find ... -printf %h`, I assume this might be a typo of:
>     >
>     >    for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do j=$(dirname $j)
> 
>     Yes, it was. I picked that up early but figured it wasn't worth mentioning.
> 
>     > The ls is a bit useless here; it will just stat each .crc32 file
>     > (twice, after the shell has already done so!) and print out the names
>     > already provided to it as separate arguments.
>     >
>     > If we're trying to avoid filesystem interactions, you could just use
>     > the result of the shell glob directly:
>     >
>     >     for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)
> 
>     root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j); echo $j ; done
> 
>     real    0m13.072s
>     user    0m0.015s
>     sys     0m0.192s
> 
>     > We can also avoid spawning a dirname process for each dir:
>     >
>     >     for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}
> 
>     root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*} ; echo $j ; done
> 
>     real    0m13.163s
>     user    0m0.012s
>     sys     0m0.142s
> 
>     Much of a muchness and all well within error margins. Still over an order of magnitude faster that find, plus I learned some new tricks.
>     Thanks!
> 
>     Regards,
>     Brad
>     -- 
>     An expert is a person who has found out by his own painful
>     experience all the mistakes that one can make in a very
>     narrow field. - Niels Bohr
>     _______________________________________________
>     PLUG discussion list: plug at plug.org.au <mailto:plug at plug.org.au>
>     http://lists.plug.org.au/mailman/listinfo/plug <http://lists.plug.org.au/mailman/listinfo/plug>
>     Committee e-mail: committee at plug.org.au <mailto:committee at plug.org.au>
>     PLUG Membership: http://www.plug.org.au/membership <http://www.plug.org.au/membership>
> 
> 
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership



More information about the plug mailing list