[plug] speed: find vs ls
Onno Benschop
onno at itmaze.com.au
Fri Jul 29 14:53:40 AWST 2022
One thing to point out which might not be obvious to everyone reading along
with this exploration is that none of this matters unless you're doing it
often.
What I mean by that is in the vast majority of cases a computer spends its
time waiting for your input and every now and then it will spend a little
more effort running a process like we're discussing here. Shaving off a few
CPU cycles here and there on a job that runs once a day won't make any
difference in the scheme of things and readability, robustness and
maintenance are much more important.
Not that this conversation is useless at all, I've had the fun of
processing data in the TB range with thousands of files and spending some
time thinking about these kinds of issues can make the difference between
waiting an hour, or waiting a day for the same answer.
Context is important!
--
finger painting on glass is an inexact art - apologies for any errors in
this scra^Hibble
()/)/)() ..ASCII for Onno..
On Fri, 29 July 2022, 12:39 Brad Campbell, <brad at fnarfbargle.com> wrote:
> On 29/7/22 11:29, Jeremy Kerr wrote:
> > Hi Brad,
> >
> >> root at rpi31:/mnt/backup/work# ls -d ??????-???? | wc -l
> >> 439
> >>
> >> root at rpi31:/mnt/backup/work# time ls ??????-????/bkb.rhash.crc32
> >>
> >> real 0m13.161s
> >> user 0m0.004s
> >> sys 0m0.162s
> >
> > The 'time' in this case isn't measuring the dentry scan - your shell is
> > expanding the '?????-????/' glob, then passing all of those expanded
> > filenames to the ls invocation. That operation is happening outside of
> > the 'time' measurement.
> >
> > Since 'ls' is passed those expanded files, all it needs to do is stat
> > each one, without the dentry scan.
> >
> > To compare more directly against the find:
> >
> > time /bin/sh -c 'ls ??????-????/bkb.rhash.crc32'
>
> Indeed, right you are.
>
> root at rpi31:/mnt/backup/work# time for i in `/bin/sh -c 'ls
> ??????-????/bkb.rhash.crc32'` ; do j=$(dirname $i) ; echo $j ; done
>
> real 0m13.062s
> user 0m0.017s
> sys 0m0.207s
>
> > the quotes there will prevent the interactive shell (which isn't being
> > timed) from performing the glob expansion, and instead we're timing the
> > glob in the subshell.
> >
> > While we're on the topic though:
> >
> > From your example of:
> >
> > for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do
> j=($dirname $j)
> >
> > Comparing to your `find ... -printf %h`, I assume this might be a typo
> of:
> >
> > for j in `ls ??????-????/bkb.rhash.crc32 2>/dev/null` ; do
> j=$(dirname $j)
>
> Yes, it was. I picked that up early but figured it wasn't worth mentioning.
>
> > The ls is a bit useless here; it will just stat each .crc32 file
> > (twice, after the shell has already done so!) and print out the names
> > already provided to it as separate arguments.
> >
> > If we're trying to avoid filesystem interactions, you could just use
> > the result of the shell glob directly:
> >
> > for j in ??????-????/bkb.rhash.crc32 ; do j=$(dirname $j)
>
> root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ;
> do j=$(dirname $j); echo $j ; done
>
> real 0m13.072s
> user 0m0.015s
> sys 0m0.192s
>
> > We can also avoid spawning a dirname process for each dir:
> >
> > for j in ??????-????/bkb.rhash.crc32 ; do j=${j%/*}
>
> root at rpi31:/mnt/backup/work# time for j in ??????-????/bkb.rhash.crc32 ;
> do j=${j%/*} ; echo $j ; done
>
> real 0m13.163s
> user 0m0.012s
> sys 0m0.142s
>
> Much of a muchness and all well within error margins. Still over an order
> of magnitude faster that find, plus I learned some new tricks.
> Thanks!
>
> Regards,
> Brad
> --
> An expert is a person who has found out by his own painful
> experience all the mistakes that one can make in a very
> narrow field. - Niels Bohr
> _______________________________________________
> PLUG discussion list: plug at plug.org.au
> http://lists.plug.org.au/mailman/listinfo/plug
> Committee e-mail: committee at plug.org.au
> PLUG Membership: http://www.plug.org.au/membership
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.plug.org.au/pipermail/plug/attachments/20220729/8f24992b/attachment.html>
More information about the plug
mailing list