[plug] Find similarly named files in directories

Timothy White weirdit at gmail.com
Fri Jan 12 22:55:19 WST 2007


Ok, so 2 simple sed's and I solve the space problem!! Not sure if
there are any other "bugs"

find| sed 's/\ /:::/g' |sed -r 's/.*\/(.*)/\0 \1/'|sort -i -k 2|uniq
-i --all-repeated=separate -f 1| sed 's/[^ ]*$//' | sed 's/:::/\ /g'

Rather simple, first check no file name as 3 :'s in a row, if it does,
find another "uniq" sequence to replace it with.

On 1/12/07, Lyndon Maydwell <maydwell at gmail.com> wrote:
> well, the space-ignorance is a bit of a show stopper, so I might just
> stick to mine for now, but I will profile the two to see how they
> stack up performance wise.

I'd be interested to see how they stack up. I know the slowest part
should be the find, well at least on my tests. Once the find was
cached, the rest flew along!

I tested it on a drive with 135Gb of files. Totalling 109790 Files and
directories
First run, so the find wasn't cached
real 27.058     user 13.825     sys 1.204       pcpu 55.54
Then with find cached.
real 20.405     user 13.657     sys 0.768       pcpu 70.69
Then I realised I should be redirecting stdout to /dev/null so the
terminal didn't effect it.
real 15.601     user 13.725     sys 0.368       pcpu 90.33

I couldn't get your ruby script to run, probably missing a ruby module
or something.

It was rather interesting seeing what "duplicate" files I have!! Of
course, running fdupes on the drive will give a rather different
result, and a much longer running time :p

Enjoy!

Tim
-- 
Linux Counter user #273956
Don't email joeblogs at scouts.org.au



More information about the plug mailing list