[plug] Find similarly named files in directories
Timothy White
weirdit at gmail.com
Fri Jan 12 22:55:19 WST 2007
Ok, so 2 simple sed's and I solve the space problem!! Not sure if
there are any other "bugs"
find| sed 's/\ /:::/g' |sed -r 's/.*\/(.*)/\0 \1/'|sort -i -k 2|uniq
-i --all-repeated=separate -f 1| sed 's/[^ ]*$//' | sed 's/:::/\ /g'
Rather simple, first check no file name as 3 :'s in a row, if it does,
find another "uniq" sequence to replace it with.
On 1/12/07, Lyndon Maydwell <maydwell at gmail.com> wrote:
> well, the space-ignorance is a bit of a show stopper, so I might just
> stick to mine for now, but I will profile the two to see how they
> stack up performance wise.
I'd be interested to see how they stack up. I know the slowest part
should be the find, well at least on my tests. Once the find was
cached, the rest flew along!
I tested it on a drive with 135Gb of files. Totalling 109790 Files and
directories
First run, so the find wasn't cached
real 27.058 user 13.825 sys 1.204 pcpu 55.54
Then with find cached.
real 20.405 user 13.657 sys 0.768 pcpu 70.69
Then I realised I should be redirecting stdout to /dev/null so the
terminal didn't effect it.
real 15.601 user 13.725 sys 0.368 pcpu 90.33
I couldn't get your ruby script to run, probably missing a ruby module
or something.
It was rather interesting seeing what "duplicate" files I have!! Of
course, running fdupes on the drive will give a rather different
result, and a much longer running time :p
Enjoy!
Tim
--
Linux Counter user #273956
Don't email joeblogs at scouts.org.au
More information about the plug
mailing list