[plug] ridiculous UNIX one-liner

Mike Holland myk at plug.linux.org.au
Thu Jan 30 11:58:55 WST 2003


On Thu, 30 Jan 2003, Craig Ringer wrote:

> Task - identify sets of identical files in a collection and print out 
> groups of identical files.

> Note that the first cut -d '     ' must contain a TAB, enter it using 
> CTRL-V then hit the tab key. (is there a better way to do this?).

Get ready to kick yourself. Tab is the default delimiter. :-)

 
> for SUM in `find -type f -exec md5sum "{}" \; | tee /tmp/proglog | sort 
> | uniq -c -w 32 | sort -n | egrep -v '^[ ]+1' | cut -d '        ' -f 2 | 
> cut -d ' ' -f 1` ; do grep $SUM /tmp/proglog ; echo ; done | cut -d ' ' 
> -f 2 | tee /tmp/nonunique_file_groups

Not bad at all, considering the handicap of using bash. Of course by 
posting here you are making a challenge :-)

In a "real scripting language" you could load all the filenames into an 
associative array, indexed by the filename hash:

foreach $f ( find filenames ) { $mylist[hash($f)] append $f }; \
   print values( $mylist );


> Cool eh?

Cool yes, in a unix-hacker way. But lacks the elegance of 
perl/pgp/python/...    Some may disagree.


-- 
 "I do not think we can hope for any better things now. We shall stick it out
to the end, but we are getting weaker, of course, and the end cannot be far.
It seems a pity, but I do not think I can write more." - RF Scott





More information about the plug mailing list