[plug] ridiculous UNIX one-liner
Mike Holland
myk at plug.linux.org.au
Thu Jan 30 11:58:55 WST 2003
On Thu, 30 Jan 2003, Craig Ringer wrote:
> Task - identify sets of identical files in a collection and print out
> groups of identical files.
> Note that the first cut -d ' ' must contain a TAB, enter it using
> CTRL-V then hit the tab key. (is there a better way to do this?).
Get ready to kick yourself. Tab is the default delimiter. :-)
> for SUM in `find -type f -exec md5sum "{}" \; | tee /tmp/proglog | sort
> | uniq -c -w 32 | sort -n | egrep -v '^[ ]+1' | cut -d ' ' -f 2 |
> cut -d ' ' -f 1` ; do grep $SUM /tmp/proglog ; echo ; done | cut -d ' '
> -f 2 | tee /tmp/nonunique_file_groups
Not bad at all, considering the handicap of using bash. Of course by
posting here you are making a challenge :-)
In a "real scripting language" you could load all the filenames into an
associative array, indexed by the filename hash:
foreach $f ( find filenames ) { $mylist[hash($f)] append $f }; \
print values( $mylist );
> Cool eh?
Cool yes, in a unix-hacker way. But lacks the elegance of
perl/pgp/python/... Some may disagree.
--
"I do not think we can hope for any better things now. We shall stick it out
to the end, but we are getting weaker, of course, and the end cannot be far.
It seems a pity, but I do not think I can write more." - RF Scott
More information about the plug
mailing list