[plug] ridiculous UNIX one-liner
Craig Ringer
craig at postnewspapers.com.au
Thu Jan 30 00:25:22 WST 2003
This is why I love unix:
Task - identify sets of identical files in a collection and print out
groups of identical files.
Command (one line *grin*):
Note that the first cut -d ' ' must contain a TAB, enter it using
CTRL-V then hit the tab key. (is there a better way to do this?).
for SUM in `find -type f -exec md5sum "{}" \; | tee /tmp/proglog | sort
| uniq -c -w 32 | sort -n | egrep -v '^[ ]+1' | cut -d ' ' -f 2 |
cut -d ' ' -f 1` ; do grep $SUM /tmp/proglog ; echo ; done | cut -d ' '
-f 2 | tee /tmp/nonunique_file_groups
Neatened up for easy reading but (probably)
for SUM in `find -type f -exec md5sum "{}" \; \
| tee /tmp/proglog \
| sort \
| uniq -c -w 32 \
| sort -n \
| egrep -v '^[ ]+1' \
| cut -d ' ' -f 2 \
| cut -d ' ' -f 1`
do grep $SUM /tmp/proglog
echo
done | cut -d ' ' -f 2 | tee /tmp/nonunique_file_groups
Cool eh?
More information about the plug
mailing list