[plug] ridiculous UNIX one-liner

Craig Ringer craig at postnewspapers.com.au
Thu Jan 30 00:25:22 WST 2003


This is why I love unix:

Task - identify sets of identical files in a collection and print out 
groups of identical files.

Command (one line *grin*):
Note that the first cut -d '     ' must contain a TAB, enter it using 
CTRL-V then hit the tab key. (is there a better way to do this?).

for SUM in `find -type f -exec md5sum "{}" \; | tee /tmp/proglog | sort 
| uniq -c -w 32 | sort -n | egrep -v '^[ ]+1' | cut -d '        ' -f 2 | 
cut -d ' ' -f 1` ; do grep $SUM /tmp/proglog ; echo ; done | cut -d ' ' 
-f 2 | tee /tmp/nonunique_file_groups

Neatened up for easy reading but (probably)

for SUM in `find -type f -exec md5sum "{}" \; \
	| tee /tmp/proglog \
	| sort \
	| uniq -c -w 32 \
	| sort -n \
	| egrep -v '^[ ]+1' \
	| cut -d '        ' -f 2 \
	| cut -d ' ' -f 1`
do grep $SUM /tmp/proglog
echo
done | cut -d ' ' -f 2 | tee /tmp/nonunique_file_groups

Cool eh?



More information about the plug mailing list