[plug] Find similarly named files in directories
Bernard Blackham
bernard at blackham.com.au
Sat Jan 13 12:50:58 WST 2007
Timothy White wrote:
>> Gah, I'll bite. 59 chars of perl, 74 chars altogether.
>>
>> find|perl
>> -ne'm#.*(/.*)#;push@{$a{$1}},$_}foreach(%a){$#$_>0&&print"@$_\n"'
>>
>> And the only thing it'll break on is files with new-lines in them (yes,
>> it's possible! The only things you can be guaranteed not to find in a
>> filename are / and the NULL byte).
>
> If you have a filename with a newline, you deserve to have our scripts
> break :P
You could use find's -print0 and some extra perl magic to make it
bullet-proof if you really desired :)
> Seeing as it's compact perl, how are you using the / to prevent spaces
> from breaking it? I was using the last / to find the filename of the
> file. Or don't spaces effect it, because it's perl and not using
> fields? Hmmm, I think it's probably the latter from what I can read of
> that perl.
Okay, here's what it expands to, with comments:
while (<>) { # this comes from -n: read lines into $_ one at a time
m#.*(/.*)#; # match everything after (and including) the last /
push @{$a{$1}},$_ # store it into a hash of arrayrefs.
# you could use lc $1 here for case insensitivity
}
for(%a) { # iterate over everything in the hash
# this actually includes the hash keys as well as the elements
# but thankfully the hash keys don't pass the next check:
$#$_ > 0 && # if the array index of the last element in this array is
# 1 or greater (thus >2 elements), then
print"@$_\n"' # print the array, implicitly joined with spaces.
# the new lines after each string come from the input.
} # this comes from -n too
> Btw, I like the way it prints it out, with the all "extra" occurrences
> of a file being indented.
Unintentional bonus ;) You could get rid of it in 6 characters, or
customise it, by modifying $".
> I quick test shows...
>
> tim at linjeni:/data$ find|perl
> -ne'm#.*(/.*)#;push@{$a{$1}},$_}foreach(%a){$#$_>0&&print"@$_\n"'|sort|uniq|wc
>
> $ find| sed 's/\ /:::/g' |sed -r 's/.*\/(.*)/\0 \1/'|sort -k 2|uniq
> --all-repeated=separate -f 1| sed 's/[^ ]*$//' | sed 's/:::/\ /g'|
> sort|uniq|wc -l
I don't really think either are that clear though ;)
> And the differences in the files? This char >>�<< (not sure if it'll
> email or not) is some strange escape code. My script seemed to ignore
> all the files with that in the path.
Sounds like you need a fsck ...
> Anyone want to try and beat 74 chars? :p
Well, 70.
Bernard.
More information about the plug
mailing list