[plug] Find similarly named files in directories

Bernard Blackham bernard at blackham.com.au
Sat Jan 13 12:50:58 WST 2007


Timothy White wrote:
>> Gah, I'll bite. 59 chars of perl, 74 chars altogether.
>>
>> find|perl 
>> -ne'm#.*(/.*)#;push@{$a{$1}},$_}foreach(%a){$#$_>0&&print"@$_\n"'
>>
>> And the only thing it'll break on is files with new-lines in them (yes,
>> it's possible! The only things you can be guaranteed not to find in a
>> filename are / and the NULL byte).
> 
> If you have a filename with a newline, you deserve to have our scripts 
> break :P

You could use find's -print0 and some extra perl magic to make it
bullet-proof if you really desired :)

> Seeing as it's compact perl, how are you using the / to prevent spaces
> from breaking it? I was using the last / to find the filename of the
> file. Or don't spaces effect it, because it's perl and not using
> fields? Hmmm, I think it's probably the latter from what I can read of
> that perl.

Okay, here's what it expands to, with comments:

while (<>) { # this comes from -n: read lines into $_ one at a time
   m#.*(/.*)#;  # match everything after (and including) the last /
   push @{$a{$1}},$_  # store it into a hash of arrayrefs.
                      # you could use lc $1 here for case insensitivity
}
for(%a) { # iterate over everything in the hash
           # this actually includes the hash keys as well as the elements
           # but thankfully the hash keys don't pass the next check:

   $#$_ > 0 &&  # if the array index of the last element in this array is
                # 1 or greater (thus >2 elements), then
     print"@$_\n"' # print the array, implicitly joined with spaces.
                   # the new lines after each string come from the input.
} # this comes from -n too

> Btw, I like the way it prints it out, with the all "extra" occurrences
> of a file being indented.

Unintentional bonus ;) You could get rid of it in 6 characters, or
customise it, by modifying $".

> I quick test shows...
> 
> tim at linjeni:/data$ find|perl
> -ne'm#.*(/.*)#;push@{$a{$1}},$_}foreach(%a){$#$_>0&&print"@$_\n"'|sort|uniq|wc 
> 
> $ find| sed 's/\ /:::/g' |sed -r 's/.*\/(.*)/\0 \1/'|sort -k 2|uniq
> --all-repeated=separate -f 1| sed 's/[^ ]*$//' | sed 's/:::/\ /g'|
> sort|uniq|wc -l

I don't really think either are that clear though ;)

> And the differences in the files? This char >>�<< (not sure if it'll
> email or not) is some strange escape code. My script seemed to ignore
> all the files with that in the path.

Sounds like you need a fsck ...

> Anyone want to try and beat 74 chars? :p

Well, 70.

Bernard.



More information about the plug mailing list