[plug] Perl: Intersection of two arrays?

Wed Jun 25 14:13:47 WST 2003

On Wednesday 25 June 2003 12:01, Trevor Phillips wrote:
> On Tuesday 24 June 2003 17:46, Simon Newton wrote:
> > Use a hash instead of the nested for loop:
> > http://www.perldoc.com/perl5.6/pod/perlfaq4.html#How-do-I-compute-the-dif
> >fe rence-of-two-arrays---How-do-I-compute-the-intersection-of-two-arrays-
>
[snip]

> Although I'm dealing with relatively small sets of data, I might just go
> with the Hash method - at least it's a consistent performance hit, even if
> other methods are faster under certain circumstances...

Here's the routine I ended up going with:

sub SetIntersection($$)
{
   my ($set1,$set2) = @_;
   my %count = ();
   my $element;
   foreach $element (@{$set1}) { $count{$element} |= 1; };
   foreach $element (@{$set2}) { $count{$element} |= 2; };
   return grep { $count{$_} == 3 } keys %count;
}

It's a bit more robust, since the example given will give false positives if 
one of the sets contains the same value twice. (Yes, I know a traditional 
"set" wouldn't; but in my situation, relying on external data, it is 
possible). The grep is also faster than the foreach/if/push loop...

This method could also be used to get, for example, the values only in Set 1, 
etc... ($count{$_} == 1)

I still wonder if a custom function written in C could do things a chunk 
faster. ^_^

-- 
. Trevor Phillips             -           http://jurai.murdoch.edu.au/ . 
: Web Technical Administrator     -          T.Phillips at murdoch.edu.au : 
| IT Services                        -              Murdoch University | 
 >--------------------------------------------------------------------<
| On nights such as this, evil deeds are done. And good deeds, of     /
| course. But mostly evil, on the whole.                             /
 \      -- (Terry Pratchett, Wyrd Sisters)                          /