[plug] Perl OO/Data Optimisation

Wed Nov 27 20:09:18 WST 2002

Trevor Phillips wrote:
> Any Perl Guts Gurus here? ^_^
> 
> I have an OO app which does nasty things to a large multi-depth hash of data 
> (of which some of it is structures of objects), and I'd really like to 
> optimise it, but I'd rather not spend time doing optimisations that Perl may 
> do internally.

I am assuming that you want to optimise for execution speed, rather than 
memory usage or readability.

> 
> So, in general: Is it worth me compiling my hash into arrays with integer 
> references? ie; Instead of $data->{prop}, convert it to $data->[num], and 
> things that reference prop to reference it as num?

Using array lookups rather than hash lookups is alway quicker as perl 
doesn't have to perform the hashing function and then walk the buckets. 
  If you define constants for your 'hash' keys the code will look almost 
identical.  Example:

use strict;
use warnings;

use constant foo => 1;

my $array = [];
my $hash = {};

$array->[foo] = 'bar';
$hash->{foo} = 'bar';

__END__

> 
> Is it worth dumping the object structure (HTML::Element), and using a 
> simplified hash or array structure, and walking it manually?

You'll almost certainly be able to get a speed increase by doing this 
(although it depends how HTML::Element is written).  However, this will 
almost certainly come at the expense of readability and maintainability.

> If I reference a hash several times in a row, is it faster to assign it to a 
> temporary variable, or does Perl cache/optimise that internally?

This would be a good time to pull out the Benchmark module[0] and test 
this for ourselves.  Example:

#!/usr/bin/perl
use strict;
use Benchmark;

our %data;
my $data_ref = \${data{prop}};  # Create a scalar reference to the key
				#
				# 	$$data_ref = 'foo';
				# is equivalent to	
				# 	$data{prop} = 'foo';

# Run each closure (Lookup and Reference) for a minimum of 1 CPU second
timethese(-1, {
	Lookup => sub {
		$data{prop} = rand;
	},
	Reference => sub {
		$$data_ref = rand;
	},
});

__END__

Which produces the output:

Benchmark: running Lookup, Reference, each for at least 1 CPU seconds...
     Lookup:  2 wallclock secs ( 1.08 usr +  0.01 sys =  1.09 CPU) @ 
876818.35/s (n=955732)
  Reference:  3 wallclock secs ( 1.10 usr +  0.00 sys =  1.10 CPU) @ 
1429875.45/s (n=1572863)

Showing that it is more efficent to use the reference to the data rather 
than performing the lookup each time.  The Lookup managed 876,818 
iterations per second, whereas the Reference managed 1429875 iterations 
per second.

However, it should be remembered that it has taken a million iterations 
to produce a noticable difference.

> When building text by shuffling chunks of text around, is there a more 
> efficient way than appending the text as you go, and storing the chunks in a 
> hash?

I think I would want to have a better idea of what you are trying to do 
with this one.  However, I doubt there is much to be saved either 
speed-wise or memory-wise here.

> Any other hints on how I can get performance increases by the way data is 
> handled?
> 

I am of the opinion that you probably want to keep the code readable and 
maintainable rather than trying to squeeze every optimisation out of 
perl. As you can see with the hash keys and references it takes a large 
number of iterations to produce a noticable difference and even then we 
are only talking half a second or so.

If you have a serious speed issue I would look at your algorithms first 
and make sure they are optimised for the problem rathrer than trying to 
optimise the implementation of the algorithm.

Anyway, hope this helps,

Matt

[0] perldoc Benchmark

-- 
print map{s&^(.)&\U\1&&&$_}split$,=$",(`$^Xdoc$,-qj`=~m&"(.*)"&)[0].$/