[Chicago-talk] appending hashes

Steven Lembark lembark at jeeves.wrkhors.com
Tue Nov 4 11:19:44 CST 2003



-- Andy_Bach at wiwb.uscourts.gov

> Interestingly (???) the map method:
>  'mapappend' => sub
>          {
>                  %a = %b;
>                  %a = map {$_ => $c{$_} }  keys(%c);
>          },
>
> is slower and gets worse as the hashes get bigger. I bumped up c to:
> %c = map(($_, $_), 0 .. 6750);
> and:
> Benchmark: timing 1000 iterations of append, baseline, mapappend...
>     append: 36 wallclock secs (33.94 usr +  0.22 sys = 34.16 CPU) @
> 29.27/s (n=1000)
>   baseline:  2 wallclock secs ( 2.08 usr +  0.02 sys =  2.10 CPU) @
> 476.19/s (n=1000)
>  mapappend: 65 wallclock secs (61.32 usr +  0.58 sys = 61.90 CPU) @
> 16.16/s (n=1000)
>
> (left off unroll as it was really slow).  Looks like slices are the way
> to  go.

Makes sense: map has to individually process the unrolled
hash into $_ and build the output list via -- essentially --
push as it goes along. Serializing the operation is most of
what causes the pain.

So far as I know:

	@foo{keys %bar} = values %bar

is the fastest, lowest overhead way to merge the hashes.

To merge multiple hashes use hash referents in a sub (e.g.,
from a job I use to manage the environment):

	#!/blah/perl

	...

	sub merge
	{
		my %bucket = ();

		@bucket{ keys %$_ } = values %$_ for @_;

		\%bucket
	}

	# don't wanna loose these either way...

	my @inherit = qw( TERM HOME USER MAIL DISPLAY );

	# configured environment

	my %default = qw( ... ); # read from config files, whatever
	my %host    = qw( ... ); # point is they go from least specific
	my %user    = qw( ... ); # to most specific as you go down
	my %job     = qw( ... ); # the list

	my $newenv = merge \%ENV, \%default, \%host, \%user, \%job;

	@{$newenv}{ @inherit } = @ENV{ @inherit };

	$newenv->{ENV_SETUP_SOURCES} = join ':', @sourcefiles;

	# at this point the environment is hard-wired from
	# the config files -- no need to worry about env
	# vars from a working shell polluting dot-scripts.

	%ENV = %$newenv;

	exec @ARGV || $ENV{SHELL};

	die "Roadkill: $!";


this gets stuffed into a #! and called via something like:


	[ "$ENV_SETUP_SOURCES" = "" ] && exec env_setup $0 $*;

at the top of shell scripts. If the environment has not yet
been set up then the multiple exec's leave the PID alone
(parent never gets a SIGCHLD) but the job is left running
with a fully configured environment. Only trick is to make
sure the env var used to flag the passthrough doesn't collide
with anything else.

--
Steven Lembark                               2930 W. Palmer
Workhorse Computing                       Chicago, IL 60647
                                            +1 888 359 3508



More information about the Chicago-talk mailing list