[Melbourne-pm] I <3 map & grep too :)

Wed Oct 26 23:12:25 PDT 2011

On Thu, Oct 27, 2011 at 04:07:45PM +1100, Toby Corkindale wrote:
> But perhaps, a little too much..
> I just found myself this:
> 
> 
>     use Config::General qw(ParseConfig);
>     ...
>     opendir(my $dir, $self->sites_dir) or die..;
> 
>     my %sites = map  { $_->{name} => Streuth::Site->new($_) }
>                 map  { { ParseConfig($_) } }
>                 grep { -f $_ }
>                 map  { File::Spec->catfile($self->sites_dir, $_) }
>                 grep { /^[-_a-zA-Z\d\.]+$/ }
>                 readdir($dir);
> 
>     closedir $dir;
>     return \%sites;

Looks good to me :)

I usually read map/grep chains in perl from the bottom up,
maybe because I'm accustomed to the shell's way of doing it.

I've been reading about 'flow based programming', your snippet would
fits nicely in that data-flow style of programming (or the functional style).

An advantage of map / grep / reduce, compared to 'for', is that a program like
this could be run in parallel, in two different ways.  It could operate on
multiple data in parallel (shell doesn't normally do that), and it could use
'pipelining' with all the steps running in parallel (shell does do that).

I guess that's why google is keen on 'map-reduce' programming, such a program
can take good advantage of pretty much however many processors you care to
throw at it.

Given enough 'workers', a data flow system can process whole arrays with the
same throughput / bandwidth as the slowest 'bottleneck' worker operates on a
single item.

For example if it takes 1ns to do a binary arithmetic addition, we could find
the sum of 1024 numbers in 10ns using a 'tree' with 10 levels of adding units,
it would have 1023 adders in total.

If we have a stream of float[1024] arrays, from a robot eye or ear or whatever,
we can process them at the rate of 1 billion arrays per second.  At any one
time we have 10 arrays going through the system, one at each level.  The
latency is 10ns, but throughput is 1 billion arrays per second, it would input
1024 billion numbers per second and output 1 billion totals per second.

I don't know a faster way to do it!  :)