[Pdx-pm] hash from array question

Sat May 26 15:44:28 PDT 2007

Tom I did some quickie benchmarking and I think that I found a faster
way, before we get to that though, I want to make sure that I
understand the task at hand.

- you have a src file that contains many lines of data.
- each line of data is a single unit (ie [single item of data] dosn't
spill to another line, or there are more then one [single item of
data] per line)
- you have a second list of 'names'
- the goal is to get every line from src that has an occurance of any
of the possible names.

if thats the case then I found that this worked:

my $names_join = join '|', @names;
my @interesting_lines;
foreach (@lines) {
   push @interesting_lines, $_ if $_ =~ m/($names_join)/;
}

my test case was ftp://gutenberg.readingroo.ms/gutenberg/1/0/6/0/10607/10607.txt

in my test case your method with the two greps took 0.294sec and the
foreach line push came in at 0.067sec.

hope that helps.

On 5/26/07, Eric Wilhelm <scratchcomputing at gmail.com> wrote:
> # from Thomas Keller
> # on Friday 25 May 2007 10:55 pm:
>
> >my @index = grep {
> >my $c = $_;
> >if ($c > $#lines or # always false
> >( grep { $lines[$c] =~ m/$_/ } @names ) > 0 )
> > { 1; #yes, select it
> >} else {
> >0;# no, skip it
> >}
> >} 0..$#lines;
>
> This was kind of bugging me, so I looked it up.  Note the original
> snippet is already suffering from a sort of illustrative clarity:
>
>   my @bigger_indices = grep {
>     if($_ > $#y or $x[$_] > $y[$_]) {
>       1;
>     } else {
>       0;
>     }
>   } 0..$#x;
>
> This reduces to:
>
>   my @bigger_indices = grep({$_ > $#y or $x[$_] > $y[$_]} 0..$#x);
>
> (And the book actually does that just down the page, though it then goes
> on to suggest using map but reverts back to the if/else rather than a
> ternary.)
>
> So, firstly always be wary of "if($expr) {1} else {0}" code, (and this
> is somewhat awkwardly hiding a ternary ($expr ? 1 : 0) which is much
> more obviously just ($expr).  The important point being that the return
> value of the block handed to grep is true or false.
>
> Secondly, whenever a book introduces a function, read perldoc to really
> grok how it works outside the context of the book's example.
>
> Finally, use of if/else for implicit return values is just bad.
>
> I think the "always false" comment must be because that is a vestigial
> bit leftover from the original example.
>
> Speaking of comments, it's important to realize that books have to be
> more commenty than production code.  They're trying to teach.  You just
> need to remind yourself and other programmers why the lines need to be
> whittled out from the file.
>
> --Eric
> --
> "If you only know how to use a hammer, every problem begins to look like
> a nail."
> --Richard B. Johnson
> ---------------------------------------------------
>     http://scratchcomputing.com
> ---------------------------------------------------
> _______________________________________________
> Pdx-pm-list mailing list
> Pdx-pm-list at pm.org
> http://mail.pm.org/mailman/listinfo/pdx-pm-list
>

-- 
benh~