[boulder.pm] unsubscribe

Thu Jun 27 17:38:07 CDT 2002

Here's a little more complete example:

use Time::HiRes qw(time);

sub slurp
{
  my($fn) = @_;
  my $tmp = $/;
  undef $/;
  open(IN,"<$fn");
  my $buff = <IN>;
  close IN;
  $/ = $tmp;
  return $buff;
}

$value = slurp("E:\\52350\\source\\52350_.html");  # 3.8 Meg file with about
69.5K lines

$timest = &time;
  @array = split(/\n/,$value); 
  my $mypos = 0;
  foreach $line (@array)
  {
    $mypos += length($line);
  }
print "Trial 1 took: ".(&time - $timest)."\n";

print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";

$timest = &time;
  while ($value =~ /\n/g)
  {
    my $mypos = pos($value);
    $array[$i] = $1;
    $i++;
  }
print "Trial 2 took: ".(&time - $timest)."\n";

print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";

Produces this output:

Trial 1 took: 0.220000028610229
Source is 3783482 bytes and 69559 lines long.
Trial 2 took: 0.490999937057495
Source is 3783482 bytes and 69559 lines long.

(Note that reversing the order of the trials doesn't make the split any
faster, or the while any slower)

-----Original Message-----
From: Jay Kominek [mailto:Jay.Kominek at colorado.edu]
Sent: Thursday, June 27, 2002 4:01 PM
To: 'boulder-pm-list at happyfunball.pm.org'
Subject: RE: [boulder.pm] unsubscribe

On Thu, 27 Jun 2002, Keanan Smith wrote:

>    @{$me->{arrayrep}} = split(/(?=$me->{break})/,$value);
>    my $mypos = 0;
>    foreach $line (@{$me->{arrayrep}})

>     while ($value =~ /$me->{break}/g)
>     {
>       my $mypos = pos($value);

> I would think that the first (Which calls the 'split' builtin to generate
> the array)
> would be faster than the second (Which repeatedly does a regular
expression
> and assigns the list elements by hand)

> But in fact it's the reverse! Weird eh?
> Anyone have a good explination for why?

You're allocating more memory for the first one, as well as using a zero
width assertion. If I recall correctly, those (can) significantly increase
the time it takes to perform a match. They'd certainly make it more
complex to repeatedly match a string.

The double indirection to access the elements of the array can't help
much, either. (Hopefully Perl is optimizing it away, but you never know.)

Any actual values for the change in algorithmic and constant time?

- Jay Kominek <jay.kominek at colorado.edu>
  Plus ça change, plus c'est la même chose