[boulder.pm] unsubscribe

Keanan Smith KSmith at netLibrary.com
Thu Jun 27 17:46:31 CDT 2002


Erp my brain is broken :)

This actually reflects the problem at hand:


use Time::HiRes qw(time);

sub slurp
{
  my($fn) = @_;
  my $tmp = $/;
  undef $/;
  open(IN,"<$fn");
  my $buff = <IN>;
  close IN;
  $/ = $tmp;
  return $buff;
}


$value = slurp("E:\\52350\\source\\52350_.html");  # 3.7 Meg file with
#$value = slurp("E:\\67345\\TEST\\67345.mtml");
#$value = slurp("E:\\52350\\source\\52350_.html");



$timest = &time;
  @array = split(/\n/,$value); 
  my $mypos = 0;
  foreach $line (@array)
  {
    $mypos += length($line);


    $lines[$i] = $mypos;
    $positions{$mypos} = $i;
    $i++;
  }
print "Trial 1 took: ".(&time - $timest)."\n";

print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";


$timest = &time;
  $i=0;
  while ($value =~ /\n/g)
  {
    my $mypos = pos($value);
    $array[$i] = $1;
    
    $lines[$i] = $mypos;
    $positions{$mypos} = $i;
    $i++;

  }
print "Trial 2 took: ".(&time - $timest)."\n";


print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";



With output:

Trial 1 took: 1.95299994945526
Source is 3783482 bytes and 69559 lines long.
Trial 2 took: 1.33200001716614
Source is 3783482 bytes and 69559 lines long.

Although the example below does pose two interesting question,
a. why was it faster this time, I'm doing more work!
b. why is the while faster this time, but slower below (That's really
odd...)

The only thing I can think of is that somehow there's some weird caching
going on.

-----Original Message-----
From: Keanan Smith [mailto:KSmith at netLibrary.com]
Sent: Thursday, June 27, 2002 4:38 PM
To: 'boulder-pm-list at happyfunball.pm.org'
Subject: RE: [boulder.pm] unsubscribe


Here's a little more complete example:

use Time::HiRes qw(time);

sub slurp
{
  my($fn) = @_;
  my $tmp = $/;
  undef $/;
  open(IN,"<$fn");
  my $buff = <IN>;
  close IN;
  $/ = $tmp;
  return $buff;
}


$value = slurp("E:\\52350\\source\\52350_.html");  # 3.8 Meg file with about
69.5K lines


$timest = &time;
  @array = split(/\n/,$value); 
  my $mypos = 0;
  foreach $line (@array)
  {
    $mypos += length($line);
  }
print "Trial 1 took: ".(&time - $timest)."\n";

print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";


$timest = &time;
  while ($value =~ /\n/g)
  {
    my $mypos = pos($value);
    $array[$i] = $1;
    $i++;
  }
print "Trial 2 took: ".(&time - $timest)."\n";


print "Source is ".length($value)." bytes and ".scalar(@array)." lines
long.\n";



Produces this output:

Trial 1 took: 0.220000028610229
Source is 3783482 bytes and 69559 lines long.
Trial 2 took: 0.490999937057495
Source is 3783482 bytes and 69559 lines long.


(Note that reversing the order of the trials doesn't make the split any
faster, or the while any slower)

-----Original Message-----
From: Jay Kominek [mailto:Jay.Kominek at colorado.edu]
Sent: Thursday, June 27, 2002 4:01 PM
To: 'boulder-pm-list at happyfunball.pm.org'
Subject: RE: [boulder.pm] unsubscribe



On Thu, 27 Jun 2002, Keanan Smith wrote:

>    @{$me->{arrayrep}} = split(/(?=$me->{break})/,$value);
>    my $mypos = 0;
>    foreach $line (@{$me->{arrayrep}})

>     while ($value =~ /$me->{break}/g)
>     {
>       my $mypos = pos($value);

> I would think that the first (Which calls the 'split' builtin to generate
> the array)
> would be faster than the second (Which repeatedly does a regular
expression
> and assigns the list elements by hand)

> But in fact it's the reverse! Weird eh?
> Anyone have a good explination for why?

You're allocating more memory for the first one, as well as using a zero
width assertion. If I recall correctly, those (can) significantly increase
the time it takes to perform a match. They'd certainly make it more
complex to repeatedly match a string.

The double indirection to access the elements of the array can't help
much, either. (Hopefully Perl is optimizing it away, but you never know.)

Any actual values for the change in algorithmic and constant time?

- Jay Kominek <jay.kominek at colorado.edu>
  Plus ça change, plus c'est la même chose



More information about the Boulder-pm mailing list