[Wellington-pm] Stripping spaces.

Mon Oct 11 16:04:46 CDT 2004

On Tue, 2004-10-12 at 09:31, Peter Love wrote:
> >     s/^\s+//g; s/\s+$//g;
> 
> Why have the g modifier on these?  As they are anchored, surely there is 
> only one occurence?

Yes, the /g is unnecessary and including it does seem to make it a
little slower.

> How would the following compare with the other two?
>      s/^\s*(.*?)\s*$/$1/

Horribly :-)

The problem with it is that \s* always matches so it always does a
substitute even if there are no spaces.  I tried tweaking it to look
like this:

s/^\s+(.*?)\s+$/$1/

But then realised that it would only work for strings with spaces at
both ends.

It really does depend on the data though.  When I used my system
dictionary file (which has no spaces), the results looked like this:

 one_regex: 13 wallclock secs
 two_regex:  9 wallclock secs
   capture: 26 wallclock secs

But when I used a variation of the file with a random number (0-99) of
spaces at the beginning and end, it looked like this:

 one_regex: 19 wallclock secs
 two_regex: 13 wallclock secs
   capture: 31 wallclock secs

So although the capture approach starts out worse it doesn't degrade so
much when it has to do more work.

use strict;

use Benchmark;

#my $dict_file = '/usr/share/dict/words';
my $dict_file = './space_words';

timethese(50, {
  'one_regex' => \&one_regex,
  'two_regex' => \&two_regex,
  'capture'   => \&capture,
});

sub one_regex {
  open my $dict, '<', $dict_file or die "$!";
  while(<$dict>) {
    s/^\s+|\s+$//g;
  }
}

sub two_regex {
  open my $dict, '<', $dict_file or die "$!";
  while(<$dict>) {
    s/^\s+//g;
    s/\s+$//g;
  }
}

sub capture {
  open my $dict, '<', $dict_file or die "$!";
  while(<$dict>) {
    s/^\s*(.*?)\s*$/$1/
  }
}