[Wellington-pm] Stripping spaces.
Grant McLean
grant at mclean.net.nz
Mon Oct 11 16:04:46 CDT 2004
On Tue, 2004-10-12 at 09:31, Peter Love wrote:
> > s/^\s+//g; s/\s+$//g;
>
> Why have the g modifier on these? As they are anchored, surely there is
> only one occurence?
Yes, the /g is unnecessary and including it does seem to make it a
little slower.
> How would the following compare with the other two?
> s/^\s*(.*?)\s*$/$1/
Horribly :-)
The problem with it is that \s* always matches so it always does a
substitute even if there are no spaces. I tried tweaking it to look
like this:
s/^\s+(.*?)\s+$/$1/
But then realised that it would only work for strings with spaces at
both ends.
It really does depend on the data though. When I used my system
dictionary file (which has no spaces), the results looked like this:
one_regex: 13 wallclock secs
two_regex: 9 wallclock secs
capture: 26 wallclock secs
But when I used a variation of the file with a random number (0-99) of
spaces at the beginning and end, it looked like this:
one_regex: 19 wallclock secs
two_regex: 13 wallclock secs
capture: 31 wallclock secs
So although the capture approach starts out worse it doesn't degrade so
much when it has to do more work.
use strict;
use Benchmark;
#my $dict_file = '/usr/share/dict/words';
my $dict_file = './space_words';
timethese(50, {
'one_regex' => \&one_regex,
'two_regex' => \&two_regex,
'capture' => \&capture,
});
sub one_regex {
open my $dict, '<', $dict_file or die "$!";
while(<$dict>) {
s/^\s+|\s+$//g;
}
}
sub two_regex {
open my $dict, '<', $dict_file or die "$!";
while(<$dict>) {
s/^\s+//g;
s/\s+$//g;
}
}
sub capture {
open my $dict, '<', $dict_file or die "$!";
while(<$dict>) {
s/^\s*(.*?)\s*$/$1/
}
}
More information about the Wellington-pm
mailing list