SPUG: Counting the number of instances in a string using
Perl's regexes...
Jason Lamport
jason at strangelight.com
Mon Dec 18 16:02:14 CST 2000
At 11:51 AM -0800 12/18/00, Jonathan Gardner wrote:
>Say I had a string with a whole bunch of characters - "asdfjkhafjklhaf".
>
>How would I go about counting the number of individual characters (say,
>'a')?
>
>One way I found in the book is this:
>@arr = $string =~ m/a/g;
>$number_of_a = scalar @arr;
>
>Is there a more elegant way of going about this? I want to skip the creation
>of @arr
Here's one way:
$number_of_a = 0;
$string =~ s/a/ ++$number_of_a , 'a' /eg;
5000 iterations on a 29k-character $string (which contained 1188
'a's) took 33 seconds on my machine. (In comparison, using the
original approach of assigning to @arr took 46 seconds.)
If you're certain that you'll only be searching for single characters
(not strings of characters, or more complex patterns) this is much
more efficient:
$number_of_a = ( $string =~ tr/a/a/ );
(And yes, I am aware that those paren's are probably unnecessary:
they make the code more readable IMO.)
The same 5000 iterations took only 5 seconds using tr.
>(I am worried about performance. The actual thing I will match is a
>little big and will take some time to write to an array.)
If you're worried about performance, doing a pattern match on a very
large scalar is probably not going to be the fastest approach,
regardless of whether you create an intermediate array or not.
Consider scanning the data as it's being input, rather than storing
it all in a huge scalar and *then* scanning it. E.g. if you're
reading from a file with filehandle IN, this would work:
$/ = 'a';
$number_of_a = -1;
while ( <IN> ) {
$string .= $_;
++$number_of_a;
}
(Surprisingly, this works as-is: when the file ends with 'a', the
while loop will be called one extra time with $_ set to the empty
string. At least it does in 5.004 -- I'm not sure if I'd trust this
behavior to remain constant in other versions.)
-jason
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list
mailing list