SPUG: Counting the number of instances in a string using Perl's regexes...

Jason Lamport jason at strangelight.com
Mon Dec 18 16:02:14 CST 2000


At 11:51 AM -0800 12/18/00, Jonathan Gardner wrote:
>Say I had a string with a whole bunch of characters - "asdfjkhafjklhaf".
>
>How would I go about counting the number of individual characters (say,
>'a')?
>
>One way I found in the book is this:
>@arr = $string =~ m/a/g;
>$number_of_a = scalar @arr;
>
>Is there a more elegant way of going about this? I want to skip the creation
>of @arr

Here's one way:

$number_of_a = 0;
$string =~ s/a/ ++$number_of_a , 'a' /eg;

5000 iterations on a 29k-character $string (which contained 1188 
'a's) took 33 seconds on my machine. (In comparison, using the 
original approach of assigning to @arr took 46 seconds.)

If you're certain that you'll only be searching for single characters 
(not strings of characters, or more complex patterns) this is much 
more efficient:

$number_of_a = ( $string =~ tr/a/a/ );

(And yes, I am aware that those paren's are probably unnecessary: 
they make the code more readable IMO.)

The same 5000 iterations took only 5 seconds using tr.

>(I am worried about performance. The actual thing I will match is a
>little big and will take some time to write to an array.)

If you're worried about performance, doing a pattern match on a very 
large scalar is probably not going to be the fastest approach, 
regardless of whether you create an intermediate array or not. 
Consider scanning the data as it's being input, rather than storing 
it all in a huge scalar and *then* scanning it.  E.g. if you're 
reading from a file with filehandle IN, this would work:

$/ = 'a';
$number_of_a = -1;
while ( <IN> ) {
	$string .= $_;
	++$number_of_a;
}

(Surprisingly, this works as-is: when the file ends with 'a', the 
while loop will be called one extra time with $_ set to the empty 
string.  At least it does in 5.004 -- I'm not sure if I'd trust this 
behavior to remain constant in other versions.)

-jason

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list