SPUG: ifs and whiles and hashes...

Wed Aug 18 18:29:46 CDT 1999

On Wed, Aug 18, 1999 at 03:34:05PM -0700, Ryan Forsythe wrote:
> hi, it's ryan the perl newbie again, with another (probably) simple
> question...
> 
> i'm writing a program which opens a text file, searches each line, and
> extracts data from it and sticks it in a hash...well, that's what it's
> supposed to do.  here's a simplification of the loop i'm having problems
> with:
> 
> while (defined($dbaseLine = <DATABASE>))  {
> 
> 	#code here...
> 
> 	if ($dbaseLine =~ m/^\"(?:.*)\",\"(.*)\",\"(.*)\"/)  {
> 		
> 		$hash{'key1'} = $1;
> 		$hash{'key2'} = $2;
> 		#etc...
> 	}  else  {
> 		next;
> 	}
> }
> 
> however, my program has 26 of these '\"(.*)\",' in the 'if
> ($dbaseLine...' test.  when i run it, it assigns the $dbaseLine variable
> okay, but when it gets to that if test, it locks up and i watch perl's
> cpu time go up to 99%.  i'm assuming it's getting in an infinite loop,
> but why?  i don't understand how an if can cause and infinite loop,
> especially when it doesn't affect the test variable of the while it's
> wrapped in (if that makes any sense to anybody :))

The program isn't getting stuck on the while loop, it's getting stuck on
your regex.  I think it's because you're making the regex engine
backtrack each time it trys to match another \"(.*)\"...

.* is greedy, so in order to match the regex against the string, it
keeps having to chop off text from previous greedy matches and try to
use it on the new greedy matches.  I can't explain it very well (Jeffrey
Friedl does a good job in 'Mastering Regular Expressions'), but it'll
take forever to run.

You almost never want to use .* or .*? in order to match stuff like that
anyways, you want to use this:

"([^"]*)"

(Look in 'Mastering Regular Expressions' for the discussion about
balanced matches, it talks about the evils of using .*? vs using negated
character classes.)

If you know how many fields you're going to need to match, you don't
even have to create the regex by hand (if you have a huge regex, it's
easy to make small typos which are hard to track down):

my $field_regex = '"([^"]*)"';
my $num_fields  = 20;
my @tmp;

until (scalar @tmp >= $num_fields) {
    push @tmp, $field_regex;
}

my $line_regex = join ',', @tmp;

Try using that or a variation thereof and see if it solves your cpu
usage problem...

HTH,

Doug

-- 
Smithers: I'm afraid we have a bad image, Sir.  Market research shows
          people see you as somewhat of an ogre.
   Burns: I ought to club them and eat their bones!

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address