SPUG: ifs and whiles and hashes...

Wed Aug 18 18:51:05 CDT 1999

>> Message submitted at: Wed Aug 18 16:51:05 PDT 1999
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3318      

According to Doug Beaver:
> 
> On Wed, Aug 18, 1999 at 03:34:05PM -0700, Ryan Forsythe wrote:
> > hi, it's ryan the perl newbie again, with another (probably) simple
> > question...
> > 
> > i'm writing a program which opens a text file, searches each line, and
> > extracts data from it and sticks it in a hash...well, that's what it's
> > supposed to do.  here's a simplification of the loop i'm having problems
> > with:
> > 
> > while (defined($dbaseLine = <DATABASE>))  {
> > 
> > 	#code here...
> > 
> > 	if ($dbaseLine =~ m/^\"(?:.*)\",\"(.*)\",\"(.*)\"/)  {
> > 		
> > 		$hash{'key1'} = $1;
> > 		$hash{'key2'} = $2;
> > 		#etc...
> > 	}  else  {
> > 		next;
> > 	}
> > }
> > 
> > however, my program has 26 of these '\"(.*)\",' in the 'if
> > ($dbaseLine...' test.  when i run it, it assigns the $dbaseLine variable
> > okay, but when it gets to that if test, it locks up and i watch perl's
> > cpu time go up to 99%.  i'm assuming it's getting in an infinite loop,
> > but why?  i don't understand how an if can cause and infinite loop,
> > especially when it doesn't affect the test variable of the while it's
> > wrapped in (if that makes any sense to anybody :))
> 
> The program isn't getting stuck on the while loop, it's getting stuck on
> your regex.  I think it's because you're making the regex engine
> backtrack each time it trys to match another \"(.*)\"...
> 
> .* is greedy, so in order to match the regex against the string, it
> keeps having to chop off text from previous greedy matches and try to
> use it on the new greedy matches.  I can't explain it very well (Jeffrey
> Friedl does a good job in 'Mastering Regular Expressions'), but it'll
> take forever to run.
> 
> You almost never want to use .* or .*? in order to match stuff like that
> anyways, you want to use this:
> 
> "([^"]*)"

- or ([^"]+), depending on the minimal acceptable match.

Good idea Doug, but according to his code, he was trying to match up
to the next \" sequence, not the next ".  He may have mistakenly
been trying to (unnecessarily) escape the " characters, in which
case your approach above would be appropriate, but I recommended
the .*? approach on the basis of his apparent need to match up to
the next \" sequence.

> 
> (Look in 'Mastering Regular Expressions' for the discussion about
> balanced matches, it talks about the evils of using .*? vs using negated
> character classes.)
> 
> If you know how many fields you're going to need to match, you don't
> even have to create the regex by hand (if you have a huge regex, it's
> easy to make small typos which are hard to track down):
> 
> my $field_regex = '"([^"]*)"';
> my $num_fields  = 20;
> my @tmp;
> 
> until (scalar @tmp >= $num_fields) {
>     push @tmp, $field_regex;
> }
> 
> my $line_regex = join ',', @tmp;
> 
> Try using that or a variation thereof and see if it solves your cpu
> usage problem...
> 
> HTH,
> 
> Doug

*==================================================================*
| Tim Maher, PhD  CEO, Consultix &    (206) 781-UNIX/8649          |
|      Pacific Software Gurus, Inc.   Email: tim at consultix-inc.com |
| "The UNIX/Perl Training Experts"    http://www.consultix-inc.com |
|CLASSES: Shell+Utils: 8/23-27; Perl: 8/30-9/1;  Perl Modules: 9/2;|
*==================================================================*

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address