SPUG: ifs and whiles and hashes...

Darren/Torin/Who Ever... torin at daft.com
Thu Aug 19 03:02:09 CDT 1999


Ryan Forsythe <ryan2 at webrocket.net>, in an immanent manifestation of deity, wrote:
>	if ($dbaseLine =~ m/^\"(?:.*)\",\"(.*)\",\"(.*)\"/)  {
>		$hash{'key1'} = $1;
>		$hash{'key2'} = $2;
>		#etc...
>	}  else  {
>however, my program has 26 of these '\"(.*)\",' in the 'if
>($dbaseLine...' test.  when i run it, it assigns the $dbaseLine variable
>okay, but when it gets to that if test, it locks up and i watch perl's
>cpu time go up to 99%.  i'm assuming it's getting in an infinite loop,
>but why?  i don't understand how an if can cause and infinite loop,
>especially when it doesn't affect the test variable of the while it's
>wrapped in (if that makes any sense to anybody :))

The if isn't the infinite loop, the regex is.  And theoretically, it's not 
infinite loop.  But it probably won't finish before the projected
heat-death of the universe.

The problem is back-tracking.  I think (someone please correct me if I'm 
wrong) that with the /\"(.*)\",/, you are increasing your runtime
exponentially (O^n).  Assuming that you want each substring to stop as
soon as it hits the sequence "," then your regex will work much quicker
if you use /^\"(?:.*?)\",\"(.*?)\",\"(.*?)\"/.  This says to minimally
match what's inside quotes.  It stops as soon as it can complete a match 
rather than trying for the largest match.  If you don't allow there to
be escaped quotes in your matched strings, it's much quicker to say:
/^\"(?:[^"])\",\"([^"])\",\"([^"])\"/.  That tells it that you want
everything to the right of a " until there is another ".

Note that you're just parsing CSV (comma separated values), a known
problem.  I asked for the CSV modules on CPAN and got the following:
Module          Bundle::DBD::CSV (N/A)
Module          DBD::CSV        (JWIED/DBD-CSV-0.1021.tar.gz)
Module          Text::CSV       (ALANCITT/Text-CSV-0.01.tar.gz)
Module          Text::CSV_XS    (JWIED/Text-CSV_XS-0.20.tar.gz)

You probably aren't interested in the DBD module unless you're
incorporating your data into a real database.  Text::CSV and
Text::CSV_XS would probably work for you though.

I saw the following regex in the Owl book (read it, learn it, love it):
push(@fields, $+) while $text =~ m{
    "([^\"\\]*(?:\\.[^\"\\]*)*)",?  # Standard quoted string (with possible comma)
  | ([^,]+),?                       # or   up to next comma (with possible comma)
  | ,                               # or   just a comma

This is quite what you want but it's close.  Look on pages 205-209 of
the Owl Book (Mastering Regular Expressions by Jeffrey Friedl, published 
by ORA, 1997; ISBN: 1-56592-257-3) for an explanation of this.

- -- 
<torin at daft.com> <http://www.daft.com/~torin> <torin at debian.org> <torin at io.com>
Darren Stalder/2608 Second Ave, @282/Seattle, WA 98121-1212/USA/+1-800-921-4996
@ Sysadmin, webweaver, postmaster for hire. C/Perl/CGI/Pilot programmer/tutor @
@		     Make a little hot-tub in your soul.		      @

Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.1, an Emacs/PGP interface


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address

More information about the spug-list mailing list