SPUG: ifs and whiles and hashes...
Darren/Torin/Who Ever...
torin at daft.com
Thu Aug 19 03:02:09 CDT 1999
-----BEGIN PGP SIGNED MESSAGE-----
Ryan Forsythe <ryan2 at webrocket.net>, in an immanent manifestation of deity, wrote:
> if ($dbaseLine =~ m/^\"(?:.*)\",\"(.*)\",\"(.*)\"/) {
>
> $hash{'key1'} = $1;
> $hash{'key2'} = $2;
> #etc...
> } else {
>
>however, my program has 26 of these '\"(.*)\",' in the 'if
>($dbaseLine...' test. when i run it, it assigns the $dbaseLine variable
>okay, but when it gets to that if test, it locks up and i watch perl's
>cpu time go up to 99%. i'm assuming it's getting in an infinite loop,
>but why? i don't understand how an if can cause and infinite loop,
>especially when it doesn't affect the test variable of the while it's
>wrapped in (if that makes any sense to anybody :))
The if isn't the infinite loop, the regex is. And theoretically, it's not
infinite loop. But it probably won't finish before the projected
heat-death of the universe.
The problem is back-tracking. I think (someone please correct me if I'm
wrong) that with the /\"(.*)\",/, you are increasing your runtime
exponentially (O^n). Assuming that you want each substring to stop as
soon as it hits the sequence "," then your regex will work much quicker
if you use /^\"(?:.*?)\",\"(.*?)\",\"(.*?)\"/. This says to minimally
match what's inside quotes. It stops as soon as it can complete a match
rather than trying for the largest match. If you don't allow there to
be escaped quotes in your matched strings, it's much quicker to say:
/^\"(?:[^"])\",\"([^"])\",\"([^"])\"/. That tells it that you want
everything to the right of a " until there is another ".
Note that you're just parsing CSV (comma separated values), a known
problem. I asked for the CSV modules on CPAN and got the following:
Module Bundle::DBD::CSV (N/A)
Module DBD::CSV (JWIED/DBD-CSV-0.1021.tar.gz)
Module Text::CSV (ALANCITT/Text-CSV-0.01.tar.gz)
Module Text::CSV_XS (JWIED/Text-CSV_XS-0.20.tar.gz)
You probably aren't interested in the DBD module unless you're
incorporating your data into a real database. Text::CSV and
Text::CSV_XS would probably work for you though.
I saw the following regex in the Owl book (read it, learn it, love it):
push(@fields, $+) while $text =~ m{
"([^\"\\]*(?:\\.[^\"\\]*)*)",? # Standard quoted string (with possible comma)
| ([^,]+),? # or up to next comma (with possible comma)
| , # or just a comma
}gx;
This is quite what you want but it's close. Look on pages 205-209 of
the Owl Book (Mastering Regular Expressions by Jeffrey Friedl, published
by ORA, 1997; ISBN: 1-56592-257-3) for an explanation of this.
Darren
- --
<torin at daft.com> <http://www.daft.com/~torin> <torin at debian.org> <torin at io.com>
Darren Stalder/2608 Second Ave, @282/Seattle, WA 98121-1212/USA/+1-800-921-4996
@ Sysadmin, webweaver, postmaster for hire. C/Perl/CGI/Pilot programmer/tutor @
@ Make a little hot-tub in your soul. @
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3a
Charset: noconv
Comment: Processed by Mailcrypt 3.5.1, an Emacs/PGP interface
iQCVAwUBN7u53Y4wrq++1Ls5AQH8LAP/ZjaxW20/odYfwQkR5okzZmiX2qAtxB67
BEQqokjq7gf6JbUvWC+pcGvJkyPQpq8wdDGhrmjHGpvqhBUWyoRzTbhaIwqlRNgJ
D4n3jjUCqsFUzkm9i+PhLbzkJ2nhdeaeDH1zcxMRiKWqVNqfPtJv01kxyG0a9fCX
KvFCOnfuHQ8=
=8Up+
-----END PGP SIGNATURE-----
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
Email to majordomo at pm.org: ACTION spug-list your_address
More information about the spug-list
mailing list