[Chicago-talk] Performance issue

Jay Strauss me at heyjay.com
Wed Apr 20 20:42:56 PDT 2011


I was under the impression that regexs were slow.

I changed to your suggestion.  I think the real prob was using pipes

Thank you
Jay

On Wed, Apr 20, 2011 at 11:49 AM, tiger peng <tigerpeng2001 at yahoo.com>wrote:

> why split and join? try replace '","' with '|' directly.
>
>
> ------------------------------
> *From:* Jay Strauss <me at heyjay.com>
> *To:* Chicago.pm chatter <chicago-talk at pm.org>
> *Sent:* Wed, April 20, 2011 11:37:47 AM
> *Subject:* [Chicago-talk] Performance issue
>
> Hi all,
>
> I have a csv file, with quoted strings (i.e. "field1","field2",...).  The
> file is 3.5M records.  I'm running strawberry perl on win7 (not that I think
> that's the issue).  What I need to do is convert any embedded "|" to "-",
> convert the field delimiter ' "," ' to "|".  I know there are cpan mods for
> parsing csv but my situation is pretty straight forward.  I'm doing:
>
> use strict;
>
> while(<>) {
>
> $_ = substr $_, 1, -2; # Remove first and last ", and remove
>                         # the \n at the same time
> #
>
> s/\|/-/g;        # Change embedded "|" into "-"
>
> my @words = split(/\",\"/,$_,-1); # split on the remaining ","
>
> print join("|", @words),"\n";
> }
>
> But it's take what seems like a long time to run (like 15 mins).  I'd think
> this would be an ideal use for Perl, and could rip through the file lickedy
> split.
>
> I'm I doing something costly in the script above that is making it run so
> slow?
>
> Thanks
> Jay
>
>
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20110420/7fd4bf1a/attachment.html>


More information about the Chicago-talk mailing list