[Chicago-talk] Performance issue

tiger peng tigerpeng2001 at yahoo.com
Wed Apr 20 09:49:03 PDT 2011


why split and join? try replace '","' with '|' directly.





________________________________
From: Jay Strauss <me at heyjay.com>
To: Chicago.pm chatter <chicago-talk at pm.org>
Sent: Wed, April 20, 2011 11:37:47 AM
Subject: [Chicago-talk] Performance issue

Hi all,

I have a csv file, with quoted strings (i.e. "field1","field2",...).  The file 
is 3.5M records.  I'm running strawberry perl on win7 (not that I think that's 
the issue).  What I need to do is convert any embedded "|" to "-", convert the 
field delimiter ' "," ' to "|".  I know there are cpan mods for parsing csv but 
my situation is pretty straight forward.  I'm doing:

use strict;

while(<>) {

$_ = substr $_, 1, -2; #Remove first and last ", and remove
                        #the \n at the same time
#

s/\|/-/g;        # Change embedded "|" into "-"

my @words = split(/\",\"/,$_,-1);# split on the remaining ","

print join("|", @words),"\n";
}

But it's take what seems like a long time to run (like 15 mins).  I'd think this 
would be an ideal use for Perl, and could rip through the file lickedy split.

I'm I doing something costly in the script above that is making it run so slow?

Thanks
Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/chicago-talk/attachments/20110420/3873fe2a/attachment.html>


More information about the Chicago-talk mailing list