<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:Courier New,courier,monaco,monospace,sans-serif;font-size:10pt">why split and join? try replace '","' with '|' directly.<br><div><br></div><div style="font-family:Courier New, courier, monaco, monospace, sans-serif;font-size:10pt"><br><div style="font-family:times new roman, new york, times, serif;font-size:12pt"><font face="Tahoma" size="2"><hr size="1"><b><span style="font-weight: bold;">From:</span></b> Jay Strauss <me@heyjay.com><br><b><span style="font-weight: bold;">To:</span></b> Chicago.pm chatter <chicago-talk@pm.org><br><b><span style="font-weight: bold;">Sent:</span></b> Wed, April 20, 2011 11:37:47 AM<br><b><span style="font-weight: bold;">Subject:</span></b> [Chicago-talk] Performance issue<br></font><br>Hi all,<div><br></div><div>I have a csv file, with quoted strings (i.e. "field1","field2",...). The file is 3.5M
records. I'm running strawberry perl on win7 (not that I think that's the issue). What I need to do is convert any embedded "|" to "-", convert the field delimiter ' "," ' to "|". I know there are cpan mods for parsing csv but my situation is pretty straight forward. I'm doing:</div>
<div><br></div><div><div>use strict;</div><div><br></div><div>while(<>) {</div><div><br></div><div><span class="Apple-tab-span" style="white-space:pre;"> </span>$_ = substr $_, 1, -2; <span class="Apple-tab-span" style="white-space:pre;"> </span>#<span class="Apple-tab-span" style="white-space:pre;"> </span>Remove first and last ", and remove</div>
<div><span class="Apple-tab-span" style="white-space:pre;"> </span> #<span class="Apple-tab-span" style="white-space:pre;"> </span>the \n at the same time</div><div><span class="Apple-tab-span" style="white-space:pre;"> </span>#</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre;"> </span>s/\|/-/g;<span class="Apple-tab-span" style="white-space:pre;"> </span> # <span class="Apple-tab-span" style="white-space:pre;"> </span>Change embedded "|" into "-"</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre;"> </span>my @words = split(/\",\"/,$_,-1);<span class="Apple-tab-span" style="white-space:pre;"> </span># split on the remaining ","</div>
<div><br></div><div><span class="Apple-tab-span" style="white-space:pre;"> </span>print join("|", @words),"\n";</div><div>}</div></div><div><br></div><div>But it's take what seems like a long time to run (like 15 mins). I'd think this would be an ideal use for Perl, and could rip through the file lickedy split.</div>
<div><br></div><div>I'm I doing something costly in the script above that is making it run so slow?</div><div><br></div><div>Thanks</div><div>Jay</div><div><br></div>
</div></div>
</div></body></html>