[Chicago-talk] Performance issue

Warren Lindsey warren.lindsey at gmail.com
Wed Apr 20 10:03:53 PDT 2011


I assume by your use of <> and print without a filehandle that you are going through pipes and reading from STDIN and writing to STDOUT. I suspect opening input and output file handles will be more efficient. Less data movement between buffers. 

Cheers,
Warren

On Apr 20, 2011, at 11:37 AM, Jay Strauss <me at heyjay.com> wrote:

> Hi all,
> 
> I have a csv file, with quoted strings (i.e. "field1","field2",...).  The file is 3.5M records.  I'm running strawberry perl on win7 (not that I think that's the issue).  What I need to do is convert any embedded "|" to "-", convert the field delimiter ' "," ' to "|".  I know there are cpan mods for parsing csv but my situation is pretty straight forward.  I'm doing:
> 
> use strict;
> 
> while(<>) {
> 
> 	$_ = substr $_, 1, -2; 	#	Remove first and last ", and remove
> 	                        #	the \n at the same time
> 				#
> 
> 	s/\|/-/g;	        # 	Change embedded "|" into "-"
> 
> 	my @words = split(/\",\"/,$_,-1);	# split on the remaining ","
> 
> 	print join("|", @words),"\n";
> }
> 
> But it's take what seems like a long time to run (like 15 mins).  I'd think this would be an ideal use for Perl, and could rip through the file lickedy split.
> 
> I'm I doing something costly in the script above that is making it run so slow?
> 
> Thanks
> Jay
> 
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk


More information about the Chicago-talk mailing list