[Thousand-Oaks-pm] CSV code example

Chuck Hardin chardin at gmail.com
Sat Sep 15 12:28:37 PDT 2012


Beats me, jefe.  Like I said, I posted this on Barry Brevik's behalf.  I didn't look at it very hard.

Best,
CCH




On Sep 15, 2012, at 10:15 AM, Tony wrote:

> 
> 
> Chuck,  how should the 4th line look?  formatting wise.
> 
> Thanks,
> 
> Tony
> 
> 
> On Saturday 15 September 2012 06:26:18 Chuck Hardin wrote:
>> The following is a code example from TO-PM member Barry Brevik:
>> 
>> Since we did not meet this month, let me throw some code at 'ya.
>> 
>> I frequently have to make one-off utilities to parse customer CSV files.
>> I try to avoid using modules for really simple things, so I use the
>> subroutine shown below. I have received some pretty weird formatting,
>> and this code handles most of them. Keep in mind that the rows in the
>> __DATA__ section represent actual formatting of files that I have
>> received.
>> 
>> P.S. notice that the 4th row fails to parse... I have not dealt with it
>> yet. Anyone with improvements or nasty comments should go ahead and
>> post!
>> 
>> #
>> # parseCSV.pl
>> #
>> # This is a test wrapper for the parseCSV() subroutine.
>> #
>> use strict;
>> use warnings;
>> 
>> # Un-buffer STDOUT.
>> select((select(STDOUT), $| = 1)[0]);
>> 
>> while (<DATA>)
>> {
>>  print "csvLine before parseCSV: $_\n\n";
>>  my @csvArray = parseCSV($_);
>>  print "[$_]\n" foreach (@csvArray);
>>  print "\n\n";
>> }
>> 
>> #----------------------------------------------------------
>> # CALL with a CSV line.
>> #
>> # This routine parses a single CSV line and handles ',' chars embedded
>> # in fields as well as extraneous spaces in between dbl quoted fields.
>> # It is also resistant to extra dbl quotes within dbl quoted fields,
>> # but it will remove them.
>> #
>> sub parseCSV
>> {
>> my @columns = ();
>> 
>> if (my $csvline = shift)
>> {
>>   # If the CSV line has any portion with 2 or more sequential commas
>> ','
>>   # then replace the commas with pipe '|' characters.
>>   while ($csvline =~ /(,{2,})/)
>>   {
>>     my $commas = $1;
>>     my $pipes  = '|' x length($1);
>>     $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/;
>>   }
>> 
>>   # If there are any commas embedded in the CSV quoted fields, replace
>> them
>>   # with pipe '|' characters.
>>   $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g;
>>   @columns = split ',', $csvline;   # Split the quoted fields at the
>> remaining commas.
>>   s/\|/,/g       foreach @columns;  # Replace pipe characters with
>> commas.
>>   s/\x22//g      foreach @columns;  # Remove double quotes from each
>> column.
>>   s/^\s+|\s+$//g foreach @columns;  # Remove leading and trailing
>> white space from each column.
>> }
>> 
>> return @columns;
>> }
>> 
>> __DATA__
>> "col 1","col 2","col 3","col 4"
>> "col 1"",""col,,,,,,2"",""col ,,3","col "4""
>> "col 1","col  '2'",col  '3'  ,"col, 4"
>> "col 1,a,1",col 2,"col,3,b",col 4,
> 
> _______________________________________________
> ThousandOaks.pm - Thousand Oaks Perl Mongers
> Website:      http://thousand-oaks.pm.org/
> Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.pm.org/pipermail/thousand-oaks-pm/attachments/20120915/5bbb843c/attachment.bin>


More information about the Thousand-Oaks-pm mailing list