[Thousand-Oaks-pm] CSV code example

Sat Sep 15 10:53:28 PDT 2012

Hi Chuck, I had some time while eating lunch today.

Tony G

#!/usr/bin/perl

use strict;

while (<DATA>) {
  chomp;
  print "csvLine before parseCSV: $_\n";
  print &parseCSV($_);
  print "\n\n";
}

sub parseCSV {
  my ($line) = @_;
  my(@line,$columns,$l);

  @line = split(/","|",|,"/);

  foreach $l ( @line ) {
     $l =~ s/"//g;
     $l =~ s/^\s+|\s+$//;
     print "[$l]\n";
  }
  return $columns;
}

__DATA__
"col 1","col 2","col 3","col 4"
"col 1"",""col,,,,,,2"",""col ,,3","col "4""
"col 1","col  '2'",col  '3'  ,"col, 4"
"col 1,a,1",col 2,"col,3,b",col 4,

On Saturday 15 September 2012 06:26:18 Chuck Hardin wrote:
> The following is a code example from TO-PM member Barry Brevik:
> 
> Since we did not meet this month, let me throw some code at 'ya.
> 
> I frequently have to make one-off utilities to parse customer CSV files.
> I try to avoid using modules for really simple things, so I use the
> subroutine shown below. I have received some pretty weird formatting,
> and this code handles most of them. Keep in mind that the rows in the
> __DATA__ section represent actual formatting of files that I have
> received.
> 
> P.S. notice that the 4th row fails to parse... I have not dealt with it
> yet. Anyone with improvements or nasty comments should go ahead and
> post!
> 
> #
> # parseCSV.pl
> #
> # This is a test wrapper for the parseCSV() subroutine.
> #
> use strict;
> use warnings;
> 
> # Un-buffer STDOUT.
> select((select(STDOUT), $| = 1)[0]);
> 
> while (<DATA>)
> {
>   print "csvLine before parseCSV: $_\n\n";
>   my @csvArray = parseCSV($_);
>   print "[$_]\n" foreach (@csvArray);
>   print "\n\n";
> }
> 
> #----------------------------------------------------------
> # CALL with a CSV line.
> #
> # This routine parses a single CSV line and handles ',' chars embedded
> # in fields as well as extraneous spaces in between dbl quoted fields.
> # It is also resistant to extra dbl quotes within dbl quoted fields,
> # but it will remove them.
> #
> sub parseCSV
> {
>  my @columns = ();
> 
>  if (my $csvline = shift)
>  {
>    # If the CSV line has any portion with 2 or more sequential commas
> ','
>    # then replace the commas with pipe '|' characters.
>    while ($csvline =~ /(,{2,})/)
>    {
>      my $commas = $1;
>      my $pipes  = '|' x length($1);
>      $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/;
>    }
> 
>    # If there are any commas embedded in the CSV quoted fields, replace
> them
>    # with pipe '|' characters.
>    $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g;
>    @columns = split ',', $csvline;   # Split the quoted fields at the
> remaining commas.
>    s/\|/,/g       foreach @columns;  # Replace pipe characters with
> commas.
>    s/\x22//g      foreach @columns;  # Remove double quotes from each
> column.
>    s/^\s+|\s+$//g foreach @columns;  # Remove leading and trailing
> white space from each column.
>  }
> 
>  return @columns;
> }
> 
> __DATA__
> "col 1","col 2","col 3","col 4"
> "col 1"",""col,,,,,,2"",""col ,,3","col "4""
> "col 1","col  '2'",col  '3'  ,"col, 4"
> "col 1,a,1",col 2,"col,3,b",col 4,