From chardin at gmail.com Mon Sep 3 19:11:15 2012 From: chardin at gmail.com (Chuck Hardin) Date: Mon, 3 Sep 2012 19:11:15 -0700 Subject: [Thousand-Oaks-pm] Thousand Oaks Perl Mongers meets on Wednesday, September 12! Message-ID: <537A9ACF-DD50-4CBC-9310-CDE36935095B@gmail.com> The next meeting of the Thousand Oaks Perl Mongers is a week from this coming Wednesday. Full particulars are, as usual, available at http://thousand-oaks.pm.org/ for your reference. If you'd like to give a presentation, you're in luck! We have two open slots, so it's first come, first serve. Be bold! We're friendly, and we enjoy learning. See you then! Best, CCH From daoswald at gmail.com Tue Sep 4 21:04:53 2012 From: daoswald at gmail.com (David Oswald) Date: Tue, 4 Sep 2012 21:04:53 -0700 Subject: [Thousand-Oaks-pm] Possible topics Message-ID: If nobody steps up to give a presentation maybe we could spend some time coming up with some project we could work together on over the next few months as we find a little time here and there: A module idea, or even working on taking over a couple of modules that have been abandoned and deserve sprucing up.... some website that would benefit the Perl community somehow... who knows. I know the Brimingham Perl Mongers are the ones behind cpantesters.org. That's a pretty ambitious project, and we're a small group. But we possibly we could come up with something appropriate. ...just a thought. It might be fun. -- David Oswald daoswald at gmail.com From daoswald at gmail.com Sun Sep 9 17:57:32 2012 From: daoswald at gmail.com (David Oswald) Date: Sun, 9 Sep 2012 17:57:32 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: References: Message-ID: On Tue, Sep 4, 2012 at 9:04 PM, David Oswald wrote: > If nobody steps up to give a presentation maybe we could spend some > time coming up with some project we could work together on over the > next few months as we find a little time here and there: A module > idea, or even working on taking over a couple of modules that have > been abandoned and deserve sprucing up.... some website that would > benefit the Perl community somehow... who knows. ^---------------- That was me.... but now regrettably I have a conflict: I found out a family member is coming to town, and have to pick him up from LAX around 6:45pm. Traffic would have to move like never before if I'm to make it out to Thousand Oaks by 7:00pm Wednesday. I don't think a presenter ever materialized for September. Dave -- David Oswald daoswald at gmail.com From chardin at gmail.com Sun Sep 9 18:11:58 2012 From: chardin at gmail.com (Chuck Hardin) Date: Sun, 9 Sep 2012 18:11:58 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: References: Message-ID: Hm, it looks as if we don't have much material yet. Will someone step boldly forth? Failing that, should we skip a month? Best, CCH On Sep 9, 2012, at 5:57 PM, David Oswald wrote: > On Tue, Sep 4, 2012 at 9:04 PM, David Oswald wrote: >> If nobody steps up to give a presentation maybe we could spend some >> time coming up with some project we could work together on over the >> next few months as we find a little time here and there: A module >> idea, or even working on taking over a couple of modules that have >> been abandoned and deserve sprucing up.... some website that would >> benefit the Perl community somehow... who knows. > > ^---------------- That was me.... but now regrettably I have a > conflict: I found out a family member is coming to town, and have to > pick him up from LAX around 6:45pm. Traffic would have to move like > never before if I'm to make it out to Thousand Oaks by 7:00pm > Wednesday. > > I don't think a presenter ever materialized for September. > > Dave > > -- > > David Oswald > daoswald at gmail.com > > _______________________________________________ > ThousandOaks.pm - Thousand Oaks Perl Mongers > Website: http://thousand-oaks.pm.org/ > Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From BBrevik at StellarMicro.com Mon Sep 10 11:39:15 2012 From: BBrevik at StellarMicro.com (Barry Brevik) Date: Mon, 10 Sep 2012 11:39:15 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: References: Message-ID: <995C029A48947048B3280035B3B5433C015F4ED3@Stellar2k3-Exch.STELLARMICRO.LOCAL> Hey Dave, I saw your DotCloud article on blogs.perl.org; it was mentioned in Gabor Szabo's newsletter. Cool! -----Original Message----- From: Thousand-Oaks-pm [mailto:thousand-oaks-pm-bounces+bbrevik=stellarmicro.com at pm.org] On Behalf Of David Oswald Sent: Sunday, September 09, 2012 5:58 PM To: ThousandOaks.pm Subject: Re: [Thousand-Oaks-pm] Possible topics On Tue, Sep 4, 2012 at 9:04 PM, David Oswald wrote: > If nobody steps up to give a presentation maybe we could spend some > time coming up with some project we could work together on over the > next few months as we find a little time here and there: A module > idea, or even working on taking over a couple of modules that have > been abandoned and deserve sprucing up.... some website that would > benefit the Perl community somehow... who knows. ^---------------- That was me.... but now regrettably I have a conflict: I found out a family member is coming to town, and have to pick him up from LAX around 6:45pm. Traffic would have to move like never before if I'm to make it out to Thousand Oaks by 7:00pm Wednesday. I don't think a presenter ever materialized for September. Dave -- David Oswald daoswald at gmail.com _______________________________________________ ThousandOaks.pm - Thousand Oaks Perl Mongers Website: http://thousand-oaks.pm.org/ Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm From chardin at gmail.com Tue Sep 11 12:28:29 2012 From: chardin at gmail.com (Chuck Hardin) Date: Tue, 11 Sep 2012 12:28:29 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: References: Message-ID: Never mind a topic; is anyone planning to attend? On Sun, Sep 9, 2012 at 6:11 PM, Chuck Hardin wrote: > Hm, it looks as if we don't have much material yet. Will someone step boldly forth? Failing that, should we skip a month? > > Best, > CCH > > > > > On Sep 9, 2012, at 5:57 PM, David Oswald wrote: > >> On Tue, Sep 4, 2012 at 9:04 PM, David Oswald wrote: >>> If nobody steps up to give a presentation maybe we could spend some >>> time coming up with some project we could work together on over the >>> next few months as we find a little time here and there: A module >>> idea, or even working on taking over a couple of modules that have >>> been abandoned and deserve sprucing up.... some website that would >>> benefit the Perl community somehow... who knows. >> >> ^---------------- That was me.... but now regrettably I have a >> conflict: I found out a family member is coming to town, and have to >> pick him up from LAX around 6:45pm. Traffic would have to move like >> never before if I'm to make it out to Thousand Oaks by 7:00pm >> Wednesday. >> >> I don't think a presenter ever materialized for September. >> >> Dave >> >> -- >> >> David Oswald >> daoswald at gmail.com >> >> _______________________________________________ >> ThousandOaks.pm - Thousand Oaks Perl Mongers >> Website: http://thousand-oaks.pm.org/ >> Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm > From BBrevik at StellarMicro.com Tue Sep 11 13:38:21 2012 From: BBrevik at StellarMicro.com (Barry Brevik) Date: Tue, 11 Sep 2012 13:38:21 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: References: Message-ID: <995C029A48947048B3280035B3B5433C015F50AD@Stellar2k3-Exch.STELLARMICRO.LOCAL> I'm planning to attend if it is not just me . Barry Brevik From chardin at gmail.com Tue Sep 11 20:48:03 2012 From: chardin at gmail.com (Chuck Hardin) Date: Tue, 11 Sep 2012 20:48:03 -0700 Subject: [Thousand-Oaks-pm] Possible topics In-Reply-To: <995C029A48947048B3280035B3B5433C015F50AD@Stellar2k3-Exch.STELLARMICRO.LOCAL> References: <995C029A48947048B3280035B3B5433C015F50AD@Stellar2k3-Exch.STELLARMICRO.LOCAL> Message-ID: Nobody is speaking up, so let's call this month a wash and try for October. Website updated accordingly. Best, CCH On Sep 11, 2012, at 1:38 PM, Barry Brevik wrote: > I'm planning to attend if it is not just me . > > Barry Brevik > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From rbaumann at fireblood.com Fri Sep 14 07:30:40 2012 From: rbaumann at fireblood.com (rbaumann at fireblood.com) Date: Fri, 14 Sep 2012 08:30:40 -0600 Subject: [Thousand-Oaks-pm] Thousand-Oaks-pm Digest, Vol 93, Issue 4 In-Reply-To: References: Message-ID: Hi, all, I showed up at ValueClick's front door at a few minutes before 7 pm on Tuesday and found no one there. I did see that there had been some exchange about possibly not meeting this month but did not see a final conclusion on that. Normally does one show up at the main door to ValueClick, i.e. up on the second floor across from the elevators to attend meetings? Or is the meeting held elsewhere? I looked all around ValueClick's premises and did not see any other gatherings of people who resembled perl mongers. Regards, Richard Baumann From chardin at gmail.com Fri Sep 14 07:53:54 2012 From: chardin at gmail.com (Chuck Hardin) Date: Fri, 14 Sep 2012 07:53:54 -0700 Subject: [Thousand-Oaks-pm] Thousand-Oaks-pm Digest, Vol 93, Issue 4 In-Reply-To: References: Message-ID: <5F83EB76-CF7D-499C-8393-3708C0FE6AF4@gmail.com> We did send a cancellation notice: http://mail.pm.org/pipermail/thousand-oaks-pm/2012-September/000701.html Sorry if you didn't receive it. We do intend to meet in October. Best, CCH On Sep 14, 2012, at 7:30 AM, rbaumann at fireblood.com wrote: > Hi, all, > > I showed up at ValueClick's front door at a few minutes before 7 pm on Tuesday and found no one there. I did see that there had been some exchange about possibly not meeting this month but did not see a final conclusion on that. Normally does one show up at the main door to ValueClick, i.e. up on the second floor across from the elevators to attend meetings? Or is the meeting held elsewhere? I looked all around ValueClick's premises and did not see any other gatherings of people who resembled perl mongers. > > Regards, > Richard Baumann > > _______________________________________________ > ThousandOaks.pm - Thousand Oaks Perl Mongers > Website: http://thousand-oaks.pm.org/ > Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From chardin at gmail.com Sat Sep 15 06:26:18 2012 From: chardin at gmail.com (Chuck Hardin) Date: Sat, 15 Sep 2012 06:26:18 -0700 Subject: [Thousand-Oaks-pm] CSV code example Message-ID: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> The following is a code example from TO-PM member Barry Brevik: Since we did not meet this month, let me throw some code at 'ya. I frequently have to make one-off utilities to parse customer CSV files. I try to avoid using modules for really simple things, so I use the subroutine shown below. I have received some pretty weird formatting, and this code handles most of them. Keep in mind that the rows in the __DATA__ section represent actual formatting of files that I have received. P.S. notice that the 4th row fails to parse... I have not dealt with it yet. Anyone with improvements or nasty comments should go ahead and post! # # parseCSV.pl # # This is a test wrapper for the parseCSV() subroutine. # use strict; use warnings; # Un-buffer STDOUT. select((select(STDOUT), $| = 1)[0]); while () { print "csvLine before parseCSV: $_\n\n"; my @csvArray = parseCSV($_); print "[$_]\n" foreach (@csvArray); print "\n\n"; } #---------------------------------------------------------- # CALL with a CSV line. # # This routine parses a single CSV line and handles ',' chars embedded # in fields as well as extraneous spaces in between dbl quoted fields. # It is also resistant to extra dbl quotes within dbl quoted fields, # but it will remove them. # sub parseCSV { my @columns = (); if (my $csvline = shift) { # If the CSV line has any portion with 2 or more sequential commas ',' # then replace the commas with pipe '|' characters. while ($csvline =~ /(,{2,})/) { my $commas = $1; my $pipes = '|' x length($1); $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/; } # If there are any commas embedded in the CSV quoted fields, replace them # with pipe '|' characters. $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g; @columns = split ',', $csvline; # Split the quoted fields at the remaining commas. s/\|/,/g foreach @columns; # Replace pipe characters with commas. s/\x22//g foreach @columns; # Remove double quotes from each column. s/^\s+|\s+$//g foreach @columns; # Remove leading and trailing white space from each column. } return @columns; } __DATA__ "col 1","col 2","col 3","col 4" "col 1"",""col,,,,,,2"",""col ,,3","col "4"" "col 1","col '2'",col '3' ,"col, 4" "col 1,a,1",col 2,"col,3,b",col 4, -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From tony at metracom.com Sat Sep 15 10:15:57 2012 From: tony at metracom.com (Tony) Date: Sat, 15 Sep 2012 10:15:57 -0700 Subject: [Thousand-Oaks-pm] CSV code example In-Reply-To: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> References: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> Message-ID: <201209151015.58099.tony@metracom.com> Chuck, how should the 4th line look? formatting wise. Thanks, Tony On Saturday 15 September 2012 06:26:18 Chuck Hardin wrote: > The following is a code example from TO-PM member Barry Brevik: > > Since we did not meet this month, let me throw some code at 'ya. > > I frequently have to make one-off utilities to parse customer CSV files. > I try to avoid using modules for really simple things, so I use the > subroutine shown below. I have received some pretty weird formatting, > and this code handles most of them. Keep in mind that the rows in the > __DATA__ section represent actual formatting of files that I have > received. > > P.S. notice that the 4th row fails to parse... I have not dealt with it > yet. Anyone with improvements or nasty comments should go ahead and > post! > > # > # parseCSV.pl > # > # This is a test wrapper for the parseCSV() subroutine. > # > use strict; > use warnings; > > # Un-buffer STDOUT. > select((select(STDOUT), $| = 1)[0]); > > while () > { > print "csvLine before parseCSV: $_\n\n"; > my @csvArray = parseCSV($_); > print "[$_]\n" foreach (@csvArray); > print "\n\n"; > } > > #---------------------------------------------------------- > # CALL with a CSV line. > # > # This routine parses a single CSV line and handles ',' chars embedded > # in fields as well as extraneous spaces in between dbl quoted fields. > # It is also resistant to extra dbl quotes within dbl quoted fields, > # but it will remove them. > # > sub parseCSV > { > my @columns = (); > > if (my $csvline = shift) > { > # If the CSV line has any portion with 2 or more sequential commas > ',' > # then replace the commas with pipe '|' characters. > while ($csvline =~ /(,{2,})/) > { > my $commas = $1; > my $pipes = '|' x length($1); > $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/; > } > > # If there are any commas embedded in the CSV quoted fields, replace > them > # with pipe '|' characters. > $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g; > @columns = split ',', $csvline; # Split the quoted fields at the > remaining commas. > s/\|/,/g foreach @columns; # Replace pipe characters with > commas. > s/\x22//g foreach @columns; # Remove double quotes from each > column. > s/^\s+|\s+$//g foreach @columns; # Remove leading and trailing > white space from each column. > } > > return @columns; > } > > __DATA__ > "col 1","col 2","col 3","col 4" > "col 1"",""col,,,,,,2"",""col ,,3","col "4"" > "col 1","col '2'",col '3' ,"col, 4" > "col 1,a,1",col 2,"col,3,b",col 4, From tony at metracom.com Sat Sep 15 10:53:28 2012 From: tony at metracom.com (Tony) Date: Sat, 15 Sep 2012 10:53:28 -0700 Subject: [Thousand-Oaks-pm] CSV code example In-Reply-To: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> References: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> Message-ID: <201209151053.29032.tony@metracom.com> Hi Chuck, I had some time while eating lunch today. Tony G #!/usr/bin/perl use strict; while () { chomp; print "csvLine before parseCSV: $_\n"; print &parseCSV($_); print "\n\n"; } sub parseCSV { my ($line) = @_; my(@line,$columns,$l); @line = split(/","|",|,"/); foreach $l ( @line ) { $l =~ s/"//g; $l =~ s/^\s+|\s+$//; print "[$l]\n"; } return $columns; } __DATA__ "col 1","col 2","col 3","col 4" "col 1"",""col,,,,,,2"",""col ,,3","col "4"" "col 1","col '2'",col '3' ,"col, 4" "col 1,a,1",col 2,"col,3,b",col 4, On Saturday 15 September 2012 06:26:18 Chuck Hardin wrote: > The following is a code example from TO-PM member Barry Brevik: > > Since we did not meet this month, let me throw some code at 'ya. > > I frequently have to make one-off utilities to parse customer CSV files. > I try to avoid using modules for really simple things, so I use the > subroutine shown below. I have received some pretty weird formatting, > and this code handles most of them. Keep in mind that the rows in the > __DATA__ section represent actual formatting of files that I have > received. > > P.S. notice that the 4th row fails to parse... I have not dealt with it > yet. Anyone with improvements or nasty comments should go ahead and > post! > > # > # parseCSV.pl > # > # This is a test wrapper for the parseCSV() subroutine. > # > use strict; > use warnings; > > # Un-buffer STDOUT. > select((select(STDOUT), $| = 1)[0]); > > while () > { > print "csvLine before parseCSV: $_\n\n"; > my @csvArray = parseCSV($_); > print "[$_]\n" foreach (@csvArray); > print "\n\n"; > } > > #---------------------------------------------------------- > # CALL with a CSV line. > # > # This routine parses a single CSV line and handles ',' chars embedded > # in fields as well as extraneous spaces in between dbl quoted fields. > # It is also resistant to extra dbl quotes within dbl quoted fields, > # but it will remove them. > # > sub parseCSV > { > my @columns = (); > > if (my $csvline = shift) > { > # If the CSV line has any portion with 2 or more sequential commas > ',' > # then replace the commas with pipe '|' characters. > while ($csvline =~ /(,{2,})/) > { > my $commas = $1; > my $pipes = '|' x length($1); > $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/; > } > > # If there are any commas embedded in the CSV quoted fields, replace > them > # with pipe '|' characters. > $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g; > @columns = split ',', $csvline; # Split the quoted fields at the > remaining commas. > s/\|/,/g foreach @columns; # Replace pipe characters with > commas. > s/\x22//g foreach @columns; # Remove double quotes from each > column. > s/^\s+|\s+$//g foreach @columns; # Remove leading and trailing > white space from each column. > } > > return @columns; > } > > __DATA__ > "col 1","col 2","col 3","col 4" > "col 1"",""col,,,,,,2"",""col ,,3","col "4"" > "col 1","col '2'",col '3' ,"col, 4" > "col 1,a,1",col 2,"col,3,b",col 4, From chardin at gmail.com Sat Sep 15 12:28:37 2012 From: chardin at gmail.com (Chuck Hardin) Date: Sat, 15 Sep 2012 12:28:37 -0700 Subject: [Thousand-Oaks-pm] CSV code example In-Reply-To: <201209151015.58099.tony@metracom.com> References: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> <201209151015.58099.tony@metracom.com> Message-ID: <338080B4-2DBD-4F37-BEC5-0FA383E1C9F3@gmail.com> Beats me, jefe. Like I said, I posted this on Barry Brevik's behalf. I didn't look at it very hard. Best, CCH On Sep 15, 2012, at 10:15 AM, Tony wrote: > > > Chuck, how should the 4th line look? formatting wise. > > Thanks, > > Tony > > > On Saturday 15 September 2012 06:26:18 Chuck Hardin wrote: >> The following is a code example from TO-PM member Barry Brevik: >> >> Since we did not meet this month, let me throw some code at 'ya. >> >> I frequently have to make one-off utilities to parse customer CSV files. >> I try to avoid using modules for really simple things, so I use the >> subroutine shown below. I have received some pretty weird formatting, >> and this code handles most of them. Keep in mind that the rows in the >> __DATA__ section represent actual formatting of files that I have >> received. >> >> P.S. notice that the 4th row fails to parse... I have not dealt with it >> yet. Anyone with improvements or nasty comments should go ahead and >> post! >> >> # >> # parseCSV.pl >> # >> # This is a test wrapper for the parseCSV() subroutine. >> # >> use strict; >> use warnings; >> >> # Un-buffer STDOUT. >> select((select(STDOUT), $| = 1)[0]); >> >> while () >> { >> print "csvLine before parseCSV: $_\n\n"; >> my @csvArray = parseCSV($_); >> print "[$_]\n" foreach (@csvArray); >> print "\n\n"; >> } >> >> #---------------------------------------------------------- >> # CALL with a CSV line. >> # >> # This routine parses a single CSV line and handles ',' chars embedded >> # in fields as well as extraneous spaces in between dbl quoted fields. >> # It is also resistant to extra dbl quotes within dbl quoted fields, >> # but it will remove them. >> # >> sub parseCSV >> { >> my @columns = (); >> >> if (my $csvline = shift) >> { >> # If the CSV line has any portion with 2 or more sequential commas >> ',' >> # then replace the commas with pipe '|' characters. >> while ($csvline =~ /(,{2,})/) >> { >> my $commas = $1; >> my $pipes = '|' x length($1); >> $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/; >> } >> >> # If there are any commas embedded in the CSV quoted fields, replace >> them >> # with pipe '|' characters. >> $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g; >> @columns = split ',', $csvline; # Split the quoted fields at the >> remaining commas. >> s/\|/,/g foreach @columns; # Replace pipe characters with >> commas. >> s/\x22//g foreach @columns; # Remove double quotes from each >> column. >> s/^\s+|\s+$//g foreach @columns; # Remove leading and trailing >> white space from each column. >> } >> >> return @columns; >> } >> >> __DATA__ >> "col 1","col 2","col 3","col 4" >> "col 1"",""col,,,,,,2"",""col ,,3","col "4"" >> "col 1","col '2'",col '3' ,"col, 4" >> "col 1,a,1",col 2,"col,3,b",col 4, > > _______________________________________________ > ThousandOaks.pm - Thousand Oaks Perl Mongers > Website: http://thousand-oaks.pm.org/ > Mailing list: http://mail.pm.org/mailman/listinfo/thousand-oaks-pm -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From daoswald at gmail.com Sat Sep 15 15:54:38 2012 From: daoswald at gmail.com (David Oswald) Date: Sat, 15 Sep 2012 15:54:38 -0700 Subject: [Thousand-Oaks-pm] CSV code example In-Reply-To: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> References: <52AB34E4-A320-43CE-8F3E-05807FCE795B@gmail.com> Message-ID: On Sat, Sep 15, 2012 at 6:26 AM, Chuck Hardin wrote: > The following is a code example from TO-PM member Barry Brevik: > > Since we did not meet this month, let me throw some code at 'ya. > > I frequently have to make one-off utilities to parse customer CSV files. > I try to avoid using modules for really simple things, so I use the > subroutine shown below. I have received some pretty weird formatting, > and this code handles most of them. Keep in mind that the rows in the > __DATA__ section represent actual formatting of files that I have > received. > > P.S. notice that the 4th row fails to parse... I have not dealt with it > yet. Anyone with improvements or nasty comments should go ahead and > post! I tend to use modules to solve things when I have a suspicion that they're deceptive in their simplicity. I figure that the module has withstood the pressure of thousands of users, evolved through bug reports, unit tests, and so on, and emerged from that firey forge a better solution than something I might come up with. Nevertheless, such solutions are not perfect, and the do come at the cost of additional baggage in your scripts. Much of the baggage CPAN modules carry with them aims to solve problems that your specific situation doesn't have. Modules can also suffer from creeping featurism, growing to satisfy the infrequent needs of a vocal minority. But I still like the fact that the time I spend learning their API can save me time in the longrun as I re-use that knowledge from project to project. Anyway, I went ahead and plopped Barry's parser alongside Tony's parser, and added a call to Text::CSV's parser, and then displayed the results for each parse. You will see that it fails with example #2, and disagree's with Tony's result for #4. The CSV sample #2 really is broken. I understand the goal is to get it done. But it's hard to be certain what exactly the correct output *should* be. If you remove a comment from the object-instantiation line where I configure the CSV parser, it actually does produce a parse for #2, but it's probably not what you want it to be. Here's the code: use strict; use warnings; use Text::CSV; chomp( my @csv_data = ); my $csv_parser = Text::CSV->new( { binary => 1, allow_whitespace => 1, # allow_loose_escapes => 1, } ) or die "Cannot use CSV:" . Text::CSV->error_diag(); foreach my $csv_line (@csv_data) { print "csvLine before parseCSV: $csv_line\n"; local $" = '],['; my @barry_parsed = parseCSV_barry($csv_line); print "Barry's parser: [@barry_parsed]\n"; my @tony_parsed = parseCSV_tony($csv_line); print "Tony's Parser: [@tony_parsed]\n"; my @tcsv_parsed = parseCSV_tcsv( $csv_parser, $csv_line ); print "Text::CSV's Parser: [@tcsv_parsed]\n\n"; } sub parseCSV_tony { my @line = split /","|",|,"/, shift; my @fields; foreach my $l (@line) { $l =~ s/"//g; $l =~ s/^\s+|\s+$//; push @fields, $l // q{}; } return @fields; } sub parseCSV_tcsv { my ( $parser, $line ) = @_; $parser->parse($line) or do { warn '*** Malformed CSV: <<' . $parser->error_input . ">>:\n*** " . $parser->error_diag; return; }; return $parser->fields; } sub parseCSV_barry { my @columns = (); if ( my $csvline = shift ) { # If the CSV line has any portion with 2 or more sequential commas ',' # then replace the commas with pipe '|' characters. while ( $csvline =~ /(,{2,})/ ) { my $commas = $1; my $pipes = '|' x length($1); $csvline =~ s/^(.*)$commas(.*)/$1$pipes$2/; } # If there are any commas embedded in the CSV quoted fields, replace them # with pipe '|' characters. $csvline =~ s/("[^",]+?),([^",]+?")/$1|$2/g; @columns = split ',', $csvline; # Split the quoted fields at the remaining commas. s/\|/,/g foreach @columns; # Replace pipe characters with commas. s/\x22//g foreach @columns; # Remove double quotes from each column. s/^\s+|\s+$//g foreach @columns; # Remove leading and trailing white space from each column. } return @columns; } __DATA__ "col 1","col 2","col 3","col 4" "col 1"",""col,,,,,,2"",""col ,,3","col "4"" "col 1","col '2'",col '3' ,"col, 4" "col 1,a,1",col 2,"col,3,b",col 4, ****** And the output ******* $ ./mytest.pl csvLine before parseCSV: "col 1","col 2","col 3","col 4" Barry's parser: [col 1],[col 2],[col 3],[col 4] Tony's Parser: [col 1],[col 2],[col 3],[col 4] Text::CSV's Parser: [col 1],[col 2],[col 3],[col 4] csvLine before parseCSV: "col 1"",""col,,,,,,2"",""col ,,3","col "4"" Barry's parser: [col 1],[col,,,,,,2],[col ,,3],[col 4] Tony's Parser: [col 1],[col,,,,,,2],[col ,,3],[col 4] *** Malformed CSV: <<"col 1"",""col,,,,,,2"",""col ,,3","col "4"">>: *** EIQ - QUO character not allowed at ./mytest.pl line 43, line 4. Text::CSV's Parser: [] csvLine before parseCSV: "col 1","col '2'",col '3' ,"col, 4" Barry's parser: [col 1],[col '2'],[col '3'],[col, 4] Tony's Parser: [col 1],[col '2'],[col '3'],[col, 4] Text::CSV's Parser: [col 1],[col '2'],[col '3'],[col, 4] csvLine before parseCSV: "col 1,a,1",col 2,"col,3,b",col 4, Barry's parser: [col 1],[a],[1],[col 2],[col],[3],[b],[col 4] Tony's Parser: [col 1,a,1],[col 2],[col,3,b],[col 4,] Text::CSV's Parser: [col 1,a,1],[col 2],[col,3,b],[col 4],[] ****** And the output from Devel::TraceUse (to see how expensive it was to pull in Text::CSV ) ******** $ perl -d:TraceUse ./mytest.pl ...... Modules used from ./mytest.pl: 1. strict 1.07, mytest.pl line 3 [main] 2. warnings 1.13, mytest.pl line 4 [main] 3. Text::CSV 1.21, mytest.pl line 5 [main] 4. Carp 1.26, Text/CSV.pm line 5 5. Exporter 5.66, Carp.pm line 35 6. vars 1.02, Text/CSV.pm line 6 7. warnings::register 1.02, vars.pm line 7 8. Text::CSV_XS 0.91, Text/CSV.pm line 150 (eval 1) 9. DynaLoader 1.14, Text/CSV_XS.pm line 26 10. Config, DynaLoader.pm line 22 You can see that, in my case, Text::CSV is using the Text::CSV_XS back-end. That only happens if you've explicitly installed Text::CSV_XS on your system. Otherwise, it uses its own built-in Text::CSV_PP back-end. If we don't count pragmas, and don't count the XS plugin (which is optional), by using Text::CSV we're essentially asking Perl to include Text::CSV, Carp (which is CORE), and Exporter (which is CORE). So the only non-core dependency for Text::CSV is the module itself. The current version of Text::CSV has a 99.7% PASS rate with CPAN testers (having passed 2445 of 2451 smokers), and the six non-passes were all "UNKNOWN" (as opposed to FAIL). One observation I wanted to make: In a situation where the CSV is broken enough to not parse with Text::CSV, it might be better for the script to make an easy to find annotation in your output file, and issue a warning on-screen so that a human can come back later and find the problem. It is possible that in tweaking a solution to automatically fix one broken construct, some other construct will fail as a result. At some point calling attention loudly to the problem might be better than trying to solve it programatically. That's just one take on the issue, and obviously I don't know your specific use and need. Thanks for posting!!! Dave -- David Oswald daoswald at gmail.com From cynthia-brevik at roadrunner.com Wed Sep 19 15:05:24 2012 From: cynthia-brevik at roadrunner.com (Cynthia Brevik) Date: Wed, 19 Sep 2012 15:05:24 -0700 Subject: [Thousand-Oaks-pm] CSV files Message-ID: <014601cd96b2$dea2d1d0$9be87570$@com> Hi, I was the OP on the CSV file topic. I fell off the mailing list somehow, but I?m back. I did see all of the comments so far and want to thank you guys for being interested. Dave, thanks for the great reply and I agree with you; when something is outside your expertise it is often best to use a module (and I often do). However, if your CSV file is so broken that you need a module, you might be better off asking the author to re-send something sane. The purpose for this code was to have something light weight that would handle the surprising crap that can come out of MS Excel. Tony, to answer your question, the 4th record should parse (IMO) like this: [col 1,a,1]? [col 2]? [col,3,b]? [col 4]? []??? # Note the empty field at the end. I will be watching for any other posts that might come to the list, and I really hope to see you guys at the next meeting. Barry Brevik From cynthia-brevik at roadrunner.com Thu Sep 20 15:48:31 2012 From: cynthia-brevik at roadrunner.com (Cynthia Brevik) Date: Thu, 20 Sep 2012 15:48:31 -0700 Subject: [Thousand-Oaks-pm] CSV files In-Reply-To: <014601cd96b2$dea2d1d0$9be87570$@com> References: <014601cd96b2$dea2d1d0$9be87570$@com> Message-ID: <022701cd9782$0f62ac80$2e280580$@com> Tony, I did not pay enough attention to the code you posted. You have an almost perfect solution DATA record 4! I'm impressed since I could not come close in a reasonable length of time. Thank you for posting Barry Brevik