[Pdx-pm] reading a broken CSV file

Jeff Zucker jeff at vpservices.com
Fri Nov 21 18:03:45 CST 2003

Eric Shore Baur wrote:

Why not use DBD::CSV, which will let you query the data files with SQL 
and which handles embedded newlines just fine.


>	I doing an import from a CSV-style text file into a SQL database.
>The data is set up so that I have one set of text files with a field
>listing in them (so I know what matches up with what) and then the data
>files in a parent directory.
>	The data format looks something like this:
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text"
>	Fine... I can import that.  Unfortunatly, some of the records have
>embeded newlines in them, so you end up with something like this:
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text
>goes here
>the record
>is done"
>	... or, potentially:
>"title","some text goes
>lines","a date is next",1999/05/10,T,123,F,F,T,"more text"
>	What I've been doing is simply doing the data import - letting
>those screwed up lines fail when the SQL inserts run and then going back
>and hand entering the screwed up data (since I"ll end up with partial
>records, so I can search for the missing last field).  This is not,
>however, a very maintainable method.  (I have to re-import things when the
>data set changes, I get all new files, not just changes.)
>	Is there any neat/slick way to get this data in there on the first
>pass?  I tried using ParseWords, but I'm not sure if I utilized it to its
>fullest extent.  I briefly played with a CSV driver for DBI, but it
>couldn't handle things split over the newlines, either.
>	This was awhile ago that I did this in the first place, I'm just
>picking the project back up off the shelf, so to speak.  Although I had
>kind of figured I'd have to re-write from scratch, I didn't want to fight
>the same issues if there was an easy way out of it... any ideas?
>Pdx-pm-list mailing list
>Pdx-pm-list at mail.pm.org

More information about the Pdx-pm-list mailing list