[Pdx-pm] reading a broken CSV file
jeff at vpservices.com
Fri Nov 21 18:03:45 CST 2003
Eric Shore Baur wrote:
Why not use DBD::CSV, which will let you query the data files with SQL
and which handles embedded newlines just fine.
> I doing an import from a CSV-style text file into a SQL database.
>The data is set up so that I have one set of text files with a field
>listing in them (so I know what matches up with what) and then the data
>files in a parent directory.
> The data format looks something like this:
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text"
> Fine... I can import that. Unfortunatly, some of the records have
>embeded newlines in them, so you end up with something like this:
>"title","some text","a date is next",1999/05/10,T,123,F,F,T,"more text
> ... or, potentially:
>"title","some text goes
>lines","a date is next",1999/05/10,T,123,F,F,T,"more text"
> What I've been doing is simply doing the data import - letting
>those screwed up lines fail when the SQL inserts run and then going back
>and hand entering the screwed up data (since I"ll end up with partial
>records, so I can search for the missing last field). This is not,
>however, a very maintainable method. (I have to re-import things when the
>data set changes, I get all new files, not just changes.)
> Is there any neat/slick way to get this data in there on the first
>pass? I tried using ParseWords, but I'm not sure if I utilized it to its
>fullest extent. I briefly played with a CSV driver for DBI, but it
>couldn't handle things split over the newlines, either.
> This was awhile ago that I did this in the first place, I'm just
>picking the project back up off the shelf, so to speak. Although I had
>kind of figured I'd have to re-write from scratch, I didn't want to fight
>the same issues if there was an easy way out of it... any ideas?
>Pdx-pm-list mailing list
>Pdx-pm-list at mail.pm.org
More information about the Pdx-pm-list