SPUG: PS/PDF parsing
aaron at activox.com
Sat Jun 9 00:48:19 CDT 2001
At 06:11 PM 6/8/01 -0700, El JoPe Magnifico wrote:
>There are various other useful reasons to want to be able to parse
>PS/PDF content, besides converting to HTML, which actually interests
>me very little: Batch-modifying PDF's, cataloguing content in PDF's
>found by a crawler, as an import filter for a layout program, etc.
...or run regexes to suck data out of PDFs and put it into a database,
breaking up Big Effing PDFs into Little Bitty ones that only have the parts
you want based on the contents thereof and shove those little bits into an
access control system, converting selective PDF contents on the fly into
XML so you can feed it to another system, and the list goes on.
I also don't care a bit about layout. I usually work the other end. I get a
bunch of PDF stuff kicked to me, and somebody wants me to programmatically
rip the PDFs up and send them the good parts either as smaller PDFs or they
just want the data, please. Most of the PDF stuff on CPAN either focus on
writing PDFs, or only give you functions to learn ABOUT the file (the
labeling crap I really don't care much about). There are some commercial
products that (for a steep price) give you some limited means to do this,
but even among the commercial stuff there is no Swiss Army knife of PDFs.
How about something to rip the effing chest of a PDF open and hold its
still-beating heart up to its face as it sinks to its knees and takes a
last breath, similar to the way that Spreadsheet:ParseExcel works?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list