[Chicago-talk] Reading PDF files

Jim Thomason thomasoniii at gmail.com
Fri Jan 7 12:15:48 PST 2005


PDF::API2 is more or less the gold standard for perl PDF modules. The
newer beta releases are production quality, so you can even use one of
those.

However, I don't know how much data you can get out of a pdf. AFAIK,
there's no good way to get output from a PDF. PDF was long ago
nicknamed the "roach motel" data format - data goes in, it doesn't
come out.

Adobe has supposedly been making strides to rectify that with the
newer versions, but I haven't kept up on it.

If anything can do it, it'd be PDF::API2. There's a mailing list
specifically for it over on yahoo somewhere. I'd recommend asking over
there. The author is also -highly- responsive to
questions/suggestions/issues/etc.

-Jim.....

On Fri, 7 Jan 2005 12:10:23 -0800 (PST), Richard Solberg
<flateyjarbok at yahoo.com> wrote:
> Can anyone tell me if there are good modules for
> reading PDF files?  I am thinking that there should be
> a way to get output from a PDF file like LWP when
> getting a webpage. From there I could extract data
> using regex.  Or better yet if there are also modules
> that would work on a PDF and put data in table
> formats, something like the results from
> HTML::TableExtract.
> 
> I have looked on CPAN for PDF::stuff and it none of it
> seems to do the above.  There were things more like
> creating PDF files or copying several pages to a new
> PDF file.
> 
> Any help appreciated.
> 
> Rich Solberg
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Chicago-talk mailing list
> Chicago-talk at mail.pm.org
> http://mail.pm.org/mailman/listinfo/chicago-talk
>


More information about the Chicago-talk mailing list