SPUG: PS/PDF parsing

Fri Jun 8 20:11:48 CDT 2001

On Fri, 8 Jun 2001, Richard Seymour UW-NPL wrote:
> Don't you consider PDF pretty much already "web-enabled",
> since (almost) all browsers carry around Acrobat Reader?

Personally, yes.  I should point out that the original request that
spurred me on to this line of questioning is _not_ my sole purpose
for asking.  As usual, it got me thinking off on a tangent.  =)

There are various other useful reasons to want to be able to parse
PS/PDF content, besides converting to HTML, which actually interests
me very little: Batch-modifying PDF's, cataloguing content in PDF's
found by a crawler, as an import filter for a layout program, etc.

On Fri, 8 Jun 2001, Scott Blachowicz wrote:
> That's going to be rather tricky since PS is a programming language
> that can be abused...you need a language interpreter (e.g.
> Ghostscript) to make sense of out of it.

First, that'd be cheatin'.  =)

Second, COME ON, THIS IS PERL!  If Damian hasn't taught you by now
that it can parse anything on the planet, then I'm not going to try.

Yes, it's a difficult problem, but therefore also an interesting one.
I just want to know whether anyone has already made headway on it.
  -jp

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/