SPUG: PS/PDF parsing

Scott Blachowicz scott at sabmail.rresearch.com
Fri Jun 8 17:12:34 CDT 2001


On Fri, Jun 08, 2001 at 12:13:54PM -0700, El JoPe Magnifico wrote:
> Are there any modules that do parsing of Postscript/PDF files?
> I looked on CPAN and found lots that output Postscript or PDF,
> but none that do parsing, outside of the little bit necessary
> to add a stamp or cut marks to each page.
> 
> Context of the question: Someone in my office was looking for a
> PS-to-HTML converter.  There are a couple PS-to-text converters
> (using Ghostscript under the hood), but those strip most of the
> layout and formatting info that would be useful for conversion
> to something richer than plaintext.

That's going to be rather tricky since PS is a programming language
that can be abused...you need a language interpreter (e.g.
Ghostscript) to make sense of out of it.  I don't know what kind of
drivers there are for Ghostscript, but it seems like that's what you'd
need - some driver for GS that does some fancier conversions to some
sort of page layout language (e.g. TeX, troff, ...), then format to
HTML from that.  Another problem is going to be that since it's a
page layout sort of language, it's probably assuming a larger, fixed
size piece of paper for output and going to a language where the user
is expect to be able to resize the device could present problems...

Scott

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list