SPUG: PS/PDF parsing
scott at sabmail.rresearch.com
Fri Jun 8 17:12:34 CDT 2001
On Fri, Jun 08, 2001 at 12:13:54PM -0700, El JoPe Magnifico wrote:
> Are there any modules that do parsing of Postscript/PDF files?
> I looked on CPAN and found lots that output Postscript or PDF,
> but none that do parsing, outside of the little bit necessary
> to add a stamp or cut marks to each page.
> Context of the question: Someone in my office was looking for a
> PS-to-HTML converter. There are a couple PS-to-text converters
> (using Ghostscript under the hood), but those strip most of the
> layout and formatting info that would be useful for conversion
> to something richer than plaintext.
That's going to be rather tricky since PS is a programming language
that can be abused...you need a language interpreter (e.g.
Ghostscript) to make sense of out of it. I don't know what kind of
drivers there are for Ghostscript, but it seems like that's what you'd
need - some driver for GS that does some fancier conversions to some
sort of page layout language (e.g. TeX, troff, ...), then format to
HTML from that. Another problem is going to be that since it's a
page layout sort of language, it's probably assuming a larger, fixed
size piece of paper for output and going to a language where the user
is expect to be able to resize the device could present problems...
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list