SPUG: PS/PDF parsing
Richard Seymour UW-NPL
SEYMOUR at npl.npl.washington.edu
Fri Jun 8 17:40:42 CDT 2001
Am i missing something here?
On Fri, Jun 08, 2001 at 12:13:54PM -0700, El JoPe Magnifico wrote:
> Are there any modules that do parsing of Postscript/PDF files?
Don't you consider PDF pretty much already "web-enabled", since
(almost) all browsers carry around Acrobat Reader?
> Context of the question: Someone in my office was looking for a
> PS-to-HTML converter. There are a couple PS-to-text converters
> (using Ghostscript under the hood), but those strip most of the
> layout and formatting info that would be useful for conversion
> to something richer than plaintext.
Let's play give-the-user-what-they-asked-for (which may not be
what they wanted... that's user education)
Since Ghostscript is capable of both GIF (old versions) and PDF output,
simply change the PS-to-Text background script to generate either one...
IF you generate PDF, you're (effectively) done... serve the result.
If you generate GIF, then you'll need a
( img src="scratch.gif" )
tag to show it as a full-page image.
The latter (gif) "solution" requires parsing the original document
to single-page level (or iteratively calling ghostscript with the
page number until it returns a blank result).
The PDF ghostscript command would be something along the lines of:
gs -sDEVICE=pdfwrite -r72x72 -sOutputFile=- input.ps
with the "r" being the pixels-per-inch resolution.
There's also a -g(width)x(height) to describe page size in pixels.
-sOutputFile=| outputs to a pipe.
it's a great tool...
--dick
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list
mailing list