SPUG: PS/PDF parsing

Richard Seymour UW-NPL SEYMOUR at npl.npl.washington.edu
Fri Jun 8 17:40:42 CDT 2001


Am i missing something here?

On Fri, Jun 08, 2001 at 12:13:54PM -0700, El JoPe Magnifico wrote:
> Are there any modules that do parsing of Postscript/PDF files?

Don't you consider PDF pretty much already "web-enabled", since
 (almost) all browsers carry around Acrobat Reader?

> Context of the question: Someone in my office was looking for a
> PS-to-HTML converter.  There are a couple PS-to-text converters
> (using Ghostscript under the hood), but those strip most of the
> layout and formatting info that would be useful for conversion
> to something richer than plaintext.

Let's play give-the-user-what-they-asked-for (which may not be
 what they wanted... that's user education)

Since Ghostscript is capable of both GIF (old versions) and PDF output,
 simply change the PS-to-Text background script to generate either one...

IF you generate PDF, you're (effectively) done... serve the result.

If you generate GIF, then you'll need a
  ( img src="scratch.gif" )
 tag to show it as a full-page image.

The latter (gif) "solution" requires parsing the original document 
 to single-page level (or iteratively calling ghostscript with the
 page number until it returns a blank result).

The PDF ghostscript command would be something along the lines of:
   gs -sDEVICE=pdfwrite   -r72x72   -sOutputFile=-   input.ps

with the "r" being the pixels-per-inch resolution.
There's also a -g(width)x(height)  to describe page size in pixels.
-sOutputFile=|  outputs to a pipe.

it's a great tool...
--dick

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list