SPUG: PDF to HTML, or PDF extract

Sean Ruddy Sean at DigiDot.com
Thu May 17 12:38:07 CDT 2001


You can select the text tool from the toolbar and pull them out manually
with the reader as long as it is not protected (locked) in which case no
tool will be able to easily grab text (some reason it took me a very long
time to figure that out).

A company that I used to work for http://www.verity.com has a tool called
Export which will do a good conversion of a > 6 mg pdf (or 249 other mime
types) into html or even XML now.  It cost a bunch though...

Sean Ruddy
206-369-7188

-----Original Message-----
From: owner-spug-list at pm.org [mailto:owner-spug-list at pm.org]On Behalf Of
Lorraine Johnson
Sent: Thursday, May 17, 2001 9:47 AM
To: Seattle Perl Users Group
Subject: RE: SPUG: PDF to HTML, or PDF extract


(Maybe a stupid question, but...)  Can you get the file in its original
format?  The pdf format is usually an end/output product, not meant to be
modified.

Other than that, maybe Acrobat (not Acrobat Reader) will allow you to grab
text?

L

-----Original Message-----
From: Richard Wood [mailto:wildwood_players at yahoo.com]
Sent: Thursday, May 17, 2001 9:00 AM
To: Seattle Perl Users Group
Subject: SPUG: PDF to HTML, or PDF extract


I have a 1248 page pdf file.  I am only interested in
740 pages of the file (pp. 142 - 931).  These pages
contain definitions of roughly 600 alarm messages.
The alarms are all bookmarked.  I would like to do one
of the following:

link directly to each specific bookmarked alarm

(I know you can link to a specific page by using:
somefile.pdf#page=142 but can this be done with
bookmarks?)

convert the entire document to HTML

extract the 740 pages into a smaller pdf file

extract each bookmarked section into an individual pdf
or html file.

It seems to me that the perl world would be amuck with
pdf tools since perl is such a wonderful pattern
recognition and text manipulation language.

But, I have looked on CPAN, Monks, and the internet in
general and have not found any tools to do this.

I know that adobe has a site where you can convert pdf
to html and I have tried it.  But there appears to be
some file size limitation somewhere either on their
site or on my mail server that keeps this from
working.  The file is roughly 6-Meg.

Any ideas?

Regards,

Rich Wood

=====
Richard O. Wood
Wildwood IT Consultants, Inc.
wildwood_players at yahoo.com
425.941.9437

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/




 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list