SPUG: PDF to HTML, or PDF extract

Richard Wood wildwood_players at yahoo.com
Thu May 17 15:51:51 CDT 2001


Sean, 

Thanks for the lead on Verity.  I took a look at their
site.  You seem to be correct on the $$, even for a
big company it is $$.  Also their HTML Export tool
doesn't list PDF as a format that it converts, but I
think the $$ would make that a mute point anyway.

In the meantime, I have copy and pasted the 740 pages
into a word processor and converted it to html
(Unfortunately it was heavily formatted).  Now I just
have to process the html to put in the links.

Thanks to everyone for their ideas.

Regards,

Rich Wood


--- Sean Ruddy <Sean at DigiDot.com> wrote:
> You can select the text tool from the toolbar and
> pull them out manually
> with the reader as long as it is not protected
> (locked) in which case no
> tool will be able to easily grab text (some reason
> it took me a very long
> time to figure that out).
> 
> A company that I used to work for
> http://www.verity.com has a tool called
> Export which will do a good conversion of a > 6 mg
> pdf (or 249 other mime
> types) into html or even XML now.  It cost a bunch
> though...
> 
> Sean Ruddy
> 206-369-7188
> 
> -----Original Message-----
> From: owner-spug-list at pm.org
> [mailto:owner-spug-list at pm.org]On Behalf Of
> Lorraine Johnson
> Sent: Thursday, May 17, 2001 9:47 AM
> To: Seattle Perl Users Group
> Subject: RE: SPUG: PDF to HTML, or PDF extract
> 
> 
> (Maybe a stupid question, but...)  Can you get the
> file in its original
> format?  The pdf format is usually an end/output
> product, not meant to be
> modified.
> 
> Other than that, maybe Acrobat (not Acrobat Reader)
> will allow you to grab
> text?
> 
> L
> 
> -----Original Message-----
> From: Richard Wood
> [mailto:wildwood_players at yahoo.com]
> Sent: Thursday, May 17, 2001 9:00 AM
> To: Seattle Perl Users Group
> Subject: SPUG: PDF to HTML, or PDF extract
> 
> 
> I have a 1248 page pdf file.  I am only interested
> in
> 740 pages of the file (pp. 142 - 931).  These pages
> contain definitions of roughly 600 alarm messages.
> The alarms are all bookmarked.  I would like to do
> one
> of the following:
> 
> link directly to each specific bookmarked alarm
> 
> (I know you can link to a specific page by using:
> somefile.pdf#page=142 but can this be done with
> bookmarks?)
> 
> convert the entire document to HTML
> 
> extract the 740 pages into a smaller pdf file
> 
> extract each bookmarked section into an individual
> pdf
> or html file.
> 
> It seems to me that the perl world would be amuck
> with
> pdf tools since perl is such a wonderful pattern
> recognition and text manipulation language.
> 
> But, I have looked on CPAN, Monks, and the internet
> in
> general and have not found any tools to do this.
> 
> I know that adobe has a site where you can convert
> pdf
> to html and I have tried it.  But there appears to
> be
> some file size limitation somewhere either on their
> site or on my mail server that keeps this from
> working.  The file is roughly 6-Meg.
> 
> Any ideas?
> 
> Regards,
> 
> Rich Wood
> 
> =====
> Richard O. Wood
> Wildwood IT Consultants, Inc.
> wildwood_players at yahoo.com
> 425.941.9437
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS:
> owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org: 
> ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
>  For daily traffic, use spug-list for LIST ;  for
> weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
> 
> 
> 
> 
>  - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS:
> owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org: 
> ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
>  For daily traffic, use spug-list for LIST ;  for
> weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
> 
> 


=====
Richard O. Wood
Wildwood IT Consultants, Inc.
wildwood_players at yahoo.com
425.941.9437

__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list