SPUG: PDF to HTML, or PDF extract

Sean Ruddy Sean at DigiDot.com
Thu May 17 23:24:40 CDT 2001


It will do pdf.  That is new so may not be GA.  I have in fact performed it
myself on the laptop I returned :-(

Adobe must offer some API but I'm sure they sell it for lots of money to
companies like Verity who in turn charge lots of money to large companies.

You probably can't get the key's to go under the hood for a cpan module.
Might not hurt to ask Adobe though?

Sean


-----Original Message-----
From: owner-spug-list at pm.org [mailto:owner-spug-list at pm.org]On Behalf Of
Richard Wood
Sent: Thursday, May 17, 2001 1:52 PM
To: Sean Ruddy; Seattle Perl Users Group
Subject: RE: SPUG: PDF to HTML, or PDF extract


Sean,

Thanks for the lead on Verity.  I took a look at their
site.  You seem to be correct on the $$, even for a
big company it is $$.  Also their HTML Export tool
doesn't list PDF as a format that it converts, but I
think the $$ would make that a mute point anyway.

In the meantime, I have copy and pasted the 740 pages
into a word processor and converted it to html
(Unfortunately it was heavily formatted).  Now I just
have to process the html to put in the links.

Thanks to everyone for their ideas.

Regards,

Rich Wood


--- Sean Ruddy <Sean at DigiDot.com> wrote:
> You can select the text tool from the toolbar and
> pull them out manually
> with the reader as long as it is not protected
> (locked) in which case no
> tool will be able to easily grab text (some reason
> it took me a very long
> time to figure that out).
>
> A company that I used to work for
> http://www.verity.com has a tool called
> Export which will do a good conversion of a > 6 mg
> pdf (or 249 other mime
> types) into html or even XML now.  It cost a bunch
> though...
>
> Sean Ruddy
> 206-369-7188
>
> -----Original Message-----
> From: owner-spug-list at pm.org
> [mailto:owner-spug-list at pm.org]On Behalf Of
> Lorraine Johnson
> Sent: Thursday, May 17, 2001 9:47 AM
> To: Seattle Perl Users Group
> Subject: RE: SPUG: PDF to HTML, or PDF extract
>
>
> (Maybe a stupid question, but...)  Can you get the
> file in its original
> format?  The pdf format is usually an end/output
> product, not meant to be
> modified.
>
> Other than that, maybe Acrobat (not Acrobat Reader)
> will allow you to grab
> text?
>
> L
>
> -----Original Message-----
> From: Richard Wood
> [mailto:wildwood_players at yahoo.com]
> Sent: Thursday, May 17, 2001 9:00 AM
> To: Seattle Perl Users Group
> Subject: SPUG: PDF to HTML, or PDF extract
>
>
> I have a 1248 page pdf file.  I am only interested
> in
> 740 pages of the file (pp. 142 - 931).  These pages
> contain definitions of roughly 600 alarm messages.
> The alarms are all bookmarked.  I would like to do
> one
> of the following:
>
> link directly to each specific bookmarked alarm
>
> (I know you can link to a specific page by using:
> somefile.pdf#page=142 but can this be done with
> bookmarks?)
>
> convert the entire document to HTML
>
> extract the 740 pages into a smaller pdf file
>
> extract each bookmarked section into an individual
> pdf
> or html file.
>
> It seems to me that the perl world would be amuck
> with
> pdf tools since perl is such a wonderful pattern
> recognition and text manipulation language.
>
> But, I have looked on CPAN, Monks, and the internet
> in
> general and have not found any tools to do this.
>
> I know that adobe has a site where you can convert
> pdf
> to html and I have tried it.  But there appears to
> be
> some file size limitation somewhere either on their
> site or on my mail server that keeps this from
> working.  The file is roughly 6-Meg.
>
> Any ideas?
>
> Regards,
>
> Rich Wood
>
> =====
> Richard O. Wood
> Wildwood IT Consultants, Inc.
> wildwood_players at yahoo.com
> 425.941.9437
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS:
> owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:
> ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
>  For daily traffic, use spug-list for LIST ;  for
> weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
>
>
>
>
>  - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
>      POST TO: spug-list at pm.org       PROBLEMS:
> owner-spug-list at pm.org
>       Subscriptions; Email to majordomo at pm.org:
> ACTION  LIST  EMAIL
>   Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
>  For daily traffic, use spug-list for LIST ;  for
> weekly, spug-list-digest
>   Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
>
>


=====
Richard O. Wood
Wildwood IT Consultants, Inc.
wildwood_players at yahoo.com
425.941.9437

__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/

 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/




 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
     POST TO: spug-list at pm.org       PROBLEMS: owner-spug-list at pm.org
      Subscriptions; Email to majordomo at pm.org:  ACTION  LIST  EMAIL
  Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
 For daily traffic, use spug-list for LIST ;  for weekly, spug-list-digest
  Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/





More information about the spug-list mailing list