SPUG: PDF to HTML, or PDF extract
Richard Wood
wildwood_players at yahoo.com
Thu May 17 15:51:51 CDT 2001
Sean,
Thanks for the lead on Verity. I took a look at their
site. You seem to be correct on the $$, even for a
big company it is $$. Also their HTML Export tool
doesn't list PDF as a format that it converts, but I
think the $$ would make that a mute point anyway.
In the meantime, I have copy and pasted the 740 pages
into a word processor and converted it to html
(Unfortunately it was heavily formatted). Now I just
have to process the html to put in the links.
Thanks to everyone for their ideas.
Regards,
Rich Wood
--- Sean Ruddy <Sean at DigiDot.com> wrote:
> You can select the text tool from the toolbar and
> pull them out manually
> with the reader as long as it is not protected
> (locked) in which case no
> tool will be able to easily grab text (some reason
> it took me a very long
> time to figure that out).
>
> A company that I used to work for
> http://www.verity.com has a tool called
> Export which will do a good conversion of a > 6 mg
> pdf (or 249 other mime
> types) into html or even XML now. It cost a bunch
> though...
>
> Sean Ruddy
> 206-369-7188
>
> -----Original Message-----
> From: owner-spug-list at pm.org
> [mailto:owner-spug-list at pm.org]On Behalf Of
> Lorraine Johnson
> Sent: Thursday, May 17, 2001 9:47 AM
> To: Seattle Perl Users Group
> Subject: RE: SPUG: PDF to HTML, or PDF extract
>
>
> (Maybe a stupid question, but...) Can you get the
> file in its original
> format? The pdf format is usually an end/output
> product, not meant to be
> modified.
>
> Other than that, maybe Acrobat (not Acrobat Reader)
> will allow you to grab
> text?
>
> L
>
> -----Original Message-----
> From: Richard Wood
> [mailto:wildwood_players at yahoo.com]
> Sent: Thursday, May 17, 2001 9:00 AM
> To: Seattle Perl Users Group
> Subject: SPUG: PDF to HTML, or PDF extract
>
>
> I have a 1248 page pdf file. I am only interested
> in
> 740 pages of the file (pp. 142 - 931). These pages
> contain definitions of roughly 600 alarm messages.
> The alarms are all bookmarked. I would like to do
> one
> of the following:
>
> link directly to each specific bookmarked alarm
>
> (I know you can link to a specific page by using:
> somefile.pdf#page=142 but can this be done with
> bookmarks?)
>
> convert the entire document to HTML
>
> extract the 740 pages into a smaller pdf file
>
> extract each bookmarked section into an individual
> pdf
> or html file.
>
> It seems to me that the perl world would be amuck
> with
> pdf tools since perl is such a wonderful pattern
> recognition and text manipulation language.
>
> But, I have looked on CPAN, Monks, and the internet
> in
> general and have not found any tools to do this.
>
> I know that adobe has a site where you can convert
> pdf
> to html and I have tried it. But there appears to
> be
> some file size limitation somewhere either on their
> site or on my mail server that keeps this from
> working. The file is roughly 6-Meg.
>
> Any ideas?
>
> Regards,
>
> Rich Wood
>
> =====
> Richard O. Wood
> Wildwood IT Consultants, Inc.
> wildwood_players at yahoo.com
> 425.941.9437
>
> - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
> POST TO: spug-list at pm.org PROBLEMS:
> owner-spug-list at pm.org
> Subscriptions; Email to majordomo at pm.org:
> ACTION LIST EMAIL
> Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
> For daily traffic, use spug-list for LIST ; for
> weekly, spug-list-digest
> Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
>
>
>
>
> - - - - - - - - - - - - - - - - - - - - - - - - - -
> - - - - - - - - - - -
> POST TO: spug-list at pm.org PROBLEMS:
> owner-spug-list at pm.org
> Subscriptions; Email to majordomo at pm.org:
> ACTION LIST EMAIL
> Replace ACTION by subscribe or unsubscribe, EMAIL
> by your Email-address
> For daily traffic, use spug-list for LIST ; for
> weekly, spug-list-digest
> Seattle Perl Users Group (SPUG) Home Page:
> http://www.halcyon.com/spug/
>
>
=====
Richard O. Wood
Wildwood IT Consultants, Inc.
wildwood_players at yahoo.com
425.941.9437
__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
POST TO: spug-list at pm.org PROBLEMS: owner-spug-list at pm.org
Subscriptions; Email to majordomo at pm.org: ACTION LIST EMAIL
Replace ACTION by subscribe or unsubscribe, EMAIL by your Email-address
For daily traffic, use spug-list for LIST ; for weekly, spug-list-digest
Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
More information about the spug-list
mailing list