SPUG: PDF to HTML?

Kenneth W. Meyer kmeyer at aa.net
Sat Jan 15 14:29:10 CST 2000


I don't understand why one would want to convert the usually excellent PDF renderings to poor approximations in HTML, when downloading and installing the Acrobat Viewer is free and takes 5 - 10 minutes (ex download time) and requires zip computer expertise.  Moreover, it seems likely to me that a significant amount or reformatting of the conversions will be necessary, and with 100 docs to convert, that could be a real chore, unless the documents are very regular in their layout.  However.......

I will assume that Jay has a directive from management -- oh, well.  To determine the best approach, it seems necessary to first understand the degree of fidelity expected.  Is it just extracting text, or is the formatting complex?  How big are the files?  Are there graphics involved that must be preserved?  How much manual re-formatting is anticipated.?  What is the budget (seems to me that any reasonable tool cost will leverage labor costs extremely, unless maximizing contract billable hours is an objective). Etc. etc.

I followed the great link to Adobe provided by Glenn.  At that site, there is a plug-in for Acrobat 4.0.5 that apparently creates plain text from PDF's; i.e. it appears that all formatting and graphics are trashed.  Also, there are free on-line services available to do the conversion, but they are targeted to vision-impaired persons, so loading them with 100 documents for commercial ends might be a bit much.  

Acrobat 4.0 goes for about $250 at list (the .0.5 upgrade is free), I believe, but the academic price is $100 at the UW Bookstore Computer Annex (could Tim's courses qualify you?).

Finally, if one really surfs the Adobe site, the web leads one to the ubiquitous Third-party Tool sites, and here is an outfit that appears to specialize in products for converting PDF's to other formats.  They have a product called Magellan ($200 and multiple product promotion potential) that does the job with graphics et al preserved, it says there.  Also, there appears to be a free mail-in conversion service available from this site -- good for a check-out anyway:

	http://www.bcl-computers.com/

I would be interested in any feedback on what you find out and decide to do, Jay.

Ken Meyer
kmeyer at aa.net

----------
From: 	taliesin at speakeasy.org[SMTP:taliesin at speakeasy.org]
Sent: 	Saturday, January 15, 2000 2:22 AM
To: 	Jay Scherrer
Cc: 	Tim Maher
Subject: 	Re: SPUG: PDF to HTML?

On Sat, 15 Jan 2000, Jay Scherrer wrote:

>I  have a new project that I will be starting and I was wondering if
>someone could point me in the right direction? I would like to convert
>several PDF docs into html with perl. Is this possible?
>I figured I will need to start with pdftotext  and then use
>text2html.pm. But as I have a 100+ documents there must be a way of
>templating the process.

Any reason this has to be done in Perl?  Adobe already has tools to 
convert PDF to HTML, and they appear to be freebies... now, they 
all run on Winders, but if that's not an issue, I think you're set.
No reason to re-invent the wheel.... 

Check out http://access.adobe.com for details... 

-- Glenn


 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address





 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    POST TO: spug-list at pm.org        PROBLEMS: owner-spug-list at pm.org
 Seattle Perl Users Group (SPUG) Home Page: http://www.halcyon.com/spug/
 SUBSCRIBE/UNSUBSCRIBE: Replace ACTION below by subscribe or unsubscribe
        Email to majordomo at pm.org: ACTION spug-list your_address





More information about the spug-list mailing list