[tpm] PDF to OCR
Liam R E Quin
liam at holoweb.net
Wed Jan 28 19:08:46 PST 2009
On Wed, 2009-01-28 at 16:49 -0500, arocker at vex.net wrote:
> Has anyone any experience using the PDF::OCR module?
As far as I have been able to tell, the only OCR programs
that are worth anything at all are commercial. I've used
Abby finereader for http://www.fromoldbooks.org/ and
can average under a minute per page including some hand
fix-ups, starting with, say, a good clean 400dpi grayscale
scan of a page.
Google books OCR is much crappier, and the gnu OCR is about
20 years behind the commercial stuff in quality.
But a lot of it depends on your source content -- some of
the packages are trained and developed with computer
printouts, for example, for OCR of business documents, and
may work well for that and really badly for other things;
I was using 19th century 9and older) books.
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org www.advogato.org
More information about the toronto-pm