moonbeam at catmanor.com
Fri Jan 24 19:27:14 CST 2003
> Regex problem. A user wants to cut-n-paste text from a WORD doc into a
>browser form. The WORD doc of course has nasty non-alpha characters in
>it that I'd like to remove in most cases, perhaps replace with other
>text in a few others. Here's an example:
> Your kids complain, Â~SNot that again!Â~T
>BTW,  was a "bullet" in WORD.
>What kind of regex would help clean this text, or is there a better
>solution to the problem?
I use a little program called 'antiword' that converts the word binary
into a text or postscript. The postscript form preserves the images
and format. I used this program in a perl cgi script to connect via smb
to an nt server and allow the web user to view the documents.
It would be really cool if we had a perl port of this!
see ... http://www.winfield.demon.nl/index.html
More information about the spug-list