[Edinburgh-pm] Converting Microsoft Word Special Characters
Robert Rothenberg
robrwo at gmail.com
Sat May 19 06:13:11 PDT 2012
Searching for "msword" on CPAN gives
http://search.cpan.org/~amiri/MSWord-ToHTML-0.005/
On 19/05/12 00:56 Alex Brelsfoard wrote:
> Hi All,
>
> I was wondering if I could get some help here. I am looking for an existing
> function/method/module that will properly convert all special characters
> (like those from Microsoft Word: smart quotes, mdash, ellipses, bullet
> points, etc.) to either a matching simpler character, or an HTML entity.
>
> HTML::Entities does a close job, but it does not handle everything correctly.
>
> I need to clean this data up for use in a google product feed (xml).
>
> Here is an example of some text I am having trouble with:
> ( the +'s are actually bullet points)
> ====== begin ======
> My doctor has recommended a dream specialist, and together we are trying to
> figure out what these nightmares mean. Jump into Hidden Object action in
> Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE COMPLETE VERSION
> :DOORS OF THE MIND: INNER MYSTERIES + Dark atmosphere+ Spooky
> gameplay+ Explore a world of nightmares!
> ======= end =======
>
> And here is the output from using HTML::Entities:
> ====== begin ======
> My doctor has recommended a dream specialist, and together we are trying to
> figure out what these nightmares mean. Jump into Hidden Object action in
> Doors of the Mind â Inner Mysteries.ADVANTAGES OF THE
> COMPLETE VERSION :DOORS OF THE MIND: INNER
> MYSTERIESÂ +Â Dark atmosphere+Â Spooky
> gameplay+Â Explore a world of nightmares!
> ======= end =======
>
> Notice the extra  all over the place.
>
> Any help you can provide would be immensely helpful.
>
> Thanks.
> --Alex
>
>
>
> _______________________________________________
> Edinburgh-pm mailing list
> Edinburgh-pm at pm.org
> http://mail.pm.org/mailman/listinfo/edinburgh-pm
More information about the Edinburgh-pm
mailing list