[Edinburgh-pm] Converting Microsoft Word Special Characters
Alex Brelsfoard
alex.brelsfoard at gmail.com
Fri May 18 16:56:44 PDT 2012
Hi All,
I was wondering if I could get some help here. I am looking for an
existing function/method/module that will properly convert all special
characters (like those from Microsoft Word: smart quotes, mdash, ellipses,
bullet points, etc.) to either a matching simpler character, or an HTML
entity.
HTML::Entities does a close job, but it does not handle everything
correctly.
I need to clean this data up for use in a google product feed (xml).
Here is an example of some text I am having trouble with:
( the +'s are actually bullet points)
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE COMPLETE VERSION
:DOORS OF THE MIND: INNER MYSTERIES + Dark atmosphere+ Spooky
gameplay+ Explore a world of nightmares!
======= end =======
And here is the output from using HTML::Entities:
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind â Inner Mysteries.ADVANTAGES OF THE
COMPLETE VERSION :DOORS OF THE MIND: INNER
MYSTERIESÂ +Â Dark atmosphere+Â Spooky
gameplay+Â Explore a world of nightmares!
======= end =======
Notice the extra  all over the place.
Any help you can provide would be immensely helpful.
Thanks.
--Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/edinburgh-pm/attachments/20120519/ae46b324/attachment.html>
More information about the Edinburgh-pm
mailing list