[Edinburgh-pm] Converting Microsoft Word Special Characters

Alex Brelsfoard alex.brelsfoard at gmail.com
Fri May 18 16:56:44 PDT 2012


Hi All,

I was wondering if I could get some help here.  I am looking for an
existing function/method/module that will properly convert all special
characters (like those from Microsoft Word: smart quotes, mdash, ellipses,
bullet points, etc.) to either a matching simpler character, or an HTML
entity.

HTML::Entities does a close job, but it does not handle everything
correctly.

I need to clean this data up for use in a google product feed (xml).

Here is an example of some text I am having trouble with:
( the +'s are actually bullet points)
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE COMPLETE VERSION
:DOORS OF THE MIND: INNER MYSTERIES + Dark atmosphere+ Spooky
gameplay+ Explore a world of nightmares!
======= end =======

And here is the output from using HTML::Entities:
====== begin ======
My doctor has recommended a dream specialist, and together we are trying to
figure out what these nightmares mean. Jump into Hidden Object action in
Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE
COMPLETE VERSION :DOORS OF THE MIND: INNER
MYSTERIES + Dark atmosphere+ Spooky
gameplay+ Explore a world of nightmares!
======= end =======

Notice the extra  all over the place.

Any help you can provide would be immensely helpful.

Thanks.
--Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/edinburgh-pm/attachments/20120519/ae46b324/attachment.html>


More information about the Edinburgh-pm mailing list