[Edinburgh-pm] Converting Microsoft Word Special Characters

Robert Rothenberg robrwo at gmail.com
Sat May 19 06:13:11 PDT 2012


Searching for "msword" on CPAN gives

  http://search.cpan.org/~amiri/MSWord-ToHTML-0.005/

On 19/05/12 00:56 Alex Brelsfoard wrote:
> Hi All,
> 
> I was wondering if I could get some help here.  I am looking for an existing
> function/method/module that will properly convert all special characters
> (like those from Microsoft Word: smart quotes, mdash, ellipses, bullet
> points, etc.) to either a matching simpler character, or an HTML entity.
> 
> HTML::Entities does a close job, but it does not handle everything correctly.
> 
> I need to clean this data up for use in a google product feed (xml).
> 
> Here is an example of some text I am having trouble with:
> ( the +'s are actually bullet points)
> ====== begin ======
> My doctor has recommended a dream specialist, and together we are trying to
> figure out what these nightmares mean. Jump into Hidden Object action in
> Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE COMPLETE VERSION
> :DOORS OF THE MIND: INNER MYSTERIES + Dark atmosphere+ Spooky
> gameplay+ Explore a world of nightmares!
> ======= end =======
> 
> And here is the output from using HTML::Entities:
> ====== begin ======
> My doctor has recommended a dream specialist, and together we are trying to
> figure out what these nightmares mean. Jump into Hidden Object action in
> Doors of the Mind – Inner Mysteries.ADVANTAGES OF THE
> COMPLETE VERSION :DOORS OF THE MIND: INNER
> MYSTERIES + Dark atmosphere+ Spooky
> gameplay+ Explore a world of nightmares!
> ======= end =======
> 
> Notice the extra  all over the place.
> 
> Any help you can provide would be immensely helpful.
> 
> Thanks.
> --Alex
> 
> 
> 
> _______________________________________________
> Edinburgh-pm mailing list
> Edinburgh-pm at pm.org
> http://mail.pm.org/mailman/listinfo/edinburgh-pm




More information about the Edinburgh-pm mailing list