[Chicago-talk] Reading extended ASCII characters from a text box

Shlomi Fish shlomif at shlomifish.org
Thu Oct 18 12:42:14 PDT 2012

Hi Jim,

On Thu, 18 Oct 2012 11:34:51 -0500
Jim Jacobus <JJacobus at PonyX.com> wrote:

> On our website we have a HTML text box so that customers can insert a 
> product description. Problem is when they do a copy and paste from a 
> Word or other document format that has extended ASCII charcters (like 
> inch marks, foot marks, greek characters and even latin-1 
> characters), the characters get translated to nulls and such.
> Is there a way to read the contents in the raw form using Perl so I 
> can translate it? Or is the an HTML issue?

That sounds like an encoding problem. See:

* http://www.joelonsoftware.com/articles/Unicode.html

* http://perldoc.perl.org/perlunitut.html

* http://www.unicode.org/faq/unicode_web.html

If you're still running into problems, it would help to get a
reproducing example.


	Shlomi Fish

