[tpm] Handling utf8/unicode in strings (I think?)

Cees Hek ceeshek at gmail.com
Mon Nov 15 16:06:35 PST 2010


On Sat, Nov 13, 2010 at 8:24 AM, J. Bobby Lopez <jbl at jbldata.com> wrote:
> I'm outputting a string in perl, and it shows up something like this:
>
> "won\u00e2\u0080\u0099t easily play well "
>
>
> Where it should be like this:
>
> "won't easily play well"

You probably have a cp1252 apostrophe in there that came from a
windows program.  You most likely need to do some encoding of the
input and/or output of your data from your perl program.  There is an
excellent document that explains all the ins and outs of handling utf8
in Perl here:

http://en.wikibooks.org/wiki/Perl_Programming/Unicode_UTF-8

> I thought it was a terminal issue... and that was partially true.  I ran mc
> on this particular terminal, and the curses UI was all garbled.  So I fixed
> that be reconfiguring my locales, but perl is still outputing the noise
> above.

Well, if I run the following from my terminal I see the apostrophe correctly:

perl -MEncode -le 'binmode STDOUT, ":encoding(utf8)"; print
Encode::decode("utf8", "won\x{00e2}\x{0080}\x{0099}t easily play
well")'

That decodes the raw utf8 string into perl's internal utf8 format, and
then converts it back to utf8 on output.

If you run that and see garbled text then perhaps it is your terminal
that is messed up.  It could be set to a different character set (mine
is set to handle utf8:

LANG=en_AU.utf8

Cheers,

Cees


More information about the toronto-pm mailing list