[tpm] trouble with unicode versus octets
stuart at morungos.com
Mon Oct 15 12:23:36 PDT 2012
It's not an uncommon problem, but it's a messy one. And it's basically an application decision.
The module you need is Encode, and what you probably need is
my $encoded = Encode::encode('utf8', $utf8_string);
which translates the string that's in UTF8 and which you can't print, into a set of bytes in UTF8, which you can. That stops the print error. However, this is for printing or writing over a network connection, and you might need a different encoding depending on your protocol. The Encode module can do most any encoding you like or need, and many that seem ridiculous.
In the case of UTF8, and only because internally Perl uses UTF8, that sets a special flag that effectively stops Perl from giving wide character errors. But this is highly confusing special behaviour, and it's often worth testing Perl with non-UTF8 data printing/communications to flush out these issues.
The problems are worse if you don't know what your strings are to begin with. It's best to help your app by making everything UTF8 (internally) as soon as possible, assuming it isn't already. There is no way to tell. reliably, whether a piece of random data really is UTF8 text as that's really down to how it is supposed to be interpreted.
On 2012-10-15, at 1:59 PM, Fulko Hew wrote:
> I have a problem (so what else is new!) that I haven't yet found a solution to ...
> In my app, I receive strings, massage them, and 'push_write" them to an AnyEvent socket.
> Occasionaly, my app receives a unicoded string...
> so when the write happens, Perl (inside the AnyEvent module)
> dies with the error:
> Wide character in subroutine entry at ...
> What I haven't figured out yet is, how to coerce the character string into
> an octet string (for the rest of its life, ie. in subsequent modules)
> so the warning/dying goes away.
> toronto-pm mailing list
> toronto-pm at pm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the toronto-pm