[tpm] Fwd: Perl and Unicode

Antonio Sun antoniosun at lavabit.com
Fri May 1 10:29:05 PDT 2009


Hi Abram,

When you talked about the difficulties that Perl has calculating string
lengths, I didn't quite understand your explanation because I didn't catch
the term that you used. Could you explain it in writing please?

AFAIK, how Perl interprets string length depends on encoding, E.g.,

use encoding utf8;
print length("骆驼"); # 2, because there are 2 Chinese characters
# However,
no encoding;
print length("骆驼"); # 6, the 2 Chinese characters take up 6 bytes.

I.e., Perl has the capability to return whatever string length you want. Do
I miss anything?

BTW,

Anyone knows how to split an Unicode string into individual characters?
E.g., from "骆驼" to '骆' & '驼'?

Thanks

Antonio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/92de4c9e/attachment-0001.html>


More information about the toronto-pm mailing list