[tpm] Perl and Unicode

Fri May 1 10:03:01 PDT 2009

Hi Abram,

When you talked about the difficulties that Perl has calculating string
lengths, I didn't quite understand your explanation because I didn't catch
the term that you used. Could you explain it in writing please?

AFAIK, how Perl interprets string length depends on encoding, E.g.,

use encoding utf8;
print length("骆驼"); # 2, because there are 2 Chinese characters
# However,
no encoding;
print length("骆驼"); # 6, the 2 Chinese characters take up 6 bytes.

I.e., Perl has the capability to return whatever string length you want. Do
I miss anything?

BTW,

Anyone knows how to split an Unicode string into individual characters?
E.g., from "骆驼" to '骆' & '驼'?

Thanks

Antonio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/376d390c/attachment.html>