[tpm] Fwd: Perl and Unicode
Mike Stok
mike at stok.ca
Fri May 1 13:58:43 PDT 2009
On May 1, 2009, at 1:29 PM, Antonio Sun wrote:
> Hi Abram,
>
> When you talked about the difficulties that Perl has calculating
> string
> lengths, I didn't quite understand your explanation because I didn't
> catch
> the term that you used. Could you explain it in writing please?
>
> AFAIK, how Perl interprets string length depends on encoding, E.g.,
>
> use encoding utf8;
> print length("骆驼"); # 2, because there are 2 Chinese characters
> # However,
> no encoding;
> print length("骆驼"); # 6, the 2 Chinese characters take up 6 bytes.
>
> I.e., Perl has the capability to return whatever string length you
> want. Do I miss anything?
>
> BTW,
>
> Anyone knows how to split an Unicode string into individual
> characters? E.g., from "骆驼" to '骆' & '驼'?
There has to be a better way than this:
DB<7> @chars = map { chr } unpack('U*', "骆驼")
DB<8> x @chars
0 '\x{9A86}'
1 '\x{9A7C}'
DB<9> print "@chars"
骆 驼
DB<10>
Mike
> Thanks
>
> Antonio
>
>
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
--
Mike Stok <mike at stok.ca>
http://www.stok.ca/~mike/
The "`Stok' disclaimers" apply.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/f6a80092/attachment.html>
More information about the toronto-pm
mailing list