[tpm] Perl and Unicode [do not read this from a terminal]
Abram Hindle
abram.hindle at softwareprocess.us
Fri May 1 11:47:55 PDT 2009
Hi,
I did not mean perl in particular. I meant that it is hard to measure
string lengths in unicode due to diacritics. It really depends what you
mean by string length, whether you mean character, rendering length etc.
Apologies to console users:
‸̵̵̧̱̱̱̣̊̃̊‸̧̋̊‸̵̧̱̱̋̆̋̆̋̃‸̧a <-- see this character?
depending on what you've got you'll get wildling different lengths:
TCL says 29
Perl says 61, 29 with use utf8;
echo -n ‸̵̵̧̱̱̱̣̊̃̊‸̧̋̊‸̵̧̱̱̋̆̋̆̋̃‸̧a | perl -e 'print length(<>).$/' -> prints 61
Where as that character on my screen looks like 9 characters,
thunderbird says it is 1.
So unicode as a lot of rough edges, try pasting this into an irc channel
on freenode and see how people react:
‸̵̵̱̣̋̌̋̊‸̋̃̊‸̊̋‸̵̱̱̱̃̋̋‸̊‸̵̱̱̱̋̋̌‸̵̵̱̣̱̋̌̃̋̋‸̧̧̱̊̃̃‸‸̵̣̊̋‸‸̧̋‸̵̱̱̊̋̌̌̋̊‸̵̧̱̣̣‸‸̵̧̧̱̱̋̊̆̆̌‸̵̵̌̆̃̆̌̆̋̊o
abram
Antonio Sun wrote:
> Hi Abram,
>
> When you talked about the difficulties that Perl has calculating string
> lengths, I didn't quite understand your explanation because I didn't catch
> the term that you used. Could you explain it in writing please?
>
> AFAIK, how Perl interprets string length depends on encoding, E.g.,
>
> use encoding utf8;
> print length("骆驼"); # 2, because there are 2 Chinese characters
> # However,
> no encoding;
> print length("骆驼"); # 6, the 2 Chinese characters take up 6 bytes.
>
> I.e., Perl has the capability to return whatever string length you want. Do
> I miss anything?
>
> BTW,
>
> Anyone knows how to split an Unicode string into individual characters?
> E.g., from "骆驼" to '骆' & '驼'?
>
> Thanks
>
> Antonio
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20090501/e33d47e4/attachment.bin>
More information about the toronto-pm
mailing list