[mplspm]: Unicode help

Ken Williams ken at mathforum.org
Thu Mar 25 21:02:17 CST 2004


On Mar 25, 2004, at 5:43 PM, Mark Allen wrote:
> So the UTF8 file ought to be twice as big as the ASCII file and have a
> totally different MD5 hash.  What am I missing?

Maybe you're thinking of UTF16?  According to 
http://www.unicode.org/standard/principles.html :

------------
UTF-8 is a way of transforming all Unicode characters into a variable 
length encoding of bytes. It has the advantages that the Unicode 
characters corresponding to the familiar ASCII set have the same byte 
values as ASCII, and that Unicode characters transformed into UTF-8 can 
be used with much existing software without extensive software 
rewrites.  
------------

The key phrase is "variable length".  Some characters, in particular 
the ASCII set, are still 8 bits long.

Try adding some fahrfergnügen/fiancée words and see what happens.

  -Ken



--------------------------------------------------
Minneapolis Perl Mongers mailing list

To unsubscribe, send mail to majordomo at pm.org
with "unsubscribe mpls" in the body of the message.



More information about the Mpls-pm mailing list