[tpm] Detecting whether a file is encoded as UTF8 or UTF16
Indy Singh
indy at indigostar.com
Thu Oct 28 09:45:21 PDT 2010
You could open the file in binary mode and look for the extra marker bytes at the beginning. For example a UTF-8 file looks like this:
0000:0000 EF BB BF 61 62 63 0D 0A ...abc.
Notice the three extra bytes. Not sure about strings.
Indy Singh
IndigoSTAR Software -- www.indigostar.com
----- Original Message -----
From: J. Bobby Lopez
To: Toronto Perl Mongers
Sent: Thursday, October 28, 2010 12:26 PM
Subject: [tpm] Detecting whether a file is encoded as UTF8 or UTF16
Does anyone have a tried true method of detecting whether a file (or string) is detected as UTF8 or UTF16?
I'm not talking about converting from one to the other, for that I'm aware of ICONV, but I"m talking about simple detection, especially if the is simply described as "data" by the 'file' command on the command line.
Thanks!
Bobby
------------------------------------------------------------------------------
_______________________________________________
toronto-pm mailing list
toronto-pm at pm.org
http://mail.pm.org/mailman/listinfo/toronto-pm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20101028/e3976042/attachment.html>
More information about the toronto-pm
mailing list