[tpm] Detecting whether a file is encoded as UTF8 or UTF16

Indy Singh indy at indigostar.com
Thu Oct 28 09:45:21 PDT 2010


You could open the file in binary mode and look for the extra marker bytes at the beginning.  For example a UTF-8 file looks like this:
0000:0000  EF BB BF 61 62 63 0D 0A   ...abc.

Notice the three extra bytes.  Not sure about strings.

Indy Singh
IndigoSTAR Software -- www.indigostar.com

  ----- Original Message ----- 
  From: J. Bobby Lopez 
  To: Toronto Perl Mongers 
  Sent: Thursday, October 28, 2010 12:26 PM
  Subject: [tpm] Detecting whether a file is encoded as UTF8 or UTF16


  Does anyone have a tried true method of detecting whether a file (or string) is detected as UTF8 or UTF16?

  I'm not talking about converting from one to the other, for that I'm aware of ICONV, but I"m talking about simple detection, especially if the is simply described as "data" by the 'file' command on the command line.

  Thanks!

  Bobby



------------------------------------------------------------------------------


  _______________________________________________
  toronto-pm mailing list
  toronto-pm at pm.org
  http://mail.pm.org/mailman/listinfo/toronto-pm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20101028/e3976042/attachment.html>


More information about the toronto-pm mailing list