<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18975">
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2 face=Arial>You could open the file in binary mode and look for
the extra marker bytes at the beginning. For example a UTF-8 file looks
like this:</FONT></DIV>
<DIV><FONT size=2 face=Arial>0000:0000 EF BB BF 61 62 63 0D 0A
...abc.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>Notice the three extra bytes. Not sure about
strings.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT><BR>Indy Singh<BR>IndigoSTAR Software -- <A
href="http://www.indigostar.com">www.indigostar.com</A><BR></DIV>
<BLOCKQUOTE
style="BORDER-LEFT: #000000 2px solid; PADDING-LEFT: 5px; PADDING-RIGHT: 0px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px">
<DIV style="FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV
style="FONT: 10pt arial; BACKGROUND: #e4e4e4; font-color: black"><B>From:</B>
<A title=jbl@jbldata.com href="mailto:jbl@jbldata.com">J. Bobby Lopez</A>
</DIV>
<DIV style="FONT: 10pt arial"><B>To:</B> <A title=tpm@to.pm.org
href="mailto:tpm@to.pm.org">Toronto Perl Mongers</A> </DIV>
<DIV style="FONT: 10pt arial"><B>Sent:</B> Thursday, October 28, 2010 12:26
PM</DIV>
<DIV style="FONT: 10pt arial"><B>Subject:</B> [tpm] Detecting whether a file
is encoded as UTF8 or UTF16</DIV>
<DIV><BR></DIV>Does anyone have a tried true method of detecting whether a
file (or string) is detected as UTF8 or UTF16?<BR><BR>I'm not talking about
converting from one to the other, for that I'm aware of ICONV, but I"m talking
about simple detection, especially if the is simply described as "data" by the
'file' command on the command line.<BR><BR>Thanks!<BR><BR>Bobby<BR>
<P>
<HR>
<P></P>_______________________________________________<BR>toronto-pm mailing
list<BR>toronto-pm@pm.org<BR>http://mail.pm.org/mailman/listinfo/toronto-pm<BR></BLOCKQUOTE></BODY></HTML>