Nunca precisei fazer estas coisas, mas não seria apenas comparar os bit mais significantes através de um bitwise ?<br><br>Solli M. Honório<br><br><div class="gmail_quote">2010/10/20 Andre Carneiro <span dir="ltr"><<a href="mailto:andregarciacarneiro@gmail.com">andregarciacarneiro@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Eu já tentei usar esse módulo. Não é sempre que ele detecta corretamente a codificação. Mas como já faz muito tempo desde a última vez que tentei usar esse módulo( a uns dois anos atrás ), talvez valha a pena dar uma olhada novamente, considerando que a última atualização foi esse ano.<div>
<br></div><div>E tem uma observação na documentação desse módulo:</div><div><br><div>Because of the algorithm used, ISO-8859 series and other single-byte encodings do not work well unless either one of ISO-8859 is the only one suspect (besides ascii and utf8).</div>
<div><br></div><div><br></div><div><br></div><div>Cheers!</div><div><br></div><div><br><div class="gmail_quote">2010/10/19 Solli Honorio <span dir="ltr"><<a href="mailto:shonorio@gmail.com" target="_blank">shonorio@gmail.com</a>></span><div>
<div></div><div class="h5"><br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Stanislaw,<br><br>O <a href="http://search.cpan.org/%7Edankogai/Encode-2.40/lib/Encode/Guess.pm" target="_blank">http://search.cpan.org/~dankogai/Encode-2.40/lib/Encode/Guess.pm</a> faz o que vc precisa ?<br>
<br>Solli<br><br><div class="gmail_quote">
2010/10/19 Stanislaw Pusep <span dir="ltr"><<a href="mailto:creaktive@gmail.com" target="_blank">creaktive@gmail.com</a>></span><div><div></div><div><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Valeu Daniel!<br>De fato, sai muito mais eficiente salvar os dados codificados num arquivo e depois abrir e ler pelo "conversor embutido" do Perl, do que fazer as conversões malucas com buffers inline.<br>Só me resta uma dúvida: e para detectar a codificação de uma string? O PHP tem mb_detect_encoding() (<a href="http://php.net/manual/en/function.mb-detect-encoding.php" target="_blank">http://php.net/manual/en/function.mb-detect-encoding.php</a>, foi de lá que roubei o meu detect_utf8()); já no Perl, nem utf8::is_utf8() e nem utf8::valid() fazem isso.<br clear="all">
<br>ABS()<div><div></div><div><br><br>
<br><br><div class="gmail_quote">On Tue, Oct 19, 2010 at 01:12, Daniel de Oliveira Mantovani <span dir="ltr"><<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
perl -e '{binmode STDOUT,":utf8";use open IO => ":utf8";print uc($_)<br>
while <>}' teste.txt<br>
<br>
"Setting the default encoding<br>
You can set the encoding for all streams with the open pragma. If you want<br>
to use the same default encoding for all input and output filehandles, you<br>
can set them at the same time with the IO setting:<br>
use open IO => ':utf8';<br>
You can set the default encoding for just output handles with the<br>
setting:<br>
OUT<br>
use open OUT => ':utf8';<br>
Similarly, you can set all of the input filehandles to have the encoding that<br>
you need:<br>
use open IN => ':utf8';<br>
You can event set the default encoding for the input and output streams<br>
separately, but in the same call to open:<br>
use open IN => ":cp1251", OUT => ":shiftjis";<br>
The -C switch tells Perl to switch on various Unicode features. You can selec-<br>
tively turn on features by specifying the ones that you want without having<br>
to change the source code. If you use that switch with no specifiers, Perl uses<br>
UTF-8 for all of the standard filehandles and any that you open yourself:<br>
<div><div></div><div>"<br>
<br>
<br>
<br>
2010/10/19 Daniel de Oliveira Mantovani <<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>>:<br>
> Argh, desculpa estou muitas, muitas, muitas horas sem dormir.<br>
><br>
> perl -Mutf8 -pe 'binmode STDIN, ":utf8";$_=uc' texte.txt<br>
><br>
> É disso que você precisa.<br>
><br>
> Me desculpe de novo.<br>
><br>
><br>
> 2010/10/19 Daniel de Oliveira Mantovani <<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>>:<br>
>> perl -Mutf8 -pe '$_=uc' teste.txt<br>
>><br>
>> 2010/10/18 Stanislaw Pusep <<a href="mailto:creaktive@gmail.com" target="_blank">creaktive@gmail.com</a>>:<br>
>>> Li sim :)<br>
>>><br>
>>> "The following functions are defined in the utf8:: package by the Perl core.<br>
>>> You do not need to say use utf8 to use these and in fact you should not say<br>
>>> that unless you really want to have UTF-8 source code."<br>
>>><br>
>>> Anyway, tentei fazer isso:<br>
>>> perl -pe 'utf8::encode($_);$_=uc' teste.txt<br>
>>><br>
>>> Conforme o esperado, imprime na tela os caracteres corretos. Porém sem<br>
>>> converter acentos para maiúsculas. Vai entender :(<br>
>>><br>
>>> ABS()<br>
>>><br>
>>><br>
>>><br>
>>> 2010/10/18 Daniel de Oliveira Mantovani <<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>><br>
>>>><br>
>>>> Você leu o manual todo ?<br>
>>>><br>
>>>> "Converts in-place the internal octet sequence in the native encoding<br>
>>>> (Latin-1 or EBCDIC) to the equivalent character sequence in UTF-X.<br>
>>>> $string already encoded as characters does no harm.Returns the number<br>
>>>> of octets necessary to represent the string as UTF-X.Can be used to<br>
>>>> make sure that the UTF-8 flag is on, so that "\w" or "lc()" work as<br>
>>>> Unicode on strings containing characters in the range 0x80-0xFF (on<br>
>>>> ASCII<br>
>>>> and derivatives)."<br>
>>>><br>
>>>><br>
>>>> 2010/10/18 Stanislaw Pusep <<a href="mailto:creaktive@gmail.com" target="_blank">creaktive@gmail.com</a>>:<br>
>>>> > Infelizmente...<br>
>>>> ><br>
>>>> > <a href="http://perldoc.perl.org/utf8.html" target="_blank">http://perldoc.perl.org/utf8.html</a><br>
>>>> > Do not use this pragma for anything else than telling Perl that your<br>
>>>> > script<br>
>>>> > is written in UTF-8.<br>
>>>> ><br>
>>>> > A minha referência atual sobre Perl e UTF-8 é esta (original em russo,<br>
>>>> > não a<br>
>>>> > tradução):<br>
>>>> ><br>
>>>> > <a href="http://translate.google.com/translate?hl=en-US&sl=ru&tl=en&u=http%3A%2F%2Fxpoint.ru%2Fknow-how%2FPerl%2FPodderzhkaUnicode" target="_blank">http://translate.google.com/translate?hl=en-US&sl=ru&tl=en&u=http%3A%2F%2Fxpoint.ru%2Fknow-how%2FPerl%2FPodderzhkaUnicode</a><br>
>>>> ><br>
>>>> > ABS()<br>
>>>> ><br>
>>>> ><br>
>>>> ><br>
>>>> > 2010/10/18 Daniel de Oliveira Mantovani <<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>><br>
>>>> >><br>
>>>> >> 2010/10/18 Daniel de Oliveira Mantovani <<a href="mailto:mantovani@perl.org.br" target="_blank">mantovani@perl.org.br</a>>:<br>
>>>> >> <code><br>
>>>> >> my $text;{$/=$\;$text=<>};<br>
>>>> >> sub do_what_I_want {return uc(@_)};<br>
>>>> >> when (detect_utf8($buf)) {<br>
>>>> >> {<br>
>>>> >> require utf8;<br>
>>>> >> do_what_I_want(...)<br>
>>>> >> }<br>
>>>> >> }<br>
>>>> >><br>
>>>> >> { do_what_I_want(...) }<br>
>>>> >> </code><br>
>>>> >><br>
>>>> >> Agora sim.<br>
>>>> >><br>
>>>> >> ><br>
>>>> >> > /me ;)<br>
>>>> >> ><br>
>>>> >> ><br>
>>>> >> > Procura no StackOverflow por Perl e codificação, o briand d foy deu<br>
>>>> >> > uma explicação bem útil.<br>
>>>> >> ><br>
>>>> >> > 2010/10/18 Stanislaw Pusep <<a href="mailto:creaktive@gmail.com" target="_blank">creaktive@gmail.com</a>>:<br>
>>>> >> >> Tenho certeza de que o assunto foi levantado várias vezes na lista,<br>
>>>> >> >> então,<br>
>>>> >> >> ATENÇÃO: o Perl tem excelentes mecanismos para tratar I/O em<br>
>>>> >> >> diversas<br>
>>>> >> >> codificações da maneira mais prática possível. Por exemplo, dá para<br>
>>>> >> >> pegar<br>
>>>> >> >> arquivo em ISO-8859-1 do STDIN e jogar para STDOUT em UTF-8, isso é<br>
>>>> >> >> canja de<br>
>>>> >> >> galinha. Sempre que abre um handle, é só especificar o que tem<br>
>>>> >> >> dentro<br>
>>>> >> >> que...<br>
>>>> >> >> Aí que está o MEU problema: nunca sei de antemão o que tem dentro :P<br>
>>>> >> >> A solução mais viável que encontrei até agora foi:<br>
>>>> >> >><br>
>>>> >> >> my $buf;<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> eval {<br>
>>>> >> >> open(TXT, '<', $file) or die "impossivel abrir<br>
>>>> >> >> $file:<br>
>>>> >> >> $!";<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> binmode TXT, ':bytes';<br>
>>>> >> >> local $/ = undef;<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> $buf = <TXT>;<br>
>>>> >> >> close TXT;<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> };<br>
>>>> >> >><br>
>>>> >> >> my $iconv = new Text::Iconv(detect_utf8($buf) ? 'utf-8' :<br>
>>>> >> >> 'iso-8859-1', 'utf-8');<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> $buf = $iconv->convert($buf);<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> Encode::_utf8_on($buf);<br>
>>>> >> >><br>
>>>> >> >> Explicando: abro o arquivo do jeito "cru", sem nenhuma codificação.<br>
>>>> >> >> Carrego<br>
>>>> >> >> o conteúdo no buffer. Aí uso Text::Iconv para converter a<br>
>>>> >> >> codificação.<br>
>>>> >> >> Detalhe importantíssimo: mesmo que os dados já estejam em UTF-8,<br>
>>>> >> >> ainda<br>
>>>> >> >> assim<br>
>>>> >> >> precisa aplicar o Text::Iconv. E ainda não acabou: Perl não<br>
>>>> >> >> reconhece o<br>
>>>> >> >> buffer como algo que tenha codificação UTF-8 até que eu force o flag<br>
>>>> >> >> UTF-8.<br>
>>>> >> >> Pronto! Depois disso tudo, $buf é um autêntico UTF-8. Posso dar uc()<br>
>>>> >> >> que "ã"<br>
>>>> >> >> vira "Ã", e /\w/ pega os acentos também.<br>
>>>> >> >> Aqui está o código completo: <a href="http://tinypaste.com/c3680" target="_blank">http://tinypaste.com/c3680</a><br>
>>>> >> >> A pergunta é: existe alguma maneira menos ineficiente de se fazer<br>
>>>> >> >> isto?<br>
>>>> >> >><br>
>>>> >> >> ABS()<br>
>>>> >> >><br>
>>>> >> >><br>
>>>> >> >> _______________________________________________<br>
>>>> >> >> SaoPaulo-pm mailing list<br>
>>>> >> >> <a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
>>>> >> >> <a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
>>>> >> >><br>
>>>> >> ><br>
>>>> >> ><br>
>>>> >> ><br>
>>>> >> > --<br>
>>>> >> > "If you’ve never written anything thoughtful, then you’ve never had<br>
>>>> >> > any difficult, important, or interesting thoughts. That’s the secret:<br>
>>>> >> > people who don’t write, are people who don’t think."<br>
>>>> >> ><br>
>>>> >><br>
>>>> >><br>
>>>> >><br>
>>>> >> --<br>
>>>> >> "If you’ve never written anything thoughtful, then you’ve never had<br>
>>>> >> any difficult, important, or interesting thoughts. That’s the secret:<br>
>>>> >> people who don’t write, are people who don’t think."<br>
>>>> >> _______________________________________________<br>
>>>> >> SaoPaulo-pm mailing list<br>
>>>> >> <a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
>>>> >> <a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
>>>> ><br>
>>>> ><br>
>>>> > _______________________________________________<br>
>>>> > SaoPaulo-pm mailing list<br>
>>>> > <a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
>>>> > <a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
>>>> ><br>
>>>><br>
>>>><br>
>>>><br>
>>>> --<br>
>>>> "If you’ve never written anything thoughtful, then you’ve never had<br>
>>>> any difficult, important, or interesting thoughts. That’s the secret:<br>
>>>> people who don’t write, are people who don’t think."<br>
>>>> _______________________________________________<br>
>>>> SaoPaulo-pm mailing list<br>
>>>> <a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
>>>> <a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
>>><br>
>>><br>
>>> _______________________________________________<br>
>>> SaoPaulo-pm mailing list<br>
>>> <a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
>>> <a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
>>><br>
>><br>
>><br>
>><br>
>> --<br>
>> "If you’ve never written anything thoughtful, then you’ve never had<br>
>> any difficult, important, or interesting thoughts. That’s the secret:<br>
>> people who don’t write, are people who don’t think."<br>
>><br>
><br>
><br>
><br>
> --<br>
> "If you’ve never written anything thoughtful, then you’ve never had<br>
> any difficult, important, or interesting thoughts. That’s the secret:<br>
> people who don’t write, are people who don’t think."<br>
><br>
<br>
<br>
<br>
</div></div>--<br>
<div><div></div><div>"If you’ve never written anything thoughtful, then you’ve never had<br>
any difficult, important, or interesting thoughts. That’s the secret:<br>
people who don’t write, are people who don’t think."<br>
_______________________________________________<br>
SaoPaulo-pm mailing list<br>
<a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br>
</div></div></blockquote></div><br>
</div></div><br>_______________________________________________<br>
SaoPaulo-pm mailing list<br>
<a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br></blockquote></div></div></div><font color="#888888"><br><br clear="all"><br>-- <br>"o animal satisfeito dorme". - Guimarães Rosa<br>
</font><br>_______________________________________________<br>
SaoPaulo-pm mailing list<br>
<a href="mailto:SaoPaulo-pm@pm.org" target="_blank">SaoPaulo-pm@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br></blockquote></div></div></div><br><br clear="all"><br>-- <br>André Garcia Carneiro<br>Analista/Desenvolvedor Perl<br>
(11)82907780<br>
</div></div>
<br>_______________________________________________<br>
SaoPaulo-pm mailing list<br>
<a href="mailto:SaoPaulo-pm@pm.org">SaoPaulo-pm@pm.org</a><br>
<a href="http://mail.pm.org/mailman/listinfo/saopaulo-pm" target="_blank">http://mail.pm.org/mailman/listinfo/saopaulo-pm</a><br></blockquote></div><br><br clear="all"><br>-- <br>"o animal satisfeito dorme". - Guimarães Rosa<br>