[SP-pm] trabalhando com UTF-8 e ISO-8859-1 simultaneamente

Daniel de Oliveira Mantovani mantovani at perl.org.br
Mon Oct 18 20:12:35 PDT 2010


perl -e '{binmode STDOUT,":utf8";use open IO => ":utf8";print uc($_)
while <>}' teste.txt

"Setting the default encoding
You can set the encoding for all streams with the open pragma. If you want
to use the same default encoding for all input and output filehandles, you
can set them at the same time with the IO setting:
use open IO => ':utf8';
You can set the default encoding for just output handles with the
setting:
OUT
use open OUT => ':utf8';
Similarly, you can set all of the input filehandles to have the encoding that
you need:
use open IN => ':utf8';
You can event set the default encoding for the input and output streams
separately, but in the same call to open:
use open IN => ":cp1251", OUT => ":shiftjis";
The -C switch tells Perl to switch on various Unicode features. You can selec-
tively turn on features by specifying the ones that you want without having
to change the source code. If you use that switch with no specifiers, Perl uses
UTF-8 for all of the standard filehandles and any that you open yourself:
"



2010/10/19 Daniel de Oliveira Mantovani <mantovani at perl.org.br>:
> Argh, desculpa estou muitas, muitas, muitas horas sem dormir.
>
> perl -Mutf8 -pe 'binmode STDIN, ":utf8";$_=uc' texte.txt
>
> É disso que você precisa.
>
> Me desculpe de novo.
>
>
> 2010/10/19 Daniel de Oliveira Mantovani <mantovani at perl.org.br>:
>> perl -Mutf8 -pe '$_=uc' teste.txt
>>
>> 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>>> Li sim :)
>>>
>>> "The following functions are defined in the utf8:: package by the Perl core.
>>> You do not need to say use utf8 to use these and in fact you should not say
>>> that unless you really want to have UTF-8 source code."
>>>
>>> Anyway, tentei fazer isso:
>>> perl -pe 'utf8::encode($_);$_=uc' teste.txt
>>>
>>> Conforme o esperado, imprime na tela os caracteres corretos. Porém sem
>>> converter acentos para maiúsculas. Vai entender :(
>>>
>>> ABS()
>>>
>>>
>>>
>>> 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>
>>>>
>>>> Você leu o manual todo ?
>>>>
>>>> "Converts in-place the internal octet sequence in the native encoding
>>>> (Latin-1 or EBCDIC) to the equivalent character sequence in UTF-X.
>>>> $string already encoded as characters does no harm.Returns the number
>>>> of octets necessary to represent the string as UTF-X.Can be used to
>>>> make sure that the UTF-8 flag is on, so that "\w" or "lc()" work as
>>>> Unicode on strings containing characters in the range 0x80-0xFF (on
>>>> ASCII
>>>> and derivatives)."
>>>>
>>>>
>>>> 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>>>> > Infelizmente...
>>>> >
>>>> > http://perldoc.perl.org/utf8.html
>>>> > Do not use this pragma for anything else than telling Perl that your
>>>> > script
>>>> > is written in UTF-8.
>>>> >
>>>> > A minha referência atual sobre Perl e UTF-8 é esta (original em russo,
>>>> > não a
>>>> > tradução):
>>>> >
>>>> > http://translate.google.com/translate?hl=en-US&sl=ru&tl=en&u=http%3A%2F%2Fxpoint.ru%2Fknow-how%2FPerl%2FPodderzhkaUnicode
>>>> >
>>>> > ABS()
>>>> >
>>>> >
>>>> >
>>>> > 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>
>>>> >>
>>>> >> 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>:
>>>> >> <code>
>>>> >>  my $text;{$/=$\;$text=<>};
>>>> >>  sub do_what_I_want {return uc(@_)};
>>>> >>  when (detect_utf8($buf)) {
>>>> >>     {
>>>> >>        require utf8;
>>>> >>        do_what_I_want(...)
>>>> >>     }
>>>> >>  }
>>>> >>
>>>> >>  { do_what_I_want(...) }
>>>> >> </code>
>>>> >>
>>>> >> Agora sim.
>>>> >>
>>>> >> >
>>>> >> > /me ;)
>>>> >> >
>>>> >> >
>>>> >> > Procura no StackOverflow por Perl e codificação, o briand d foy deu
>>>> >> > uma explicação bem útil.
>>>> >> >
>>>> >> > 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>>>> >> >> Tenho certeza de que o assunto foi levantado várias vezes na lista,
>>>> >> >> então,
>>>> >> >> ATENÇÃO: o Perl tem excelentes mecanismos para tratar I/O em
>>>> >> >> diversas
>>>> >> >> codificações da maneira mais prática possível. Por exemplo, dá para
>>>> >> >> pegar
>>>> >> >> arquivo em ISO-8859-1 do STDIN e jogar para STDOUT em UTF-8, isso é
>>>> >> >> canja de
>>>> >> >> galinha. Sempre que abre um handle, é só especificar o que tem
>>>> >> >> dentro
>>>> >> >> que...
>>>> >> >> Aí que está o MEU problema: nunca sei de antemão o que tem dentro :P
>>>> >> >> A solução mais viável que encontrei até agora foi:
>>>> >> >>
>>>> >> >>         my $buf;
>>>> >> >>
>>>> >> >>
>>>> >> >>         eval {
>>>> >> >>                 open(TXT, '<', $file) or die "impossivel abrir
>>>> >> >> $file:
>>>> >> >> $!";
>>>> >> >>
>>>> >> >>
>>>> >> >>                 binmode TXT, ':bytes';
>>>> >> >>                 local $/ = undef;
>>>> >> >>
>>>> >> >>
>>>> >> >>                 $buf = <TXT>;
>>>> >> >>                 close TXT;
>>>> >> >>
>>>> >> >>
>>>> >> >>         };
>>>> >> >>
>>>> >> >>         my $iconv = new Text::Iconv(detect_utf8($buf) ? 'utf-8' :
>>>> >> >> 'iso-8859-1', 'utf-8');
>>>> >> >>
>>>> >> >>
>>>> >> >>         $buf = $iconv->convert($buf);
>>>> >> >>
>>>> >> >>
>>>> >> >>         Encode::_utf8_on($buf);
>>>> >> >>
>>>> >> >> Explicando: abro o arquivo do jeito "cru", sem nenhuma codificação.
>>>> >> >> Carrego
>>>> >> >> o conteúdo no buffer. Aí uso Text::Iconv para converter a
>>>> >> >> codificação.
>>>> >> >> Detalhe importantíssimo: mesmo que os dados já estejam em UTF-8,
>>>> >> >> ainda
>>>> >> >> assim
>>>> >> >> precisa aplicar o Text::Iconv. E ainda não acabou: Perl não
>>>> >> >> reconhece o
>>>> >> >> buffer como algo que tenha codificação UTF-8 até que eu force o flag
>>>> >> >> UTF-8.
>>>> >> >> Pronto! Depois disso tudo, $buf é um autêntico UTF-8. Posso dar uc()
>>>> >> >> que "ã"
>>>> >> >> vira "Ã", e /\w/ pega os acentos também.
>>>> >> >> Aqui está o código completo: http://tinypaste.com/c3680
>>>> >> >> A pergunta é: existe alguma maneira menos ineficiente de se fazer
>>>> >> >> isto?
>>>> >> >>
>>>> >> >> ABS()
>>>> >> >>
>>>> >> >>
>>>> >> >> _______________________________________________
>>>> >> >> SaoPaulo-pm mailing list
>>>> >> >> SaoPaulo-pm at pm.org
>>>> >> >> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > "If you’ve never written anything thoughtful, then you’ve never had
>>>> >> > any difficult, important, or interesting thoughts. That’s the secret:
>>>> >> > people who don’t write, are people who don’t think."
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> "If you’ve never written anything thoughtful, then you’ve never had
>>>> >> any difficult, important, or interesting thoughts. That’s the secret:
>>>> >> people who don’t write, are people who don’t think."
>>>> >> _______________________________________________
>>>> >> SaoPaulo-pm mailing list
>>>> >> SaoPaulo-pm at pm.org
>>>> >> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > SaoPaulo-pm mailing list
>>>> > SaoPaulo-pm at pm.org
>>>> > http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> "If you’ve never written anything thoughtful, then you’ve never had
>>>> any difficult, important, or interesting thoughts. That’s the secret:
>>>> people who don’t write, are people who don’t think."
>>>> _______________________________________________
>>>> SaoPaulo-pm mailing list
>>>> SaoPaulo-pm at pm.org
>>>> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>>
>>>
>>> _______________________________________________
>>> SaoPaulo-pm mailing list
>>> SaoPaulo-pm at pm.org
>>> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>>
>>
>>
>>
>> --
>> "If you’ve never written anything thoughtful, then you’ve never had
>> any difficult, important, or interesting thoughts. That’s the secret:
>> people who don’t write, are people who don’t think."
>>
>
>
>
> --
> "If you’ve never written anything thoughtful, then you’ve never had
> any difficult, important, or interesting thoughts. That’s the secret:
> people who don’t write, are people who don’t think."
>



-- 
"If you’ve never written anything thoughtful, then you’ve never had
any difficult, important, or interesting thoughts. That’s the secret:
people who don’t write, are people who don’t think."


More information about the SaoPaulo-pm mailing list