[SP-pm] trabalhando com UTF-8 e ISO-8859-1 simultaneamente

Daniel de Oliveira Mantovani mantovani at perl.org.br
Mon Oct 18 19:47:26 PDT 2010


Argh, desculpa estou muitas, muitas, muitas horas sem dormir.

perl -Mutf8 -pe 'binmode STDIN, ":utf8";$_=uc' texte.txt

É disso que você precisa.

Me desculpe de novo.


2010/10/19 Daniel de Oliveira Mantovani <mantovani at perl.org.br>:
> perl -Mutf8 -pe '$_=uc' teste.txt
>
> 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>> Li sim :)
>>
>> "The following functions are defined in the utf8:: package by the Perl core.
>> You do not need to say use utf8 to use these and in fact you should not say
>> that unless you really want to have UTF-8 source code."
>>
>> Anyway, tentei fazer isso:
>> perl -pe 'utf8::encode($_);$_=uc' teste.txt
>>
>> Conforme o esperado, imprime na tela os caracteres corretos. Porém sem
>> converter acentos para maiúsculas. Vai entender :(
>>
>> ABS()
>>
>>
>>
>> 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>
>>>
>>> Você leu o manual todo ?
>>>
>>> "Converts in-place the internal octet sequence in the native encoding
>>> (Latin-1 or EBCDIC) to the equivalent character sequence in UTF-X.
>>> $string already encoded as characters does no harm.Returns the number
>>> of octets necessary to represent the string as UTF-X.Can be used to
>>> make sure that the UTF-8 flag is on, so that "\w" or "lc()" work as
>>> Unicode on strings containing characters in the range 0x80-0xFF (on
>>> ASCII
>>> and derivatives)."
>>>
>>>
>>> 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>>> > Infelizmente...
>>> >
>>> > http://perldoc.perl.org/utf8.html
>>> > Do not use this pragma for anything else than telling Perl that your
>>> > script
>>> > is written in UTF-8.
>>> >
>>> > A minha referência atual sobre Perl e UTF-8 é esta (original em russo,
>>> > não a
>>> > tradução):
>>> >
>>> > http://translate.google.com/translate?hl=en-US&sl=ru&tl=en&u=http%3A%2F%2Fxpoint.ru%2Fknow-how%2FPerl%2FPodderzhkaUnicode
>>> >
>>> > ABS()
>>> >
>>> >
>>> >
>>> > 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>
>>> >>
>>> >> 2010/10/18 Daniel de Oliveira Mantovani <mantovani at perl.org.br>:
>>> >> <code>
>>> >>  my $text;{$/=$\;$text=<>};
>>> >>  sub do_what_I_want {return uc(@_)};
>>> >>  when (detect_utf8($buf)) {
>>> >>     {
>>> >>        require utf8;
>>> >>        do_what_I_want(...)
>>> >>     }
>>> >>  }
>>> >>
>>> >>  { do_what_I_want(...) }
>>> >> </code>
>>> >>
>>> >> Agora sim.
>>> >>
>>> >> >
>>> >> > /me ;)
>>> >> >
>>> >> >
>>> >> > Procura no StackOverflow por Perl e codificação, o briand d foy deu
>>> >> > uma explicação bem útil.
>>> >> >
>>> >> > 2010/10/18 Stanislaw Pusep <creaktive at gmail.com>:
>>> >> >> Tenho certeza de que o assunto foi levantado várias vezes na lista,
>>> >> >> então,
>>> >> >> ATENÇÃO: o Perl tem excelentes mecanismos para tratar I/O em
>>> >> >> diversas
>>> >> >> codificações da maneira mais prática possível. Por exemplo, dá para
>>> >> >> pegar
>>> >> >> arquivo em ISO-8859-1 do STDIN e jogar para STDOUT em UTF-8, isso é
>>> >> >> canja de
>>> >> >> galinha. Sempre que abre um handle, é só especificar o que tem
>>> >> >> dentro
>>> >> >> que...
>>> >> >> Aí que está o MEU problema: nunca sei de antemão o que tem dentro :P
>>> >> >> A solução mais viável que encontrei até agora foi:
>>> >> >>
>>> >> >>         my $buf;
>>> >> >>
>>> >> >>
>>> >> >>         eval {
>>> >> >>                 open(TXT, '<', $file) or die "impossivel abrir
>>> >> >> $file:
>>> >> >> $!";
>>> >> >>
>>> >> >>
>>> >> >>                 binmode TXT, ':bytes';
>>> >> >>                 local $/ = undef;
>>> >> >>
>>> >> >>
>>> >> >>                 $buf = <TXT>;
>>> >> >>                 close TXT;
>>> >> >>
>>> >> >>
>>> >> >>         };
>>> >> >>
>>> >> >>         my $iconv = new Text::Iconv(detect_utf8($buf) ? 'utf-8' :
>>> >> >> 'iso-8859-1', 'utf-8');
>>> >> >>
>>> >> >>
>>> >> >>         $buf = $iconv->convert($buf);
>>> >> >>
>>> >> >>
>>> >> >>         Encode::_utf8_on($buf);
>>> >> >>
>>> >> >> Explicando: abro o arquivo do jeito "cru", sem nenhuma codificação.
>>> >> >> Carrego
>>> >> >> o conteúdo no buffer. Aí uso Text::Iconv para converter a
>>> >> >> codificação.
>>> >> >> Detalhe importantíssimo: mesmo que os dados já estejam em UTF-8,
>>> >> >> ainda
>>> >> >> assim
>>> >> >> precisa aplicar o Text::Iconv. E ainda não acabou: Perl não
>>> >> >> reconhece o
>>> >> >> buffer como algo que tenha codificação UTF-8 até que eu force o flag
>>> >> >> UTF-8.
>>> >> >> Pronto! Depois disso tudo, $buf é um autêntico UTF-8. Posso dar uc()
>>> >> >> que "ã"
>>> >> >> vira "Ã", e /\w/ pega os acentos também.
>>> >> >> Aqui está o código completo: http://tinypaste.com/c3680
>>> >> >> A pergunta é: existe alguma maneira menos ineficiente de se fazer
>>> >> >> isto?
>>> >> >>
>>> >> >> ABS()
>>> >> >>
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> SaoPaulo-pm mailing list
>>> >> >> SaoPaulo-pm at pm.org
>>> >> >> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > "If you’ve never written anything thoughtful, then you’ve never had
>>> >> > any difficult, important, or interesting thoughts. That’s the secret:
>>> >> > people who don’t write, are people who don’t think."
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> "If you’ve never written anything thoughtful, then you’ve never had
>>> >> any difficult, important, or interesting thoughts. That’s the secret:
>>> >> people who don’t write, are people who don’t think."
>>> >> _______________________________________________
>>> >> SaoPaulo-pm mailing list
>>> >> SaoPaulo-pm at pm.org
>>> >> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>> >
>>> >
>>> > _______________________________________________
>>> > SaoPaulo-pm mailing list
>>> > SaoPaulo-pm at pm.org
>>> > http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>> >
>>>
>>>
>>>
>>> --
>>> "If you’ve never written anything thoughtful, then you’ve never had
>>> any difficult, important, or interesting thoughts. That’s the secret:
>>> people who don’t write, are people who don’t think."
>>> _______________________________________________
>>> SaoPaulo-pm mailing list
>>> SaoPaulo-pm at pm.org
>>> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>
>>
>> _______________________________________________
>> SaoPaulo-pm mailing list
>> SaoPaulo-pm at pm.org
>> http://mail.pm.org/mailman/listinfo/saopaulo-pm
>>
>
>
>
> --
> "If you’ve never written anything thoughtful, then you’ve never had
> any difficult, important, or interesting thoughts. That’s the secret:
> people who don’t write, are people who don’t think."
>



-- 
"If you’ve never written anything thoughtful, then you’ve never had
any difficult, important, or interesting thoughts. That’s the secret:
people who don’t write, are people who don’t think."


More information about the SaoPaulo-pm mailing list