[tpm] Manipulating utf8 strings with Perl

Vinny Alves vinny at usestrict.net
Tue May 8 12:27:38 PDT 2012


Do you absolutely need to work on the hex strings? If not, using binmode
should make your oneliner read the input as UTF-8.

perl -e 'binmode(STDIN,":encoding(UTF-8)"); while(<>){
*s/*<s:Body>*/*<s:Body xmlns:a=..

.>*/** *}' < myfile.utf8

Vinny
http://cronblocks.com


On Tue, May 8, 2012 at 11:21 AM, Antonio Sun <antoniosun at lavabit.com> wrote:

> Hi,
>
> I want to do (one-line) replacement with Perl on utf8 strings.
>
> Here is the hex dump of what exactly the utf8 strings looks like:
>
> cat myfile.utf8 | od -t x1 | head -3
> 0000000 3c 00 73 00 3a 00 45 00 6e 00 76 00 65 00 6c 00
> 0000020 6f 00 70 00 65 00 20 00 78 00 6d 00 6c 00 6e 00
> 0000040 73 00 3a 00 73 00 3d 00 22 00 68 00 74 00 74 00
>
> I.e., each character takes 2 bytes.
>
> I don't know how to strip the high byte (please help), but this is what
> the strings actually is:
>
> <s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
>   <s:Body>
>    . . .
>
> I just want to do some string manipulations with one-line Perl, e.g.:
>
>  cat myfile.utf8 | *perl -p000e ‘s/*<s:Body>*/*<s:Body xmlns:a=...>*/'*
> *
> *
> *How can I do that? *
> *
> *
> *Thanks*
> *
> *
>
>
> _______________________________________________
> toronto-pm mailing list
> toronto-pm at pm.org
> http://mail.pm.org/mailman/listinfo/toronto-pm
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/toronto-pm/attachments/20120508/948ee831/attachment.html>


More information about the toronto-pm mailing list