[Chicago-talk] Validating utf-8.

Jonathan Rockway jon-chicagotalk at jrock.us
Fri Oct 3 08:19:11 PDT 2008


* On Fri, Oct 03 2008, Elliot Shank wrote:
> Elliot Shank wrote:
>> Using the built-in IO layers seems to hide problems, i.e.
>>
>>    open my $handle, '<:utf8', $file
>>
>> doesn't work.  If I feed that a binary file which is plainly not utf-8, perl blithely reads the file without complaint.
>
> Well, not without warnings, but I don't really want to hook $SIG{__WARN__} looking for specific strings, which is pretty fragile.

If you use Encode::decode directly, you can specify exactly how to
handle errors:

  http://search.cpan.org/~dankogai/Encode-2.26/Encode.pm#Handling_Malformed_Data

I think:

  my $string = Encode::decode('utf-8', $octets, Encode::FB_CROAK)

will do what you want.

Regards,
Jonathan Rockway

--
print just => another => perl => hacker => if $,=$"


More information about the Chicago-talk mailing list