[Philadelphia-pm] Unicode BOM in input files
John Karr
brainbuz at brainbuz.org
Tue Oct 27 15:18:16 PDT 2020
One of my many wishes for Perl 7 is to switch to native unicode string
handling. Unfortunately, given the effort just to get strict and
warnings enabled (which I've been doing a little of and Jim Keenan a lot
of), the work to pull that off given how much would probably break in
Perl and CPAN makes it really unlikely barring a deep pocketed corporate
sponsor.
I recently discovered a trick that helps with one of the problems from
Perl not being unicode native.
If you add 'export PERL_UNICODE=AS' to your environment many wide
character errors will vanish. This can also be done by the -C switch to
Perl or adding 'binmode(STDOUT, ":utf8");' to your boilerplate.
Unfortunately changing the < in open to <:encoding(UTF-8) does not
change the way the string is read. but
$line =~ s/^\N{BOM}//; # will remove the BOM
This is all the sort of headache I want Perl to allow me to magically
and blissfully never think about.
|
|
On 10/27/20 4:21 PM, James E Keenan wrote:
> On 10/27/20 3:45 PM, Eric Roode wrote:
>> Hello fellow mongers!
>>
>> Today I opened and read a file. Advanced stuff, right? :-)
>>
>> open my $fh, '<', 'file.dat';
>> $line = <$fh>;
>> if ($line =~ /^Your data:/) ....
>>
>>
>> The problem is that the input file has a Unicode BOM (byte-order
>> mark), so the first three bytes of the string are in fact 0xEF, 0xBB,
>> and 0xBF. So the match fails, even though if you look at the file in
>> an editor, it looks like it begins with "Your data". It took me a
>> fair amount of time to figure this out.
>>
>
> Yes, this is annoying. I have encountered the problem before, in the
> form of a bug report for my CPAN distro Text-CSV-Hashify:
> https://rt.cpan.org/Ticket/Display.html?id=130048
>
> If you read that ticket, you will appreciate some of the complexities
> in this issue. Unfortunately, I haven't had time to develop a
> solution -- magical, automagical or otherwise.
>
> Thank you very much.
> Jim Keenan
> _______________________________________________
> Philadelphia-pm mailing list
> Philadelphia-pm at pm.org
> https://mail.pm.org/mailman/listinfo/philadelphia-pm
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/philadelphia-pm/attachments/20201027/11adceaa/attachment.html>
More information about the Philadelphia-pm
mailing list