[Philadelphia-pm] Unicode BOM in input files
James E Keenan
jkeenan at pobox.com
Tue Oct 27 13:21:40 PDT 2020
On 10/27/20 3:45 PM, Eric Roode wrote:
> Hello fellow mongers!
>
> Today I opened and read a file. Advanced stuff, right? :-)
>
> open my $fh, '<', 'file.dat';
> $line = <$fh>;
> if ($line =~ /^Your data:/) ....
>
>
> The problem is that the input file has a Unicode BOM (byte-order
> mark), so the first three bytes of the string are in fact 0xEF, 0xBB,
> and 0xBF. So the match fails, even though if you look at the file in an
> editor, it looks like it begins with "Your data". It took me a fair
> amount of time to figure this out.
>
Yes, this is annoying. I have encountered the problem before, in the
form of a bug report for my CPAN distro Text-CSV-Hashify:
https://rt.cpan.org/Ticket/Display.html?id=130048
If you read that ticket, you will appreciate some of the complexities in
this issue. Unfortunately, I haven't had time to develop a solution --
magical, automagical or otherwise.
Thank you very much.
Jim Keenan
More information about the Philadelphia-pm
mailing list