[Philadelphia-pm] Unicode BOM in input files

James E Keenan jkeenan at pobox.com
Tue Oct 27 13:21:40 PDT 2020


On 10/27/20 3:45 PM, Eric Roode wrote:
> Hello fellow mongers!
> 
>      Today I opened and read a file.  Advanced stuff, right?  :-)
> 
>     open my $fh, '<', 'file.dat';
>     $line = <$fh>;
>     if ($line =~ /^Your data:/) ....
> 
> 
>      The problem is that the input file has a Unicode BOM (byte-order 
> mark), so the first three bytes of the string are in fact 0xEF, 0xBB, 
> and 0xBF.  So the match fails, even though if you look at the file in an 
> editor, it looks like it begins with "Your data".  It took me a fair 
> amount of time to figure this out.
> 

Yes, this is annoying.  I have encountered the problem before, in the 
form of a bug report for my CPAN distro Text-CSV-Hashify:
https://rt.cpan.org/Ticket/Display.html?id=130048

If you read that ticket, you will appreciate some of the complexities in 
this issue.  Unfortunately, I haven't had time to develop a solution -- 
magical, automagical or otherwise.

Thank you very much.
Jim Keenan


More information about the Philadelphia-pm mailing list