<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>One of my many wishes for Perl 7 is to switch to native unicode

      string handling. Unfortunately, given the effort just to get

      strict and warnings enabled (which I've been doing a little of and

      Jim Keenan a lot of), the work to pull that off given how much

      would probably break in Perl and CPAN makes it really unlikely

      barring a deep pocketed corporate sponsor.</p>

    <p>I recently discovered a trick that helps with one of the problems

      from Perl not being unicode native. <br>

    </p>

    <p>If you add 'export PERL_UNICODE=AS' to your environment many wide

      character errors will vanish. This can also be done by the -C

      switch to Perl or adding 'binmode(STDOUT, ":utf8");' to your

      boilerplate. <br>

    </p>

    <p>Unfortunately changing the < in open to <:encoding(UTF-8)

      does not change the way the string is read. but <br>

    </p>

    <p>    $line =~ s/^\N{BOM}//;  # will remove the BOM</p>

    <p>This is all the sort of headache I want Perl to allow me to

      magically and blissfully never think about.<br>

    </p>

    <code class="prettyprint prettyprinted" style=""><span class="pln"><br>

      </span><span class="pun"></span></code>

    <div class="moz-cite-prefix">On 10/27/20 4:21 PM, James E Keenan

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:c57cef27-dc8b-fcab-6147-cdd6225a1ca1@pobox.com">On

      10/27/20 3:45 PM, Eric Roode wrote:

      <br>

      <blockquote type="cite">Hello fellow mongers!

        <br>

        <br>

             Today I opened and read a file.  Advanced stuff, right? 

        :-)

        <br>

        <br>

            open my $fh, '<', 'file.dat';

        <br>

            $line = <$fh>;

        <br>

            if ($line =~ /^Your data:/) ....

        <br>

        <br>

        <br>

             The problem is that the input file has a Unicode BOM

        (byte-order mark), so the first three bytes of the string are in

        fact 0xEF, 0xBB, and 0xBF.  So the match fails, even though if

        you look at the file in an editor, it looks like it begins with

        "Your data".  It took me a fair amount of time to figure this

        out.

        <br>

        <br>

      </blockquote>

      <br>

      Yes, this is annoying.  I have encountered the problem before, in

      the form of a bug report for my CPAN distro Text-CSV-Hashify:

      <br>

      <a class="moz-txt-link-freetext" href="https://rt.cpan.org/Ticket/Display.html?id=130048">https://rt.cpan.org/Ticket/Display.html?id=130048</a>

      <br>

      <br>

      If you read that ticket, you will appreciate some of the

      complexities in this issue.  Unfortunately, I haven't had time to

      develop a solution -- magical, automagical or otherwise.

      <br>

      <br>

      Thank you very much.

      <br>

      Jim Keenan

      <br>

      _______________________________________________

      <br>

      Philadelphia-pm mailing list

      <br>

      <a class="moz-txt-link-abbreviated" href="mailto:Philadelphia-pm@pm.org">Philadelphia-pm@pm.org</a>

      <br>

      <a class="moz-txt-link-freetext" href="https://mail.pm.org/mailman/listinfo/philadelphia-pm">https://mail.pm.org/mailman/listinfo/philadelphia-pm</a>

      <br>

    </blockquote>

  </body>

</html>