[Omaha.pm] Hello Perl Gurus

Sterling Hanenkamp sterling at hanenkamp.com
Fri Nov 13 11:53:55 PST 2015


The short answer is that you are working with UTF-8 formatted files and the
first record in your file contains 3 extra bytes (called the Byte Order
Mark or BOM) to mark the file as Unicode. However, you are not telling Perl
to treat them as UTF-8 files. You need to either:

   1. Save your files in ASCII (which will probably break hospitals.csv
   since one of the hospital names contains a Unicode character).
   2. Tell Perl to read the files as UTF-8.

To do #2, you just can change the file open lines to:

open (HOSPITALS, '<:utf8', "hospitals.csv") or die $!;
...
open (PATIENTS, '<:utf8', "september.csv") or die $!;

Another option is to use File::BOM <https://metacpan.org/pod/File::BOM> to
add BOM detection to your script.

As a positive side-effect to the change suggested above, the 3-argument
version of open is safer than the 2-arg version if you ever decide to use a
variable name in your filenames (e.g., "$month.csv").

Perl has excellent Unicode support (better than most), but, for whatever
reason, it does not have built-in BOM detection for input files. (It does
detect BOM for script files it will be executing, just not for regular
input files.)

Cheers.

On Fri, Nov 13, 2015 at 11:33 AM Simons, Tony <ts-pm at tvortex.net> wrote:

> Please excuse the test message in reply to Paul's Message.  I did some
> printf's in the code to test the output.  The result Paul is seeing is only
> on the first record and it's happening in the first occurance of:
>
> my $firstDir = substr ($patientId...
>
> if I print the values of $patientId and $hospitalId before the substr the
> data appears to be correct.
>
> I also tried something since the data is numeric in nature.  I tried:
> my $patientId = int $ids[0];
>
>  which resulted in the following as an error since it's not text.
>
> M-oM-;M-?514027
>
> So it appears to be something that's happening with the substr and the
> data in the file.  I see no special characters in the file itself using vi
> :set list  I also did a dos2unix on the file to make sure it's using the
> right format.   I have read that there are problems with perl and files in
> UTF-8 format.  Is that a potential problem?
> _______________________________________________
> Omaha-pm mailing list
> Omaha-pm at pm.org
> http://mail.pm.org/mailman/listinfo/omaha-pm

-- 
Sterling Hanenkamp
http://sterling.hanenkamp.com/stfl/
785-370-4454
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/omaha-pm/attachments/20151113/5b23f5e6/attachment-0001.html>


More information about the Omaha-pm mailing list