Script Question

Craig Freter cfreter at digex.net
Thu Aug 26 14:19:16 CDT 1999


James,

I modified your student parser script.  I used a different approach, in
that I try to match the entire student line with a single regular
expression.  I don't know if that makes the script more 'perlish', but
you might find my approach interesting.

"James W. Sandoz; (BIO;FAC)" wrote:
> OK.  I've written a script which parses a class list sent by our
> registrar.  The email is exported and then parsed so that the students can
> be entered into a spreadsheet (It's 'prettified' regarding case and each
> field becomes comma-separated).  The class list can contain up to 400
> students, but in my case contains fewer.
> The script works just fine, but it looks awful (that is, not 'perlish').
> It's evolved over the past half year. I probably should re-write it from
> scratch, but that's a later problem.
> 
> If anyone has suggestions I'd appreciate them.  Below is a typical class
> list email (i hope linewrapping doesn't interfere. Each new line begins
> with the SSN). and below that is the perl script.
> 
> Email from registrar:
> 
> >From classlists at umbc.edu Thu Aug 12 21:22:56 1999
> Date: Thu, 12 Aug 1999 14:57:00
> From: classlists at umbc.edu
> To: sandoz at umbc.edu
> Cc: dina at umbc.edu
> Subject: Class List for Fall 1999 BIOL302L0402
> 
> 123-45-6789  APPLEJAK, ABBLE                                                       SOPHOMORE     BIOL       Reg   2.00  301-555-5555 applej1 at umbc.edu
> 321-54-9876  EINSTEIN, ALBERT I.                                                   JUNIOR        BIOL       Reg   2.00  410-555-1234 aeinst1 at umbc.edu
> 111-22-3333  KUBLE-KAHN, KRIS K.                                                   JUNIOR        VPAV/BIOL  Reg   2.00  301-555-2222 kkuble1 at umbc.edu
> 
> Script:
> #!/usr/local/bin/perl5 -wi.bak
> 
> #=============================
> #"format_classlist" by JW Sandoz, Department of Biology, UMBC
> # August 25, 1999
> # Normal disclaimers: Worked fine for me.  Should for you. No guarantees,
> # though.
> #=============================
> 
> # This script formats classlists at UMBC as mailed through EASI/myUMBC
> # to a format more easily parsed into a spreadsheet.
> # One needs to export the email to a file in home directory (easily done
> # in Pine with 'export').  Then type
> # "parse_classlist <filename_of_classlist_that_you_exported>"
> # The parsed file contains all the fields that the email class list
> # holds, EXCEPT the SSN is parsed to the last four digits (I use these as
> # the Password for the student).
> # In addition, the umbc username is parsed so that the unique id is
> # captured in the field "Login ID".  The entire email address is retained
> # as well.
> 
> $x = shift;
> unshift @ARGV, $x;  # capturing filename and then returning it to argv
> 
> while (<>) {
>         if (m/[A-Z]+\d\d\d[A-Z]?(\d\d\d\d)/) {$sect = $1}# capture sect
>                 #else {}
>         local $_  = lc();       # lowercases everything
>         s/^\D.*//g;             # removes lines without starting number
>         s/^\s*$//;                      # removes blank lines
>         s/\d\d\d-\d\d-//g;      # leaves last 4 digits of ssn
>         s/(\w+.*,)\s(\w+\.?)?(\s+)(\w)/$1$2$3$4/; #capturing names
>         s/\s{2,}\b/,/g; #replace multiple space with ','
>         s/\s+,/,/g;     #remove spaces before existing ','s
>         s/(\d)\s/$1,/;  #puts a comma at the end of the phone number
>         s/  / /;                #removes one of the spaces if two exist
>         s/(_{9,12}) /$1,/; #puts comma at end of dashes (no phone #)
>         s/([-])([a-z])/$1\u$2/g; #Caps second (hyphenated) name
>         s/ (\w)/ \u$1/; #removes leading space from MI and uppercases it
>         s/(\w+)(\@.*)/\l$1,\l$1$2/; #separate username into Login ID + username
>         if ($1) {
>         print "$sect," . $_;  # prints changes to file
>         }               #prepends the section number to each student's record. Useful
>                         #if a course has more than one section.
> }
> 
> open (FILE, "$x") or die $!;
>         @a = <FILE>;
> close FILE;
> 
> #the following prettifies the text: leading caps instead of all caps
> 
> @proper = map {(my $y = $_) =~ s/\,(.)/\,\u$+/g; $y } @a;
>         #oops. It uppercases the username as well.
> @proper2 = map {(my $y = $_) =~ s/\,([A-Z]\w{1,6})\,([A-Z]\w{1,6}\@)/\,\l$1,\l$2/; $y } @proper;
> # @proper2 fixes (lowercases) username
> 
> unshift @proper2, ('Sect,','SSN,','Last Name,','First Name,','Standing,','Major,','Grade_Method,','Credits,','Phone,','Login ID,','email',"\n");
> # above adds a heading for each field
> 
> open (FH2, ">$x") or die $!;
>         print FH2 @proper2; # writes to the file
> close FH2;
> 
> Mr. James W. Sandoz, Instructor, UMBC Dept of Biol Sciences,
>                                  1000 Hilltop Circle
>                                  Catonsville, MD 21250
> voice: (410) 455-3497; fax: 455-3875; net: sandoz at umbc.edu

-- 
All that is complex is not useful,
and all that is useful is simple.
                       -- Mikhail Kalashnikov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test2.pl
Type: application/x-perl
Size: 1712 bytes
Desc: not available
Url : http://mail.pm.org/archives/baltimore-pm/attachments/19990826/d1fa4c42/test2.bin


More information about the Baltimore-pm mailing list