[Chicago-talk] Script comments

Steven Lembark lembark at wrkhors.com
Tue Jan 17 15:20:03 PST 2006


> 
>    1  #!/usr/local/bin/perl
>    2  #
> ========================================================================
> ======

Using blocks of '#' survive wraps without blowing up the code.

>    4  # backup.pl

Perl Library?

> ------------------------------------------------------------------------
> ------
>   71  #                        G L O B A L   V A R I A B L E S
>   72  #
> ------------------------------------------------------------------------
> ------
>   73  my $DEBUG       = 0; 
>   74  my $DEBUG_LEVEL = 1;
>   75  my $VERBOSE     = 0;
>   76  my $LOGFILE     = "$FindBin::Bin/$NAME.log"; 
>   77  
>   78  my $TESTMODE      = 0;
>   79  my $TYPE        = "full";
>   80  my $REPTDEST    = "systems@\lists.chicagogsb.edu";
>   81  my $SMTPHOST    = "localhost";
>   82  my $SMTPFROM    = "IMS_Backup";
>   83  
>   84  my $STARTLETTER   = "A";
>   85  my $ENDLETTER   = "Z";
>   86  
>   87  my $REGFILE     = "/etc/msgregistry.inf";
>   88  my $HOSTNAME    = hostname;
>   89  
>   90  my $IMSROOT       = get_ims_root($REGFILE);
>   91  my $BKPFILE     = $IMSROOT . "/" . "msg-$HOSTNAME" . "/" .
> "backup-groups.conf";
>   92  my $IMSBKPCMD   = $IMSROOT . "/bin/msg/store/bin/imsbackup";
>   93  
>   94  my $BKPDEST     = "/mnt/imsbkvol";
>   95  
>   96  my $NUMPROCS      = 4;
>   97  
>   98  
>   99  # used for statistics
>  100  my %JOB_START;
>  101  my %JOB_END;
>  102  my %JOB_SIZE;

Lexicals aren't "global", the farthest out they can go
is the package they live in (vs. global variables that
can be accessed from everywhere).  

You might also want to initialize the lists to () for
prettier debugging and some easier checks.

> ------
>  106  #                   C O M M A N D   L I N E   O P T I O N S
>  107  #
> ------------------------------------------------------------------------

Ditto the suggestion for Pod::Usage.

> ------
>  168  #                              V A R I A B L E S
>  169  #
> ------------------------------------------------------------------------

Why not put these in the same place as your package variables? If it
looks like one place in the code defines the package stuff it makes
finding things harder if they aren't in it (and the Getopt::Long 
block is long).

> ------
>  170  
>  171  # upper case the start and end letters
>  172  $STARTLETTER    = uc($STARTLETTER);
>  173  $ENDLETTER      = uc($ENDLETTER);
>  174  
>  175  my $name        = "main()";

Check out the use of "caller" in a list context. The
$subroutine argument saves you from setting this.

    With EXPR, it returns some extra information that the debugger uses to
print a stack trace.  The value of EXPR indicates how many call
    frames to go back before the current one.

    ($package, $filename, $line, $subroutine, $hasargs,
    $wantarray, $evaltext, $is_require, $hints, $bitmask) = caller($i);


>  191  unless ( $TESTMODE ) {
>  192      open( STDOUT, ">$LOGFILE" ) or die "Unable to dup STDOUT:
> ($!)\n";
>  193      open( STDIN,  "/dev/null" ) or die "Can't read /dev/null:
> ($!)";
>  194      open( STDERR, ">>&STDOUT" ) or die "Can't dup stdout: ($!)";

Might want to start using the three-argument version. It'll
make the second open a bit more obvious. 

Q: Why bother to open /dev/null? Just close STDIN if you
   want to ignore it -- saves flushing it, above.

You can also save the "or die..." checks via:

    use Fatal qw( open );

which converts failure to an exception for you automatically.

Closing flushes the handle automatically, which a re-open
will do for you.

That leaves you with:

    use Fatal qw( open );

    open my $logfh, '>', $logpath;

    # alive at this point means the log path is open 
    # and error messages no longer need go to the 
    # original file handles to be noticed.

    close STDIN;

    open STDOUT, '>>&' , $logfh; # double-check the exact syntax
    open STDERR, '>>&' , $logfh; # double-check the exact syntax
    

It'd probably be cleaner to create the anon-sub's 
as variables first and pass them in as:

    $forker->foo( $job_start );

    $forker->bar( $job_finish );

Putting them at the top in the package variables
section also saves anyone from having to wander
this far down the code in order to change some 
part of them (e.g., adding logging later on).

    ############################################################
    # package variables
    ############################################################

    ############################################################
    # passed to Parllel::Forkatosis as init, cleanup
    # handlers.

    my %fork_handler_subz = 
    (
        run_on_start =>
        sub
        {
            ...
        },

        run_on_finish =>
        sub
        {
            ...
        },

    );

    ...

    # deep in the bowels of your code where noone needs
    # to be mucking around you can stuff them up the 
    # forkatosis handler and go along with life.

    $forker->$_( $fork_handler_subz{ $_ } )
    for keys %fork_handler_subz;

>  220  # create the fork manager object
>  221  my $forker = new Parallel::ForkManager($NUMPROCS);
>  222  
>  223  # callback for each child start
>  224  $forker->run_on_start( sub {
>  225                                  my ($pid, $group) = @_;
>  226                                  logmsg("$group: Started $group,
> pid = $pid");
>  227                                  $JOB_START{$group} = time();
>  228                              });
>  229  
>  230  # callback for each child finish
>  231  $forker->run_on_finish( sub { 
>  232                                  my ($pid, $exit_code, $group) =
> @_;
>  233                                  logmsg("$group: Finished $group,
> pid = $pid, exit code = $exit_code");
>  234                                  $JOB_END{$group} = time();
>  235  
>  236                                  # if the dump file is present save
> the file size
>  237                                  # otherwise use 0 as the size
> (file isn't there)
>  238                                  if ( -f "$BKPDEST/$group" ) {
>  239                                      my $f =
> stat("$BKPDEST/$group");
>  240                                      $JOB_SIZE{$group} = $f->size;
>  241                                      logmsg("$group: size => " .
> $f->size);
>  242                                  } else {
>  243                                      $JOB_SIZE{$group} = 0;
>  244                                  }
>  245                              });
>  246  
>  247  
>  248  
>  249  ###
>  250  # Start children, keep specified number of children running
>  251  ###
>  252  foreach my $letter (@letters) {
>  253      my $group = "group" . $letter;
>  254      logmsg("forker(): working on letter $letter");
>  255  
>  256      my $pid = $forker->start($group) and next;
>  257  
>  258      my $cmd;
>  259      $cmd = $IMSBKPCMD;
>  260      $cmd .= " " . "-f-";
>  261      $cmd .= " " . "/$HOSTNAME/$group";
>  262      $cmd .= " " . ">";
>  263      $cmd .= " " . "$BKPDEST" . "/" . $group;

This might be a bit less error prone using with a list,
which can be passed directly to system (see perldoc -f
system for reasons why). It'd also be easier
to read if you were consistent using "$foo/$bar" vs. 
$foo . '/' . $bar.

    my $group_path = "$BKPDEST/$group";

    my @cmd = 
    (
        $imsbkpcmd,
        '-f-',
        "/$hostname/$group",
        '>',
        "$bkpdest/$group"
    );

    system @cmd unless $debug_mode;

If this is getting passed to system it'll be cleaner
in most cases to just leave it as a list:

(see perldoc -f system for reasons).


>  265      debug("$name: $cmd");
>  266  
>  267      # only run the system() call outside of testmode
>  268      if ( ! $TESTMODE ) {
>  269          $rc = system($cmd);
>  270          $rc = $rc / 256;
>  271      } else {
>  272          $rc = 0;
>  273      }
>  274  
>  275      logmsg("forker(): system call returned : $rc"); 
>  276  
>  277      $forker->finish($rc, $group);  # closes process
>  278  }
>  279  
>  280  
>  281  # wait for all children to complete (avoids zombies)
>  282  logmsg("$name: waiting for all children to finish");
>  283  $forker->wait_all_children();

This all reduces to:

    use Schedule::Parallel;

    sub handle_a_letter
    {
        my $letter = shift
        or croak "Bogus handle_a_letter: missing the letter";

        my @cmd = ( ... );

        system @cmd;
    }

    my @queue
    = map 
    {
        my $letter = $_;

        sub { handle_a_letter $letter }
    }
    @letterz;

    runqueue @queue and die "Failed execution...";

which deals with the waits, and dodges having to
hardwire separate pre- and post- condidtions
for you.

This might also make testing a bit easier since you
could call handle_a_letter directly to check the
results of a single backup.

>  307  open(LOGFILE, "<$LOGFILE") or die "Unable to open logfile $LOGFILE

That or just open it with a '+' and use rewind -- saves a 
possible source of erros and data loss if someone blows 
off the file before you are done processing the data.

> for read ($!)\n";
>  308  
>  309  if ( $DEBUG ) {
>  310      $sender = new Mail::Sender {

You're always better off avoiding the indirect object
calls.

Mail::Sender->new( ... )

>  329  while(<LOGFILE>) {
>  330      $sender->SendEx($_);
>  331  };

You can loose a trailing blank line this way, but
using defined will also dodge a warning.


>  357  sub logmsg {

    my $debug_prefix = '';

    sub debug
    {
        # notice the trailing space.

        $debug = 'DEBUG ';
    }

    sub logmsg
    {
        use Date::Format;

        my $message = shift
        or croak 'Bogus logmsg: missing the message';

        my $datestr
        = time2str '%m-%d-%Y %H:%M:%S', time;

        print "$debug_preifx$debug_prefix $name $datestr $message";
    }


Might be better of with "%Y-%m-%d %H:%M:%S' (i.e., ISO format)
since all the date packages can process it properly and the
stuff sorts lexically if you want to combine files later on.


-- 
Steven Lembark                                       85-09 90th Street
Workhorse Computing                                Woodhaven, NY 11421
lembark at wrkhors.com                                     1 888 359 3508


More information about the Chicago-talk mailing list