Code Check

McLean, Grant grant.mclean at bearingpoint.com
Mon Apr 28 20:09:58 CDT 2003


James Eaton wrote
> http://forsale.orcon.net.nz/gm-rss0.3.0.cgi
> 
> > > Being a novice to Perl and CGI programming is there 
> > > someone out there who would volunteer to check out
> > > my CGI program.

I've just made a brief pass through it and it looks pretty 
good to me - especially for a 'novice'.  Here are a few
random thoughts that struck me as I read through.  As they
are all style related and style is a matter of personal
taste, I won't be in the least offended if you wish to
ignore or debate them.

You have this code to enable debugging:

  if (defined($ARGV[0])) {
    if ($ARGV[0] eq "-d") {
      print "DEBUG - SET<br>\n";
      $debug = $true;
    }
  }

which could be collapsed into:

  if(@ARGV  and  $ARGV[0] eq "-d") {
    print "DEBUG - SET<br>\n";
    $debug = $true;
  }

or even possibly:

  my $debug = (@ARGV and $ARGV[0] eq "-d") and print "DEBUG - SET<br>\n";

Presumably, $ARGV[0] will only be set when the script is
invoked from the command line.  An alternative approach
would be to always enable debugging when the script is run 
interactively (STDOUT is attached to a TTY):

  my $debug = (-t STDOUT);


You have a block of code that looks like this:

  my $dest_name = $prefs{"dest_name"};
  my $title = $prefs{"title"};
  my $description = $prefs{"description"};
  my $language = $prefs{"language"};
  my $copyright = $prefs{"copyright"};
  my $webmaster_name = $prefs{"nameofwebmaster"};

I find it easier to read this sort of stuff if the '='
symbols are vertically aligned.  Also, as you're aware, it's 
not (always) necessary to quote 'barewords' used as hash keys.
So you could rewrite that as:

  my $dest_name      = $prefs{dest_name};
  my $title          = $prefs{title};
  my $description    = $prefs{description};
  my $language       = $prefs{language};
  my $copyright      = $prefs{copyright};
  my $webmaster_name = $prefs{nameofwebmaster};

Having said that though, do you really need to copy the hash 
values out into individual variables anyway?  Why not just
change your code from this style:

  $rss->channel(
    title           => $title,
    'link'          => $link,
    description     => $description,
    image           => $image,

To say this:

  $rss->channel(
    title           => $prefs{title},
    link            => $link,
    description     => $prefs{description},
    image           => $prefs{image},

That would enable you to throw away a dozen lines of code.
(Throwing away code feels soooo good).


Your config file parsing code is also re-inventing the wheel
somewhat.  I'll refrain from suggesting that you could do it
all in two lines of code with XML::Simple (oops it slipped
out) and suggest a few other worthy alternatives

  http://search.cpan.org/dist/Config-IniFiles/
  http://search.cpan.org/dist/Config-Properties/
  http://search.cpan.org/dist/YAML/


I noticed this use of a variable:

  my $i = 1;
  # 150 lines snipped
  until ($i > $numberofitems) {

You are effectively using $i as a global variable which
means that the place where it is declared and initialised
is a fair distance from where it is used (the same could
be said for a number of your other variables).  I would be
inclined to declare it and initialise it in the same
place.  Eg:

  for my $i (1..$numberofitems) {


I notice you're using the eof function to detect the end of
file.  This raised a red flag for me.  In Perl it is almost
never necessary to explicitly check for end of file and there
are some subtle gotchas that can make using it undesirable.  
I'd be inclined to change the whole loop to something like:

  my $i = 1;
  while(my $eline = <ENTRYLIST>) {
    last if($i++ > $numberofitems);

    # your stuff here
  }


You have this snippet:

  READ:  while (<ITEMFILE>) {
    $eline = $_;
    if ($eline =~ /span class="rss:item"/) {
      $span_complete = $false;
      foreach $eline (<ITEMFILE>) {
        if ($eline =~ /\/span/) {
          $span_complete = $true;
          last READ;
        }

The first two lines could be combined:

  READ:  while ($eline = <ITEMFILE>) {

and reading from the same filehandle using both while
and foreach is a bit odd.  This line:

  foreach $eline (<ITEMFILE>) {

actually slurps in all the remaining lines from ITEMFILE,
but you then go on to ignore any files which follow the 
closing </span>.  If you used a while loop rather than a 
foreach then you wouldn't waste cycles reading in those 
trailing lines.


Some people would say that if you're going to parse HTML
then you should do it with a parser module like:

  http://search.cpan.org/dist/HTML-Parser/

I use regexes on HTML all the time and as long as you're
aware of the potential traps then it's a reasonable enough
approach (especially when you have to work with badly formed 
HTML).  One interesting alternative approach is to use a 
module like XML::LibXML (which can read HTML) to parse the 
HTML into a DOM tree then you can use XPath expressions to 
select and extract the data you're interested in.


Well, that's enough from me.  I'll be interested to see what 
others thought.

Regards
Grant





More information about the Wellington-pm mailing list