From jandanz at bigfoot.com  Thu Apr 24 02:14:49 2003
From: jandanz at bigfoot.com (James @bigfoot.com)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS
Message-ID: <1051168498.2052.33.camel@argon>

I am writing a CGI script that will use XML::RSS. I realise that some
hosting service will either, not have this installed, or not be willing
to install it. It has been suggested that I could include XSS::RSS with
my code. Could someone please explain what is meant by this, and how I
would go about it.


-- 
====================
James J. Eaton
jandanz@bigfoot.com
====================
==


From ewen at naos.co.nz  Thu Apr 24 02:57:59 2003
From: ewen at naos.co.nz (Ewen McNeill)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS 
In-Reply-To: Message from "James @bigfoot.com" <jandanz@bigfoot.com> 
   of "24 Apr 2003 19:14:49 +1200." <1051168498.2052.33.camel@argon> 
Message-ID: <20030424075759.2CFE6AE4F5@basilica.la.naos.co.nz>

In message <1051168498.2052.33.camel@argon>, "James @bigfoot.com" writes:
>I am writing a CGI script that will use XML::RSS. I realise that some
>hosting service will either, not have this installed, or not be willing
>to install it. It has been suggested that I could include XSS::RSS with
>my code. Could someone please explain what is meant by this, and how I
>would go about it.

Without knowing what the person who suggested it was thinking it seems
to me there are a couple of possiblities
- upload the XML::RSS module files to sit in your cgi-bin directory, and
  add the appropriate directory to your perl @INC path
- textually include the contents of the XML:RSS module into your cgi-bin
  program surrounded by appropriate Package statements and the like to
  make perl happy.

Of these two the first is going to be more maintainable and the second
is going to minimise the number of files you need to upload (and, eg,
might be useful if you have to send them the script for inclusion).

Both will work only if the code is "pure perl" and doesn't need anything
compiled/adapted/etc for the machine it's running on.  With XML::RSS
based on what I guess it's doing, it's probably going to be okay.
Beware, however, of its dependencies -- it may well want an XML library
which may well not be installed...

Ewen

From ewen at naos.co.nz  Thu Apr 24 03:15:21 2003
From: ewen at naos.co.nz (Ewen McNeill)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS 
In-Reply-To: Message from James J Eaton & Annette Eaton <jandanz@bigfoot.com> 
   of "24 Apr 2003 20:06:06 +1200." <1051171567.2047.39.camel@argon> 
Message-ID: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz>

NOTE: CC'd back to the Perl Mongers list 'cause I figure others might be
interested.

In message <1051171567.2047.39.camel@argon>, James J Eaton & Annette Eaton writes:
>On Thu, 2003-04-24 at 19:57, Ewen McNeill wrote:
>> - upload the XML::RSS module files to sit in your cgi-bin directory, and
>>   add the appropriate directory to your perl @INC path
>
>I like this option, but how can a user change the @INC path for their
>hosting service? Is it easy to do, or is each site going to be different?

What I'd suggest doing is uploading it to a subdirectory of cgi-bin, and
then having the script push an appropriate relative path onto the @INC
path.

For instance, suppose that you create a "local-libs" directory under the
cgi-bin directory, and have: 

..../cgi-bin/local-libs/XML/RSS.pm

Then your cgi-bin script could contain:

push @INC, "local-libs";

and it should be fine.  This is all off the top of my head, but I think
it should work give or take:
- the @INC modifications might need to be done in a BEGIN { } block to
  ensure they happen early enough; 
- you might need to fiddle with the exact path to get the one that
  corresponds with the CGI spec.  IIRC cgi-bin programs are supposed to
  be run with the current directory of their binary directory, but I may
  be misremembering, as it's ages since I've written/installed a cgi-bin
  script that cared.

Ewen

From jandanz at bigfoot.com  Thu Apr 24 05:41:04 2003
From: jandanz at bigfoot.com (James @bigfoot.com)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS
In-Reply-To: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz>
References: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz>
Message-ID: <1051180865.2041.47.camel@argon>

On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote:
> NOTE: CC'd back to the Perl Mongers list 'cause I figure others might be
> interested.
> 
> In message <1051171567.2047.39.camel@argon>, James J Eaton & Annette Eaton writes:
> >On Thu, 2003-04-24 at 19:57, Ewen McNeill wrote:
> >> - upload the XML::RSS module files to sit in your cgi-bin directory, and
> >>   add the appropriate directory to your perl @INC path
> >
> >I like this option, but how can a user change the @INC path for their
> >hosting service? Is it easy to do, or is each site going to be different?
> 
> What I'd suggest doing is uploading it to a subdirectory of cgi-bin, and
> then having the script push an appropriate relative path onto the @INC
> path.
> 
> For instance, suppose that you create a "local-libs" directory under the
> cgi-bin directory, and have: 
> 
> ..../cgi-bin/local-libs/XML/RSS.pm

Wouldn't it be just as easy to put XML::RSS module in the cgi-bin.
Wouldn't that in the @inc path?

James
 
-- 
====================
James J. Eaton
jandanz@bigfoot.com
====================
==


From ewen at naos.co.nz  Thu Apr 24 05:56:09 2003
From: ewen at naos.co.nz (Ewen McNeill)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS 
In-Reply-To: Message from "James @bigfoot.com" <jandanz@bigfoot.com> 
   of "24 Apr 2003 22:41:04 +1200." <1051180865.2041.47.camel@argon> 
Message-ID: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz>

In message <1051180865.2041.47.camel@argon>, "James @bigfoot.com" writes:
>On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote:
>> For instance, suppose that you create a "local-libs" directory under the
>> cgi-bin directory, and have: 
>> 
>> ..../cgi-bin/local-libs/XML/RSS.pm
>
>Wouldn't it be just as easy to put XML::RSS module in the cgi-bin.
>Wouldn't that in the @inc path?

You can't just put RSS.pm in the cgi-bin directory, and expect to do
"use XML::RSS".  Even if the cgi-bin directory were in the @INC path
(which it might be either in the website's perl setup, or via "." being
in the @INC path, and it being the current directory at invocation
time), Perl is still going to want to see the directory structure.

You might be able to get away with:

..../cgi-bin/XML/RSS.pm

but personally I'd find that a bit messy if I had to deal with a bunch
of libraries "included" like that.

Ewen

From jandanz at bigfoot.com  Fri Apr 25 18:49:17 2003
From: jandanz at bigfoot.com (James @bigfoot.com)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS
In-Reply-To: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz>
References: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz>
Message-ID: <1051314557.1875.2.camel@argon>

I have just found in the Perl documentation a reference to 
use lib
<quote>
Probably the most convenient solution from your users' perspective is
for you to add a use lib pragma near the top of your script. That way
the users of the program don't need to take any special action to run
your program. Imagine a hypothetical project called Spectre whose
programs rely on its own set of libraries. Those programs could have a
statement like this at their start:

use lib "/projects/spectre/lib";
</quote>
I presume that this would be put into a Begin sub:
Begin (
     use lib "/projects/spectre/lib";
)
James
On Thu, 2003-04-24 at 22:56, Ewen McNeill wrote:
> In message <1051180865.2041.47.camel@argon>, "James @bigfoot.com" writes:
> >On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote:
> >> For instance, suppose that you create a "local-libs" directory under the
> >> cgi-bin directory, and have: 
> >> 
> >> ..../cgi-bin/local-libs/XML/RSS.pm
> >
> >Wouldn't it be just as easy to put XML::RSS module in the cgi-bin.
> >Wouldn't that in the @inc path?
> 
> You can't just put RSS.pm in the cgi-bin directory, and expect to do
> "use XML::RSS".  Even if the cgi-bin directory were in the @INC path
> (which it might be either in the website's perl setup, or via "." being
> in the @INC path, and it being the current directory at invocation
> time), Perl is still going to want to see the directory structure.
> 
> You might be able to get away with:
> 
> ..../cgi-bin/XML/RSS.pm
> 
> but personally I'd find that a bit messy if I had to deal with a bunch
> of libraries "included" like that.
> 
> Ewen
> 
-- 
===================
James J. Eaton
james@eaton.net.nz
===================
-- 
====================
James J. Eaton
jandanz@bigfoot.com
====================
==


From michael at diaspora.gen.nz  Fri Apr 25 19:22:58 2003
From: michael at diaspora.gen.nz (michael@diaspora.gen.nz)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS  
In-Reply-To: Your message of "26 Apr 2003 11:49:17 +1200."
             <1051314557.1875.2.camel@argon> 
Message-ID: <E199DT1-00055i-00@israel.diaspora.gen.nz>

>use lib "/projects/spectre/lib";
></quote>
>I presume that this would be put into a Begin sub:
>Begin (
>     use lib "/projects/spectre/lib";
>)

Sort of.  "perldoc -u lib" reveals:

    The parameters to C<use lib> are added to the start of the perl search
    path. Saying

	use lib LIST;

    is I<almost> the same as saying

	BEGIN { unshift(@INC, LIST) }

"use" is processed at compile time, rather than at run time; obviously,
the compiler needs to know what extra semantics you're importing into
your namespace, so it can generate the appropriate code.

In your case, you want to alter @INC to include your cgi-bin directory,
so you can include your own copy of XML::RSS.[0]  So you want to put:

    use lib "path/to/your/cgi-bin";

at the top of your script, which is *almost* like saying[1]:

    BEGIN { unshift(@INC, "path/to/your/cgi-bin") }

Ewen's suggesting that if you want to use lots of local modules, you
reduce the mess by creating "path/to/your/cgi-bin/local-libs", and put
the modules in there; the "use lib" statement changes in a corresponding
manner.
    -- michael.

[0] However, if you want to then write "use XML::RSS", you'll need to
    put the RSS.pm file in path/to/your/cgi-bin/XML/, to preserve the
    convention that :: means "a directory".  This is documented under
    "perldoc -f require"; "::" gets replaced with "/".  

[1] Which in turn, if you're unfamiliar with unshift (perldoc -f
    unshift), means something like:

	BEGIN { @INC = ("path/to/your/cgi-bin", @INC) }

From ewen at naos.co.nz  Fri Apr 25 19:44:15 2003
From: ewen at naos.co.nz (Ewen McNeill)
Date: Thu Aug  5 00:24:18 2004
Subject: XML::RSS 
In-Reply-To: Message from "James @bigfoot.com" <jandanz@bigfoot.com> 
   of "26 Apr 2003 11:49:17 +1200." <1051314557.1875.2.camel@argon> 
Message-ID: <20030426004415.3781FAE4F5@basilica.la.naos.co.nz>

In message <1051314557.1875.2.camel@argon>, "James @bigfoot.com" writes:
>I have just found in the Perl documentation a reference to use lib

"use lib" is (loosely) equivilent to push @INC, "directory".  (Michael's
explanation of what it means is more detailed, and more correct.)

However "use lib" will not solve your problem by itself because:
(a) you cannot upload files outside the cgi-bin directory, and 
(b) you don't necessarily know where the cgi-bin directory is anyway

Hence my suggestion of placing a sub-directory under the cgi-bin
directory, and using a relative path to it.  As I said IIRC the cgi-bin
starts with the current directory being the cgi-bin directory from where
it is run.

However feel free to use "use lib" instead of "push @INC ..." if you're
using a new enough version of perl to support it.  (5.something IIRC.)

Ewen

From jandanz at bigfoot.com  Mon Apr 28 02:31:12 2003
From: jandanz at bigfoot.com (James @bigfoot.com)
Date: Thu Aug  5 00:24:18 2004
Subject: Code Check
Message-ID: <1051515072.2033.16.camel@argon>

Being a novice to Perl and CGI programming is there someone out there
who would volunteer to check out my CGI program.

I have tested it locally and under BOA and Apache in controlled
condition and it works.

I would appreciate constructive criticism. Just let me know your email
address and I will email it to you. It is not very long only about 280
lines.

Thank you 
James 
-- 
====================
James J. Eaton
jandanz@bigfoot.com
====================
==


From douglas at katipo.co.nz  Mon Apr 28 03:16:37 2003
From: douglas at katipo.co.nz (Douglas Bagnall)
Date: Thu Aug  5 00:24:18 2004
Subject: Code Check
In-Reply-To: <1051515072.2033.16.camel@argon>
References: <1051515072.2033.16.camel@argon>
Message-ID: <3EACE365.5000602@katipo.co.nz>


hi James,

Post an url that serves the source as text/plain, and let everyone pick 
at it.

cheers,

Douglas

James @bigfoot.com wrote:
> Being a novice to Perl and CGI programming is there someone out there
> who would volunteer to check out my CGI program.
> 
> I have tested it locally and under BOA and Apache in controlled
> condition and it works.
> 
> I would appreciate constructive criticism. Just let me know your email
> address and I will email it to you. It is not very long only about 280
> lines.
> 
> Thank you 
> James 


From jandanz at bigfoot.com  Mon Apr 28 04:10:51 2003
From: jandanz at bigfoot.com (James J Eaton & Annette Eaton)
Date: Thu Aug  5 00:24:18 2004
Subject: Code Check
In-Reply-To: <3EACE365.5000602@katipo.co.nz>
References: <1051515072.2033.16.camel@argon> 
	<3EACE365.5000602@katipo.co.nz>
Message-ID: <1051521051.2033.30.camel@argon>

Ok here goes:

http://forsale.orcon.net.nz/gm-rss0.3.0.cgi

James
On Mon, 2003-04-28 at 20:16, Douglas Bagnall wrote:
> 
> hi James,
> 
> Post an url that serves the source as text/plain, and let everyone pick 
> at it.
> 
> cheers,
> 
> Douglas
> 
> James @bigfoot.com wrote:
> > Being a novice to Perl and CGI programming is there someone out there
> > who would volunteer to check out my CGI program.
> > 
> > I have tested it locally and under BOA and Apache in controlled
> > condition and it works.
> > 
> > I would appreciate constructive criticism. Just let me know your email
> > address and I will email it to you. It is not very long only about 280
> > lines.
> > 
> > Thank you 
> > James 
> 
> 
-- 
=======================
James & Annette Eaton
jandanz@bigfoot.com
=======================


From grant.mclean at bearingpoint.com  Mon Apr 28 20:09:58 2003
From: grant.mclean at bearingpoint.com (McLean, Grant)
Date: Thu Aug  5 00:24:18 2004
Subject: Code Check
Message-ID: <53B8C97B11002E49BB494F8610AE9538A16A39@kccxoex03.corp.kpmgconsulting.com>

James Eaton wrote
> http://forsale.orcon.net.nz/gm-rss0.3.0.cgi
> 
> > > Being a novice to Perl and CGI programming is there 
> > > someone out there who would volunteer to check out
> > > my CGI program.

I've just made a brief pass through it and it looks pretty 
good to me - especially for a 'novice'.  Here are a few
random thoughts that struck me as I read through.  As they
are all style related and style is a matter of personal
taste, I won't be in the least offended if you wish to
ignore or debate them.

You have this code to enable debugging:

  if (defined($ARGV[0])) {
    if ($ARGV[0] eq "-d") {
      print "DEBUG - SET<br>\n";
      $debug = $true;
    }
  }

which could be collapsed into:

  if(@ARGV  and  $ARGV[0] eq "-d") {
    print "DEBUG - SET<br>\n";
    $debug = $true;
  }

or even possibly:

  my $debug = (@ARGV and $ARGV[0] eq "-d") and print "DEBUG - SET<br>\n";

Presumably, $ARGV[0] will only be set when the script is
invoked from the command line.  An alternative approach
would be to always enable debugging when the script is run 
interactively (STDOUT is attached to a TTY):

  my $debug = (-t STDOUT);


You have a block of code that looks like this:

  my $dest_name = $prefs{"dest_name"};
  my $title = $prefs{"title"};
  my $description = $prefs{"description"};
  my $language = $prefs{"language"};
  my $copyright = $prefs{"copyright"};
  my $webmaster_name = $prefs{"nameofwebmaster"};

I find it easier to read this sort of stuff if the '='
symbols are vertically aligned.  Also, as you're aware, it's 
not (always) necessary to quote 'barewords' used as hash keys.
So you could rewrite that as:

  my $dest_name      = $prefs{dest_name};
  my $title          = $prefs{title};
  my $description    = $prefs{description};
  my $language       = $prefs{language};
  my $copyright      = $prefs{copyright};
  my $webmaster_name = $prefs{nameofwebmaster};

Having said that though, do you really need to copy the hash 
values out into individual variables anyway?  Why not just
change your code from this style:

  $rss->channel(
    title           => $title,
    'link'          => $link,
    description     => $description,
    image           => $image,

To say this:

  $rss->channel(
    title           => $prefs{title},
    link            => $link,
    description     => $prefs{description},
    image           => $prefs{image},

That would enable you to throw away a dozen lines of code.
(Throwing away code feels soooo good).


Your config file parsing code is also re-inventing the wheel
somewhat.  I'll refrain from suggesting that you could do it
all in two lines of code with XML::Simple (oops it slipped
out) and suggest a few other worthy alternatives

  http://search.cpan.org/dist/Config-IniFiles/
  http://search.cpan.org/dist/Config-Properties/
  http://search.cpan.org/dist/YAML/


I noticed this use of a variable:

  my $i = 1;
  # 150 lines snipped
  until ($i > $numberofitems) {

You are effectively using $i as a global variable which
means that the place where it is declared and initialised
is a fair distance from where it is used (the same could
be said for a number of your other variables).  I would be
inclined to declare it and initialise it in the same
place.  Eg:

  for my $i (1..$numberofitems) {


I notice you're using the eof function to detect the end of
file.  This raised a red flag for me.  In Perl it is almost
never necessary to explicitly check for end of file and there
are some subtle gotchas that can make using it undesirable.  
I'd be inclined to change the whole loop to something like:

  my $i = 1;
  while(my $eline = <ENTRYLIST>) {
    last if($i++ > $numberofitems);

    # your stuff here
  }


You have this snippet:

  READ:  while (<ITEMFILE>) {
    $eline = $_;
    if ($eline =~ /span class="rss:item"/) {
      $span_complete = $false;
      foreach $eline (<ITEMFILE>) {
        if ($eline =~ /\/span/) {
          $span_complete = $true;
          last READ;
        }

The first two lines could be combined:

  READ:  while ($eline = <ITEMFILE>) {

and reading from the same filehandle using both while
and foreach is a bit odd.  This line:

  foreach $eline (<ITEMFILE>) {

actually slurps in all the remaining lines from ITEMFILE,
but you then go on to ignore any files which follow the 
closing </span>.  If you used a while loop rather than a 
foreach then you wouldn't waste cycles reading in those 
trailing lines.


Some people would say that if you're going to parse HTML
then you should do it with a parser module like:

  http://search.cpan.org/dist/HTML-Parser/

I use regexes on HTML all the time and as long as you're
aware of the potential traps then it's a reasonable enough
approach (especially when you have to work with badly formed 
HTML).  One interesting alternative approach is to use a 
module like XML::LibXML (which can read HTML) to parse the 
HTML into a DOM tree then you can use XPath expressions to 
select and extract the data you're interested in.


Well, that's enough from me.  I'll be interested to see what 
others thought.

Regards
Grant


From michael at diaspora.gen.nz  Tue Apr 29 03:14:40 2003
From: michael at diaspora.gen.nz (michael@diaspora.gen.nz)
Date: Thu Aug  5 00:24:18 2004
Subject: Code Check  
In-Reply-To: Your message of "Tue, 29 Apr 2003 02:09:58 +0100."
             <53B8C97B11002E49BB494F8610AE9538A16A39@kccxoex03.corp.kpmgconsulting.com> 
Message-ID: <E19AQG8-0007si-00@israel.diaspora.gen.nz>

>I've just made a brief pass through it and it looks pretty 
>good to me - especially for a 'novice'.  Here are a few
>random thoughts that struck me as I read through.  As they
>are all style related and style is a matter of personal
>taste, I won't be in the least offended if you wish to
>ignore or debate them.

Please, take the following in the same spirit.  Particularly the good
for a "novice" bit; it's quite clear that you've programmed in other
languages before, and are just a Perl novice.

>Well, that's enough from me.  I'll be interested to see what 
>others thought.

I'll just focus on one bit, the loop that looks like:

    READ: while (<ITEMFILE>) {
	$eline = $_;
	if ($eline =~ /span class="rss:item"/) {
	    $span_complete = $false;
	    foreach $eline (<ITEMFILE>) {
		if ($eline =~ /\/span/) {
		    $span_complete = $true;
		    last READ;
		}		
		$eline =~ s/\&/\&amp;/g;
		$eline =~ s/\/span//g;
		$eline =~ s/<\/div>//g;
		$eline =~ s/</\&lt;/g;
		$eline =~ s/>/\&gt;/g;
		$eline =~ s/"/\&quot;/g;
		$eline =~ s/'/\&apos;/g;
		$itemline[$j] = $eline;
		$j++;
	    }		
	}
    }		

I'd be inclined to write that something like this:

    sub SPAN_NOT_SEEN { -1 }
    sub IN_SPAN       {  0 }
    sub SPAN_COMPLETE {  1 }

    ...

    $span_state = SPAN_NOT_SEEN;
    for (<ITEMFILE>) {
	if (/span class="rss:item"/ .. m!/span!) {
	    $span_state = IN_SPAN;
	    s!</div>!!g;
	    s!/span!!g;
	    s!'!&apos;!!g;
	    push @itemline, CGI::escapeHTML($_);
	    if (m!/span!) {
		$span_state = SPAN_COMPLETE; 
		last
	    }
	}
    }

    ... do something with $span_state

Several comments:

(1) I hate leaning toothpicks (the "s/\/span//g" in the original),
    and always prefer changing the delimiters.

(2) Whenever I see "$array[$counter] = $item; $counter++", I
    always replace that with push, as I can get rid of a variable
    that way, and that's Good.

(3) The flip-flop operator is a cool piece of Perl syntactic sugar;
    it's reasonably easy to understand what it does from looking at it,
    but if you haven't used sed, you won't have seen it before.

(4) I'd check CGI::escapeHTML to see if it also does quoting of
    &apos;, which it might well do -- then you could get rid of another
    line.

(5) $false and $true strike me as very wrong; either use 1 and 0,
    or at least something like:

	sub FALSE { 0 }
	sub TRUE  { 1 }

    and use case to distingush them as globals.  You've actually got
    three cases here, rather than the implied boolean of "$span_complete",
    so I renamed it to "$span_state".