From jandanz at bigfoot.com Thu Apr 24 02:14:49 2003 From: jandanz at bigfoot.com (James @bigfoot.com) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS Message-ID: <1051168498.2052.33.camel@argon> I am writing a CGI script that will use XML::RSS. I realise that some hosting service will either, not have this installed, or not be willing to install it. It has been suggested that I could include XSS::RSS with my code. Could someone please explain what is meant by this, and how I would go about it. -- ==================== James J. Eaton jandanz@bigfoot.com ==================== == From ewen at naos.co.nz Thu Apr 24 02:57:59 2003 From: ewen at naos.co.nz (Ewen McNeill) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: Message from "James @bigfoot.com" of "24 Apr 2003 19:14:49 +1200." <1051168498.2052.33.camel@argon> Message-ID: <20030424075759.2CFE6AE4F5@basilica.la.naos.co.nz> In message <1051168498.2052.33.camel@argon>, "James @bigfoot.com" writes: >I am writing a CGI script that will use XML::RSS. I realise that some >hosting service will either, not have this installed, or not be willing >to install it. It has been suggested that I could include XSS::RSS with >my code. Could someone please explain what is meant by this, and how I >would go about it. Without knowing what the person who suggested it was thinking it seems to me there are a couple of possiblities - upload the XML::RSS module files to sit in your cgi-bin directory, and add the appropriate directory to your perl @INC path - textually include the contents of the XML:RSS module into your cgi-bin program surrounded by appropriate Package statements and the like to make perl happy. Of these two the first is going to be more maintainable and the second is going to minimise the number of files you need to upload (and, eg, might be useful if you have to send them the script for inclusion). Both will work only if the code is "pure perl" and doesn't need anything compiled/adapted/etc for the machine it's running on. With XML::RSS based on what I guess it's doing, it's probably going to be okay. Beware, however, of its dependencies -- it may well want an XML library which may well not be installed... Ewen From ewen at naos.co.nz Thu Apr 24 03:15:21 2003 From: ewen at naos.co.nz (Ewen McNeill) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: Message from James J Eaton & Annette Eaton of "24 Apr 2003 20:06:06 +1200." <1051171567.2047.39.camel@argon> Message-ID: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz> NOTE: CC'd back to the Perl Mongers list 'cause I figure others might be interested. In message <1051171567.2047.39.camel@argon>, James J Eaton & Annette Eaton writes: >On Thu, 2003-04-24 at 19:57, Ewen McNeill wrote: >> - upload the XML::RSS module files to sit in your cgi-bin directory, and >> add the appropriate directory to your perl @INC path > >I like this option, but how can a user change the @INC path for their >hosting service? Is it easy to do, or is each site going to be different? What I'd suggest doing is uploading it to a subdirectory of cgi-bin, and then having the script push an appropriate relative path onto the @INC path. For instance, suppose that you create a "local-libs" directory under the cgi-bin directory, and have: ..../cgi-bin/local-libs/XML/RSS.pm Then your cgi-bin script could contain: push @INC, "local-libs"; and it should be fine. This is all off the top of my head, but I think it should work give or take: - the @INC modifications might need to be done in a BEGIN { } block to ensure they happen early enough; - you might need to fiddle with the exact path to get the one that corresponds with the CGI spec. IIRC cgi-bin programs are supposed to be run with the current directory of their binary directory, but I may be misremembering, as it's ages since I've written/installed a cgi-bin script that cared. Ewen From jandanz at bigfoot.com Thu Apr 24 05:41:04 2003 From: jandanz at bigfoot.com (James @bigfoot.com) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz> References: <20030424081521.5409EAE4F5@basilica.la.naos.co.nz> Message-ID: <1051180865.2041.47.camel@argon> On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote: > NOTE: CC'd back to the Perl Mongers list 'cause I figure others might be > interested. > > In message <1051171567.2047.39.camel@argon>, James J Eaton & Annette Eaton writes: > >On Thu, 2003-04-24 at 19:57, Ewen McNeill wrote: > >> - upload the XML::RSS module files to sit in your cgi-bin directory, and > >> add the appropriate directory to your perl @INC path > > > >I like this option, but how can a user change the @INC path for their > >hosting service? Is it easy to do, or is each site going to be different? > > What I'd suggest doing is uploading it to a subdirectory of cgi-bin, and > then having the script push an appropriate relative path onto the @INC > path. > > For instance, suppose that you create a "local-libs" directory under the > cgi-bin directory, and have: > > ..../cgi-bin/local-libs/XML/RSS.pm Wouldn't it be just as easy to put XML::RSS module in the cgi-bin. Wouldn't that in the @inc path? James -- ==================== James J. Eaton jandanz@bigfoot.com ==================== == From ewen at naos.co.nz Thu Apr 24 05:56:09 2003 From: ewen at naos.co.nz (Ewen McNeill) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: Message from "James @bigfoot.com" of "24 Apr 2003 22:41:04 +1200." <1051180865.2041.47.camel@argon> Message-ID: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz> In message <1051180865.2041.47.camel@argon>, "James @bigfoot.com" writes: >On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote: >> For instance, suppose that you create a "local-libs" directory under the >> cgi-bin directory, and have: >> >> ..../cgi-bin/local-libs/XML/RSS.pm > >Wouldn't it be just as easy to put XML::RSS module in the cgi-bin. >Wouldn't that in the @inc path? You can't just put RSS.pm in the cgi-bin directory, and expect to do "use XML::RSS". Even if the cgi-bin directory were in the @INC path (which it might be either in the website's perl setup, or via "." being in the @INC path, and it being the current directory at invocation time), Perl is still going to want to see the directory structure. You might be able to get away with: ..../cgi-bin/XML/RSS.pm but personally I'd find that a bit messy if I had to deal with a bunch of libraries "included" like that. Ewen From jandanz at bigfoot.com Fri Apr 25 18:49:17 2003 From: jandanz at bigfoot.com (James @bigfoot.com) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz> References: <20030424105609.C077FAE4F5@basilica.la.naos.co.nz> Message-ID: <1051314557.1875.2.camel@argon> I have just found in the Perl documentation a reference to use lib Probably the most convenient solution from your users' perspective is for you to add a use lib pragma near the top of your script. That way the users of the program don't need to take any special action to run your program. Imagine a hypothetical project called Spectre whose programs rely on its own set of libraries. Those programs could have a statement like this at their start: use lib "/projects/spectre/lib"; I presume that this would be put into a Begin sub: Begin ( use lib "/projects/spectre/lib"; ) James On Thu, 2003-04-24 at 22:56, Ewen McNeill wrote: > In message <1051180865.2041.47.camel@argon>, "James @bigfoot.com" writes: > >On Thu, 2003-04-24 at 20:15, Ewen McNeill wrote: > >> For instance, suppose that you create a "local-libs" directory under the > >> cgi-bin directory, and have: > >> > >> ..../cgi-bin/local-libs/XML/RSS.pm > > > >Wouldn't it be just as easy to put XML::RSS module in the cgi-bin. > >Wouldn't that in the @inc path? > > You can't just put RSS.pm in the cgi-bin directory, and expect to do > "use XML::RSS". Even if the cgi-bin directory were in the @INC path > (which it might be either in the website's perl setup, or via "." being > in the @INC path, and it being the current directory at invocation > time), Perl is still going to want to see the directory structure. > > You might be able to get away with: > > ..../cgi-bin/XML/RSS.pm > > but personally I'd find that a bit messy if I had to deal with a bunch > of libraries "included" like that. > > Ewen > -- =================== James J. Eaton james@eaton.net.nz =================== -- ==================== James J. Eaton jandanz@bigfoot.com ==================== == From michael at diaspora.gen.nz Fri Apr 25 19:22:58 2003 From: michael at diaspora.gen.nz (michael@diaspora.gen.nz) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: Your message of "26 Apr 2003 11:49:17 +1200." <1051314557.1875.2.camel@argon> Message-ID: >use lib "/projects/spectre/lib"; > >I presume that this would be put into a Begin sub: >Begin ( > use lib "/projects/spectre/lib"; >) Sort of. "perldoc -u lib" reveals: The parameters to C are added to the start of the perl search path. Saying use lib LIST; is I the same as saying BEGIN { unshift(@INC, LIST) } "use" is processed at compile time, rather than at run time; obviously, the compiler needs to know what extra semantics you're importing into your namespace, so it can generate the appropriate code. In your case, you want to alter @INC to include your cgi-bin directory, so you can include your own copy of XML::RSS.[0] So you want to put: use lib "path/to/your/cgi-bin"; at the top of your script, which is *almost* like saying[1]: BEGIN { unshift(@INC, "path/to/your/cgi-bin") } Ewen's suggesting that if you want to use lots of local modules, you reduce the mess by creating "path/to/your/cgi-bin/local-libs", and put the modules in there; the "use lib" statement changes in a corresponding manner. -- michael. [0] However, if you want to then write "use XML::RSS", you'll need to put the RSS.pm file in path/to/your/cgi-bin/XML/, to preserve the convention that :: means "a directory". This is documented under "perldoc -f require"; "::" gets replaced with "/". [1] Which in turn, if you're unfamiliar with unshift (perldoc -f unshift), means something like: BEGIN { @INC = ("path/to/your/cgi-bin", @INC) } From ewen at naos.co.nz Fri Apr 25 19:44:15 2003 From: ewen at naos.co.nz (Ewen McNeill) Date: Thu Aug 5 00:24:18 2004 Subject: XML::RSS In-Reply-To: Message from "James @bigfoot.com" of "26 Apr 2003 11:49:17 +1200." <1051314557.1875.2.camel@argon> Message-ID: <20030426004415.3781FAE4F5@basilica.la.naos.co.nz> In message <1051314557.1875.2.camel@argon>, "James @bigfoot.com" writes: >I have just found in the Perl documentation a reference to use lib "use lib" is (loosely) equivilent to push @INC, "directory". (Michael's explanation of what it means is more detailed, and more correct.) However "use lib" will not solve your problem by itself because: (a) you cannot upload files outside the cgi-bin directory, and (b) you don't necessarily know where the cgi-bin directory is anyway Hence my suggestion of placing a sub-directory under the cgi-bin directory, and using a relative path to it. As I said IIRC the cgi-bin starts with the current directory being the cgi-bin directory from where it is run. However feel free to use "use lib" instead of "push @INC ..." if you're using a new enough version of perl to support it. (5.something IIRC.) Ewen From jandanz at bigfoot.com Mon Apr 28 02:31:12 2003 From: jandanz at bigfoot.com (James @bigfoot.com) Date: Thu Aug 5 00:24:18 2004 Subject: Code Check Message-ID: <1051515072.2033.16.camel@argon> Being a novice to Perl and CGI programming is there someone out there who would volunteer to check out my CGI program. I have tested it locally and under BOA and Apache in controlled condition and it works. I would appreciate constructive criticism. Just let me know your email address and I will email it to you. It is not very long only about 280 lines. Thank you James -- ==================== James J. Eaton jandanz@bigfoot.com ==================== == From douglas at katipo.co.nz Mon Apr 28 03:16:37 2003 From: douglas at katipo.co.nz (Douglas Bagnall) Date: Thu Aug 5 00:24:18 2004 Subject: Code Check In-Reply-To: <1051515072.2033.16.camel@argon> References: <1051515072.2033.16.camel@argon> Message-ID: <3EACE365.5000602@katipo.co.nz> hi James, Post an url that serves the source as text/plain, and let everyone pick at it. cheers, Douglas James @bigfoot.com wrote: > Being a novice to Perl and CGI programming is there someone out there > who would volunteer to check out my CGI program. > > I have tested it locally and under BOA and Apache in controlled > condition and it works. > > I would appreciate constructive criticism. Just let me know your email > address and I will email it to you. It is not very long only about 280 > lines. > > Thank you > James From jandanz at bigfoot.com Mon Apr 28 04:10:51 2003 From: jandanz at bigfoot.com (James J Eaton & Annette Eaton) Date: Thu Aug 5 00:24:18 2004 Subject: Code Check In-Reply-To: <3EACE365.5000602@katipo.co.nz> References: <1051515072.2033.16.camel@argon> <3EACE365.5000602@katipo.co.nz> Message-ID: <1051521051.2033.30.camel@argon> Ok here goes: http://forsale.orcon.net.nz/gm-rss0.3.0.cgi James On Mon, 2003-04-28 at 20:16, Douglas Bagnall wrote: > > hi James, > > Post an url that serves the source as text/plain, and let everyone pick > at it. > > cheers, > > Douglas > > James @bigfoot.com wrote: > > Being a novice to Perl and CGI programming is there someone out there > > who would volunteer to check out my CGI program. > > > > I have tested it locally and under BOA and Apache in controlled > > condition and it works. > > > > I would appreciate constructive criticism. Just let me know your email > > address and I will email it to you. It is not very long only about 280 > > lines. > > > > Thank you > > James > > -- ======================= James & Annette Eaton jandanz@bigfoot.com ======================= From grant.mclean at bearingpoint.com Mon Apr 28 20:09:58 2003 From: grant.mclean at bearingpoint.com (McLean, Grant) Date: Thu Aug 5 00:24:18 2004 Subject: Code Check Message-ID: <53B8C97B11002E49BB494F8610AE9538A16A39@kccxoex03.corp.kpmgconsulting.com> James Eaton wrote > http://forsale.orcon.net.nz/gm-rss0.3.0.cgi > > > > Being a novice to Perl and CGI programming is there > > > someone out there who would volunteer to check out > > > my CGI program. I've just made a brief pass through it and it looks pretty good to me - especially for a 'novice'. Here are a few random thoughts that struck me as I read through. As they are all style related and style is a matter of personal taste, I won't be in the least offended if you wish to ignore or debate them. You have this code to enable debugging: if (defined($ARGV[0])) { if ($ARGV[0] eq "-d") { print "DEBUG - SET
\n"; $debug = $true; } } which could be collapsed into: if(@ARGV and $ARGV[0] eq "-d") { print "DEBUG - SET
\n"; $debug = $true; } or even possibly: my $debug = (@ARGV and $ARGV[0] eq "-d") and print "DEBUG - SET
\n"; Presumably, $ARGV[0] will only be set when the script is invoked from the command line. An alternative approach would be to always enable debugging when the script is run interactively (STDOUT is attached to a TTY): my $debug = (-t STDOUT); You have a block of code that looks like this: my $dest_name = $prefs{"dest_name"}; my $title = $prefs{"title"}; my $description = $prefs{"description"}; my $language = $prefs{"language"}; my $copyright = $prefs{"copyright"}; my $webmaster_name = $prefs{"nameofwebmaster"}; I find it easier to read this sort of stuff if the '=' symbols are vertically aligned. Also, as you're aware, it's not (always) necessary to quote 'barewords' used as hash keys. So you could rewrite that as: my $dest_name = $prefs{dest_name}; my $title = $prefs{title}; my $description = $prefs{description}; my $language = $prefs{language}; my $copyright = $prefs{copyright}; my $webmaster_name = $prefs{nameofwebmaster}; Having said that though, do you really need to copy the hash values out into individual variables anyway? Why not just change your code from this style: $rss->channel( title => $title, 'link' => $link, description => $description, image => $image, To say this: $rss->channel( title => $prefs{title}, link => $link, description => $prefs{description}, image => $prefs{image}, That would enable you to throw away a dozen lines of code. (Throwing away code feels soooo good). Your config file parsing code is also re-inventing the wheel somewhat. I'll refrain from suggesting that you could do it all in two lines of code with XML::Simple (oops it slipped out) and suggest a few other worthy alternatives http://search.cpan.org/dist/Config-IniFiles/ http://search.cpan.org/dist/Config-Properties/ http://search.cpan.org/dist/YAML/ I noticed this use of a variable: my $i = 1; # 150 lines snipped until ($i > $numberofitems) { You are effectively using $i as a global variable which means that the place where it is declared and initialised is a fair distance from where it is used (the same could be said for a number of your other variables). I would be inclined to declare it and initialise it in the same place. Eg: for my $i (1..$numberofitems) { I notice you're using the eof function to detect the end of file. This raised a red flag for me. In Perl it is almost never necessary to explicitly check for end of file and there are some subtle gotchas that can make using it undesirable. I'd be inclined to change the whole loop to something like: my $i = 1; while(my $eline = ) { last if($i++ > $numberofitems); # your stuff here } You have this snippet: READ: while () { $eline = $_; if ($eline =~ /span class="rss:item"/) { $span_complete = $false; foreach $eline () { if ($eline =~ /\/span/) { $span_complete = $true; last READ; } The first two lines could be combined: READ: while ($eline = ) { and reading from the same filehandle using both while and foreach is a bit odd. This line: foreach $eline () { actually slurps in all the remaining lines from ITEMFILE, but you then go on to ignore any files which follow the closing . If you used a while loop rather than a foreach then you wouldn't waste cycles reading in those trailing lines. Some people would say that if you're going to parse HTML then you should do it with a parser module like: http://search.cpan.org/dist/HTML-Parser/ I use regexes on HTML all the time and as long as you're aware of the potential traps then it's a reasonable enough approach (especially when you have to work with badly formed HTML). One interesting alternative approach is to use a module like XML::LibXML (which can read HTML) to parse the HTML into a DOM tree then you can use XPath expressions to select and extract the data you're interested in. Well, that's enough from me. I'll be interested to see what others thought. Regards Grant From michael at diaspora.gen.nz Tue Apr 29 03:14:40 2003 From: michael at diaspora.gen.nz (michael@diaspora.gen.nz) Date: Thu Aug 5 00:24:18 2004 Subject: Code Check In-Reply-To: Your message of "Tue, 29 Apr 2003 02:09:58 +0100." <53B8C97B11002E49BB494F8610AE9538A16A39@kccxoex03.corp.kpmgconsulting.com> Message-ID: >I've just made a brief pass through it and it looks pretty >good to me - especially for a 'novice'. Here are a few >random thoughts that struck me as I read through. As they >are all style related and style is a matter of personal >taste, I won't be in the least offended if you wish to >ignore or debate them. Please, take the following in the same spirit. Particularly the good for a "novice" bit; it's quite clear that you've programmed in other languages before, and are just a Perl novice. >Well, that's enough from me. I'll be interested to see what >others thought. I'll just focus on one bit, the loop that looks like: READ: while () { $eline = $_; if ($eline =~ /span class="rss:item"/) { $span_complete = $false; foreach $eline () { if ($eline =~ /\/span/) { $span_complete = $true; last READ; } $eline =~ s/\&/\&/g; $eline =~ s/\/span//g; $eline =~ s/<\/div>//g; $eline =~ s//\>/g; $eline =~ s/"/\"/g; $eline =~ s/'/\'/g; $itemline[$j] = $eline; $j++; } } } I'd be inclined to write that something like this: sub SPAN_NOT_SEEN { -1 } sub IN_SPAN { 0 } sub SPAN_COMPLETE { 1 } ... $span_state = SPAN_NOT_SEEN; for () { if (/span class="rss:item"/ .. m!/span!) { $span_state = IN_SPAN; s!!!g; s!/span!!g; s!'!'!!g; push @itemline, CGI::escapeHTML($_); if (m!/span!) { $span_state = SPAN_COMPLETE; last } } } ... do something with $span_state Several comments: (1) I hate leaning toothpicks (the "s/\/span//g" in the original), and always prefer changing the delimiters. (2) Whenever I see "$array[$counter] = $item; $counter++", I always replace that with push, as I can get rid of a variable that way, and that's Good. (3) The flip-flop operator is a cool piece of Perl syntactic sugar; it's reasonably easy to understand what it does from looking at it, but if you haven't used sed, you won't have seen it before. (4) I'd check CGI::escapeHTML to see if it also does quoting of ', which it might well do -- then you could get rid of another line. (5) $false and $true strike me as very wrong; either use 1 and 0, or at least something like: sub FALSE { 0 } sub TRUE { 1 } and use case to distingush them as globals. You've actually got three cases here, rather than the implied boolean of "$span_complete", so I renamed it to "$span_state".