From andrew at sweger.net Wed Dec 2 17:51:32 2009 From: andrew at sweger.net (Andrew Sweger) Date: Wed, 2 Dec 2009 17:51:32 -0800 (PST) Subject: SPUG: 45% off Ebook Purchases from O'Reilly Message-ID: It's not real clear to me what "limited time" means. But here's a chance to get O'Reilly ebooks at 45% off. The 35% off print books stands, of course. --------------------------------------------------------------------- View this information as HTML in your browser, click here: http://post.oreilly.com/rd/9z1ziaieh1am23jhkerev8rm3dvk18clp58dsoib4q0 Special offer for O'Reilly User Group program members: Along with your 35% discount off print books, you can now get *45% off all ebooks* you purchase direct from oreilly.com for a limited time. When you buy an O'Reilly ebook you get lifetime access to the book, and whenever possible we make it available to you in four, DRM-free file formats--PDF, .epub, Kindle-compatible .mobi, and Android ebook--that you can use on the devices of your choice. Our ebook files are fully searchable, and you can cut-and-paste and print them. We also alert you when we've updated the files with corrections and additions. Just use code DSUG when ordering online at http://post.oreilly.com/rd/9z1zmt80e0s79hh3ekkun36arb6750099mib373odl0 Read more about our ebook formats and the ways to use them here: http://post.oreilly.com/rd/9z1zpimij73sgdkvqfk0ma89q7l9i3i9cb53nddjk80 From skendric at fhcrc.org Fri Dec 4 05:07:31 2009 From: skendric at fhcrc.org (Stuart Kendrick) Date: Fri, 04 Dec 2009 05:07:31 -0800 Subject: SPUG: Security Seminar / December 10 / FHCRC Message-ID: <4B190993.8070701@fhcrc.org> Hi folks, We're hosting Garth Brown of Semaphore and Matt Sommer of Google for a public seminar on the Morphing Landscape of IT Security. Matt is lead security analyst for Google -- spends his days managing penetration testing and security assessments. A couple of smart guys, long-time buds, wanting to contribute some of what they've been learning to the community. https://vishnu.fhcrc.org/security-seminar/IT-Security-Landscape-Morphs.pdf Forward as widely as you see fit. --sk Stuart Kendrick Fred Hutchinson Cancer Research Center Seattle, WA USA From cmeyer at helvella.org Tue Dec 8 10:15:56 2009 From: cmeyer at helvella.org (Colin Meyer) Date: Tue, 8 Dec 2009 10:15:56 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: <1258514921.7075.19.camel@norseth> References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> <1258514921.7075.19.camel@norseth> Message-ID: <20091208181555.GA15794@infula.marketoutsider.com> Just came across this blog post on xpath webscraping (via perlbuzz): http://ssscripting.blogspot.com/2009/12/using-perl-to-scrape-web.html It aggrees with C.J.'s suggestion of using HTML::TreeBuilder::XPath -Colin. On Tue, Nov 17, 2009 at 07:28:41PM -0800, C.J. Adams-Collier wrote: > HTML::TreeBuilder::XPath > > On Tue, 2009-11-17 at 13:33 -0800, Michael R. Wolf wrote: > > > Yes, I know that XPath can only be applied to well-formed XML. > > > > That's the theoretical, pure, absolute truth. > > > > I'm working in the real world where I can't find a well-formed page. > > (For instance, http://validator.w3c.org does not validate such biggies > > as amazon.com, ask.com, google.com, or msn.com). For (my) practical > > purposes, there are no valid pages. From twists at gmail.com Tue Dec 8 11:41:38 2009 From: twists at gmail.com (Joshua ben Jore) Date: Tue, 8 Dec 2009 11:41:38 -0800 Subject: SPUG: When is a caret just a caret? And what about dollar? In-Reply-To: References: <3E1FFC9E-DA66-4018-89A5-A47E020C4F2A@att.net> <53473.97.113.85.41.1255488635.squirrel@webmail.efn.org> Message-ID: On Wed, Oct 14, 2009 at 6:58 AM, Michael R. Wolf wrote: > > On Oct 13, 2009, at 7:50 PM, Yitzchak Scott-Thoennes wrote: > >> You left out $foo[EXPR] and $foo{EXPR}, which may interpolate $foo >> or may interpolate a hash or array element, depending on perl's guess. > > Guess? ?You mean it's non-deterministic? ?Oh, the horror! ?:-) It's deterministic but probably difficult to prove correct. The code is S_intuit_more in toke.c: * Returns TRUE if there's more to the expression (e.g., a subscript), * FALSE otherwise. It deals with "$foo[3]" and /$foo[3]/ and /$foo[0123456789$]+/ * ->[ and ->{ return TRUE * { and [ outside a pattern are always subscripts, so return TRUE * if we're outside a pattern and it's not { or [, then return FALSE * if we're in a pattern and the first char is a { * {4,5} (any digits around the comma) returns FALSE * if we're in a pattern and the first char is a [ * [] returns FALSE * [SOMETHING] has a funky algorithm to decide whether it's a character class or not. It has to deal with things like /$foo[-3]/ and /$foo[$bar]/ as well as /$foo[$\d]+ * anything else returns TRUE /* This is the one truly awful dwimmer necessary to conflate C and sed. */ Josh From MichaelRWolf at att.net Tue Dec 8 17:43:36 2009 From: MichaelRWolf at att.net (Michael R. Wolf) Date: Tue, 8 Dec 2009 17:43:36 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: <20091208181555.GA15794@infula.marketoutsider.com> References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> <1258514921.7075.19.camel@norseth> <20091208181555.GA15794@infula.marketoutsider.com> Message-ID: On Dec 8, 2009, at 10:15 AM, Colin Meyer wrote: > Just came across this blog post on xpath webscraping (via perlbuzz): > > http://ssscripting.blogspot.com/2009/12/using-perl-to-scrape-web.html > > It aggrees with C.J.'s suggestion of using HTML::TreeBuilder::XPath Colin, Thanks. C.J., Thanks again All, In essence, this article gets a nodeset via use HTML::TreeBuilder::XPath; $agent = $content = WWW::Mechanize->new(); $agent->get($url); $content = $agent->content(); @nodes = HTML::TreeBuilder::XPath->new()->parse($content)- >findnodes($xpath); $text = join '', map { $_->content()->[0] } @nodes I'm getting similar results via use XML::LibXML; $content = DITTO; my %parse_options = (suppress_errors =>1, recover => 1); @nodes = XML::LibXML->new(\%parse_options)- >parse_html_string($content)->findnodes($xpath); $text = join '', map { $_->textContent() } @nodes; So, I asked myself, "Self, what's the difference between starting with XML::LibXML and starting with HTML::TreeBuilder if I get to pass an XPath off to a findnodes() method in either case?". In chasing the provenance, I found that they're both maintained by Michael Rodriguez, and have almost identical MANIFEST files. (They're identical on the names (but not contents of) the lib/(XML|Tree)/*.pm files and differ in the names and number of the t/*.t files.) A high-level code review looked like the lib/*.pm files were mostly copy/paste-identical files. Here's the best (high-level) contrast I could find in the documentation. From the XML::XPathEngine POD: SEE ALSO Tree::XPathEngine for a similar module for non-XML trees. Although XML brings to mind 'well-formed' whereas HTML (not XHTML) does not, I guess I'm fortunate to be able to use XPath in the XML-ish packages by using the qw(suppress_errors recover) options to the parser to handle my HTML. (Aside. This was the answer to my earlier posting on how to get over the non-well-formed issue.) I guess I started with XML::LIbXML because I didn't think that XPath would be applicable to non-XML (i.e. HTML). It appears that findnodes($xpath) works for a $treee (or $doc or $dom) parsed from either package. Could the only difference be that I've got to be explicit with the XML::LibXML parser about recovering on non-well-formed input while the HTML one already (tacitly) expects non-well-formed. Since my code's got to run on Mac, Windows and CentOS it would be great to hear if anyone's got a strong preference for, or history with, one versus the other. Thanks, Michael -- Michael R. Wolf All mammals learn by playing! MichaelRWolf at att.net From cjac at colliertech.org Tue Dec 8 18:30:53 2009 From: cjac at colliertech.org (C.J. Adams-Collier) Date: Tue, 08 Dec 2009 18:30:53 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> <1258514921.7075.19.camel@norseth> <20091208181555.GA15794@infula.marketoutsider.com> Message-ID: <1260325853.4961.56.camel@calcifer> yay me. On Tue, 2009-12-08 at 17:43 -0800, Michael R. Wolf wrote: > On Dec 8, 2009, at 10:15 AM, Colin Meyer wrote: > > > Just came across this blog post on xpath webscraping (via perlbuzz): > > > > http://ssscripting.blogspot.com/2009/12/using-perl-to-scrape-web.html > > > > It aggrees with C.J.'s suggestion of using HTML::TreeBuilder::XPath > > Colin, > > Thanks. > > > C.J., > > Thanks again > > > All, > > In essence, this article gets a nodeset via > > use HTML::TreeBuilder::XPath; > > $agent = $content = WWW::Mechanize->new(); > $agent->get($url); > $content = $agent->content(); > > @nodes = HTML::TreeBuilder::XPath->new()->parse($content)- > >findnodes($xpath); > > $text = join '', map { $_->content()->[0] } @nodes > > I'm getting similar results via > > use XML::LibXML; > > $content = DITTO; > > my %parse_options = (suppress_errors =>1, recover => 1); > @nodes = XML::LibXML->new(\%parse_options)- > >parse_html_string($content)->findnodes($xpath); > > $text = join '', map { $_->textContent() } @nodes; > > > So, I asked myself, "Self, what's the difference between starting with > XML::LibXML and starting with HTML::TreeBuilder if I get to pass an > XPath off to a findnodes() method in either case?". In chasing the > provenance, I found that they're both maintained by Michael Rodriguez, > and have almost identical MANIFEST files. (They're identical on the > names (but not contents of) the lib/(XML|Tree)/*.pm files and differ > in the names and number of the t/*.t files.) > > A high-level code review looked like the lib/*.pm files were mostly > copy/paste-identical files. > > Here's the best (high-level) contrast I could find in the documentation. > > From the XML::XPathEngine POD: > SEE ALSO > Tree::XPathEngine for a similar module for non-XML trees. > > > Although XML brings to mind 'well-formed' whereas HTML (not XHTML) > does not, I guess I'm fortunate to be able to use XPath in the XML-ish > packages by using the qw(suppress_errors recover) options to the > parser to handle my HTML. (Aside. This was the answer to my earlier > posting on how to get over the non-well-formed issue.) I guess I > started with XML::LIbXML because I didn't think that XPath would be > applicable to non-XML (i.e. HTML). It appears that findnodes($xpath) > works for a $treee (or $doc or $dom) parsed from either package. > > Could the only difference be that I've got to be explicit with the > XML::LibXML parser about recovering on non-well-formed input while the > HTML one already (tacitly) expects non-well-formed. > > Since my code's got to run on Mac, Windows and CentOS it would be > great to hear if anyone's got a strong preference for, or history > with, one versus the other. > > Thanks, > Michael > -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: From sthoenna at gmail.com Tue Dec 8 19:44:17 2009 From: sthoenna at gmail.com (Yitzchak Scott-Thoennes) Date: Tue, 8 Dec 2009 19:44:17 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> <1258514921.7075.19.camel@norseth> <20091208181555.GA15794@infula.marketoutsider.com> Message-ID: <7ee730350912081944s5eae722fhfcb30e8184cf2b49@mail.gmail.com> On Tue, Dec 8, 2009 at 5:43 PM, Michael R. Wolf wrote: > > Although XML brings to mind 'well-formed' whereas HTML (not XHTML) does not, I guess I'm fortunate to be able to use XPath in the XML-ish packages by using the qw(suppress_errors recover) options to the parser to handle my HTML. ?(Aside. ?This was the answer to my earlier posting on how to get over the non-well-formed issue.) ?I guess I started with XML::LIbXML because I didn't think that XPath would be applicable to non-XML (i.e. HTML). ?It appears that findnodes($xpath) works for a $treee (or $doc or $dom) parsed from either package. > > Could the only difference be that I've got to be explicit with the XML::LibXML parser about recovering on non-well-formed input while the HTML one already (tacitly) expects non-well-formed. No personal experience, but it's not just about recovering, but recovering the way a browser would have interpreted the HTML. >From the TreeBuilder POD: > HTML is rather harder to parse than people who write it generally suspect. > > Here's the problem: HTML is a kind of SGML that permits "minimization" > and "implication". In short, this means that you don't have to close > every tag you open (because the opening of a subsequent tag may > implicitly close it), and if you use a tag that can't occur in the > context you seem to using it in, under certain conditions the parser > will be able to realize you mean to leave the current context and enter > the new one, that being the only one that your code could correctly be > interpreted in. > > Now, this would all work flawlessly and unproblematically if: 1) all > the rules that both prescribe and describe HTML were (and had been) > clearly set out, and 2) everyone was aware of these rules and wrote > their code in compliance to them. > > However, it didn't happen that way, and so most HTML pages are > difficult if not impossible to correctly parse with nearly any set of > straightforward SGML rules. That's why the internals of HTML::TreeBuilder > consist of lots and lots of special cases -- instead of being just a > generic SGML parser with HTML DTD rules plugged in. > ... > > The HTML::TreeBuilder source may seem long and complex, but it is rather > well commented, and symbol names are generally self-explanatory. (You are > encouraged to read the Mozilla HTML parser source for comparison.) Some > of the complexity comes from little-used features, and some of it comes > from having the HTML tokenizer (HTML::Parser) being a separate module, > requiring somewhat of a different interface than you'd find in a combined > tokenizer and tree-builder. But most of the length of the source comes > from the fact that it's essentially a long list of special cases, > with lots and lots of sanity-checking, and sanity-recovery -- because, > as Roseanne Rosannadanna once said, "it's always something". From kevin-spug at fink.com Tue Dec 8 20:04:53 2009 From: kevin-spug at fink.com (Kevin Fink) Date: Tue, 8 Dec 2009 20:04:53 -0800 Subject: SPUG: Question on Class::DBI, MySQL, mod_perl locking Message-ID: I have a very simple dynamic page that is not working, and I'm not sure what I'm doing wrong. The page loads a record out of a database and displays a variety of options for one of the fields. When the user submits one of those, the page saves that back into the database, then loads the next applicable record. About as simple as you can get. Everything works fine, except when I try to submit the data the first time, nothing changes in the database, and the second time I try to submit it locks until the request times out with the following error: COD::DB::Domain COD::DB::Domain=HASH(0x552b689d40) destroyed without saving changes to domain_category_id at /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi/ModPerl/RegistryCooker.pm line 202 [Tue Dec 08 19:38:58 2009] [error] Can't update 5200: DBD::mysql::st execute failed: Lock wait timeout exceeded; try restarting transaction [for Statement "UPDATE domain\nSET domain_category_id = ?\nWHERE domain_id=?\n"] at /usr/lib/perl5/site_perl/5.8.5/DBIx/ContextualFetch.pm line 52, line 27.\n at /var/www/cod/domains.cgi line 30\n The relevant section of code is: { ... my $record = COD::DB::Domain->search(domain => $domain)->first; $record->domain_category_id($id); $record->update; } COD::DB::Domain isa Class::DBI. domain_category_id is a FK to another table, represented by COD::DB::DomainCategory (but I don't think that's relevant here - but could very easily be wrong). >From what little I understand of Class::DBI I thought the DB transaction would be committed when $record goes out of scope, but I don't think that's happening, so when I grab the next domain I get the same one again (since it's still available for changes), and then it starts to chase its tail. If I restart the web server between each submission I don't get the lock, but nothing gets changed in the database. So how do I get the record to be updated so the next web call sees the change? I can make the updates via the MySQL client without any problems, so I'm assuming it's a mod_perl/Class::DBI issue. Apache/2.0.52 This is perl, v5.8.5 built for x86_64-linux-thread-multi Class::DBI version 3.0.17 MySQL 4.1.22 Any thoughts? Kevin From brian at massassi.com Tue Dec 8 20:42:26 2009 From: brian at massassi.com (Brian E. Lozier) Date: Tue, 8 Dec 2009 20:42:26 -0800 Subject: SPUG: Question on Class::DBI, MySQL, mod_perl locking In-Reply-To: References: Message-ID: <3ec919520912082042h26c24ec0odf37d3ab63812373@mail.gmail.com> You could try explicitly committing the transaction. $record->db_commit(); # Class::DBI way of committing Code I write generally doesn't have auto commit enabled, maybe the author of that code did something similar. You can look in the connection code and look for AutoCommit => 0. Brian On Tue, Dec 8, 2009 at 8:04 PM, Kevin Fink wrote: > I have a very simple dynamic page that is not working, and I'm not > sure what I'm doing wrong. The page loads a record out of a database > and displays a variety of options for one of the fields. When the user > submits one of those, the page saves that back into the database, then > loads the next applicable record. About as simple as you can get. > Everything works fine, except when I try to submit the data the first > time, nothing changes in the database, and the second time I try to > submit it locks until the request times out with the following error: > > COD::DB::Domain COD::DB::Domain=HASH(0x552b689d40) destroyed without > saving changes to domain_category_id at > /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi/ModPerl/RegistryCooker.pm > line 202 > [Tue Dec 08 19:38:58 2009] [error] Can't update 5200: DBD::mysql::st > execute failed: Lock wait timeout exceeded; try restarting transaction > [for Statement "UPDATE domain\nSET ? ?domain_category_id = ?\nWHERE > domain_id=?\n"] at > /usr/lib/perl5/site_perl/5.8.5/DBIx/ContextualFetch.pm line 52, > line 27.\n at /var/www/cod/domains.cgi line 30\n > > The relevant section of code is: > > { > ... > ? ? ? my $record = COD::DB::Domain->search(domain => $domain)->first; > ? ? ? ?$record->domain_category_id($id); > ? ? ? ?$record->update; > } > > COD::DB::Domain isa Class::DBI. domain_category_id is a FK to another > table, represented by COD::DB::DomainCategory (but I don't think > that's relevant here - but could very easily be wrong). > > >From what little I understand of Class::DBI I thought the DB > transaction would be committed when $record goes out of scope, but I > don't think that's happening, so when I grab the next domain I get the > same one again (since it's still available for changes), and then it > starts to chase its tail. If I restart the web server between each > submission I don't get the lock, but nothing gets changed in the > database. > > So how do I get the record to be updated so the next web call sees the change? > > I can make the updates via the MySQL client without any problems, so > I'm assuming it's a mod_perl/Class::DBI issue. > > Apache/2.0.52 > This is perl, v5.8.5 built for x86_64-linux-thread-multi > Class::DBI version 3.0.17 > MySQL 4.1.22 > > Any thoughts? > > Kevin > _____________________________________________________________ > Seattle Perl Users Group Mailing List > ? ? POST TO: spug-list at pm.org > SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list > ? ?MEETINGS: 3rd Tuesdays > ? ?WEB PAGE: http://seattleperl.org/ > From kevin-spug at fink.com Tue Dec 8 20:44:42 2009 From: kevin-spug at fink.com (Kevin Fink) Date: Tue, 8 Dec 2009 20:44:42 -0800 Subject: SPUG: Question on Class::DBI, MySQL, mod_perl locking In-Reply-To: References: Message-ID: Ah, nothing like sending out a detailed description of a problem to help you figure it out for yourself... I changed my web server config to run a single process, and that eliminated the locking issue, so it definitely seems like it has to do with objects not getting released and DESTROYed properly. However, the value of the field doesn't change in the database, despite the database claiming that it is. Here's a section of strace output showing the update: 14097 write(1, "G\0\0\0\3UPDATE domain\nSET domain_category_id = \'2\'\nWHERE domain_id=\'5200\'\n", 75) = 75 14097 setsockopt(1, SOL_SOCKET, SO_RCVTIMEO, "\2003\341\1\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 14097 read(1, "0\0\0\1", 4) = 4 14097 read(1, "\0\1\0\0\0\0\0(Rows matched: 1 Changed: 1 Warnings: 0", 48) = 48 So MySQL says it matched 1 row and changed 1 row. But if I query that in the DB, it hasn't changed. Maybe a transaction not being committed? So I added an explicit dbi_commit() call, and lo and behold, the data is being changed! On Tue, Dec 8, 2009 at 8:04 PM, Kevin Fink wrote: > I have a very simple dynamic page that is not working, and I'm not > sure what I'm doing wrong. The page loads a record out of a database > and displays a variety of options for one of the fields. When the user > submits one of those, the page saves that back into the database, then > loads the next applicable record. About as simple as you can get. > Everything works fine, except when I try to submit the data the first > time, nothing changes in the database, and the second time I try to > submit it locks until the request times out with the following error: > > COD::DB::Domain COD::DB::Domain=HASH(0x552b689d40) destroyed without > saving changes to domain_category_id at > /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi/ModPerl/RegistryCooker.pm > line 202 > [Tue Dec 08 19:38:58 2009] [error] Can't update 5200: DBD::mysql::st > execute failed: Lock wait timeout exceeded; try restarting transaction > [for Statement "UPDATE domain\nSET ? ?domain_category_id = ?\nWHERE > domain_id=?\n"] at > /usr/lib/perl5/site_perl/5.8.5/DBIx/ContextualFetch.pm line 52, > line 27.\n at /var/www/cod/domains.cgi line 30\n > > The relevant section of code is: > > { > ... > ? ? ? my $record = COD::DB::Domain->search(domain => $domain)->first; > ? ? ? ?$record->domain_category_id($id); > ? ? ? ?$record->update; > } > > COD::DB::Domain isa Class::DBI. domain_category_id is a FK to another > table, represented by COD::DB::DomainCategory (but I don't think > that's relevant here - but could very easily be wrong). > > >From what little I understand of Class::DBI I thought the DB > transaction would be committed when $record goes out of scope, but I > don't think that's happening, so when I grab the next domain I get the > same one again (since it's still available for changes), and then it > starts to chase its tail. If I restart the web server between each > submission I don't get the lock, but nothing gets changed in the > database. > > So how do I get the record to be updated so the next web call sees the change? > > I can make the updates via the MySQL client without any problems, so > I'm assuming it's a mod_perl/Class::DBI issue. > > Apache/2.0.52 > This is perl, v5.8.5 built for x86_64-linux-thread-multi > Class::DBI version 3.0.17 > MySQL 4.1.22 > > Any thoughts? > > Kevin > _____________________________________________________________ > Seattle Perl Users Group Mailing List > ? ? POST TO: spug-list at pm.org > SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list > ? ?MEETINGS: 3rd Tuesdays > ? ?WEB PAGE: http://seattleperl.org/ > > From kevin-spug at fink.com Tue Dec 8 20:47:00 2009 From: kevin-spug at fink.com (Kevin Fink) Date: Tue, 8 Dec 2009 20:47:00 -0800 Subject: SPUG: Question on Class::DBI, MySQL, mod_perl locking In-Reply-To: <3ec919520912082042h26c24ec0odf37d3ab63812373@mail.gmail.com> References: <3ec919520912082042h26c24ec0odf37d3ab63812373@mail.gmail.com> Message-ID: Yep, that was it - I figured it out just before I saw your message. Thanks! Kevin On Tue, Dec 8, 2009 at 8:42 PM, Brian E. Lozier wrote: > You could try explicitly committing the transaction. > > $record->db_commit(); # Class::DBI way of committing > > Code I write generally doesn't have auto commit enabled, maybe the > author of that code did something similar. ?You can look in the > connection code and look for AutoCommit => 0. > > Brian > > On Tue, Dec 8, 2009 at 8:04 PM, Kevin Fink wrote: >> I have a very simple dynamic page that is not working, and I'm not >> sure what I'm doing wrong. The page loads a record out of a database >> and displays a variety of options for one of the fields. When the user >> submits one of those, the page saves that back into the database, then >> loads the next applicable record. About as simple as you can get. >> Everything works fine, except when I try to submit the data the first >> time, nothing changes in the database, and the second time I try to >> submit it locks until the request times out with the following error: >> >> COD::DB::Domain COD::DB::Domain=HASH(0x552b689d40) destroyed without >> saving changes to domain_category_id at >> /usr/lib64/perl5/vendor_perl/5.8.5/x86_64-linux-thread-multi/ModPerl/RegistryCooker.pm >> line 202 >> [Tue Dec 08 19:38:58 2009] [error] Can't update 5200: DBD::mysql::st >> execute failed: Lock wait timeout exceeded; try restarting transaction >> [for Statement "UPDATE domain\nSET ? ?domain_category_id = ?\nWHERE >> domain_id=?\n"] at >> /usr/lib/perl5/site_perl/5.8.5/DBIx/ContextualFetch.pm line 52, >> line 27.\n at /var/www/cod/domains.cgi line 30\n >> >> The relevant section of code is: >> >> { >> ... >> ? ? ? my $record = COD::DB::Domain->search(domain => $domain)->first; >> ? ? ? ?$record->domain_category_id($id); >> ? ? ? ?$record->update; >> } >> >> COD::DB::Domain isa Class::DBI. domain_category_id is a FK to another >> table, represented by COD::DB::DomainCategory (but I don't think >> that's relevant here - but could very easily be wrong). >> >> >From what little I understand of Class::DBI I thought the DB >> transaction would be committed when $record goes out of scope, but I >> don't think that's happening, so when I grab the next domain I get the >> same one again (since it's still available for changes), and then it >> starts to chase its tail. If I restart the web server between each >> submission I don't get the lock, but nothing gets changed in the >> database. >> >> So how do I get the record to be updated so the next web call sees the change? >> >> I can make the updates via the MySQL client without any problems, so >> I'm assuming it's a mod_perl/Class::DBI issue. >> >> Apache/2.0.52 >> This is perl, v5.8.5 built for x86_64-linux-thread-multi >> Class::DBI version 3.0.17 >> MySQL 4.1.22 >> >> Any thoughts? >> >> Kevin >> _____________________________________________________________ >> Seattle Perl Users Group Mailing List >> ? ? POST TO: spug-list at pm.org >> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list >> ? ?MEETINGS: 3rd Tuesdays >> ? ?WEB PAGE: http://seattleperl.org/ >> > > From MichaelRWolf at att.net Tue Dec 8 23:59:58 2009 From: MichaelRWolf at att.net (Michael R. Wolf) Date: Tue, 8 Dec 2009 23:59:58 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: <7ee730350912081944s5eae722fhfcb30e8184cf2b49@mail.gmail.com> References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> <1258514921.7075.19.camel@norseth> <20091208181555.GA15794@infula.marketoutsider.com> <7ee730350912081944s5eae722fhfcb30e8184cf2b49@mail.gmail.com> Message-ID: On Dec 8, 2009, at 7:44 PM, Yitzchak Scott-Thoennes wrote: > On Tue, Dec 8, 2009 at 5:43 PM, Michael R. Wolf > wrote: [...] >> Could the only difference be that I've got to be explicit with the >> XML::LibXML parser about recovering on non-well-formed input while >> the HTML one already (tacitly) expects non-well-formed. > > No personal experience, but it's not just about recovering, but > recovering the way a browser would have interpreted the HTML. Good point. Thanks. Do you know if it has a "quirks mode"? > From the TreeBuilder POD: > [...] >> -- because, >> as Roseanne Rosannadanna once said, "it's always something". Or, as my Dad said. If it's not one thing, it's ... 10! -- Michael R. Wolf All mammals learn by playing! MichaelRWolf at att.net From jobs-noreply at seattleperl.org Wed Dec 9 16:03:28 2009 From: jobs-noreply at seattleperl.org (SPUG Jobs) Date: Wed, 9 Dec 2009 16:03:28 -0800 (PST) Subject: SPUG: JOB: Perl/Java @ Speakeasy Message-ID: Description Position Summary: We're looking for an experienced, enthusiastic Perl & Java developer to work with our CPE team on a project. Ideal candidate will have experience with networks and SIP protocols. This project requires solid recent Perl coding. The contactor will spend 80% of the time coding in Perl and 20% in Java. Your responsibilities include but are not limited to: * Develop software using our current application standards in Java and Perl * Database development experience with SQL, Oracle, database design * Participate in various phases of SDLC including design, coding, reviews, testing and documentation * Utilize design methodologies, object-oriented design and design patterns * Work with other teams throughout the company to determine feasibility, business and functional requirements and technical designs on assigned projects * Provide ongoing support, maintenance and enhancement of systems Required Skills & Experiences Skills and experience: * BS in Computer science with 3+ years of industry experience * 3+ years solid hands-on experience with Java development required * 3+ years solid development experience with Perl required * Database development experience with SQL, Oracle, database design * Network experience a plus * Experience working with SIP protocols a plus Ideally will also have: * 2+ years experience with Spring and Hibernate * Experience with Web Services and XML technologies (SOAP, XSD, XMLBeans, etc...) * Experience with Agile software development methodologies * Experience with design patterns and modeling methodologies such as UML * Experience with multi-tiered enterprise software application design and development * Knowledge of computer network infrastructures, technologies and protocols * Excellent written and verbal communication skills * Ability to work well independently or within a team, especially cross- functional teams * Ability to determine unique and creative solutions to problems within a rapid development environment If interested, please send me your resume. Lori Barry, PHR, SPHR Manager, Recruiting Speakeasy Direct > 206 971 5154 * Fax > 206 971 5191 Email > lori.barry at hq.speakeasy.net * Web > http://www.speakeasy.net/ STRATEGIC | ACHIEVER | LEARNER | WOO | ANALYTICAL Voice * Data * Managed Services From michaelrwolf at att.net Thu Dec 10 19:52:37 2009 From: michaelrwolf at att.net (Michael R. Wolf) Date: Thu, 10 Dec 2009 19:52:37 -0800 Subject: SPUG: Perl shines for data validation Message-ID: I just saw this as a suggestion to flag invalid phone numbers as a validation routine for importing into a hosted database application. I think they were serious. LEN( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE(MobilePhone, "(", ""), ")", ""), "-","")," ",""),".",""),"/","")) <> 10 Eeeeekk. SQL is not the right tool for this job. -- Michael R. Wolf All mammals learn by playing! MichaelRWolf at att.net From Chris.Callan at Tectura.com Fri Dec 11 06:45:31 2009 From: Chris.Callan at Tectura.com (Callan, Chris) Date: Fri, 11 Dec 2009 07:45:31 -0700 Subject: SPUG: Perl shines for data validation In-Reply-To: References: Message-ID: <5303BDF3F253554E889BE362FA0C604E0F3ABCBF@MAIL1.TecturaCorp.net> If it's the only tool/language available for the processing, then it is what it is. Chris Callan "When your map and the terrain disagree, believe the terrain." ~Military quote on "Ground Truth" and navigation 3:80 -----Original Message----- From: spug-list-bounces+chris.callan=tectura.com at pm.org [mailto:spug-list-bounces+chris.callan=tectura.com at pm.org] On Behalf Of Michael R. Wolf Sent: Thursday, December 10, 2009 7:53 PM To: Spug-List at Pm.Org Subject: SPUG: Perl shines for data validation I just saw this as a suggestion to flag invalid phone numbers as a validation routine for importing into a hosted database application. I think they were serious. LEN( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE( SUBSTITUTE(MobilePhone, "(", ""), ")", ""), "-","")," ",""),".",""),"/","")) <> 10 Eeeeekk. SQL is not the right tool for this job. -- Michael R. Wolf All mammals learn by playing! MichaelRWolf at att.net _____________________________________________________________ Seattle Perl Users Group Mailing List POST TO: spug-list at pm.org SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list MEETINGS: 3rd Tuesdays WEB PAGE: http://seattleperl.org/ No virus found in this incoming message. Checked by AVG - www.avg.com Version: 9.0.709 / Virus Database: 270.14.103/2558 - Release Date: 12/11/09 02:06:00 From andrew at sweger.net Tue Dec 22 09:12:47 2009 From: andrew at sweger.net (Andrew Sweger) Date: Tue, 22 Dec 2009 09:12:47 -0800 (PST) Subject: SPUG: OSCON Call for Proposals Now Open Message-ID: The OSCON 2010 CFP is open until February 1st: OSCON, the O'Reilly Open Source Convention July 19 - 23, 2010 Oregon Convention Center Portland, OR http://en.oreilly.com/oscon2010 Faster, Freer, Smarter: Whatever your Goal, Make It Happen with Open Source More than 2,500 experts, developers, sys admins, and hackers will meet up at OSCON 2010 to explore the tools, services, and platforms that make up the vibrant open source ecosystem. Join us! The OSCON Call for Participation is now open. If you have winning techniques, favorite lifesavers, war stories, productivity tips, or other ideas to share, we want to hear from you. We're especially on the look-out for ways to do more with less, design and usability best practices, mobile device innovations, cloud computing, parallelization, open standards and data, open source in government, business models, and beyond. Speak up about the freedom--and opportunity--of open source at OSCON 2010. Submit your proposal by February 1, 2010 at: http://en.oreilly.com/oscon2010/public/cfp/92 From choward at indicium.us Tue Dec 22 12:25:17 2009 From: choward at indicium.us (Christopher Howard) Date: Tue, 22 Dec 2009 11:25:17 -0900 Subject: SPUG: LEGO::NXT Message-ID: <4B312B2D.7040103@indicium.us> Does anyone on this list have experience working with LEGO::NXT? -- Christopher Howard http://indicium.us http://theologia.indicium.us http://robots.arsc.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 261 bytes Desc: OpenPGP digital signature URL: From atom.powers at gmail.com Tue Dec 22 12:45:33 2009 From: atom.powers at gmail.com (Atom Powers) Date: Tue, 22 Dec 2009 12:45:33 -0800 Subject: SPUG: LEGO::NXT In-Reply-To: <4B312B2D.7040103@indicium.us> References: <4B312B2D.7040103@indicium.us> Message-ID: I have one, played with it for a while, but never did anything really interesting with it. On Tue, Dec 22, 2009 at 12:25 PM, Christopher Howard wrote: > Does anyone on this list have experience working with LEGO::NXT? > > -- > Christopher Howard > http://indicium.us > http://theologia.indicium.us > http://robots.arsc.edu > > > _____________________________________________________________ > Seattle Perl Users Group Mailing List > ? ? POST TO: spug-list at pm.org > SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list > ? ?MEETINGS: 3rd Tuesdays > ? ?WEB PAGE: http://seattleperl.org/ > -- Perfection is just a word I use occasionally with mustard. --Atom Powers-- From choward at indicium.us Tue Dec 22 12:53:48 2009 From: choward at indicium.us (Christopher Howard) Date: Tue, 22 Dec 2009 11:53:48 -0900 Subject: SPUG: LEGO::NXT In-Reply-To: References: <4B312B2D.7040103@indicium.us> Message-ID: <4B3131DC.2060101@indicium.us> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Atom Powers wrote: > I have one, played with it for a while, but never did anything really > interesting with it. > > On Tue, Dec 22, 2009 at 12:25 PM, Christopher Howard > wrote: >> Does anyone on this list have experience working with LEGO::NXT? >> >> -- >> Christopher Howard >> http://indicium.us >> http://theologia.indicium.us >> http://robots.arsc.edu >> >> >> _____________________________________________________________ >> Seattle Perl Users Group Mailing List >> POST TO: spug-list at pm.org >> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list >> MEETINGS: 3rd Tuesdays >> WEB PAGE: http://seattleperl.org/ >> > > > I was trying to use the LEGO::NXT module to control my Mindstorm 2.0 bot over bluetooth, but I am concerned about one annoying limitation, which I described in the post: http://forums.nxtasy.org/index.php?showtopic=4579 I was hoping one of you guys might have played around with the module before. - -- Christopher Howard http://indicium.us http://theologia.indicium.us http://robots.arsc.edu -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAksxMdwACgkQQ5FLNdi0BcVoswCgnSoCEg0NN0KCbnbKmJS/+P6I SIUAnR343hq9UCz1FOdgIYpPtlOqhv23 =0dD3 -----END PGP SIGNATURE----- From telcodev at gmail.com Tue Dec 22 20:49:13 2009 From: telcodev at gmail.com (Joseph Werner) Date: Tue, 22 Dec 2009 20:49:13 -0800 Subject: SPUG: Strange code effect Message-ID: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> Hey guys [and gals], I encountered a strange Perl code phenomenon dealing with some object references, and and array of object references. Specifically, I could not shift a reference to an object off of an array of object references. More troubling, I also was unable to use the object references in a boolean context. I have written code very similar to this dozens if not hundreds of times and never encountered this type of problem... The code is all proprietary, and the powers that be frown on release of any sort, so I will pseudo code what I am talking about. We are running Active State Perl 5.10 on a Win32 platform, if that matters my $some_parms = shift; my $objhref = get_objects($some_parms); # $objhref = { # '01' => bless( {}, 'ASpecialObject' ), # '03' => bless( {}, 'ASpecialObject' ), # '02' => bless( {}, 'ASpecialObject' ) # }; # ASpecialObject class does have the <=> overloaded # But I cannot see the making a difference... # Sorting works fine: my @arrayoforefs; eval { @arrayoforefs = sort { $a <=> $b } values %{$objhref}; } # Error checks ignored # Server dies silently at the following statement: my $chosenO = shift @arrayoforefs; # If I do this instead: my $chosenO = $arrayoforefs[0]; # Then this causes the server to silently crash: return unless $chosenO; __END__ Any help here? Thanks, Saltbreez From mail.spammagnet at gmail.com Wed Dec 23 12:36:02 2009 From: mail.spammagnet at gmail.com (BenRifkah Bergsten-Buret) Date: Wed, 23 Dec 2009 12:36:02 -0800 Subject: SPUG: Strange code effect In-Reply-To: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> Message-ID: See comments below. On Tue, Dec 22, 2009 at 8:49 PM, Joseph Werner wrote: > Hey guys [and gals], > > I encountered a strange Perl code phenomenon dealing with some object > references, and and array of object references. Specifically, I could > not shift a reference to an object off of an array of object > references. More troubling, I also was unable to use the object > references in a boolean context. I have written code very similar to > this dozens if not hundreds of times and never encountered this type > of problem... > > The code is all proprietary, and the powers that be frown on release > of any sort, so I will pseudo code what I am talking about. We are > running Active State Perl 5.10 on a Win32 platform, if that matters > > my $some_parms = shift; > my $objhref = get_objects($some_parms); > # $objhref = { > # '01' => bless( {}, 'ASpecialObject' ), > # '03' => bless( {}, 'ASpecialObject' ), > # '02' => bless( {}, 'ASpecialObject' ) > # }; > > Have you tried assigning anything from $objhref directly at this point? For example, do you get an error if you try to do $myobj = $objhref->{01}? > # ASpecialObject class does have the <=> overloaded > # But I cannot see the making a difference... > > # Sorting works fine: > > my @arrayoforefs; > eval { > @arrayoforefs = sort { $a <=> $b } values %{$objhref}; > } > # Error checks ignored > Since you've wrapped this in an eval it seems like you're expecting exceptions to be thrown. Why ignore them? See if there is an exception that might lead you in the right direction. Also, are you able to access anything within @arrayoforefs without using an assignment? For example what happens when you do warn "arrayofrefs[0]: @arrayofrefs[0]"? Do the server crash? > > # Server dies silently at the following statement: > > my $chosenO = shift @arrayoforefs; > > # If I do this instead: > > my $chosenO = $arrayoforefs[0]; > > # Then this causes the server to silently crash: > > return unless $chosenO; > You've said "server dies silently" and "silently crash" but I'm not clear what you mean. Is this two different things or the same? By "crash" do you mean that the server process exits with a core dump? Is this an Apache server? Mod_perl? When perl "dies" it generates a message. Also, in my experience when Apache dumps core it puts a message in the log but you've said "silently" so perhaps something else is going on. My suggestion is to code up the simplest case you can where the problem is encountered. Strip out as much code as possible that doesn't directly relate to what you're doing. For example, the following script can be run outside of your web server and outside of your Mason environment: #!/usr/bin/perl use strict; use warnings; use lib "/my/code/path"; use ASpecialObject; my $objects = { 01 => ASpecialObject->new(), 02 => ASpecialObject->new(), }; my @sorted = sort {$a <=> $b} values %{$objects}; #Does this result in the same problems you're having? my $chosen0 = shift @sorted; print "If you see this then the script is done\n"; __END__ You'll probably have to include some object initialization code but if you still have problems with this then you know the problem is somewhere in ASpecialObject.pm. Then you can start stripping out the code from there to find the minimum amount of code that causes the problem. -- Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail.spammagnet at gmail.com Wed Dec 23 12:40:09 2009 From: mail.spammagnet at gmail.com (BenRifkah Bergsten-Buret) Date: Wed, 23 Dec 2009 12:40:09 -0800 Subject: SPUG: Strange code effect In-Reply-To: References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> Message-ID: So I'm getting my lists confused. When I said this: On Wed, Dec 23, 2009 at 12:36 PM, BenRifkah Bergsten-Buret < mail.spammagnet at gmail.com> wrote: > For example, the following script can be run outside of your web server and > outside of your Mason environment: > I thought I was responding to the HTML::Mason list. Please disregard that part. -- Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From telcodev at gmail.com Wed Dec 23 14:21:12 2009 From: telcodev at gmail.com (Joseph Werner) Date: Wed, 23 Dec 2009 14:21:12 -0800 Subject: SPUG: Strange code effect In-Reply-To: References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> Message-ID: <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> Hey Ben, Thanks for the feedback! First, let me bring you up to current status. I attempted to construct, by building up to, the problem. This attempt failed to reproduce the central first issue: why was I unable to shift the object off of the array? However, it did answer the second question: The object failed to respond in a boolean context because one overloaded operator had been declared, but neither bool nor stringify had been declared. STRANGER still, fixing the second problem [bool overload] made the first [shift from array] problem disappear. I STILL cannot connect the two. Anyhow, problem solved, but the central question remains unanswered. No, we are not using Apache. Wish we were. I tread a tightrope of my nondisclosure commitments, but I can say: we use thin servers, what I have com to call 'campstove' servers [all the expectations of Apache in 2,000 lines of code or less] and due to the environment in which the server runs, either the crashes produce output to a log file, or we never see it. See additional comments below. On Wed, Dec 23, 2009 at 12:36 PM, BenRifkah Bergsten-Buret wrote: > See comments below. > > On Tue, Dec 22, 2009 at 8:49 PM, Joseph Werner wrote: >> >> Hey guys [and gals], >> >> I encountered a strange Perl code phenomenon dealing with some object >> references, and and array of object references. Specifically, I could >> not shift a reference to an object off of an array of object >> references. More troubling, I also was unable to use the object >> references in a boolean context. I have written code very similar to >> this dozens if not hundreds of times and never encountered this type >> of problem... >> >> The code is all proprietary, and the powers that be frown on release >> of any sort, so I will pseudo code what I am talking about. We are >> running Active State Perl 5.10 on a Win32 platform, if that matters >> >> my $some_parms = shift; >> my $objhref = get_objects($some_parms); >> # $objhref = { >> # '01' => bless( {}, 'ASpecialObject' ), >> # '03' => bless( {}, 'ASpecialObject' ), >> # '02' => bless( {}, 'ASpecialObject' ) >> # }; >> > > Have you tried assigning anything from $objhref directly at this point? For > example, do you get an error if you try to do $myobj = $objhref->{01}? Yes, the hash ref is fully loaded and performs these type of operations. > >> >> # ASpecialObject class does have the <=> overloaded >> # But I cannot see the making a difference... >> >> # Sorting works fine: >> >> my @arrayoforefs; >> eval { >> @arrayoforefs = sort { $a <=> $b } values %{$objhref}; >> } >> # Error checks ignored > > Since you've wrapped this in an eval it seems like you're expecting > exceptions to be thrown. Why ignore them? See if there is an exception > that might lead you in the right direction. My BAD. Ignoring exceptions was only pseudocode. In production, we do trap the exceptions. But it is a sunny day scenario [from the point of view of the evaled statement] that is producing the problem, so I omitted any checking to keep the waters clear. > > Also, are you able to access anything within @arrayoforefs without using an > assignment? For example what happens when you do warn "arrayofrefs[0]: > @arrayofrefs[0]"? Do the server crash? Not certian if what @arrayofrefs[0] is what you mean? A slice of one element? I will try and get back to you. > >> >> # Server dies silently at the following statement: >> >> my $chosenO = shift @arrayoforefs; >> >> # If I do this instead: >> >> my $chosenO = $arrayoforefs[0]; >> >> # Then this causes the server to silently crash: >> >> return unless $chosenO; > > You've said "server dies silently" and "silently crash" but I'm not clear > what you mean. Is this two different things or the same? By "crash" do you > mean that the server process exits with a core dump? Is this an Apache > server? Mod_perl? When perl "dies" it generates a message. Also, in my > experience when Apache dumps core it puts a message in the log but you've > said "silently" so perhaps something else is going on. Yes, silently. No core. Frustrating, nothing prints to the log at all, just the server is no longer running. > > My suggestion is to code up the simplest case you can where the problem is > encountered. Strip out as much code as possible that doesn't directly > relate to what you're doing. For example, the following script can be run > outside of your web server and outside of your Mason environment: Unfortunately my attempts to reproduce the array shift issue outside of the server have failed. Hey Thanks again for the feedback > > #!/usr/bin/perl > > use strict; > use warnings; > use lib "/my/code/path"; > use ASpecialObject; > > my $objects = { > 01 => ASpecialObject->new(), > 02 => ASpecialObject->new(), > }; > > my @sorted = sort {$a <=> $b} values %{$objects}; > > #Does this result in the same problems you're having? > my $chosen0 = shift @sorted; > print "If you see this then the script is done\n"; > __END__ > > You'll probably have to include some object initialization code but if you > still have problems with this then you know the problem is somewhere in > ASpecialObject.pm. Then you can start stripping out the code from there to > find the minimum amount of code that causes the problem. > > -- > Ben > From skylos at gmail.com Wed Dec 23 14:52:12 2009 From: skylos at gmail.com (Skylos) Date: Wed, 23 Dec 2009 14:52:12 -0800 Subject: SPUG: Strange code effect In-Reply-To: <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> Message-ID: <3650cdc00912231452h13cd7b48mae7e257f64805635@mail.gmail.com> Thats interesting. So something I had considered when you described the problem initially - Hm, he's using overrides - that has peril because you are inserting yourself inside of system operations on your data type. If he was crashing in his override... or didn't define a needed one a null pointer might do that... but naw, the default behavior should be intact if you don't custom define an override... so that couldn't be the problem - was wrong, because the boolean override changed it. Which makes me think "something about that value when a shift is done uses the value in a boolean context" Exactly what it is about a shift that initiates a boolean context - and why failure to define an override bypassed the normal behavior - isn't clear to me. But it makes perfect sense to me. Context is an unintuitive matter at times - sometimes you use values in a context without realizing because the context is implicit - for instance, anything in the final position of a block has the context of the block itself. I've gone so far as to do ; return (); } at the end of a block explicitly to keep the returning operation in a void context, thus keeping the value to myself. I wouldn't normally, but in this case it is an object that absolutely must be destroyed at the exit of this block. To have it be intercepted would lead to undefined operation within my system. Kind of like resurrecting a destroyed object with funky DESTROY subs. It may be that the boolean expression evaluation within the shift routine is purposeful - or it may merely be a side effect of some sequence that put the object value into a boolean context somewhere. Skylos "If only I could get rid of hunger by rubbing my belly" - Diogenes On Wed, Dec 23, 2009 at 2:21 PM, Joseph Werner wrote: > Hey Ben, > > Thanks for the feedback! > > First, let me bring you up to current status. I attempted to > construct, by building up to, the problem. This attempt failed to > reproduce the central first issue: why was I unable to shift the > object off of the array? However, it did answer the second question: > The object failed to respond in a boolean context because one > overloaded operator had been declared, but neither bool nor stringify > had been declared. STRANGER still, fixing the second problem [bool > overload] made the first [shift from array] problem disappear. I > STILL cannot connect the two. Anyhow, problem solved, but the central > question remains unanswered. > > No, we are not using Apache. Wish we were. I tread a tightrope of my > nondisclosure commitments, but I can say: we use thin servers, what I > have com to call 'campstove' servers [all the expectations of Apache > in 2,000 lines of code or less] and due to the environment in which > the server runs, either the crashes produce output to a log file, or > we never see it. > > See additional comments below. > > On Wed, Dec 23, 2009 at 12:36 PM, BenRifkah Bergsten-Buret > wrote: > > See comments below. > > > > On Tue, Dec 22, 2009 at 8:49 PM, Joseph Werner > wrote: > >> > >> Hey guys [and gals], > >> > >> I encountered a strange Perl code phenomenon dealing with some object > >> references, and and array of object references. Specifically, I could > >> not shift a reference to an object off of an array of object > >> references. More troubling, I also was unable to use the object > >> references in a boolean context. I have written code very similar to > >> this dozens if not hundreds of times and never encountered this type > >> of problem... > >> > >> The code is all proprietary, and the powers that be frown on release > >> of any sort, so I will pseudo code what I am talking about. We are > >> running Active State Perl 5.10 on a Win32 platform, if that matters > >> > >> my $some_parms = shift; > >> my $objhref = get_objects($some_parms); > >> # $objhref = { > >> # '01' => bless( {}, 'ASpecialObject' ), > >> # '03' => bless( {}, 'ASpecialObject' ), > >> # '02' => bless( {}, 'ASpecialObject' ) > >> # }; > >> > > > > Have you tried assigning anything from $objhref directly at this point? > For > > example, do you get an error if you try to do $myobj = $objhref->{01}? > > Yes, the hash ref is fully loaded and performs these type of operations. > > > > >> > >> # ASpecialObject class does have the <=> overloaded > >> # But I cannot see the making a difference... > >> > >> # Sorting works fine: > >> > >> my @arrayoforefs; > >> eval { > >> @arrayoforefs = sort { $a <=> $b } values %{$objhref}; > >> } > >> # Error checks ignored > > > > Since you've wrapped this in an eval it seems like you're expecting > > exceptions to be thrown. Why ignore them? See if there is an exception > > that might lead you in the right direction. > > My BAD. Ignoring exceptions was only pseudocode. In production, we do > trap the exceptions. But it is a sunny day scenario [from the point of > view of the evaled statement] that is producing the problem, so I > omitted any checking to keep the waters clear. > > > > > Also, are you able to access anything within @arrayoforefs without using > an > > assignment? For example what happens when you do warn "arrayofrefs[0]: > > @arrayofrefs[0]"? Do the server crash? > > Not certian if what @arrayofrefs[0] is what you mean? A slice of one > element? I will try and get back to you. > > > > >> > >> # Server dies silently at the following statement: > >> > >> my $chosenO = shift @arrayoforefs; > >> > >> # If I do this instead: > >> > >> my $chosenO = $arrayoforefs[0]; > >> > >> # Then this causes the server to silently crash: > >> > >> return unless $chosenO; > > > > You've said "server dies silently" and "silently crash" but I'm not clear > > what you mean. Is this two different things or the same? By "crash" do > you > > mean that the server process exits with a core dump? Is this an Apache > > server? Mod_perl? When perl "dies" it generates a message. Also, in my > > experience when Apache dumps core it puts a message in the log but you've > > said "silently" so perhaps something else is going on. > > Yes, silently. No core. Frustrating, nothing prints to the log at all, > just the server is no longer running. > > > > > My suggestion is to code up the simplest case you can where the problem > is > > encountered. Strip out as much code as possible that doesn't directly > > relate to what you're doing. For example, the following script can be > run > > outside of your web server and outside of your Mason environment: > > Unfortunately my attempts to reproduce the array shift issue outside > of the server have failed. > > Hey Thanks again for the feedback > > > > > #!/usr/bin/perl > > > > use strict; > > use warnings; > > use lib "/my/code/path"; > > use ASpecialObject; > > > > my $objects = { > > 01 => ASpecialObject->new(), > > 02 => ASpecialObject->new(), > > }; > > > > my @sorted = sort {$a <=> $b} values %{$objects}; > > > > #Does this result in the same problems you're having? > > my $chosen0 = shift @sorted; > > print "If you see this then the script is done\n"; > > __END__ > > > > You'll probably have to include some object initialization code but if > you > > still have problems with this then you know the problem is somewhere in > > ASpecialObject.pm. Then you can start stripping out the code from there > to > > find the minimum amount of code that causes the problem. > > > > -- > > Ben > > > _____________________________________________________________ > Seattle Perl Users Group Mailing List > POST TO: spug-list at pm.org > SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list > MEETINGS: 3rd Tuesdays > WEB PAGE: http://seattleperl.org/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From telcodev at gmail.com Wed Dec 23 15:27:42 2009 From: telcodev at gmail.com (Joseph Werner) Date: Wed, 23 Dec 2009 15:27:42 -0800 Subject: SPUG: Strange code effect In-Reply-To: <3650cdc00912231452h13cd7b48mae7e257f64805635@mail.gmail.com> References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> <3650cdc00912231452h13cd7b48mae7e257f64805635@mail.gmail.com> Message-ID: <4c93055f0912231527j137fd7c2t4d34bc3d106af402@mail.gmail.com> On Wed, Dec 23, 2009 at 2:52 PM, Skylos wrote: > ... > Which makes me think "something about that value when a shift is done uses > the value in a boolean context" > ... Thanks for the feedback Skylos, This is exactly the same conclusion that we have come too; but I STILL cannot place a boolean context in the a simple shift assignment. The powers that be crack the whip and say "Problem solved, move on". Code that does not behave the way I expect it to bothers me... Saltbreez On Wed, Dec 23, 2009 at 2:52 PM, Skylos wrote: > Thats interesting. > > So something I had considered when you described the problem initially - > > Hm, he's using overrides - that has peril because you are inserting yourself > inside of system operations on your data type. If he was crashing in his > override... or didn't define a needed one a null pointer might do that... > but naw, the default behavior should be intact if you don't custom define an > override... so that couldn't be the problem > > - was wrong, because the boolean override changed it. > > Which makes me think "something about that value when a shift is done uses > the value in a boolean context" > > Exactly what it is about a shift that initiates a boolean context - and why > failure to define an override bypassed the normal behavior - isn't clear to > me. But it makes perfect sense to me. Context is an unintuitive matter at > times - sometimes you use values in a context without realizing because the > context is implicit - for instance, anything in the final position of a > block has the context of the block itself. I've gone so far as to do > ; return (); } at the end of a block > explicitly to keep the returning operation in a void context, thus keeping > the value to myself. I wouldn't normally, but in this case it is an object > that absolutely must be destroyed at the exit of this block. To have it be > intercepted would lead to undefined operation within my system. Kind of > like resurrecting a destroyed object with funky DESTROY subs. > > It may be that the boolean expression evaluation within the shift routine is > purposeful - or it may merely be a side effect of some sequence that put the > object value into a boolean context somewhere. > > Skylos > > "If only I could get rid of hunger by rubbing my belly" - Diogenes > > > On Wed, Dec 23, 2009 at 2:21 PM, Joseph Werner wrote: >> >> Hey Ben, >> >> Thanks for the feedback! >> >> First, let me bring you up to current status. I attempted to >> construct, by building up to, the problem. This attempt failed to >> reproduce the central first issue: why was I unable to shift the >> object off of the array? However, it did answer the second question: >> The object failed to respond in a boolean context because one >> overloaded operator had been declared, but neither bool nor stringify >> had been declared. STRANGER still, fixing the second problem [bool >> overload] made the first [shift from array] problem disappear. I >> STILL cannot connect the two. Anyhow, problem solved, but the central >> question remains unanswered. >> >> No, we are not using Apache. Wish we were. I tread a tightrope of my >> nondisclosure commitments, but I can say: we use thin servers, what I >> have com to call 'campstove' servers [all the expectations of Apache >> in 2,000 lines of code or less] and due to the environment in which >> the server runs, either the crashes produce output to a log file, or >> we never see it. >> >> See additional comments below. >> >> On Wed, Dec 23, 2009 at 12:36 PM, BenRifkah Bergsten-Buret >> wrote: >> > See comments below. >> > >> > On Tue, Dec 22, 2009 at 8:49 PM, Joseph Werner >> > wrote: >> >> >> >> Hey guys [and gals], >> >> >> >> I encountered a strange Perl code phenomenon dealing with some object >> >> references, and and array of object references. Specifically, I could >> >> not shift a reference to an object off of an array of object >> >> references. More troubling, I also was unable to use the object >> >> references in a boolean context. I have written code very similar to >> >> this dozens if not hundreds of times and never encountered this type >> >> of problem... >> >> >> >> The code is all proprietary, and the powers that be frown on release >> >> of any sort, so I will pseudo code what I am talking about. We are >> >> running Active State Perl 5.10 on a Win32 platform, if that matters >> >> >> >> my $some_parms = shift; >> >> my $objhref = get_objects($some_parms); >> >> # $objhref = { >> >> # '01' => bless( {}, 'ASpecialObject' ), >> >> # '03' => bless( {}, 'ASpecialObject' ), >> >> # '02' => bless( {}, 'ASpecialObject' ) >> >> # }; >> >> >> > >> > Have you tried assigning anything from $objhref directly at this point? >> > For >> > example, do you get an error if you try to do $myobj = $objhref->{01}? >> >> Yes, the hash ref is fully loaded and performs these type of operations. >> >> > >> >> >> >> # ASpecialObject class does have the <=> overloaded >> >> # But I cannot see the making a difference... >> >> >> >> # Sorting works fine: >> >> >> >> my @arrayoforefs; >> >> eval { >> >> @arrayoforefs = sort { $a <=> $b } values %{$objhref}; >> >> } >> >> # Error checks ignored >> > >> > Since you've wrapped this in an eval it seems like you're expecting >> > exceptions to be thrown. Why ignore them? See if there is an exception >> > that might lead you in the right direction. >> >> My BAD. Ignoring exceptions was only pseudocode. In production, we do >> trap the exceptions. But it is a sunny day scenario [from the point of >> view of the evaled statement] that is producing the problem, so I >> omitted any checking to keep the waters clear. >> >> > >> > Also, are you able to access anything within @arrayoforefs without using >> > an >> > assignment? For example what happens when you do warn "arrayofrefs[0]: >> > @arrayofrefs[0]"? Do the server crash? >> >> Not certian if what @arrayofrefs[0] is what you mean? A slice of one >> element? I will try and get back to you. >> >> > >> >> >> >> # Server dies silently at the following statement: >> >> >> >> my $chosenO = shift @arrayoforefs; >> >> >> >> # If I do this instead: >> >> >> >> my $chosenO = $arrayoforefs[0]; >> >> >> >> # Then this causes the server to silently crash: >> >> >> >> return unless $chosenO; >> > >> > You've said "server dies silently" and "silently crash" but I'm not >> > clear >> > what you mean. Is this two different things or the same? By "crash" do >> > you >> > mean that the server process exits with a core dump? Is this an Apache >> > server? Mod_perl? When perl "dies" it generates a message. Also, in >> > my >> > experience when Apache dumps core it puts a message in the log but >> > you've >> > said "silently" so perhaps something else is going on. >> >> Yes, silently. No core. Frustrating, nothing prints to the log at all, >> just the server is no longer running. >> >> > >> > My suggestion is to code up the simplest case you can where the problem >> > is >> > encountered. Strip out as much code as possible that doesn't directly >> > relate to what you're doing. For example, the following script can be >> > run >> > outside of your web server and outside of your Mason environment: >> >> Unfortunately my attempts to reproduce the array shift issue outside >> of the server have failed. >> >> Hey Thanks again for the feedback >> >> > >> > #!/usr/bin/perl >> > >> > use strict; >> > use warnings; >> > use lib "/my/code/path"; >> > use ASpecialObject; >> > >> > my $objects = { >> > 01 => ASpecialObject->new(), >> > 02 => ASpecialObject->new(), >> > }; >> > >> > my @sorted = sort {$a <=> $b} values %{$objects}; >> > >> > #Does this result in the same problems you're having? >> > my $chosen0 = shift @sorted; >> > print "If you see this then the script is done\n"; >> > __END__ >> > >> > You'll probably have to include some object initialization code but if >> > you >> > still have problems with this then you know the problem is somewhere in >> > ASpecialObject.pm. Then you can start stripping out the code from there >> > to >> > find the minimum amount of code that causes the problem. >> > >> > -- >> > Ben >> > >> _____________________________________________________________ >> Seattle Perl Users Group Mailing List >> POST TO: spug-list at pm.org >> SUBSCRIPTION: http://mail.pm.org/mailman/listinfo/spug-list >> MEETINGS: 3rd Tuesdays >> WEB PAGE: http://seattleperl.org/ > > -- I require any third parties to obtain my permission to submit my information to any other party for each such submission. I further require any third party to follow up on any submittal of my information by sending detailed information regarding each such submission to telcodev at gmail.com Joseph Werner From mail.spammagnet at gmail.com Wed Dec 23 16:07:23 2009 From: mail.spammagnet at gmail.com (BenRifkah Bergsten-Buret) Date: Wed, 23 Dec 2009 16:07:23 -0800 Subject: SPUG: Strange code effect In-Reply-To: <4c93055f0912231527j137fd7c2t4d34bc3d106af402@mail.gmail.com> References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> <3650cdc00912231452h13cd7b48mae7e257f64805635@mail.gmail.com> <4c93055f0912231527j137fd7c2t4d34bc3d106af402@mail.gmail.com> Message-ID: On Wed, Dec 23, 2009 at 3:27 PM, Joseph Werner wrote: > On Wed, Dec 23, 2009 at 2:52 PM, Skylos wrote: > > ... > > Which makes me think "something about that value when a shift is done > uses > > the value in a boolean context" > > ... > > Thanks for the feedback Skylos, This is exactly the same conclusion > that we have come too; but I STILL cannot place a boolean context in > the a simple shift assignment. The powers that be crack the whip and > say "Problem solved, move on". Code that does not behave the way I > expect it to bothers me... > > I was considering that shift had an implicit boolean context as well so did some digging. Based on my test implementation it appears that shift doesn't have an implicit boolean context. Perhaps the boolean context is occuring after the $choosenO is returned? Here's my test script is_shift_boolean.pl: 1 #!/usr/bin/perl 2 3 use strict; 4 use warnings; 5 use Carp; 6 7 my @objects = map{Snoop->new()} 1..3; 8 9 # Implicit boolean context here? 10 my $first = shift @objects; 11 12 # explicit boolean context here 13 if ($first) { 14 # nothing to do here. 15 } 16 17 package Snoop; 18 19 use overload ( 20 q{bool} => sub { 21 my $self = shift; 22 Carp::confess("Boolean context on objnum $self->{objnum}"); 23 return $self; 24 }, 25 ); 26 27 my $objnum = 0; 28 29 sub new { 30 my $class = shift; 31 return bless {objnum => $objnum++}, $class; 32 } 33 __END__ This uses the overload pragma to do the operator overloading so if you're using something else the results may be different. Upon execution I got the following output: Boolean context on objnum 0 at is_shift_boolean.pl line 22 Snoop::__ANON__('Snoop=HASH(0x814ccd4)', 'undef', '') called at is_shift_boolean.pl line 13 This reports only one boolean context in the if statement at line 13. There is no report of boolean context from line 10 where the shift is. A mystery for the ages, -- Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylos at gmail.com Wed Dec 23 16:19:40 2009 From: skylos at gmail.com (Skylos) Date: Wed, 23 Dec 2009 16:19:40 -0800 Subject: SPUG: Strange code effect In-Reply-To: References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> <3650cdc00912231452h13cd7b48mae7e257f64805635@mail.gmail.com> <4c93055f0912231527j137fd7c2t4d34bc3d106af402@mail.gmail.com> Message-ID: <3650cdc00912231619w21841f15x1ab7aad5e0b8e3d6@mail.gmail.com> Ooh good point Ben! What is there was something else that hit the bool functionality of the object though? It occurs to me that sorting can be a complex operation - maybe something happened when it was sorted, like this mod of your lil test script? 1 #!/usr/bin/perl 2 3 use strict; 4 use warnings; 5 use Carp; 6 7 my @objects = sort reverse map{Snoop->new()} 1..3; 8 9 # Implicit boolean context here? 10 my $first = shift @objects; 11 12 # explicit boolean context here 13 if ($first) { 14 # nothing to do here. 15 } 16 17 package Snoop; 18 19 use overload ( 20 q{bool} => sub { 21 my $self = shift; 22 Carp::confess("Boolean context on objnum $self->{objnum}"); 23 return $self; 24 }, 25 q{cmp} => sub { 26 my $self = shift; 27 my $other = shift; 28 if ($self) { 29 Carp::confess "Did I bool in the cmp?"; 30 } 31 return $self->{objnum} <=> $other->{objnum}; 32 }, 33 ); 34 35 my $objnum = 0; 36 37 sub new { 38 my $class = shift; 39 return bless {objnum => $objnum++}, $class; 40 } 41 __END__ That outputs for me: Boolean context on objnum 2 at testbool line 22 Snoop::__ANON__('Snoop=HASH(0x8924c8c)', 'undef', '') called at testbool line 28 Snoop::__ANON__('Snoop=HASH(0x8924c8c)', 'Snoop=HASH(0x8918258)', '') called at testbool line 7 One thing I've learned for sure now is that if you're going to use the override pragma, you probably should define something for *all* of the operators/contexts - otherwise, you risk the undefined operation error all over the place! I couldn't sort the array without defining the cmp operation... But seriously, maybe thats how it happened, due to the lack of actual feedback about where the error was, it perhaps *seemed* to be on the shift when it was somewhere down in the sort? An idea at any rate, Skylos "If only I could get rid of hunger by rubbing my belly" - Diogenes On Wed, Dec 23, 2009 at 4:07 PM, BenRifkah Bergsten-Buret < mail.spammagnet at gmail.com> wrote: > On Wed, Dec 23, 2009 at 3:27 PM, Joseph Werner wrote: > >> On Wed, Dec 23, 2009 at 2:52 PM, Skylos wrote: >> > ... >> > Which makes me think "something about that value when a shift is done >> uses >> > the value in a boolean context" >> > ... >> >> Thanks for the feedback Skylos, This is exactly the same conclusion >> that we have come too; but I STILL cannot place a boolean context in >> the a simple shift assignment. The powers that be crack the whip and >> say "Problem solved, move on". Code that does not behave the way I >> expect it to bothers me... >> >> > I was considering that shift had an implicit boolean context as well so did > some digging. Based on my test implementation it appears that shift doesn't > have an implicit boolean context. Perhaps the boolean context is occuring > after the $choosenO is returned? > > Here's my test script is_shift_boolean.pl: > 1 #!/usr/bin/perl > 2 > 3 use strict; > 4 use warnings; > 5 use Carp; > 6 > 7 my @objects = map{Snoop->new()} 1..3; > 8 > 9 # Implicit boolean context here? > 10 my $first = shift @objects; > 11 > 12 # explicit boolean context here > 13 if ($first) { > 14 # nothing to do here. > 15 } > 16 > 17 package Snoop; > 18 > 19 use overload ( > 20 q{bool} => sub { > 21 my $self = shift; > 22 Carp::confess("Boolean context on objnum $self->{objnum}"); > 23 return $self; > 24 }, > 25 ); > 26 > 27 my $objnum = 0; > 28 > 29 sub new { > 30 my $class = shift; > 31 return bless {objnum => $objnum++}, $class; > 32 } > 33 __END__ > > This uses the overload pragma to do the operator overloading so if you're > using something else the results may be different. > > Upon execution I got the following output: > Boolean context on objnum 0 at is_shift_boolean.pl line 22 > Snoop::__ANON__('Snoop=HASH(0x814ccd4)', 'undef', '') called at > is_shift_boolean.pl line 13 > > This reports only one boolean context in the if statement at line 13. > There is no report of boolean context from line 10 where the shift is. > > A mystery for the ages, > > -- > Ben > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derykus at gmail.com Wed Dec 23 18:46:18 2009 From: derykus at gmail.com (Charles DeRykus) Date: Wed, 23 Dec 2009 18:46:18 -0800 Subject: SPUG: Strange code effect In-Reply-To: <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> References: <4c93055f0912222049i329ae952s746ea69a9941f4f6@mail.gmail.com> <4c93055f0912231421g724d27e2w122de21787cc80e3@mail.gmail.com> Message-ID: <175825720912231846w9bf180emc893db088d2ad442@mail.gmail.com> > ... > # Server dies silently at the following statement > my $chosenO = shift @arrayoforefs; > # If I do this instead: > my $chosenO = $arrayoforefs[0]; > # Then this causes the server to silently crash: > return unless $chosenO; > ... In my experience, when the server goes "silently into that good night", an untrapped signal is often the culprit. You might try using Signal::StackTrace to get a stack dump. Just identify some of the usual signal suspects, eg, use Signal::StackTrace qw/ SEGV TERM PIPE /; # etc... And the trace will help pin down what's happened. -- Charles DeRykus -------------- next part -------------- An HTML attachment was scrubbed... URL: From jobs-noreply at seattleperl.org Tue Dec 29 17:11:28 2009 From: jobs-noreply at seattleperl.org (SPUG Jobs) Date: Tue, 29 Dec 2009 17:11:28 -0800 (PST) Subject: SPUG: JOB: Perl 12 mo contract, GE Healthcare Message-ID: GE Healthcare is looking for a Perl contractor to work on tools that assist in configuration management tasks. We deploy much of our code on an older port of Perl running on the HP NonStop (Tandem). Required skillset: * Fluent in idiomatic Perl. * Able to control complexity. * Comfortable with pair programming and rigorous code reviews. * Configuration management expertise on Unix or Linux systems a plus This position is located in Seattle and the expected contract duration is 12 months. Telecommuting is possible for part of the time, after the first few months in the Seattle office to get off to a good start. Pay range is negotiable depending on qualifications. Placement will be through a contracting company. Please contact Rob Parks at rob.parks at ge.com to apply. From twists at gmail.com Thu Dec 31 13:15:50 2009 From: twists at gmail.com (Joshua ben Jore) Date: Thu, 31 Dec 2009 13:15:50 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> Message-ID: On Tue, Nov 17, 2009 at 1:33 PM, Michael R. Wolf wrote: > Yes, I know that XPath can only be applied to well-formed XML. > > That's the theoretical, pure, absolute truth. > > I'm working in the real world where I can't find a well-formed page. ?(For > instance, http://validator.w3c.org does not validate such biggies as > amazon.com, ask.com, google.com, or msn.com). ?For (my) practical purposes, > there are no valid pages. > > What am I to (practically, not theoretically) do? > > What tricks do practical XPath users know that I might not? > > I'm trying to scrape pages across sites to aggregate data. > > I'm loathe to use regular expressions for all the pure reasons, but if pure > isn't workable outside the ivory towers, that purity is useless in the real > world. > > I've already tried: > ? ?tidy -asxhtml > ? ?tidy -asxml > ? ?HTML::TokeParser > ? ?XML::XPath > ? ?XML::LibXML I've happily used XML::LibXML per Randal Schwartz in Linux Magazine (Jun 2003) at http://www.stonehenge.com/merlyn/LinuxMag/col49.html Josh From MichaelRWolf at att.net Thu Dec 31 13:41:51 2009 From: MichaelRWolf at att.net (Michael R. Wolf) Date: Thu, 31 Dec 2009 13:41:51 -0800 Subject: SPUG: XPath on (less-than-perfect) HTML In-Reply-To: References: <05999990-5A43-48C4-8AB4-FB84859EFE99@att.net> Message-ID: On Dec 31, 2009, at 1:15 PM, Joshua ben Jore wrote: > On Tue, Nov 17, 2009 at 1:33 PM, Michael R. Wolf > wrote: >> Yes, I know that XPath can only be applied to well-formed XML. >> >> That's the theoretical, pure, absolute truth. > > I've happily used XML::LibXML per Randal Schwartz in Linux Magazine > (Jun 2003) at http://www.stonehenge.com/merlyn/LinuxMag/col49.html Thanks. Randal's article(s) were one of my motivations for using XPATH. I got my code working after fixing two version problems on my Mac, both of which I think were nice, though in hind sight, I don't think change #1 was strictly necessary. Without a deep analysis of the changes, my I went with my gut (and the expertise of the authors) and updated the CPAN module. 1. Updated XML::LibXML to version 1.70 from CPAN 2. updated libxml2 (version 2.7.6) from macports I've appended a fragment of the code I got working. It's not yet perfect (for some[1] definition of perfect), but it works. That is, I did the elegant "growth" phase but haven't completed the elegant "prune" phase. Enjoy, Michael Notes: 1. For *this* definition of perfection... Perfection is achieved not when you have nothing more to add, but when you have nothing left to take away. -- Antoine de Saint-Exupery -- as quoted on http://perlgolf.sourceforge.net ================================================================ my %parse_options = ( #suppress_warnings => 1, suppress_errors => 1, recover => 1, # validation => 0, ); # Former versions... my $dom; if (XML::LibXML->can('load_html')) { # Works on mac at v1.70, but not on PC at v1.65 # my $dom = $parser->load_html(string=>$content, \%parse_options); $dom = XML::LibXML->load_html(string=>$content, \%parse_options); } else { # Works on PC at v1.65 my $parser = XML::LibXML->new(\%parse_options); my $doc = $parser->parse_html_string($content, \%parse_options); $dom = $doc; } #... snip, snip... my @nodes = $dom->findnode($xpath); -- Michael R. Wolf All mammals learn by playing! MichaelRWolf at att.net