From toby.corkindale at strategicdata.com.au Wed Oct 3 16:56:44 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Thu, 04 Oct 2012 09:56:44 +1000 Subject: [Melbourne-pm] PostgreSQL and indexing with json types Message-ID: <506CD0BC.1080704@strategicdata.com.au> Hi, Just something interesting I noticed today.. As you've probably already seen, PostgreSQL 9.2 added support for storing JSON documents in a 'json' type field. It also adds the V8 javascript engine as a first-class procedural language. Initially I thought that you'd probably end up with a lot of tables with key metadata separated out into your int/char/etc types so you could index it.. but someone has pointed out that you can index into the json data using the v8 engine and Postgres' functional indexes. That is reasonably nifty. Currently you need to write a couple of helper functions yourself, but I suspect we'll see more of this brought into the built-in functions by 9.3. For examples, see: http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html Cheers, Toby From alfiej at opera.com Wed Oct 3 17:06:10 2012 From: alfiej at opera.com (Alfie John) Date: Thu, 04 Oct 2012 10:06:10 +1000 Subject: [Melbourne-pm] Meeting this month? Message-ID: <1349309170.2714.140661136187497.08AF4515@webmail.messagingengine.com> Hi guys, We were planning on having Rob Norris talk this month but he won't be able to present this month. Does anyone have a talk they would like to present for next week? Alfie -- Alfie John alfiej at opera.com From daniel at rimspace.net Wed Oct 3 18:40:57 2012 From: daniel at rimspace.net (Daniel Pittman) Date: Wed, 3 Oct 2012 18:40:57 -0700 Subject: [Melbourne-pm] PostgreSQL and indexing with json types In-Reply-To: <506CD0BC.1080704@strategicdata.com.au> References: <506CD0BC.1080704@strategicdata.com.au> Message-ID: On Wed, Oct 3, 2012 at 4:56 PM, Toby Corkindale wrote: > Just something interesting I noticed today.. > > As you've probably already seen, PostgreSQL 9.2 added support for storing > JSON documents in a 'json' type field. It also adds the V8 javascript engine > as a first-class procedural language. PostgreSQL, the best FOSS NoSQL available. :) > Initially I thought that you'd probably end up with a lot of tables with key > metadata separated out into your int/char/etc types so you could index it.. > but someone has pointed out that you can index into the json data using the > v8 engine and Postgres' functional indexes. > > That is reasonably nifty. Currently you need to write a couple of helper > functions yourself, but I suspect we'll see more of this brought into the > built-in functions by 9.3. Their XML and HSTORE engines have done unstructured data for a while, and the indexing over them (which is pretty much equivalent to the JSON indexing) is really, really good. So, not only does it work, it works damn well. Better still, combining unstructured and structured storage in one engine works well, and you get all the usual transactional magic. -- Daniel Pittman ? Made with 100 percent post-consumer electrons From slundie at westpac.com.au Thu Oct 4 17:08:45 2012 From: slundie at westpac.com.au (Sam Lundie) Date: Fri, 5 Oct 2012 10:08:45 +1000 Subject: [Melbourne-pm] I'm Overseas Message-ID: I will be out of the office starting 21/09/2012 and will not return until 09/10/2012. For anything urgent please see Josh Nast , for TRP enquiries Julie Langford. Thanks, Sam Lundie Unless otherwise stated, this email is confidential. If received in error, please delete and inform the sender by return email. Unauthorised use, copying or distribution is prohibited. Westpac Banking Corporation (ABN 33 007 457 141) is not responsible for viruses, or for delays, errors or interception in transmission. Unless stated or apparent from its terms, any opinion is not the opinion of Westpac Banking Corporation. This message also includes information on Westpac Institutional Bank available at westpac.com.au/wibinfo From alfiej at opera.com Tue Oct 9 14:32:42 2012 From: alfiej at opera.com (Alfie John) Date: Wed, 10 Oct 2012 08:32:42 +1100 Subject: [Melbourne-pm] No meeting this month In-Reply-To: <1349309170.2714.140661136187497.08AF4515@webmail.messagingengine.com> References: <1349309170.2714.140661136187497.08AF4515@webmail.messagingengine.com> Message-ID: <1349818362.28254.140661138623281.62A386F2@webmail.messagingengine.com> Hi guys, On Thu, Oct 4, 2012, at 11:06 AM, Alfie John wrote: > We were planning on having Rob Norris talk this month but he won't be > able to present this month. Does anyone have a talk they would like to > present for next week? Unfortunately with the lack of talks there will be no meeting tonight. Alfie -- Alfie John alfiej at opera.com From cas at taz.net.au Tue Oct 9 15:18:14 2012 From: cas at taz.net.au (Craig Sanders) Date: Wed, 10 Oct 2012 09:18:14 +1100 Subject: [Melbourne-pm] PostgreSQL and indexing with json types In-Reply-To: <506CD0BC.1080704@strategicdata.com.au> References: <506CD0BC.1080704@strategicdata.com.au> Message-ID: <20121009221814.GA6924@taz.net.au> On Thu, Oct 04, 2012 at 09:56:44AM +1000, Toby Corkindale wrote: > Hi, > Just something interesting I noticed today.. > > As you've probably already seen, PostgreSQL 9.2 added support for > storing JSON documents in a 'json' type field. this is a cool and interesting feature but it will serve to encourage even more morons to avoid even basic/minimalist database normalisation methods. i'm getting tired of seeing FOSS programs that use databases just stuff JSON or XML or CSV or similar into a text field. their reasoning seems to be that the data will only ever be used by their own web app or via their REST API so there's no need to spend any time thinking about or designing the database schema (or maybe there's no reasoning and they're just ignorant and stupid). Data lock-in is no less irritating in FOSS apps than it is in proprietary apps. also, why do something in the database server when you can do a half-arsed emulation of it 10,000 times less efficiently in your crappy php web app? pg's support for JSON fields isn't a bad thing - overall, it's a good thing...it will mitigate some of the problems - but it won't do anything to educate web-dev morons about database design. craig -- craig sanders BOFH excuse #233: TCP/IP UDP alarm threshold is set too low. From mathew.blair.robertson at gmail.com Tue Oct 9 16:09:41 2012 From: mathew.blair.robertson at gmail.com (Mathew Robertson) Date: Wed, 10 Oct 2012 10:09:41 +1100 Subject: [Melbourne-pm] PostgreSQL and indexing with json types In-Reply-To: <20121009221814.GA6924@taz.net.au> References: <506CD0BC.1080704@strategicdata.com.au> <20121009221814.GA6924@taz.net.au> Message-ID: > > > Just something interesting I noticed today.. > > > > As you've probably already seen, PostgreSQL 9.2 added support for > > storing JSON documents in a 'json' type field. > > this is a cool and interesting feature but it will serve to encourage > even more morons to avoid even basic/minimalist database normalisation > methods. > > [snipped] > > pg's support for JSON fields isn't a bad thing - overall, it's a good > thing...it will mitigate some of the problems - but it won't do anything > to educate web-dev morons about database design. > While having support for more languages within the server, is generally a good thing... I am generally in agreement that storing JSON directly to the database, is a bad thing -> as it will encourage developers to no bother to sanitize their inputs. cheers, Mathew -------------- next part -------------- An HTML attachment was scrubbed... URL: From toby.corkindale at strategicdata.com.au Tue Oct 9 16:42:41 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Wed, 10 Oct 2012 10:42:41 +1100 Subject: [Melbourne-pm] PostgreSQL and indexing with json types In-Reply-To: <20121009221814.GA6924@taz.net.au> References: <506CD0BC.1080704@strategicdata.com.au> <20121009221814.GA6924@taz.net.au> Message-ID: <5074B671.90103@strategicdata.com.au> On 10/10/12 09:18, Craig Sanders wrote: > On Thu, Oct 04, 2012 at 09:56:44AM +1000, Toby Corkindale wrote: >> Hi, >> Just something interesting I noticed today.. >> >> As you've probably already seen, PostgreSQL 9.2 added support for >> storing JSON documents in a 'json' type field. > > this is a cool and interesting feature but it will serve to encourage > even more morons to avoid even basic/minimalist database normalisation > methods. > > i'm getting tired of seeing FOSS programs that use databases just > stuff JSON or XML or CSV or similar into a text field. > > their reasoning seems to be that the data will only ever be used by > their own web app or via their REST API so there's no need to spend any > time thinking about or designing the database schema (or maybe there's > no reasoning and they're just ignorant and stupid). Data lock-in is no > less irritating in FOSS apps than it is in proprietary apps. > > also, why do something in the database server when you can do a > half-arsed emulation of it 10,000 times less efficiently in your crappy > php web app? > > pg's support for JSON fields isn't a bad thing - overall, it's a good > thing...it will mitigate some of the problems - but it won't do anything > to educate web-dev morons about database design. Nothing has stopped people putting arbitrary documents into TEXT or BYTEA fields before -- I've seen plenty of cases where people use Perl's Storable, Python's Pickles, YAML, or custom serialisation, and just dump that into a text field in the database. In one company I worked at, someone had even written a query-creator which would build the appropriate LIKE phrase to search within the serialised format! (But needless to say, it performed atrociously slowly, because you're doing a full table scan every time..) I agree with you, that this practice is usually indicative of poor design -- even if the database offers ways to index and query the document natively. (Which is still a big improvement though) However, I think you're missing a point. Sometimes you are really storing a whole document, or at least a large amount of very structured, self-contained data. For these cases, JSON storage is quite useful and document storage is a lot neater (and performant) than having to write a whole heap of code to deparse and reparse it into all the constituent components. -Toby From toby.corkindale at strategicdata.com.au Wed Oct 10 15:33:11 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Thu, 11 Oct 2012 09:33:11 +1100 Subject: [Melbourne-pm] Survey about newcomer experience and citizenship behavior in the Perl community Message-ID: <5075F7A7.8030609@strategicdata.com.au> Forwarding this along as requested.. # ---------------------------------------- Hi, My name is Kevin Carillo and I am a PhD student at the School of Information Management of Victoria University of Wellington (New Zealand). I am currently running a survey that aims at studying how the experience of a Perl community newcomer has an influence on this person's actions and project contributions in the community. I would like to kindly request the leaders of the PerlMongers group to forward the survey invitation to their respective pm mailing lists. The more respondents we get, the more the data will help the overall Perl community. The dataset will be released under a CC license. Karen Pauley, Nat Torkington, and Mark Keating have already been informed about the research project and they all have been supportive and helpful. The study has been already advertised in different Perl resources. The survey targets contributors to Perl sub-projects endorsed by the Perl Foundation and who joined Perl within the last 2 years. You can find a blog post about the research project on blogs.perl.org that can be found at: http://blogs.perl.org/users/kevin_carillo/2012/10/newcomer-experience-and-contributor-behavior-in-perl-and-other-foss-communities---survey.html The direct link to the survey is: https://limesurvey.sim.vuw.ac.nz/index.php?sid=89971&lang=en This survey is anonymous, and no information is used to identify participants. The Human Ethics Committee of the School of Information Management has approved this research project. Thank you, Kevin Carillo School of Information Management Victoria University of Wellington PO Box 600, Wellington NEW ZEALAND (04) 463 5233 ext. 8679 | Room RH401 kevin.carillo at sim.vuw.ac.nz http://kevincarillo.org/ From alfiej at opera.com Tue Oct 16 15:52:57 2012 From: alfiej at opera.com (Alfie John) Date: Wed, 17 Oct 2012 09:52:57 +1100 Subject: [Melbourne-pm] Meeting for November? Message-ID: <1350427977.23416.140661141647593.246CB243@webmail.messagingengine.com> Hi guys, Does anyone have a talk they would like to present for a November meeting? Alfie -- Alfie John alfiej at opera.com From nathan.bailey at monash.edu Wed Oct 17 00:09:22 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Wed, 17 Oct 2012 18:09:22 +1100 Subject: [Melbourne-pm] Regexp: What's the right way to do this? Message-ID: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> The code below works, but the commented out bits don't. I presume that when $shh and $smm are defined on the first loop through, they get undefined on the next time through? What's the "right" way to do this, TIMTOWTDI notwithstanding :-) N while(<>) { #if (($shh,$smm) = m#^\s*
(\d+):(\d+) -#) { if (m#^\s*
(\d+):(\d+) -#) { $shh = $1; $smm = 2; $event++; #} elsif (($fhh,$fmm) = m#^\s*(\d+):(\d+)
#) { } elsif (m#^\s*(\d+):(\d+)
#) { $fhh = $1; $fmm = 2; $event++; #} elsif (($summary) = m#^\s*

(.*)

#) { } elsif (m#^\s*

(.*)

#) { $summary = $1; $event++; } elsif (($description) = m#^\s*

(.*)

#) { $event++; } } From tjc at wintrmute.net Wed Oct 17 00:10:52 2012 From: tjc at wintrmute.net (Toby Wintermute) Date: Wed, 17 Oct 2012 18:10:52 +1100 Subject: [Melbourne-pm] Regexp: What's the right way to do this? In-Reply-To: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> Message-ID: On 17 October 2012 18:09, Nathan Bailey wrote: > The code below works, but the commented out bits don't. I presume that when $shh and $smm are defined on the first loop through, they get undefined on the next time through? > > What's the "right" way to do this, TIMTOWTDI notwithstanding :-) use HTML::TreeBuilder; -Toby From nathan.bailey at monash.edu Wed Oct 17 00:11:54 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Wed, 17 Oct 2012 18:11:54 +1100 Subject: [Melbourne-pm] Regexp: What's the right way to do this? In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> Message-ID: <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> I knew someone would say that :P It's a regexp question, not an HTML parsing question! N On 17/10/2012, at 6:10 PM, Toby Wintermute wrote: > On 17 October 2012 18:09, Nathan Bailey wrote: >> The code below works, but the commented out bits don't. I presume that when $shh and $smm are defined on the first loop through, they get undefined on the next time through? >> >> What's the "right" way to do this, TIMTOWTDI notwithstanding :-) > > > use HTML::TreeBuilder; > > > -Toby From schwern at pobox.com Wed Oct 17 01:14:29 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 01:14:29 -0700 Subject: [Melbourne-pm] Regexp: What's the right way to do this? In-Reply-To: <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> Message-ID: <507E68E5.3030705@pobox.com> "Should I use regexes to parse HTML?" No, do not use regexes to parse HTML. While it may seem easy to put together a quick and dirty HTML scanner with regexes, it will very quickly get very ugly. HTML parsing requires matching balanced characters and tags such as < and > and quotes which regexes do very poorly in addition to all the little special cases like comments. In addition, you're going to forget many small things, like casing and spaces, which you'll be hunting down forever. For example...
2:3 -
2:3 -

blah

blah

If you patch up your regexes to cover those, maybe an activity for the next meeting might be to come up with more to break your regexes. :) There, your regex question is answered. :P It's quicker even in the short run to use a pre existing, well documented, parser like HTML::TreeBuilder as evidenced by the fact that you're posting on a mailing list for help with your regex based HTML parser. You even get search facilities like XPath (see HTML::TreeBuilder::XPath and http://www.w3schools.com/xpath/). use HTML::TreeBuilder::XPath; use v5.14; my $tree= HTML::TreeBuilder::XPath->new; $tree->parse_file(shift); my @event_times = $tree->findnodes( '//div[starts-with(@class, "event-time-calendar-")]' ); for my $event_time (@event_times) { my($hour, $min) = $event_time->as_text =~ /(\d+):(\d+)/; say "Event at $hour:$min"; } Once you learn how to use an HTML parser and XPath you'll never have to write a hacky HTML regex parser again. O(1) learning efficiency! If you're doing this as an exercise in learning regexes, well, don't ignore the lesson just because its not what you expected to learn. If you want to learn "from scratch" look into writing a grammar parser. On 2012.10.17 12:11 AM, Nathan Bailey wrote:> I knew someone would say that :P > > It's a regexp question, not an HTML parsing question! > N > > On 17/10/2012, at 6:10 PM, Toby Wintermute wrote: > >> On 17 October 2012 18:09, Nathan Bailey wrote: >>> The code below works, but the commented out bits don't. I presume that when $shh and $smm are defined on the first loop through, they get undefined on the next time through? >>> >>> What's the "right" way to do this, TIMTOWTDI notwithstanding :-) >> >> >> use HTML::TreeBuilder; >> >> >> -Toby > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > -- s7ank: i want to be one of those guys that types "s/j&jd//.^$ueu*///djsls/sm." and it's a perl script that turns dog crap into gold. From nathan.bailey at monash.edu Wed Oct 17 03:36:56 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Wed, 17 Oct 2012 21:36:56 +1100 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <507E68E5.3030705@pobox.com> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> Message-ID: <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> I really wish I had obfuscated the contents of the regexps :-( My question, which I thought I had clearly stated, related to the lexical scope of capture buffers, and why one approach to capture buffers worked and another didn't. Let's try again: #if (($start_time) = m#^\s*(\d+:\d+) -#) { if (m#^\s*(\d+:\d+) -#) { $start_time = $2; #} elsif (($finish_time) = m#^\s*(\d+:\d+)#) { } elsif (m#^\s*(\d+:\d+)#) { $finish_time = $1; $event++; } Why is $start_time undefined when we get to $finish_time in the first version (commented out) and not in the second? And is there a good/better way to collect multiple values over multiple lines than this? thanks, Nathan On 17/10/2012, at 7:14 PM, Michael G Schwern wrote: > "Should I use regexes to parse HTML?" > > No, do not use regexes to parse HTML. While it may seem easy to put together > a quick and dirty HTML scanner with regexes, it will very quickly get very > ugly. HTML parsing requires matching balanced characters and tags such as < > and > and quotes which regexes do very poorly in addition to all the little > special cases like comments. > > In addition, you're going to forget many small things, like casing and spaces, > which you'll be hunting down forever. For example... > >
2:3 -
>
2:3 -
>

blah

>

blah

> > > If you patch up your regexes to cover those, maybe an activity for the next > meeting might be to come up with more to break your regexes. :) > > There, your regex question is answered. :P > > It's quicker even in the short run to use a pre existing, well documented, > parser like HTML::TreeBuilder as evidenced by the fact that you're posting on > a mailing list for help with your regex based HTML parser. You even get > search facilities like XPath (see HTML::TreeBuilder::XPath and > http://www.w3schools.com/xpath/). > > use HTML::TreeBuilder::XPath; > use v5.14; > > my $tree= HTML::TreeBuilder::XPath->new; > $tree->parse_file(shift); > > my @event_times = $tree->findnodes( > '//div[starts-with(@class, "event-time-calendar-")]' > ); > > for my $event_time (@event_times) { > my($hour, $min) = $event_time->as_text =~ /(\d+):(\d+)/; > say "Event at $hour:$min"; > } > > Once you learn how to use an HTML parser and XPath you'll never have to write > a hacky HTML regex parser again. O(1) learning efficiency! > > If you're doing this as an exercise in learning regexes, well, don't ignore > the lesson just because its not what you expected to learn. If you want to > learn "from scratch" look into writing a grammar parser. > > > On 2012.10.17 12:11 AM, Nathan Bailey wrote:> I knew someone would say that :P >> >> It's a regexp question, not an HTML parsing question! >> N >> >> On 17/10/2012, at 6:10 PM, Toby Wintermute wrote: >> >>> On 17 October 2012 18:09, Nathan Bailey wrote: >>>> The code below works, but the commented out bits don't. I presume that > when $shh and $smm are defined on the first loop through, they get undefined > on the next time through? >>>> >>>> What's the "right" way to do this, TIMTOWTDI notwithstanding :-) >>> >>> >>> use HTML::TreeBuilder; >>> >>> >>> -Toby >> >> _______________________________________________ >> Melbourne-pm mailing list >> Melbourne-pm at pm.org >> http://mail.pm.org/mailman/listinfo/melbourne-pm >> > > > -- > s7ank: i want to be one of those guys that types "s/j&jd//.^$ueu*///djsls/sm." > and it's a perl script that turns dog crap into gold. From shlomif at shlomifish.org Wed Oct 17 04:38:26 2012 From: shlomif at shlomifish.org (Shlomi Fish) Date: Wed, 17 Oct 2012 13:38:26 +0200 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> Message-ID: <20121017133826.76dcbeb3@lap.shlomifish.org> Hi Nathan, On Wed, 17 Oct 2012 21:36:56 +1100 Nathan Bailey wrote: > I really wish I had obfuscated the contents of the regexps :-( > > My question, which I thought I had clearly stated, related to the > lexical scope of capture buffers, and why one approach to capture > buffers worked and another didn't. > > Let's try again: > #if (($start_time) = m#^\s*(\d+:\d+) -#) { > if (m#^\s*(\d+:\d+) -#) { > $start_time = $2; > #} elsif (($finish_time) = m#^\s*(\d+:\d+)#) { > } elsif (m#^\s*(\d+:\d+)#) { > $finish_time = $1; > $event++; > } > > Why is $start_time undefined when we get to $finish_time in the first > version (commented out) and not in the second? > > And is there a good/better way to collect multiple values over > multiple lines than this? > Your code is hard to follow. Can you post a reproducing example for the issue you're trying to resolve, without the need to uncomment stuff? You can also post a working and non-working version. Regards, Shlomi Fish > thanks, > Nathan -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Stop Using MSIE - http://www.shlomifish.org/no-ie/ But if you're writing [open source software] for the world, you have to listen to your customers ? this doesn't change just because they're not paying you in money. ? Eric S. Raymond in The Cathedral and the Bazaar Please reply to list if it's a mailing list post - http://shlom.in/reply . From nathan.bailey at monash.edu Wed Oct 17 05:24:03 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Wed, 17 Oct 2012 23:24:03 +1100 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> Message-ID: Oops, accidentally only replied to Shlomi. With the input file: 8:10 - 9:15 a.pl products: 8:10 - 9:15 but b.pl produces: Why is start_time undefined now? at b.pl line 8, <> line 2. - 9:15 I presume it fails because the regexp fails, returning undef which is then assigned to $start_time, rather than failing and skipping the assignment. I'm just wondering if there's a better way to grab text out of multiple lines that are related to each other. A simple solution would be to go for multi-line strings but I'm actually curious to know (a) if that's the way the evaluation of the regexp and assignment works and (b) if there are better ways of doing multi-line parsing, without simply treating it as one big complex line. N PS: Files attached in a tarball. -------------- next part -------------- A non-text attachment was scrubbed... Name: files.tgz Type: application/octet-stream Size: 366 bytes Desc: not available URL: From shlomif at shlomifish.org Wed Oct 17 05:49:44 2012 From: shlomif at shlomifish.org (Shlomi Fish) Date: Wed, 17 Oct 2012 14:49:44 +0200 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> Message-ID: <20121017144944.28fd3014@lap.shlomifish.org> Hi Nathan, On Wed, 17 Oct 2012 23:24:03 +1100 Nathan Bailey wrote: > Oops, accidentally only replied to Shlomi. > > With the input file: > 8:10 - > 9:15 > > a.pl products: > 8:10 - 9:15 > > but b.pl produces: > Why is start_time undefined now? at b.pl line 8, <> line 2. > - 9:15 > > I presume it fails because the regexp fails, returning undef which is > then assigned to $start_time, rather than failing and skipping the > assignment. Well, with use strict; my ($start_time, $finish_time); while (<>) { if (($start_time) = m#^\s*(\d+:\d+) -#) { ; } elsif (($finish_time) = m#^\s*(\d+:\d+)#) { ; warn "Why is start_time undefined now?" if !defined $start_time; print "$start_time - $finish_time\n"; } } if the clause in the "if" line fails, then the list ($start_time) will be assigned from the empty list, causing $start_time to become undef. What you want is: if (my ($new_start_time) = m#^\s*(\d+:\d+) -#) { $start_time = $new_start_time; } Untested. Regards, Shlomi Fish -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ UNIX Fortune Cookies - http://www.shlomifish.org/humour/fortunes/ Chuck Norris doesn?t commit changes, the changes commit for him. ? Araujo Please reply to list if it's a mailing list post - http://shlom.in/reply . From schwern at pobox.com Wed Oct 17 12:57:30 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 12:57:30 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> Message-ID: <507F0DAA.6030105@pobox.com> On 2012.10.17 3:36 AM, Nathan Bailey wrote: > I really wish I had obfuscated the contents of the regexps :-( > > My question, which I thought I had clearly stated, related to the lexical scope of capture buffers, and why one approach to capture buffers worked and another didn't. > > Let's try again: > #if (($start_time) = m#^\s*(\d+:\d+) -#) { > if (m#^\s*(\d+:\d+) -#) { > $start_time = $2; > #} elsif (($finish_time) = m#^\s*(\d+:\d+)#) { > } elsif (m#^\s*(\d+:\d+)#) { > $finish_time = $1; > $event++; > } > > Why is $start_time undefined when we get to $finish_time in the > first version (commented out) and not in the second? To clarify... if (($start_time) = m#^\s*(\d+:\d+) -#) { print "$start_time\n"; } elsif (($finish_time) = m#^\s*(\d+:\d+)#) { print "$finish_time\n"; } vs if (m#^\s*(\d+:\d+) -#) { $start_time = $2; # that should be $1 print "$start_time\n"; } elsif (m#^\s*(\d+:\d+)#) { $finish_time = $1; $event++; print "$finish_time\n"; } The regexes are a red herring. This has to do with how lexical variables and conditions work. Presumably you run this code more than once, maybe in a loop. And maybe $start_time and $finish_time are globals, or they're lexicals but declared outside the loop like this... my($start_time, $finish_time); while() { if( m#^\s*(\d+:\d+) -# ) { $start_time = $1; } elsif( m#^\s*(\d+:\d+)# ) { $finish_time = $1; } print "$start_time - $finish_time\n"; } In the above version $start_time and $finish_time are only changed if their regexes match. And because it's an if/elsif condition only one of them is going to change per loop. But their values persist from one loop to the next, so you're A) only ever going to get one of them set and B) you're always going to get one of them from the last loop. This is bad. my($start_time, $finish_time); while() { if( ($start_time) = m#^\s*(\d+:\d+) -# ) { ... } elsif( ($finish_time) = m#^\s*(\d+:\d+)# ) { ... } print "$start_time - $finish_time\n"; } You're in the same boat here, only now because the first condition always runs $start_time will always be set to something. Maybe a value, maybe undef. Either way, there's still data persisting from one iteration to the next which I presume you don't want? Even if you do, you're better off having variables for "what I saw this iteration" and "what I'm remembering". Simplest way to fix this is to move the lexical variables inside the loop so they're cleared on every iteration. while() { my($start_time, $finish_time); if( m#^\s*(\d+:\d+) -# ) { $start_time = $1; } elsif( m#^\s*(\d+:\d+)# ) { $finish_time = $1; } print "$start_time - $finish_time\n"; } Eliminating the regexes might make it clearer. my($odd, $even); for my $num (1..10) { if( $num % 2 ) { $odd = $num } elsif( !($num % 2) ) { $even = $num } print "$even - $odd\n"; } vs for my $num (1..10) { my($odd, $even); if( $num % 2 ) { $odd = $num } elsif( !($num % 2) ) { $even = $num } print "$even - $odd\n"; } Still not a regex question. ;) -- 54. "Napalm sticks to kids" is *not* a motivational phrase. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/ From schwern at pobox.com Wed Oct 17 13:45:17 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 13:45:17 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> Message-ID: <507F18DD.7000900@pobox.com> On 2012.10.17 5:24 AM, Nathan Bailey wrote: > I'm just wondering if there's a better way to grab text out of > multiple lines that are related to each other. A simple solution > would be to go for multi-line strings but I'm actually curious > to know (a) if that's the way the evaluation of the regexp and > assignment works and (b) if there are better ways of doing > multi-line parsing, without simply treating it as one big complex > line. You don't want to the answer to be "use an HTML parser", so here's a sort of a Look Into Your Future as you try to parse HTML with regexes... Reading your original code, it seems like you're trying to parse this:
12:45 - 14:00
but doing it line by line with individual regexes. HTML doesn't give two hoots about newlines, so trying to understand it line by line has lots of problems. This means you have to carry state over from one line to another, which gets complicated. Worse, you have to check that nothing else came between them else you get fooled by this:
12:45 -
The time is now 05:46
If you try to parse as one big string... /\G
(\d+):(\d+) - (\d+):(\d+)
/msg That works for this:
12:45 - 14:00
15:00 - 16:00
But to account for whitespace and casing the regex really needs to be... /\G\s+(\d+):(\d+)\s+-\s+(\d+):(\d+)/imsg Yuck. You'll run into trouble with this: /\G.*

/msgi

foo

bar

Slurped up too much. Have to make it non-greedy. /\G(.*?)

/msgi And then you're told its not just

tags that might contain the summary, but

tags as well. This gets into the joy of variable balancing tags in regexes. /\G<(p|div)\s+class\s+=\s+"summary"\s+>(.*?)/msgi And it all seems to be working fine until...

foo

bar

baz

Now you're hosed. Regexes are *terrible* at trying to match nested balanced delimiters. HTML is all about nested balanced delimiters. Solving this requires wall-banging complexity. http://perldoc.perl.org/perlfaq4.html#How-do-I-find-matching%2fnesting-anything%3f http://perldoc.perl.org/perlre.html#%28%3fPARNO%29-%28%3f-PARNO%29-%28%3f%2bPARNO%29-%28%3fR%29-%28%3f0%29 https://metacpan.org/module/Regexp::Common::balanced There are a class of problems which look easy to solve with regexes, but are actually nigh impossible to get even mostly correct. This is one of them. I expect you'll try anyway. :) "Parse HTML with a regex" is right up there with "write a template language" and "write an ORM" for Perl rites of passage. -- 185. My name is not a killing word. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/ From peter at vereshagin.org Wed Oct 17 14:33:44 2012 From: peter at vereshagin.org (Peter Vereshagin) Date: Thu, 18 Oct 2012 01:33:44 +0400 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <507F18DD.7000900@pobox.com> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> <507F18DD.7000900@pobox.com> Message-ID: <20121017213344.GD5407@external.screwed.box> Hi guys. The what is wrong with this: #!/usr/bin/env perl use strict; use warnings; use autodie; my $str = ''; while ( my $buf .= ) { $str .= $buf; if (my ( $hh_mm_start => $hh_mm_end ) = $str =~ m/ ]*>\s* (\d\d?:\d\d?) \s*-\s* (\d\d?:\d\d?) /sx ) { use Data::Dump; ddx $hh_mm_start => $hh_mm_end; $str = ''; } } __DATA__
12:45 - 14:00
12:45 -
The time is now 05:46
? There is also an old bold P::RecD : http://search.cpan.org/dist/Parse-RecDescent/ I use to parse MySQL dumps with it here: http://gitweb.vereshagin.org/endvance/blob_plain/HEAD:/endvance/README But surely HTML::* can make you happy, too. 2012/10/17 13:45:17 -0700 Michael G Schwern => To melbourne-pm at pm.org : MGS> On 2012.10.17 5:24 AM, Nathan Bailey wrote: MGS> > I'm just wondering if there's a better way to grab text out of MGS> > multiple lines that are related to each other. A simple solution MGS> > would be to go for multi-line strings but I'm actually curious MGS> > to know (a) if that's the way the evaluation of the regexp and MGS> > assignment works and (b) if there are better ways of doing MGS> > multi-line parsing, without simply treating it as one big complex MGS> > line. MGS> MGS> You don't want to the answer to be "use an HTML parser", so here's a sort of a MGS> Look Into Your Future as you try to parse HTML with regexes... MGS> MGS> Reading your original code, it seems like you're trying to parse this: MGS> MGS>
12:45 - MGS> 14:00
MGS> MGS> but doing it line by line with individual regexes. HTML doesn't give two MGS> hoots about newlines, so trying to understand it line by line has lots of MGS> problems. This means you have to carry state over from one line to another, MGS> which gets complicated. Worse, you have to check that nothing else came MGS> between them else you get fooled by this: MGS> MGS>
12:45 - MGS>
MGS>
The time is now MGS> 05:46
MGS> MGS> If you try to parse as one big string... MGS> MGS> /\G
(\d+):(\d+) - (\d+):(\d+)
/msg MGS> MGS> That works for this: MGS> MGS>
12:45 - 14:00
15:00 - 16:00
MGS> MGS> But to account for whitespace and casing the regex really needs to be... MGS> MGS> /\G\s+(\d+):(\d+)\s+-\s+(\d+):(\d+)
/imsg MGS> MGS> Yuck. MGS> MGS> You'll run into trouble with this: MGS> MGS> /\G.*

/msgi MGS> MGS>

foo

MGS>

bar

MGS> MGS> Slurped up too much. Have to make it non-greedy. MGS> MGS> /\G(.*?)

/msgi MGS> MGS> And then you're told its not just

tags that might contain the summary, but MGS>

tags as well. This gets into the joy of variable balancing tags in regexes. MGS> MGS> /\G<(p|div)\s+class\s+=\s+"summary"\s+>(.*?)/msgi MGS> MGS> And it all seems to be working fine until... MGS> MGS>

foo

bar

baz

MGS> MGS> Now you're hosed. Regexes are *terrible* at trying to match nested balanced MGS> delimiters. HTML is all about nested balanced delimiters. Solving this MGS> requires wall-banging complexity. MGS> http://perldoc.perl.org/perlfaq4.html#How-do-I-find-matching%2fnesting-anything%3f MGS> http://perldoc.perl.org/perlre.html#%28%3fPARNO%29-%28%3f-PARNO%29-%28%3f%2bPARNO%29-%28%3fR%29-%28%3f0%29 MGS> https://metacpan.org/module/Regexp::Common::balanced MGS> MGS> There are a class of problems which look easy to solve with regexes, but are MGS> actually nigh impossible to get even mostly correct. This is one of them. MGS> MGS> I expect you'll try anyway. :) "Parse HTML with a regex" is right up there MGS> with "write a template language" and "write an ORM" for Perl rites of passage. MGS> MGS> MGS> -- MGS> 185. My name is not a killing word. MGS> -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army MGS> http://skippyslist.com/list/ MGS> _______________________________________________ MGS> Melbourne-pm mailing list MGS> Melbourne-pm at pm.org MGS> http://mail.pm.org/mailman/listinfo/melbourne-pm -- Peter Vereshagin (http://vereshagin.org) pgp: A0E26627 From peter at vereshagin.org Wed Oct 17 14:42:58 2012 From: peter at vereshagin.org (Peter Vereshagin) Date: Thu, 18 Oct 2012 01:42:58 +0400 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <20121017213344.GD5407@external.screwed.box> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> <507F18DD.7000900@pobox.com> <20121017213344.GD5407@external.screwed.box> Message-ID: <20121017214258.GE5407@external.screwed.box> Hello. 2012/10/18 01:33:44 +0400 Peter Vereshagin => To melbourne-pm at pm.org : PV> The what is wrong with this: PV> ? Oops: -while ( my $buf .= ) { +while ( my $buf = ) { -- Peter Vereshagin (http://vereshagin.org) pgp: A0E26627 From schwern at pobox.com Wed Oct 17 15:46:45 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 15:46:45 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <20121017213344.GD5407@external.screwed.box> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> <507F18DD.7000900@pobox.com> <20121017213344.GD5407@external.screwed.box> Message-ID: <507F3555.5020404@pobox.com> On 2012.10.17 2:33 PM, Peter Vereshagin wrote: > The what is wrong with this: Very clever. Now match the contents of

and

and associate all three pieces of data together. (It finds all div's with the proper content rather than just the ones of the event-time calendar class, but that's a fairly trivial fix.) > #!/usr/bin/env perl > use strict; > use warnings; > use autodie; > > my $str = ''; > while ( my $buf .= ) { > $str .= $buf; > if (my ( $hh_mm_start => $hh_mm_end ) > = $str =~ m/ > ]*>\s* > (\d\d?:\d\d?) > \s*-\s* > (\d\d?:\d\d?) > /sx > ) > { > use Data::Dump; > ddx $hh_mm_start => $hh_mm_end; > $str = ''; > } > } -- Life is like a sewer - what you get out of it depends on what you put into it. - Tom Lehrer From nathan.bailey at monash.edu Wed Oct 17 16:33:35 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Thu, 18 Oct 2012 10:33:35 +1100 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <507F0DAA.6030105@pobox.com> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <507F0DAA.6030105@pobox.com> Message-ID: On 18/10/2012, at 6:57 AM, Michael G Schwern wrote: > On 2012.10.17 3:36 AM, Nathan Bailey wrote: > In the above version $start_time and $finish_time are only changed if their > regexes match. And because it's an if/elsif condition only one of them is > going to change per loop. But their values persist from one loop to the next, > so you're A) only ever going to get one of them set and B) you're always going > to get one of them from the last loop. This is bad. To be clear, this was b.pl: use strict; my ($start_time, $finish_time); while (<>) { if (($start_time) = m#^\s*(\d+:\d+) -#) { ; } elsif (($finish_time) = m#^\s*(\d+:\d+)#) { ; warn "Why is start_time undefined now?" if !defined $start_time; print "$start_time - $finish_time\n"; } } and this was input.txt: 8:10 - 9:15 As Shlomi noted, the regexp capture buffer asks "What are the contents of the match?" and as the match failed, the contents are undefined. Perl then happily assigns undef to the left-hand side ($start_time), overwriting the "8:10" successfully read in the previous iteration. So we get the output " - 9:15" on the second iteration of the loop, rather than the more desirable "8:10 - 9:15". My first question is really a language design one. Regexp evaluations short circuit on failure; why don't if statement assignments do the same? I would think the above use case is far more common/likely than the current one, which would theoretically allow someone to collect a bunch of undefs through each loop iteration for the ifs that fail (and as you note, there are other ways to get the right-hand side to fail into undef). My second question is what's a better way to do this. I can think of two ways: 1. Assign the capture buffer (ie. $start_time = $1), which is what a.pl does 2. Use a multi-line string regexp that pulls out both start and finish time at once I was wondering if there was a deep fu way that I hadn't considered. N From schwern at pobox.com Wed Oct 17 22:23:46 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 22:23:46 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <507F0DAA.6030105@pobox.com> Message-ID: <507F9262.3060007@pobox.com> On 2012.10.17 4:33 PM, Nathan Bailey wrote: > My first question is really a language design one. Regexp evaluations > short circuit on failure; why don't if statement assignments do the same? > I would think the above use case is far more common/likely than the current > one, which would theoretically allow someone to collect a bunch of undefs > through each loop iteration for the ifs that fail (and as you note, there > are other ways to get the right-hand side to fail into undef). I'm not sure what you mean by "regexp evaluations short circuit on failure". I'm going to assume you're asking why when you run this code... sub bar { 0 } $foo = 42; if( $foo = bar() ) { ... } else { print $foo; # what do you expect here? } ...why doesn't it print 42? >From a pragmatic POV, its impossible to evaluate arbitrary code without actually running it. See also "The Halting Problem". Once you've run it, you'd have to roll back any changes it made which isn't possible in most languages/interpreters. It's theoretically possible using something called Software Transactional Memory but that's way beyond Perl. http://en.wikipedia.org/wiki/Software_transactional_memory And then there's side effects, printing to the screen, setting global variables, network, disk and database access... how do you control them? I don't even think STE can account for that. >From a language design perspective, there's lots and lots of cases where you want to use changes and side effects from a failed conditional. Here's a couple examples off the top of my head. The first illustrates where you want to use a side effect from a failed condition. if( open my $fh, $file ) { print <$fh>; } else { # $! is a global set as a side effect of the failed open print "There was an error: $!\n"; } This one uses a change, in this case a variable assignment. # This is longhand for open || die if( !open $fh, $file ) { die "Can't open $file: $!"; } # The condition failed, but we still want to use a variable assigned # in it. print <$fh>; Regexes, OTOH, are their own little machines within a machine with clear boundries where they communicate with Perl. They don't so much short-circuit on failure as they simply do not clear out their associated global variables until they have to. I'm willing to bet this was originally an implementation quirk, possibly an overly aggressive optimization, which became a feature and/or compatibility issue. If you were making it today, you'd want each regex to clear its associated globals to avoid exactly the sort of problem you're having. Better yet, you wouldn't use globals and avoid the problem of regexes clobbering each other. The regex would return a match object you could get information out of. # something like this if( my $match = $string =~ /foo (.*?) bar/ ) { print $match->capture(1); } > My second question is what's a better way to do this. I can think of two ways: > 1. Assign the capture buffer (ie. $start_time = $1), which is what a.pl does > 2. Use a multi-line string regexp that pulls out both start and finish time at once > > I was wondering if there was a deep fu way that I hadn't considered. Use a p--... oh nevermind. :P -- Anyway, last I saw him, the TPF goons were pouring concrete around him, leaving only one hole each for air, tea, and power. No ethernet, because he's using git. -- Eric Wilhelm on one of my disappearances From peter at vereshagin.org Wed Oct 17 22:34:32 2012 From: peter at vereshagin.org (Peter Vereshagin) Date: Thu, 18 Oct 2012 09:34:32 +0400 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <507F3555.5020404@pobox.com> References: <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <20121017133826.76dcbeb3@lap.shlomifish.org> <507F18DD.7000900@pobox.com> <20121017213344.GD5407@external.screwed.box> <507F3555.5020404@pobox.com> Message-ID: <20121018053432.GF5407@external.screwed.box> Hello. 2012/10/17 15:46:45 -0700 Michael G Schwern => To melbourne-pm at pm.org : MGS> On 2012.10.17 2:33 PM, Peter Vereshagin wrote: MGS> > The what is wrong with this: MGS> MGS> Very clever. Now match the contents of

and

class="description"> and associate all three pieces of data together. Thank me for your very clever note. MGS> (It finds all div's with the proper content rather than just the ones of the MGS> event-time calendar class, but that's a fairly trivial fix.) http://paste2.org/p/2348988 -- Peter Vereshagin (http://vereshagin.org) pgp: A0E26627 From peter at vereshagin.org Wed Oct 17 22:41:52 2012 From: peter at vereshagin.org (Peter Vereshagin) Date: Thu, 18 Oct 2012 09:41:52 +0400 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <507F0DAA.6030105@pobox.com> Message-ID: <20121018054152.GG5407@external.screwed.box> Hello. 2012/10/18 10:33:35 +1100 Nathan Bailey => To Michael G Schwern : NB> if (($start_time) = m#^\s*(\d+:\d+) -#) { NB> My first question is really a language design one. Regexp evaluations short circuit on failure; why don't if statement assignments do the same? Perhaps for '=~' operator no any variable changes when regex doesn't match. And returns empty array in array context. But in the code above you have the assignment '=' operator. So IMO in the case the regex doesn't match you have this: if ( ($start_time) = () ) { thus effectively undef()ining the scalar variable. -- Peter Vereshagin (http://vereshagin.org) pgp: A0E26627 From nathan.bailey at monash.edu Wed Oct 17 22:59:02 2012 From: nathan.bailey at monash.edu (Nathan Bailey) Date: Thu, 18 Oct 2012 16:59:02 +1100 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <507F9262.3060007@pobox.com> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <507F0DAA.6030105@pobox.com> <507F9262.3060007@pobox.com> Message-ID: <0F5E096A-2B46-4E7D-BE00-AA46BC3C75E6@monash.edu> On 18/10/2012, at 4:23 PM, Michael G Schwern wrote: > On 2012.10.17 4:33 PM, Nathan Bailey wrote: > I'm not sure what you mean by "regexp evaluations short circuit on failure". As I understand it, the 'c' in the below regular expression never gets evaluated: if ("aa" =~ /bc/) { ... > I'm going to assume you're asking why when you run this code... > if( $foo = bar() ) { ... > And then there's side effects, printing to the screen, setting global > variables, network, disk and database access... how do you control them? I > don't even think STE can account for that. Thank-you, that's actually a really good answer - if the if statement includes some major side effect, it's not reasonable to expect that it could be undone on failure, and it is reasonable to expect that someone might want to record that failure, separate from the execution of the subsequent block of code. > The regex would return a match object you could get information out of. > # something like this > if( my $match = $string =~ /foo (.*?) bar/ ) { > print $match->capture(1); > } Interesting. That has a certain elegance to it. Maybe we should hassle Damian :-) >> I was wondering if there was a deep fu way that I hadn't considered. > Use a p--... oh nevermind. :P parser? I suspect Peter's Parse::RecDescent suggestion is actually the generic answer to my question (of which HTML::{TreeBuilder,TokeParser} and its cousins are a specific case for HTML). Beyond a certain point of regexp fu, you have to look at the document rather than the line. > -- > Anyway, last I saw him, the TPF goons were pouring concrete around him, > leaving only one hole each for air, tea, and power. No ethernet, > because he's using git. > -- Eric Wilhelm on one of my disappearances There seems to be a certain lack of output capacity in this model? (and I'm not referring to the code :-) N From schwern at pobox.com Wed Oct 17 23:42:20 2012 From: schwern at pobox.com (Michael G Schwern) Date: Wed, 17 Oct 2012 23:42:20 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <0F5E096A-2B46-4E7D-BE00-AA46BC3C75E6@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <507F0DAA.6030105@pobox.com> <507F9262.3060007@pobox.com> <0F5E096A-2B46-4E7D-BE00-AA46BC3C75E6@monash.edu> Message-ID: <507FA4CC.7080104@pobox.com> On 2012.10.17 10:59 PM, Nathan Bailey wrote: > On 18/10/2012, at 4:23 PM, Michael G Schwern wrote: >> On 2012.10.17 4:33 PM, Nathan Bailey wrote: >> I'm not sure what you mean by "regexp evaluations short circuit on failure". > > As I understand it, the 'c' in the below regular expression never gets evaluated: > if ("aa" =~ /bc/) { ... Sooorta. It gets evaluated in the process of compiling the regex, but when run it never bothers to check if there's a 'c' because there's never a 'b'... maybe. It depends on how the regex is implemented. It's possible instead of looking first for 'b' and then 'c' looks for 'bc'... but go a step down and the string comparison probably never tries to compare "b" to "c". This is basically how strcmp works. # Pretend this is a low level language... sub strcmp { my($left, $right) = @_; # Different lengths, don't bother comparing. # (This isn't efficient in C, but it is in Perl) return 0 if length($left) != length($right); for my $idx (0..length($left)-1) { # If you encounter a different character, stop. return 0 if substr($left, $idx, 1) ne substr($right, $idx, 1); } # Made it this far, must be the same return 1; } The key thing that separates that from trying to roll back a condition is strcmp() doesn't change anything outside its scope in the process of doing its work. There's nothing to roll back, you just stop, and there's no side effects. >> I'm going to assume you're asking why when you run this code... >> if( $foo = bar() ) { > ... >> And then there's side effects, printing to the screen, setting global >> variables, network, disk and database access... how do you control them? I >> don't even think STE can account for that. > > Thank-you, that's actually a really good answer - if the if statement includes > some major side effect, it's not reasonable to expect that it could be undone > on failure, and it is reasonable to expect that someone might want to record > that failure, separate from the execution of the subsequent block of code. Yes. Any side effect. Even simple assignment is a side effect. >> The regex would return a match object you could get information out of. >> # something like this >> if( my $match = $string =~ /foo (.*?) bar/ ) { >> print $match->capture(1); >> } > > Interesting. That has a certain elegance to it. Maybe we should hassle Damian :-) > >>> I was wondering if there was a deep fu way that I hadn't considered. >> Use a p--... oh nevermind. :P > > parser? I suspect Peter's Parse::RecDescent suggestion is actually the generic answer > to my question (of which HTML::{TreeBuilder,TokeParser} and its cousins are a specific > case for HTML). Beyond a certain point of regexp fu, you have to look at the document > rather than the line. Pretty much. Though to be precise, it's not so much about looking at the document as it is understanding the grammar. You can still work on a complicated document element by element, but those elements are not delimited by newlines. They're delimited by... something else. One you understand the grammar you can iterate through the elements like you iterate through lines. Except they can be nested. Not a perfect analogy. For example, most HTML/XML parsers parse the whole document into a DOM (Document Object Model... basically a bunch of objects representing all the things in the document. This is very convenient to work with, it allows things like XPath I showed earlier, but consumes a lot of memory and you can't do anything until its done parsing. This is sort of like slurping a whole file into an array. OTOH a SAX parser reads a document element by element and lets you do something to each element. This is more like reading line by line in a file. https://secure.wikimedia.org/wikipedia/en/wiki/SAX_parser >> -- >> Anyway, last I saw him, the TPF goons were pouring concrete around him, >> leaving only one hole each for air, tea, and power. No ethernet, >> because he's using git. >> -- Eric Wilhelm on one of my disappearances > > There seems to be a certain lack of output capacity in this model? > (and I'm not referring to the code :-) So THAT'S why I haven't been getting anything done! -- THIS I COMMAND! From andrewmcnnz at gmail.com Thu Oct 18 11:08:46 2012 From: andrewmcnnz at gmail.com (Andrew McNaughton) Date: Fri, 19 Oct 2012 05:08:46 +1100 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> Message-ID: <508045AE.3040601@mcnaughty.com> This is tangential to the OP's question, except that it's about the same idiom. I started to think about what was going on in order to explain it, and realised that there's a subtle difference an array of undefs and a list of undefs. I've gone for years without noticing this. So, there's this general idiom for assigning variables from a regex into a list of variables. ($var1, $var2, ...) = m/(.)(.).../; You're evaluating m// in list context. If it doesn't match, it returns an empty list, and $var1, $var2, etc are set to undef. So consider that in a conditional: if( ($var1,$var2,...) = m/(.)(.).../ ) { ... } Perl does what you expect, but when you look closely it's pretty clever, and I don't know that I've seen this documented: if ( (undef,undef) ) { } # a list of undefs is false @arr = (undef,undef); if ( @arr ) {} # an array of undefs is true At one level it's quirky behaviour that a perl programmer of many years may not have considered. At another level, it enables a very useful idiom, and we can mostly just get on and use it without worrying about the subtleties. Very perlish. Regards, Andrew McNaughton On 17/10/12 21:36, Nathan Bailey wrote: > I really wish I had obfuscated the contents of the regexps :-( > > My question, which I thought I had clearly stated, related to the lexical scope of capture buffers, and why one approach to capture buffers worked and another didn't. > > Let's try again: > #if (($start_time) = m#^\s*(\d+:\d+) -#) { > if (m#^\s*(\d+:\d+) -#) { > $start_time = $2; > #} elsif (($finish_time) = m#^\s*(\d+:\d+)#) { > } elsif (m#^\s*(\d+:\d+)#) { > $finish_time = $1; > $event++; > } > > Why is $start_time undefined when we get to $finish_time in the first version (commented out) and not in the second? > > And is there a good/better way to collect multiple values over multiple lines than this? > > thanks, > Nathan > > On 17/10/2012, at 7:14 PM, Michael G Schwern wrote: > >> "Should I use regexes to parse HTML?" >> >> No, do not use regexes to parse HTML. While it may seem easy to put together >> a quick and dirty HTML scanner with regexes, it will very quickly get very >> ugly. HTML parsing requires matching balanced characters and tags such as < >> and > and quotes which regexes do very poorly in addition to all the little >> special cases like comments. >> >> In addition, you're going to forget many small things, like casing and spaces, >> which you'll be hunting down forever. For example... >> >>

2:3 -
>>
2:3 -
>>

blah

>>

blah

>> >> >> If you patch up your regexes to cover those, maybe an activity for the next >> meeting might be to come up with more to break your regexes. :) >> >> There, your regex question is answered. :P >> >> It's quicker even in the short run to use a pre existing, well documented, >> parser like HTML::TreeBuilder as evidenced by the fact that you're posting on >> a mailing list for help with your regex based HTML parser. You even get >> search facilities like XPath (see HTML::TreeBuilder::XPath and >> http://www.w3schools.com/xpath/). >> >> use HTML::TreeBuilder::XPath; >> use v5.14; >> >> my $tree= HTML::TreeBuilder::XPath->new; >> $tree->parse_file(shift); >> >> my @event_times = $tree->findnodes( >> '//div[starts-with(@class, "event-time-calendar-")]' >> ); >> >> for my $event_time (@event_times) { >> my($hour, $min) = $event_time->as_text =~ /(\d+):(\d+)/; >> say "Event at $hour:$min"; >> } >> >> Once you learn how to use an HTML parser and XPath you'll never have to write >> a hacky HTML regex parser again. O(1) learning efficiency! >> >> If you're doing this as an exercise in learning regexes, well, don't ignore >> the lesson just because its not what you expected to learn. If you want to >> learn "from scratch" look into writing a grammar parser. >> >> >> On 2012.10.17 12:11 AM, Nathan Bailey wrote:> I knew someone would say that :P >>> It's a regexp question, not an HTML parsing question! >>> N >>> >>> On 17/10/2012, at 6:10 PM, Toby Wintermute wrote: >>> >>>> On 17 October 2012 18:09, Nathan Bailey wrote: >>>>> The code below works, but the commented out bits don't. I presume that >> when $shh and $smm are defined on the first loop through, they get undefined >> on the next time through? >>>>> What's the "right" way to do this, TIMTOWTDI notwithstanding :-) >>>> >>>> use HTML::TreeBuilder; >>>> >>>> >>>> -Toby >>> _______________________________________________ >>> Melbourne-pm mailing list >>> Melbourne-pm at pm.org >>> http://mail.pm.org/mailman/listinfo/melbourne-pm >>> >> >> -- >> s7ank: i want to be one of those guys that types "s/j&jd//.^$ueu*///djsls/sm." >> and it's a perl script that turns dog crap into gold. > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm From schwern at pobox.com Thu Oct 18 15:51:59 2012 From: schwern at pobox.com (Michael G Schwern) Date: Thu, 18 Oct 2012 15:51:59 -0700 Subject: [Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?) In-Reply-To: <508045AE.3040601@mcnaughty.com> References: <909FC6B9-5F2B-4747-9A98-C55901B99744@monash.edu> <8434DC51-14DA-4327-8BC6-F709FF01AB37@monash.edu> <507E68E5.3030705@pobox.com> <74EFE4B2-56A3-4FB2-869B-46ED388ECA38@monash.edu> <508045AE.3040601@mcnaughty.com> Message-ID: <5080880F.90302@pobox.com> On 2012.10.18 11:08 AM, Andrew McNaughton wrote: > Perl does what you expect, but when you look closely it's pretty clever, > and I don't know that I've seen this documented: > > if ( (undef,undef) ) { } # a list of undefs is false > > @arr = (undef,undef); > if ( @arr ) {} # an array of undefs is true > > At one level it's quirky behaviour that a perl programmer of many years > may not have considered. At another level, it enables a very useful > idiom, and we can mostly just get on and use it without worrying about > the subtleties. Very perlish. It's worse than that. print "True" if (undef, 1); True print "True" if (1, undef); False Why? Because that's not really a list, it's the scalar comma operator. From perlop... http://perldoc.perl.org/perlop.html#Comma-Operator Binary "," is the comma operator. In scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator. In list context, it's just the list argument separator, and inserts both its arguments into the list. These arguments are also evaluated from left to right. @arr = (undef, undef) is true in a condition because an array in scalar context returns the number of elements. (undef, undef) is false as a condition because its in scalar context so its the comma operator. Its scalar return value is the last expression, undef. (undef, 1) is true as a condition for the same reason, but its last expression is 1. The problem is inflamed by pretending the scalar comma operator is a "list" and involving it in the "list vs array" madness. http://perldoc.perl.org/perlfaq4.html#What-is-the-difference-between-a-list-and-an-array%3f Given how often one intends to use the comma operator as the comma operator, rather than mistakenly using it, the language would have been better off without overloading the meaning of comma. The comma operator is a bit silly, just syntax sugar for "cram a bunch of statements into one line and return the last one" usually used for evil. It doesn't even apply any runtime logic. my $foo = (foo(), bar(), baz()); is equivalent to foo(); bar(); my $foo = baz(); -- 124. Two drink limit does not mean first and last. -- The 213 Things Skippy Is No Longer Allowed To Do In The U.S. Army http://skippyslist.com/list/ From toby.corkindale at strategicdata.com.au Sun Oct 21 22:36:13 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 22 Oct 2012 16:36:13 +1100 Subject: [Melbourne-pm] Perl switch statements Message-ID: <5084DB4D.2020703@strategicdata.com.au> Something bugs me about Perl's switch statements. If you put a given(){} block at the end of a function, the function will return the matched result. However if you attempt to assign the result of given() directly to a variable, it will fail. Who came up with this and what were they thinking? I suppose there's a good reason, but I can't see what it is. Compare these: --------------------------------------- sub foo { given (shift) { when ('ay') { "yay" } when ('bee') { "hurrah" } default { "what?" } } } say foo("bee"); # outputs: hurrah --------------------------------------- sub bar { say given (shift) { when ('ay') { "yay" } when ('bee') { "hurrah" } default { "what?" } } bar("bee"); # outputs: From toby.corkindale at strategicdata.com.au Sun Oct 21 22:39:49 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 22 Oct 2012 16:39:49 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <5084DB4D.2020703@strategicdata.com.au> References: <5084DB4D.2020703@strategicdata.com.au> Message-ID: <5084DC25.905@strategicdata.com.au> On 22/10/12 16:36, Toby Corkindale wrote: > Something bugs me about Perl's switch statements. > > If you put a given(){} block at the end of a function, the function will > return the matched result. However if you attempt to assign the result > of given() directly to a variable, it will fail. > > Who came up with this and what were they thinking? I suppose there's a > good reason, but I can't see what it is. > > Compare these: > --------------------------------------- > > sub foo { > given (shift) { > when ('ay') { "yay" } > when ('bee') { "hurrah" } > default { "what?" } > } > } > say foo("bee"); > # outputs: hurrah > > --------------------------------------- > > sub bar { > say given (shift) { > when ('ay') { "yay" } > when ('bee') { "hurrah" } > default { "what?" } > } Oops. My copy of the function above is missing the final closing curly brace. It still doesn't work with it though, but you get a different error. From toby.corkindale at strategicdata.com.au Sun Oct 21 22:51:26 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Mon, 22 Oct 2012 16:51:26 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <5084DB4D.2020703@strategicdata.com.au> References: <5084DB4D.2020703@strategicdata.com.au> Message-ID: <5084DEDE.9020706@strategicdata.com.au> On 22/10/12 16:36, Toby Corkindale wrote: > Something bugs me about Perl's switch statements. > > If you put a given(){} block at the end of a function, the function will > return the matched result. However if you attempt to assign the result > of given() directly to a variable, it will fail. > > Who came up with this and what were they thinking? I suppose there's a > good reason, but I can't see what it is. > [snip example] I suppose it's consistent to other control syntax, eg. sub foo { my $thing = shift; if ($thing eq 'ay') { "yay" } else { "boo" } } say foo("ay"); # outputs: yay but that still bothers me.. Why is the result good enough to return but not to assign? From mathew.blair.robertson at gmail.com Sun Oct 21 22:51:48 2012 From: mathew.blair.robertson at gmail.com (Mathew Robertson) Date: Mon, 22 Oct 2012 16:51:48 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <5084DC25.905@strategicdata.com.au> References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> Message-ID: What is also surprising is this: sub moo { given (shift) { when ('ay') { "yay" } when ('bee') { "hurrah" } default { "what?" } }; say $_; } moo("bee"); I thought the when() implemented the test against $_... or even: sub moo { given (shift) { when ('ay') { "yay" } when ('bee') { $_ = "hurrah" } default { "what?" } }; say $_; } moo("bee"); On 22 October 2012 16:39, Toby Corkindale < toby.corkindale at strategicdata.com.au> wrote: > On 22/10/12 16:36, Toby Corkindale wrote: > >> Something bugs me about Perl's switch statements. >> >> If you put a given(){} block at the end of a function, the function will >> return the matched result. However if you attempt to assign the result >> of given() directly to a variable, it will fail. >> >> Who came up with this and what were they thinking? I suppose there's a >> good reason, but I can't see what it is. >> >> Compare these: >> ------------------------------**--------- >> >> sub foo { >> given (shift) { >> when ('ay') { "yay" } >> when ('bee') { "hurrah" } >> default { "what?" } >> } >> } >> say foo("bee"); >> # outputs: hurrah >> >> ------------------------------**--------- >> >> sub bar { >> say given (shift) { >> when ('ay') { "yay" } >> when ('bee') { "hurrah" } >> default { "what?" } >> } >> > > Oops. > My copy of the function above is missing the final closing curly brace. It > still doesn't work with it though, but you get a different error. > > ______________________________**_________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/**listinfo/melbourne-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From saramic at gmail.com Mon Oct 22 16:01:24 2012 From: saramic at gmail.com (Michael Milewski) Date: Mon, 22 Oct 2012 23:01:24 +0000 (UTC) Subject: [Melbourne-pm] Invitation to connect on LinkedIn Message-ID: <1585878119.402959.1350946884384.JavaMail.app@ela4-app2320.prod> LinkedIn ------------ I'd like to add you to my professional network on LinkedIn. - Michael Michael Milewski Senior Developer at bikeExchange.com.au Melbourne Area, Australia Confirm that you know Michael Milewski: https://www.linkedin.com/e/-yxd6q-h8m6w7bu-6p/isd/9203078966/dHQ8dyOd/?hs=false&tok=12_9UuFUp5A5s1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/-yxd6q-h8m6w7bu-6p/zbdiygbg38K2WiNKkDqiKXBfiQMUPpo/goo/melbourne-pm%40pm%2Eorg/20061/I3078349436_1/?hs=false&tok=2zr_ho73B5A5s1 (c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. -------------- next part -------------- An HTML attachment was scrubbed... URL: From schwern at pobox.com Mon Oct 22 16:24:50 2012 From: schwern at pobox.com (Michael G Schwern) Date: Mon, 22 Oct 2012 16:24:50 -0700 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <5084DEDE.9020706@strategicdata.com.au> References: <5084DB4D.2020703@strategicdata.com.au> <5084DEDE.9020706@strategicdata.com.au> Message-ID: <5085D5C2.3050201@pobox.com> On 2012.10.21 10:51 PM, Toby Corkindale wrote: > Why is the result good enough to return but not to assign? given doesn't return anything. What's tripping you up there is that in the absence of an explicit return, a subroutine or do block returns the last evaluated expression. That includes if/else and while loops but not for loops. Before 5.14.0 it didn't include given either. http://perldoc.perl.org/5.14.0/perldelta.html#Changes-to-Syntax-or-to-Perl-Operators That's why golfing horrors like this work: sub foo { if( 1 ) { "foo"; } else { "bar"; } } say foo(); If you see that the author is either A) falsely lazy or B) still thinks return incurs a performance penalty which was like 15 years ago. There's been plenty of clamoring for given to actually be usable in the right hand side of an expression. I'm not sure what the status of that is, I thought it was a done deal. Right now in order to do that you have to wrap it in a do block. my $value = do { given ($grade) { 'Well done!' when 'A'; 'Try harder!' when 'B'; 'You need help!!!' when 'C'; default { 'You are just making it up!' } } }; -- ROCKS FALL! EVERYONE DIES! http://www.somethingpositive.net/sp05032002.shtml From schwern at pobox.com Mon Oct 22 16:29:02 2012 From: schwern at pobox.com (Michael G Schwern) Date: Mon, 22 Oct 2012 16:29:02 -0700 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> Message-ID: <5085D6BE.3020307@pobox.com> On 2012.10.21 10:51 PM, Mathew Robertson wrote: > What is also surprising is this: > > sub moo { > given (shift) { > when ('ay') { "yay" } > when ('bee') { "hurrah" } > default { "what?" } > }; > say $_; > } > moo("bee"); > > > I thought the when() implemented the test against $_... It does. $_ is local to the given block, same as a foreach loop. > or even: > > sub moo { > given (shift) { > when ('ay') { "yay" } > when ('bee') { $_ = "hurrah" } > default { "what?" } > }; > say $_; > } > moo("bee"); Same deal. $_ can't escape the given block. If it did, nested givens would clobber each other. -- There will be snacks. From mathew.blair.robertson at gmail.com Mon Oct 22 17:04:33 2012 From: mathew.blair.robertson at gmail.com (Mathew Robertson) Date: Tue, 23 Oct 2012 11:04:33 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <5085D6BE.3020307@pobox.com> References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> <5085D6BE.3020307@pobox.com> Message-ID: > > What is also surprising is this: > > > > sub moo { > > given (shift) { > > when ('ay') { "yay" } > > when ('bee') { "hurrah" } > > default { "what?" } > > }; > > say $_; > > } > > moo("bee"); > > > > > > I thought the when() implemented the test against $_... > > It does. $_ is local to the given block, same as a foreach loop. > I always thought that "unless $_ is explicitly localised"... would manipulate the global $_ foreach (1..2) { foreach ('a'..'c') { print $_." "; last; } print $_." "; } print $/; ie: the output is "a 1 a 2"... on v5.14 and v5.8.8 ... I could have sworn that I have used that idiom in the past, expecting to get "a a a a"... but there you go. To clarify, I would have expected $x to not be lexical... I am deliberately reusing $x in the child scope assignment, without localising $x (via 'my' or 'local'): my $x = 'foo'; foreach $x (1..2) { foreach $x ('a'..'c') { print $x." "; } print $x." "; } print $x.$/; gives: a b c 1 a b c 2 foo when I told the code to generate: a b c c a b c c c > > > or even: > > > > sub moo { > > given (shift) { > > when ('ay') { "yay" } > > when ('bee') { $_ = "hurrah" } > > default { "what?" } > > }; > > say $_; > > } > > moo("bee"); > > Same deal. $_ can't escape the given block. If it did, nested givens > would > clobber each other. > I'm going to say either a) no it wouldn't, or b) that is expected...... depending on the use case. ie: either you are in block scope so other 'when's arn't executed, or you are explicitly manipulating $_ so that you can deliberately could cause manipulation of the outer scope. In any case, the point is moot as $_ or $x are localised to the scope - and we (well 'I') learnt something new. cheers, Mathew > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jarich at perltraining.com.au Mon Oct 22 22:10:07 2012 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Tue, 23 Oct 2012 16:10:07 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> <5085D6BE.3020307@pobox.com> Message-ID: <508626AF.9070905@perltraining.com.au> On 23/10/12 11:04, Mathew Robertson wrote: > > To clarify, I would have expected $x to not be lexical... I am > deliberately reusing $x in the child scope assignment, without > localising $x (via 'my' or 'local'): > > my $x = 'foo'; > foreach $x (1..2) { > foreach $x ('a'..'c') { > print $x." "; > } > print $x." "; > } > print $x.$/; > > gives: a b c 1 a b c 2 foo > when I told the code to generate: a b c c a b c c c $x is lexical. That's what declaring it with a my does... except... you'll find that the following two loops are effectively identical. foreach $x ( 1..10) { say $x; } and { local $x = 1; while( $x < 10 ) { say $x; } } Notice that extra set of parentheses and the local? As per this is intentional: The "foreach" loop iterates over a normal list value and sets the variable VAR to be each element of the list in turn. If the variable is preceded with the keyword "my", then it is lexically scoped, and is therefore visible only within the loop. _/*Otherwise, the variable is*/__/* */__/* implicitly local to the loop and regains its former value upon exiting*/__/* */__/* the loop.*/_ If the variable was previously declared with "my", it uses that variable instead of the global one, but it's still localized to the loop. This implicit localization occurs only in a "foreach" loop. No point arguing, it's existed this way for a long, long time. ;) J -------------- next part -------------- An HTML attachment was scrubbed... URL: From Martin.G.Ryan at team.telstra.com Tue Oct 23 15:25:55 2012 From: Martin.G.Ryan at team.telstra.com (Ryan, Martin G) Date: Wed, 24 Oct 2012 09:25:55 +1100 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <508626AF.9070905@perltraining.com.au> References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> <5085D6BE.3020307@pobox.com> <508626AF.9070905@perltraining.com.au> Message-ID: <589EE331794E0B4DA62A9ADE89BCB405924747CBD7@WSMSG3103V.srv.dir.telstra.com> Jacinta, > On Tuesday, 23 October 2012 4:10 PM, Jacinta Richardson [jarich at perltraining.com.au] wrote; > { > ??? local $x = 1; > ? ? while( $x < 10 ) { > ??? ??? say $x; > ??? } > } > > Notice that extra set of parentheses and the local? > > As per this is intentional: > > ????? The "foreach" loop iterates over a normal list value and sets the >?????? variable VAR to be each element of the list in turn.? If the variable > ?????? is preceded with the keyword "my", then it is lexically scoped, and is > ?????? therefore visible only within the loop.? Otherwise, the variable is > ?????? implicitly local to the loop and regains its former value upon exiting > ?????? the loop.? If the variable was previously declared with "my", it uses > ?????? that variable instead of the global one, but it's still localized to > ?????? the loop.? This implicit localization occurs only in a "foreach" loop. > > > No point arguing, it's existed this way for a long, long time.? ;) Thank you for expanding on that - I found it very illuminating. Fascinating how if it was previously declared with "my", it localizes a lexical variable. (which you can't do normally, yes?) I've always used a fresh variable for the cause - "$i" if I'm running low on imagination - and hence the question doesn't arise (probably best that way for the sanity of future maintainers). Martin From schwern at pobox.com Tue Oct 23 16:54:55 2012 From: schwern at pobox.com (Michael G Schwern) Date: Tue, 23 Oct 2012 16:54:55 -0700 Subject: [Melbourne-pm] Perl switch statements In-Reply-To: <589EE331794E0B4DA62A9ADE89BCB405924747CBD7@WSMSG3103V.srv.dir.telstra.com> References: <5084DB4D.2020703@strategicdata.com.au> <5084DC25.905@strategicdata.com.au> <5085D6BE.3020307@pobox.com> <508626AF.9070905@perltraining.com.au> <589EE331794E0B4DA62A9ADE89BCB405924747CBD7@WSMSG3103V.srv.dir.telstra.com> Message-ID: <50872E4F.7050704@pobox.com> On 2012.10.23 3:25 PM, Ryan, Martin G wrote: > Thank you for expanding on that - I found it very illuminating. > Fascinating how if it was previously declared with "my", it localizes > a lexical variable. (which you can't do normally, yes?) You're right that you can't localizing a lexical variable. What really happens is foreach forces the global (and localized) $_ to shadow your lexical $_. Sort of like this: $ perl -wle 'my $foo = 23; { our $foo; local $foo = 42; } print $foo' 23 $ perl -wle 'my $_ = 23; print $_; for(1..3) { print $_ } print $_' 23 1 2 3 23 > I've always used a fresh variable for the cause - "$i" if I'm running low on > imagination - and hence the question doesn't arise (probably best that way for > the sanity of future maintainers). This is a good idea. $_ has so many side effects associated with it if you have the option to avoid it do so. -- Anyway, last I saw him, the TPF goons were pouring concrete around him, leaving only one hole each for air, tea, and power. No ethernet, because he's using git. -- Eric Wilhelm on one of my disappearances From jarich at perltraining.com.au Tue Oct 23 17:19:24 2012 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Wed, 24 Oct 2012 11:19:24 +1100 Subject: [Melbourne-pm] Early Bird Ticket Sales end 26th of October Message-ID: <5087340C.6010404@perltraining.com.au> Share Freely Early Bird Ticket Sales end 26th of October The OSDC organising team in Sydney invite you to register for this years conference while early bird tickets are still available. Register today: http://www.osdc.com.au/ OSDC is a grass-roots style conference by developers for developers. If you're developing something that's Open Source, or you are using Open Source tools within your business, this conference is for you. This year, for four days starting December 4th, the Open Source Developers Conference is taking place in Sydney at the University of Technology, Broadway Campus. December 5th-8th is the main conference, and as is tradition with OSDC there will be a dinner event for all attendees (Thursday evening, December 6th). December 4th will be a CMS Expo day, noting the importance of Content Management Systems in the current web environment. The day will be based around skill sharing tutorials, case studies and talks from contributors in Open Source CMS projects. The OSDC 2012 Sydney organising team, http://www.osdc.com.au/ info at osdc.com.au From scottp at dd.com.au Tue Oct 23 18:13:11 2012 From: scottp at dd.com.au (Scott Penrose) Date: Wed, 24 Oct 2012 12:13:11 +1100 Subject: [Melbourne-pm] Early Bird Ticket Sales end 26th of October In-Reply-To: <5087340C.6010404@perltraining.com.au> References: <5087340C.6010404@perltraining.com.au> Message-ID: <415C16D6-C43F-4904-AC7B-74D9BDFA5AC0@dd.com.au> I will be doing a talk on Perl (and other open source stuff) going to Antarctica. Anyone else coming along, talking? Scooter On 24/10/2012, at 11:19 AM, Jacinta Richardson wrote: > Share Freely > Early Bird Ticket Sales end 26th of October > > The OSDC organising team in Sydney invite you to register for this years > conference while early bird tickets are still available. > > Register today: http://www.osdc.com.au/ > > OSDC is a grass-roots style conference by developers for developers. If > you're developing something that's Open Source, or you are using Open > Source tools within your business, this conference is for you. > > This year, for four days starting December 4th, the Open Source > Developers Conference is taking place in Sydney at the University of > Technology, Broadway Campus. December 5th-8th is the main conference, > and as is tradition with OSDC there will be a dinner event for all > attendees (Thursday evening, December 6th). > > December 4th will be a CMS Expo day, noting the importance of Content > Management Systems in the current web environment. The day will be based > around skill sharing tutorials, case studies and talks from contributors > in Open Source CMS projects. > > The OSDC 2012 Sydney organising team, > http://www.osdc.com.au/ > info at osdc.com.au > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm From jarich at perltraining.com.au Tue Oct 23 18:40:27 2012 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Wed, 24 Oct 2012 12:40:27 +1100 Subject: [Melbourne-pm] Early Bird Ticket Sales end 26th of October In-Reply-To: <5087340C.6010404@perltraining.com.au> References: <5087340C.6010404@perltraining.com.au> Message-ID: <5087470B.20805@perltraining.com.au> So the dinner is on the Wednesday night, not the Thursday night as per this announcement. Just so you know which night to keep free. Dinner ticket is included with conference entry. J PS If you're a speaker, they're still working out how to register speakers, but entry for speakers should be free so don't panic you should hear something soon. On 24/10/12 11:19, Jacinta Richardson wrote: > Share Freely > Early Bird Ticket Sales end 26th of October > > The OSDC organising team in Sydney invite you to register for this years > conference while early bird tickets are still available. > > Register today: http://www.osdc.com.au/ > > OSDC is a grass-roots style conference by developers for developers. If > you're developing something that's Open Source, or you are using Open > Source tools within your business, this conference is for you. > > This year, for four days starting December 4th, the Open Source > Developers Conference is taking place in Sydney at the University of > Technology, Broadway Campus. December 5th-8th is the main conference, > and as is tradition with OSDC there will be a dinner event for all > attendees (Thursday evening, December 6th). > > December 4th will be a CMS Expo day, noting the importance of Content > Management Systems in the current web environment. The day will be based > around skill sharing tutorials, case studies and talks from contributors > in Open Source CMS projects. > > The OSDC 2012 Sydney organising team, > http://www.osdc.com.au/ > info at osdc.com.au > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > From mathew.blair.robertson at gmail.com Mon Oct 29 18:49:22 2012 From: mathew.blair.robertson at gmail.com (Mathew Robertson) Date: Tue, 30 Oct 2012 12:49:22 +1100 Subject: [Melbourne-pm] given/when Message-ID: Hi list, What is the expected output of this: use strict; use v5.14; sub true { 1 } sub false { 0 } my $x = 0; given ($x) { when (true) { print "true".$/; } when (false) { print "false".$/; } default { print "unknown".$/; } } or: sub M { 'M' } sub N { 'N' } my $x = "N"; given ($x) { when (M) { print "m".$/; } when (N) { print "n".$/; } default { print "unknown".$/; } } cheers, Mathew -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at conway.org Mon Oct 29 19:27:28 2012 From: damian at conway.org (Damian Conway) Date: Tue, 30 Oct 2012 13:27:28 +1100 Subject: [Melbourne-pm] given/when In-Reply-To: References: Message-ID: Mathew Robertson asked: > What is the expected output of this: > my $x = 0; > given ($x) { > when (true) { print "true".$/; } > when (false) { print "false".$/; } > default { print "unknown".$/; } > } Expected outcome is: "true" As explained in perlsyn: Most of the time, "when(EXPR)" is treated as an implicit smart match of $_, i.e. "$_ ~~ EXPR". (See "Smart matching in detail" for more information on smart matching.) But when EXPR is one of the below exceptional cases, it is used directly as a boolean: o a subroutine or method call Yes, it's arguably broken and/or stupid. But it's as documented. Workaround: my $TRUE = 1; my $FALSE = 0; my $x = 0; given ($x) { when ($TRUE) { print "true".$/; } when ($FALSE) { print "false".$/; } default { print "unknown".$/; } } Damian From mathew.blair.robertson at gmail.com Mon Oct 29 21:37:44 2012 From: mathew.blair.robertson at gmail.com (Mathew Robertson) Date: Tue, 30 Oct 2012 15:37:44 +1100 Subject: [Melbourne-pm] given/when In-Reply-To: References: Message-ID: On 30 October 2012 13:27, Damian Conway wrote: > Mathew Robertson asked: > > > What is the expected output of this: > > my $x = 0; > > given ($x) { > > when (true) { print "true".$/; } > > when (false) { print "false".$/; } > > default { print "unknown".$/; } > > } > > Expected outcome is: "true" > > As explained in perlsyn: > > Most of the time, "when(EXPR)" is treated as an implicit smart match > of > $_, i.e. "$_ ~~ EXPR". (See "Smart matching in detail" for more > information on smart matching.) But when EXPR is one of the below > exceptional cases, it is used directly as a boolean: > > o a subroutine or method call > > Yes, it's arguably broken and/or stupid. But it's as documented. > > Workaround: > > my $TRUE = 1; > my $FALSE = 0; > my $x = 0; > given ($x) { > when ($TRUE) { print "true".$/; } > when ($FALSE) { print "false".$/; } > default { print "unknown".$/; } > } > > Damian > I found that passage in perlsyn, but it didn't explain the behaviour, aka: EXPR is an expression that is used as a boolean, so I assumed that it referred to the returned value.. cf: undef is boolean-false inside "if (...)" context and "0E0" is boolean-true inside "if (...)". And thus is why I posted the alternative using "M" and "N"... which I expected to be boolean-true inside the conditional. Indeed it is was documented, but I didn't understand the *magic*... :) cheers, Mathew -------------- next part -------------- An HTML attachment was scrubbed... URL: From damian at conway.org Mon Oct 29 23:26:05 2012 From: damian at conway.org (Damian Conway) Date: Tue, 30 Oct 2012 17:26:05 +1100 Subject: [Melbourne-pm] given/when In-Reply-To: References: Message-ID: > Indeed it is was documented, but I didn't understand the *magic*... :) Sorry. Let me try again. Normally, this: when (EXPR) {...} is the same as: if ($_ ~~ EXPR) {...; break; } But not in the Eight Special Cases (as listed in perlsyn). One of those E.S.C. is when EXPR is a subroutine call: when (foo()) {...} which is always equivalent to: if (foo()) {...; break; } So, in your first example: when (true) { print "true".$/; } is equivalent to: if (true()) { print "true".$/; break; } which is the same as: if (1) { print "true".$/; break; } The value 1 is true, so the first when triggers, prints "true", and then control exits the given. And your second example: when (M) { print "m".$/; } is equivalent to: if (M) { print "m".$/; break; } which is the same as: if ('M') { print "m".$/; break; } The string 'M' is true, so the first when triggers, prints "m", and then control exits the given. Damian From alfiej at opera.com Tue Oct 30 16:46:17 2012 From: alfiej at opera.com (Alfie John) Date: Wed, 31 Oct 2012 10:46:17 +1100 Subject: [Melbourne-pm] No meeting for November Message-ID: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> Hi guys, As nobody has put up their hand for a talk in November and being so close to Christmas celebrations, unless anybody says otherwise, I think November will also be a non-meeting month. Alfie -- Alfie John alfiej at opera.com From toby.corkindale at strategicdata.com.au Tue Oct 30 16:54:44 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Wed, 31 Oct 2012 10:54:44 +1100 Subject: [Melbourne-pm] No meeting for November In-Reply-To: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> References: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> Message-ID: <509068C4.8080605@strategicdata.com.au> For December maybe we should try for a big social meeting? I also wonder if maybe we could put in some social meetings interspersed with the technical meetings.. even if there's no tech talk, often there's some interesting chatter and war stories. On 31/10/12 10:46, Alfie John wrote: > Hi guys, > > As nobody has put up their hand for a talk in November and being so > close to Christmas celebrations, unless anybody says otherwise, I think > November will also be a non-meeting month. > > Alfie > From andrew at sericyb.com.au Tue Oct 30 16:55:44 2012 From: andrew at sericyb.com.au (Andrew Pam) Date: Wed, 31 Oct 2012 10:55:44 +1100 Subject: [Melbourne-pm] No meeting for November In-Reply-To: <509068C4.8080605@strategicdata.com.au> References: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> <509068C4.8080605@strategicdata.com.au> Message-ID: <50906900.7040709@sericyb.com.au> On 31/10/12 10:54, Toby Corkindale wrote: > For December maybe we should try for a big social meeting? > > I also wonder if maybe we could put in some social meetings interspersed > with the technical meetings.. even if there's no tech talk, often > there's some interesting chatter and war stories. +1 Andrew From toby.corkindale at strategicdata.com.au Tue Oct 30 17:03:37 2012 From: toby.corkindale at strategicdata.com.au (Toby Corkindale) Date: Wed, 31 Oct 2012 11:03:37 +1100 Subject: [Melbourne-pm] Early Bird Ticket Sales end 26th of October In-Reply-To: <415C16D6-C43F-4904-AC7B-74D9BDFA5AC0@dd.com.au> References: <5087340C.6010404@perltraining.com.au> <415C16D6-C43F-4904-AC7B-74D9BDFA5AC0@dd.com.au> Message-ID: <50906AD9.2030400@strategicdata.com.au> On 24/10/12 12:13, Scott Penrose wrote: > I will be doing a talk on Perl (and other open source stuff) going to Antarctica. > Anyone else coming along, talking? I think some colleagues are going. I didn't feel like there was that much content that I was interested in, so am going to hit up a different conference this year. (Note there is definitely SOME content I want to see, but.. it was only really one talk per day of the conference.) -Toby From alfiej at opera.com Tue Oct 30 17:08:06 2012 From: alfiej at opera.com (Alfie John) Date: Wed, 31 Oct 2012 11:08:06 +1100 Subject: [Melbourne-pm] No meeting for November In-Reply-To: <50906900.7040709@sericyb.com.au> References: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> <509068C4.8080605@strategicdata.com.au> <50906900.7040709@sericyb.com.au> Message-ID: <1351642086.24386.140661147519949.70F11C9F@webmail.messagingengine.com> On Wed, Oct 31, 2012, at 10:55 AM, Andrew Pam wrote: > On 31/10/12 10:54, Toby Corkindale wrote: > > For December maybe we should try for a big social meeting? > > > > I also wonder if maybe we could put in some social meetings interspersed > > with the technical meetings.. even if there's no tech talk, often > > there's some interesting chatter and war stories. > > +1 Maybe even for November then? The pub at Fed Square had a nice vibe and had lots of different beers? Alfie -- Alfie John alfiej at opera.com From sam at nipl.net Tue Oct 30 22:30:32 2012 From: sam at nipl.net (Sam Watkins) Date: Wed, 31 Oct 2012 16:30:32 +1100 Subject: [Melbourne-pm] No meeting for November In-Reply-To: <1351642086.24386.140661147519949.70F11C9F@webmail.messagingengine.com> References: <1351640777.16628.140661147513885.134C4E48@webmail.messagingengine.com> <509068C4.8080605@strategicdata.com.au> <50906900.7040709@sericyb.com.au> <1351642086.24386.140661147519949.70F11C9F@webmail.messagingengine.com> Message-ID: <20121031053032.GB4064@opal.nipl.net> > November then? The pub at Fed Square had a nice vibe and > had lots of different beers? Sounds good, I would like to attend a social meeting. We're sure to talk about tech stuff and perl anyways. Just discovered can control brightness of two LEDs on my Pandora, including the power LED, on a scale from 0 to 255 by writing to files in /sys/class/leds/... The other four LEDs are on/off - but that need not stop an enterprising hacker from writing a kernel module (or shell script) to switch them on and off real fast for simulated brightness control. There is space for 2 extra leds and a backlight led, but they were a bit cheap to populate them :) still, 6 leds is good for real "blinken lights". This is the kind of inspirational work I might talk about after a few beers! Has anyone actually written a kernel module in perl yet? Talk about geek cred - software controlled dimmable power LED :) I like it. Specifically I need to turn it off, for nighttime gaming, so it doesn't hurt my delicate eyes. And to save electricity :p