[Melbourne-pm] Regexps - how does the lexical scope of capture buffers work? (Was: Regexp: What's the right way to do this?)

Nathan Bailey nathan.bailey at monash.edu
Wed Oct 17 16:33:35 PDT 2012


On 18/10/2012, at 6:57 AM, Michael G Schwern wrote:
> On 2012.10.17 3:36 AM, Nathan Bailey wrote:
> In the above version $start_time and $finish_time are only changed if their
> regexes match.  And because it's an if/elsif condition only one of them is
> going to change per loop.  But their values persist from one loop to the next,
> so you're A) only ever going to get one of them set and B) you're always going
> to get one of them from the last loop.  This is bad.


To be clear, this was b.pl:

use strict;
my ($start_time, $finish_time);
while (<>) {
   if (($start_time) = m#^\s*(\d+:\d+) -#) {
      ;
   } elsif (($finish_time) = m#^\s*(\d+:\d+)#) {
      ;
      warn "Why is start_time undefined now?" if !defined $start_time;
      print "$start_time - $finish_time\n";
   }
}

and this was input.txt:
   8:10 -
   9:15

As Shlomi noted, the regexp capture buffer asks "What are the contents of the match?" and as the match failed, the contents are undefined. Perl then happily assigns undef to the left-hand side ($start_time), overwriting the "8:10" successfully read in the previous iteration. So we get the output " - 9:15" on the second iteration of the loop, rather than the more desirable "8:10 - 9:15".

My first question is really a language design one. Regexp evaluations short circuit on failure; why don't if statement assignments do the same? I would think the above use case is far more common/likely than the current one, which would theoretically allow someone to collect a bunch of undefs through each loop iteration for the ifs that fail (and as you note, there are other ways to get the right-hand side to fail into undef).

My second question is what's a better way to do this. I can think of two ways:
	1. Assign the capture buffer (ie. $start_time = $1), which is what a.pl does
	2. Use a multi-line string regexp that pulls out both start and finish time at once

I was wondering if there was a deep fu way that I hadn't considered.
N


More information about the Melbourne-pm mailing list