[Pdx-pm] regexp and semi-greedy match - THANKS!

Keith Lofstrom keithl at kl-ic.com
Mon Oct 15 09:33:43 PDT 2007


# from Keith Lofstrom
# on Sunday 14 October 2007 22:03:

>"Greedy" regexp is just a tiny bit too greedy.  If I use a pattern
> match like:
>
>   if( /(a-z0-9_.-)-(\d*)\.(raw)$/i ) {     # this does NOT work

On Sun, Oct 14, 2007 at 11:48:55PM -0700, Eric Wilhelm wrote:
> I don't think it is a greedy bug.  The first group is literal.  Are you 
> trying for a character class (needs square brackets) and why?
> 
>    m/^(.*)-(\d+)\.raw$/
...

Thank you!  I did forget the brackets, and I should have used \d+ .
The actual regexp, which captures files suffixed .raw or .out or
.RAW or .OUT and ignores other files (as mentioned in the boring
details), is now:

     m/(.*)-(\d+)\.(raw|out)$/i

... and that works fine.  Also as mentioned in the boring details,
I am trying to use the minimum number of digits necessary, mostly
because the second program (that uses the file list) seems to have
limited filename buffer space.  The new version of the rename program
is at http://www.keithl.com/ndir2 .  Again, do not show that to 
children, or read after eating.

Keith


PS: the two programs this sits between are part of a big, expensive
integrated circuit simulator CAD suite, which I use in place of an
ULTRA-expensive CAD suite, which I need to design chips.  Someday 
there will be an open source replacement, I hope, but until then
I am stuck with funky behavior.  Fortunately, Perl can repair much
file damage.   Also, I can send the Perl code to the proprietary
vendor and say "this is how the data SHOULD look.  Fix the CAD tool."

-- 
Keith Lofstrom          keithl at keithl.com         Voice (503)-520-1993
KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon"
Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs


More information about the Pdx-pm-list mailing list