From autarch at urth.org Wed Sep 7 11:05:49 2005 From: autarch at urth.org (Dave Rolsky) Date: Wed, 7 Sep 2005 13:05:49 -0500 (CDT) Subject: [Mpls-pm] Tech Meeting next Wednesday, 7 PM at TechPro Message-ID: Date & time: Wednesday, September 14, 2005, 7 PM Where: Tech-Pro - 3000 Centre Point Drive, Roseville, MN http://maps.google.com/maps?q=3000+Centre+Point+Drive,+Roseville,+MN&spn=0.021465,0.040525&hl=en Topic: Exception Handling, Logging, and Parameter Validation in Perl (and if there's time, I could do the DateTime talk too, but we'll see) Also: free pizza courtesy of Tech-Pro. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From gary.vollink at gmail.com Wed Sep 7 14:23:39 2005 From: gary.vollink at gmail.com (Gary Vollink) Date: Wed, 7 Sep 2005 16:23:39 -0500 Subject: [Mpls-pm] Tech Meeting next Wednesday, 7 PM at TechPro In-Reply-To: References: Message-ID: I, for one, welcome our new Pizza overlords. No, wait... I mean, I'll be there, notebook in hand. On 9/7/05, Dave Rolsky wrote: > Date & time: Wednesday, September 14, 2005, 7 PM > > Where: Tech-Pro - 3000 Centre Point Drive, Roseville, MN > http://maps.google.com/maps?q=3000+Centre+Point+Drive,+Roseville,+MN&spn=0.021465,0.040525&hl=en > > Topic: Exception Handling, Logging, and Parameter Validation in Perl > (and if there's time, I could do the DateTime talk too, but we'll see) > > Also: free pizza courtesy of Tech-Pro. > > > > -dave > > /*=================================================== > VegGuide.Org www.BookIRead.com > Your guide to all that's veg. My book blog > ===================================================*/ > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > From autarch at urth.org Fri Sep 9 14:11:04 2005 From: autarch at urth.org (Dave Rolsky) Date: Fri, 9 Sep 2005 16:11:04 -0500 (CDT) Subject: [Mpls-pm] Getting free ORA books Message-ID: We can get free books as part of the user group program. I'd be happy to request these as long as someone promises to write a review of the book in question. We can post the reviews on our pm group website. Let me know if you want a book to review. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From autarch at urth.org Fri Sep 9 14:35:24 2005 From: autarch at urth.org (Dave Rolsky) Date: Fri, 9 Sep 2005 16:35:24 -0500 (CDT) Subject: [Mpls-pm] Getting free ORA books In-Reply-To: References: Message-ID: [ Moving back to the list ] On Fri, 9 Sep 2005, Gary Vollink wrote: > ORA is O'Reilly & Assoc isn't it. Duh. Silly me. I'm still not > sure, but hmmm. Perl specific categories, random selection, or pick > from the library? That is to say, Learning Python, or Windows 2003 > Server in a Nutshell probably wouldn't be my cup of tea. This can be any ORA book, I think. They're probably more interested in having new books reviewed, of course. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From andy at petdance.com Fri Sep 9 14:41:44 2005 From: andy at petdance.com (Andy Lester) Date: Fri, 9 Sep 2005 16:41:44 -0500 Subject: [Mpls-pm] Getting free ORA books In-Reply-To: References: Message-ID: <20050909214144.GA10937@petdance.com> On Fri, Sep 09, 2005 at 04:35:24PM -0500, Dave Rolsky (autarch at urth.org) wrote: > > ORA is O'Reilly & Assoc isn't it. Duh. Silly me. I'm still not Actually, they're now ORM: O'Reilly Media. xoxo, Andy -- Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance From ringert at consumption.net Fri Sep 9 15:02:19 2005 From: ringert at consumption.net (ringert@consumption.net) Date: Fri, 9 Sep 2005 15:02:19 -0700 Subject: [Mpls-pm] Getting free ORA books In-Reply-To: References: Message-ID: <1126303339.4322066b1a250@www.consumption.net> Quoting Dave Rolsky : > [ Moving back to the list ] > > On Fri, 9 Sep 2005, Gary Vollink wrote: > > > ORA is O'Reilly & Assoc isn't it. Duh. Silly me. I'm still not > > sure, but hmmm. Perl specific categories, random selection, or pick > > from the library? That is to say, Learning Python, or Windows 2003 > > Server in a Nutshell probably wouldn't be my cup of tea. > > This can be any ORA book, I think. They're probably more interested in > having new books reviewed, of course. > > > -dave I would be happy to read and review at least these titles: Switching to VoIP http://www.oreilly.com/catalog/switchingvoip/ Learning SQL http://www.oreilly.com/catalog/learningsql/ ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From gary.vollink at gmail.com Fri Sep 9 15:05:54 2005 From: gary.vollink at gmail.com (Gary Vollink) Date: Fri, 9 Sep 2005 17:05:54 -0500 Subject: [Mpls-pm] Getting free ORA books In-Reply-To: <20050909214144.GA10937@petdance.com> References: <20050909214144.GA10937@petdance.com> Message-ID: Dave, Andy: Thanks for that mental adjustment. :-) *was thinking 'ORA' eq 'Oracle', and feeling silly for it* Dave, In their 'coming this month' catalog, I would be very interested in 'Asterisk: The Future of Telephony'. I would be happy to review it. Please, do get details, and share them if you can. From perlmongers.20.thulben at spamgourmet.com Tue Sep 13 11:58:10 2005 From: perlmongers.20.thulben at spamgourmet.com (perlmongers.20.thulben@spamgourmet.com) Date: Tue, 13 Sep 2005 13:58:10 -0500 Subject: [Mpls-pm] Mpls-pm Digest, Vol 17, Issue 1 In-Reply-To: References: Message-ID: <95cd929105091311586f6cb2f7@mail.gmail.com> Do we need to RSVP for this seeing as how there will be space and food considerations? On 9/7/05, mpls-pm-request at pm.org <+perlmongers+thulben+4f5113147f.mpls-pm-request#pm.org at spamgourmet.com> wrote: > Send Mpls-pm mailing list submissions to > mpls-pm at pm.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.pm.org/mailman/listinfo/mpls-pm > or, via email, send a message with subject or body 'help' to > mpls-pm-request at pm.org > > You can reach the person managing the list at > mpls-pm-owner at pm.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Mpls-pm digest..." > > > Today's Topics: > > 1. Tech Meeting next Wednesday, 7 PM at TechPro (Dave Rolsky) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 7 Sep 2005 13:05:49 -0500 (CDT) > From: Dave Rolsky > Subject: [Mpls-pm] Tech Meeting next Wednesday, 7 PM at TechPro > To: Mpls-pm at pm.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > Date & time: Wednesday, September 14, 2005, 7 PM > > Where: Tech-Pro - 3000 Centre Point Drive, Roseville, MN > http://maps.google.com/maps?q=3000+Centre+Point+Drive,+Roseville,+MN&spn=0.021465,0.040525&hl=en > > Topic: Exception Handling, Logging, and Parameter Validation in Perl > (and if there's time, I could do the DateTime talk too, but we'll see) > > Also: free pizza courtesy of Tech-Pro. > > > > -dave > > /*=================================================== > VegGuide.Org www.BookIRead.com > Your guide to all that's veg. My book blog > ===================================================*/ > > > ------------------------------ > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > > End of Mpls-pm Digest, Vol 17, Issue 1 > ************************************** > From ejseim at tech-pro.com Tue Sep 13 12:05:29 2005 From: ejseim at tech-pro.com (Ehren J. Seim) Date: Tue, 13 Sep 2005 14:05:29 -0500 Subject: [Mpls-pm] Mpls-pm Digest, Vol 17, Issue 1 Message-ID: We have enough space and pizza planned for roughly 25-30 individuals (can probably accommodate more). A general understanding of the number attending would help, but not required....unless we're expecting over 30? -Ehren -----Original Message----- From: mpls-pm-bounces at pm.org [mailto:mpls-pm-bounces at pm.org] On Behalf Of perlmongers.20.thulben at spamgourmet.com Sent: Tuesday, September 13, 2005 1:58 PM To: mpls-pm at pm.org Subject: Re: [Mpls-pm] Mpls-pm Digest, Vol 17, Issue 1 Do we need to RSVP for this seeing as how there will be space and food considerations? On 9/7/05, mpls-pm-request at pm.org <+perlmongers+thulben+4f5113147f.mpls-pm-request#pm.org at spamgourmet.com> wrote: > Send Mpls-pm mailing list submissions to > mpls-pm at pm.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.pm.org/mailman/listinfo/mpls-pm > or, via email, send a message with subject or body 'help' to > mpls-pm-request at pm.org > > You can reach the person managing the list at > mpls-pm-owner at pm.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Mpls-pm digest..." > > > Today's Topics: > > 1. Tech Meeting next Wednesday, 7 PM at TechPro (Dave Rolsky) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 7 Sep 2005 13:05:49 -0500 (CDT) > From: Dave Rolsky > Subject: [Mpls-pm] Tech Meeting next Wednesday, 7 PM at TechPro > To: Mpls-pm at pm.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > Date & time: Wednesday, September 14, 2005, 7 PM > > Where: Tech-Pro - 3000 Centre Point Drive, Roseville, MN > http://maps.google.com/maps?q=3000+Centre+Point+Drive,+Roseville,+MN&spn =0.021465,0.040525&hl=en > > Topic: Exception Handling, Logging, and Parameter Validation in Perl > (and if there's time, I could do the DateTime talk too, but we'll see) > > Also: free pizza courtesy of Tech-Pro. > > > > -dave > > /*=================================================== > VegGuide.Org www.BookIRead.com > Your guide to all that's veg. My book blog > ===================================================*/ > > > ------------------------------ > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > > End of Mpls-pm Digest, Vol 17, Issue 1 > ************************************** > _______________________________________________ Mpls-pm mailing list Mpls-pm at pm.org http://mail.pm.org/mailman/listinfo/mpls-pm This electronic mail (including any attachments) may contain information that is privileged, confidential, and/or otherwise protected from disclosure to anyone other than its intended recipient(s). Any dissemination or use of this electronic email or its contents (including any attachments) by persons other than the intended recipient(s) is strictly prohibited. If you have received this message in error, please notify us immediately by reply email so that we may correct our internal records. Please then delete the original message (including any attachments) in its entirety. Thank you. From autarch at urth.org Tue Sep 13 12:15:50 2005 From: autarch at urth.org (Dave Rolsky) Date: Tue, 13 Sep 2005 14:15:50 -0500 (CDT) Subject: [Mpls-pm] Mpls-pm Digest, Vol 17, Issue 1 In-Reply-To: <95cd929105091311586f6cb2f7@mail.gmail.com> References: <95cd929105091311586f6cb2f7@mail.gmail.com> Message-ID: On Tue, 13 Sep 2005, perlmongers.20.thulben at spamgourmet.com wrote: > Do we need to RSVP for this seeing as how there will be space and food > considerations? Probably not. I told Ehren at Tech-Pro to expect 15-20, which is typical of past talks. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From twists at gmail.com Tue Sep 13 12:22:39 2005 From: twists at gmail.com (Joshua ben Jore) Date: Tue, 13 Sep 2005 14:22:39 -0500 Subject: [Mpls-pm] Mpls-pm Digest, Vol 17, Issue 1 In-Reply-To: References: Message-ID: On 9/13/05, Ehren J. Seim wrote: > We have enough space and pizza planned for roughly 25-30 individuals > (can probably accommodate more). A general understanding of the number > attending would help, but not required....unless we're expecting over > 30? I'm bringing myself. Josh From craig at wavefront.net Wed Sep 14 08:31:41 2005 From: craig at wavefront.net (Craig S. Wilson) Date: Wed, 14 Sep 2005 10:31:41 -0500 Subject: [Mpls-pm] Getting free ORA books In-Reply-To: References: Message-ID: <4328425D.7020001@wavefront.net> Dave Rolsky wrote: > We can get free books as part of the user group program. I'd be happy to > request these as long as someone promises to write a review of the book in > question. We can post the reviews on our pm group website. > > Let me know if you want a book to review. I would be interested in _Perl Best Practices_. -- --------------------------------------- Craig S. Wilson craig at wavefront.net WaveFront Communications, Inc. 1677 Lake Valentine Road Arden Hills MN 55112-2840 1.651.638.9594 1.612.865.8794 =============================== Note: If you send me HTML-mail, it will probably end up in my SPAM bucket. --------------------------------------- From shane at aptest.com Wed Sep 14 09:14:10 2005 From: shane at aptest.com (Shane McCarron) Date: Wed, 14 Sep 2005 11:14:10 -0500 Subject: [Mpls-pm] Getting free ORA books In-Reply-To: <4328425D.7020001@wavefront.net> References: <4328425D.7020001@wavefront.net> Message-ID: <43284C52.1010403@aptest.com> I bet you would - I've seen your code ;-) Craig S. Wilson wrote: >Dave Rolsky wrote: > > >>We can get free books as part of the user group program. I'd be happy to >>request these as long as someone promises to write a review of the book in >>question. We can post the reviews on our pm group website. >> >>Let me know if you want a book to review. >> >> > >I would be interested in _Perl Best Practices_. > > > -- Shane P. McCarron Phone: +1 763 786-8160 x120 Managing Director Fax: +1 763 786-8180 ApTest Minnesota Inet: shane at aptest.com From gary.vollink at gmail.com Wed Sep 14 20:48:39 2005 From: gary.vollink at gmail.com (Gary Vollink) Date: Wed, 14 Sep 2005 22:48:39 -0500 Subject: [Mpls-pm] Big Thanks and Reminders Message-ID: I'd like to shout out a big "THANK YOU" to Ehren Seim, and his employer Tech-Pro for hosting tonight. I, for one, quite enjoyed the whole experience. Everybody (including me) Lightning Talks. Just because I can't think of a subject right now, doesn't mean I won't think of one over the next week or so. So think of lightning talk subjects that you might want to bring up (and I'll be thinking of them too). Dave, you mentioned "remind me to send functional examples of some of this code", and I would also suggest that the slides / presentation would also be useful. (Pretty please). Thanks, Gary Allen http://www.vollink.com/gary/ From autarch at urth.org Wed Sep 14 21:04:03 2005 From: autarch at urth.org (Dave Rolsky) Date: Wed, 14 Sep 2005 23:04:03 -0500 (CDT) Subject: [Mpls-pm] Big Thanks and Reminders In-Reply-To: References: Message-ID: On Wed, 14 Sep 2005, Gary Vollink wrote: > I'd like to shout out a big "THANK YOU" to Ehren Seim, and his > employer Tech-Pro for hosting tonight. I, for one, quite enjoyed the > whole experience. Seconded. > Dave, you mentioned "remind me to send functional examples of some of > this code", and I would also suggest that the slides / presentation > would also be useful. (Pretty please). The slides are here: http://hew.ca/cgi-bin/page.pl?mode=View&fieldnum=25 I'll send the code separately. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From autarch at urth.org Wed Sep 14 21:09:01 2005 From: autarch at urth.org (Dave Rolsky) Date: Wed, 14 Sep 2005 23:09:01 -0500 (CDT) Subject: [Mpls-pm] Code I whiteboarded tonight Message-ID: So I said I'd send this out. The discussion was related to overriding $SIG{__DIE__} and how you can do that more or less safely. So here's the $SIG{__DIE__} handler used by Mason: sub rethrow_exception { my ($err) = @_; return unless $err; if ( UNIVERSAL::can($err, 'rethrow') ) { $err->rethrow; } elsif ( ref $err ) { die $err; } HTML::Mason::Exception->throw(error => $err); } The key here is that it's careful not to mess with refs/objects, so they are just passed through as is. The rethrow() method is provided via Exception::Class::Base. It's also important to note that Exception::Class::Base objects carry a stack trace with them, so that even after they're rethrown the stack trace still leads you back to where the original error occurred, not to this error handling code. Also note that our idiom for overriding $SIG{__DIE__} looks more or less like this: sub foo { local $SIG{__DIE__} = \&rethrow_exception; eval { thing(); other_thing(); whatever(); }; } So the override is not happening everywhere, nor does it happen simply by _loading_ Mason's modules. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From twists at gmail.com Wed Sep 14 21:12:40 2005 From: twists at gmail.com (Joshua ben Jore) Date: Wed, 14 Sep 2005 23:12:40 -0500 Subject: [Mpls-pm] Big Thanks and Reminders In-Reply-To: References: Message-ID: On 9/14/05, Gary Vollink wrote: > Everybody (including me) Lightning Talks. Just because I can't think > of a subject right now, doesn't mean I won't think of one over the > next week or so. So think of lightning talk subjects that you might > want to bring up (and I'll be thinking of them too). I'd like to do one on one of the things I've been playing with recently. I can't say I've actually found any production level uses for any of these but its the stuff that interests me. The Data::Postponed thing is actually documented and usable. I'd also like to do it because I need the practice. Coercively inverting execution flow using Data::Postponed. Extending regexp syntax to add locally useful stuff. A lispy view into perl (this is great for extending B::Lint and writing your own stricter-than-strict rules) Josh From twists at gmail.com Wed Sep 14 21:17:25 2005 From: twists at gmail.com (Joshua ben Jore) Date: Wed, 14 Sep 2005 23:17:25 -0500 Subject: [Mpls-pm] Code I whiteboarded tonight In-Reply-To: References: Message-ID: On 9/14/05, Dave Rolsky wrote: > So I said I'd send this out. > > The discussion was related to overriding $SIG{__DIE__} and how you can do > that more or less safely. > > So here's the $SIG{__DIE__} handler used by Mason: > > sub rethrow_exception { > my ($err) = @_; > return unless $err; > > if ( UNIVERSAL::can($err, 'rethrow') ) { > $err->rethrow; > } > elsif ( ref $err ) { > die $err; > } If you're going to use isa()/can() and you think your code might be used by someone who writes tests, you should be using them as methods anytime you have an object. People using Test::MockObject rely on the method form of isa() being overrideable. I haven't used T::MO myself but I think I'm accurately relaying what the Test::MockObject users are typically complaining about. Josh From ttausend at gmail.com Thu Sep 15 07:06:43 2005 From: ttausend at gmail.com (Troy E. Hove Tausend) Date: Thu, 15 Sep 2005 09:06:43 -0500 Subject: [Mpls-pm] Code I whiteboarded tonight In-Reply-To: References: Message-ID: <200509150906.44009.ttausend@gmail.com> On Wednesday 14 September 2005 11:17 pm, Joshua ben Jore wrote: > > If you're going to use isa()/can() and you think your code might be > used by someone who writes tests, you should be using them as methods > anytime you have an object. People using Test::MockObject rely on the > method form of isa() being overrideable. > > I haven't used T::MO myself but I think I'm accurately relaying what > the Test::MockObject users are typically complaining about. > > Josh > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm I have used Test::MockObject. I would say that it was not intended to replace "real" objects in the tests for those classes/objects which happens to be the only place that my code gets picky about pedigree. It is useful for when you are testing something else that you might be throwing your objects at and need a stand-in/stunt double because the "real" object does undesireable things. It appears also that there is now UNIVERSAL::isa and UNVERSAL::can modules used by Test::MockObject to "fix" the functions so they will behave using the classes methods when appropriate. They are very recent (copyright 2005). I see chromatic has now also added Test::MockObject::Extends so you can selectively mock methods of an existing class/object. I ended up writing a module that does basically the same thing, but on a much more subtle level so that even stack traces don't see the subterfuge and the mocking can occur dynamically during execution, plus it works against plain old subroutines as well. I'll look into submitting a patch, although I think the goals are different enough that my fancy pants stuff doesn't make sense as a subset of Test::MockObject. -- Troy E. Hove Tausend ttausend at hotmail.com ttausend at gmail.com ttausend at mn.rr.com From gypsy at freeq.com Thu Sep 15 08:49:49 2005 From: gypsy at freeq.com (Gypsy Rogers) Date: 15 Sep 2005 15:49:49 -0000 Subject: [Mpls-pm] Managed Hosting Woes (off topic) Message-ID: <20050915154949.89619.qmail@ll.gypsy.org> A scant few of you will remember that Once apon a time I owned a hosting company and hosted the Twin Cities Perl Monger's server before pm.org gave us all these nice toys. I sold my hosting business several years ago but have always maintained a single server for my stuff. Now, I'm having a very hard time with Managed/Dedicated services from venders. I thought I'd throw this out to the local Perl Mongers to see if any of you have any good recomendations for a reasonably priced Dedicated server hosting company, or Co-Location hosting, or would be interested in starting a Internet hosting Co-op with me. See the details in my LiveJournal http://www.livejournal.com/users/hightekvagabond/88745.html Please reply there or to me directly rather then spamming the list. Thanks for your input :) From autarch at urth.org Sat Sep 17 21:54:24 2005 From: autarch at urth.org (Dave Rolsky) Date: Sat, 17 Sep 2005 23:54:24 -0500 (CDT) Subject: [Mpls-pm] Call for lightning talks for October Message-ID: So next month we're planning to do lightning talks. I'm thinking that 90 minutes or so with a break in the middle should be good. At conferences where this is done there's usually a pretty strict 5 minute time limit but in our case I don't know that we'll have 16 different talks, so I'm thinking 5-10 minutes per talk. Please send talk submissions to me directly so I can put together a schedule. The talk submission should be a title, a very brief description (1-2 sentences) and how much time you want. Multiple submissions are great, but if we get lots of submissions I'll just pick one. I'd like to come up with a schedule by the beginning of October so please submit yours soon. Some possible talk ideas: - Some modules on CPAN I love - Some modules on CPAN I have to use but hate - A great module idea - Why I love Perl - Why I hate Perl - Why I love technology X - A cool app/tool I wrote - Cool thing you've never heard about The great thing about lightning talks is that you really don't need to be a super-experienced presenter to put together 5 minutes worth of stuff. I'd also encourage people not to worry about slides too much either. Realistically, you only have time for a few slides at most, so this isn't something that should you should spend lots of your time on. And of course, with a 5 minute talk, it's easy enough to have no slides at all. Also, for those who are interested, the tool I used to make my slides for the last talk is called Spork and it's on CPAN. There's also a really nice plugin called Spork::Hilite for doing cool code highlighting. -dave /*=================================================== VegGuide.Org www.BookIRead.com Your guide to all that's veg. My book blog ===================================================*/ From ken at mathforum.org Sun Sep 18 18:14:07 2005 From: ken at mathforum.org (Ken Williams) Date: Sun, 18 Sep 2005 20:14:07 -0500 Subject: [Mpls-pm] Call for lightning talks for October In-Reply-To: References: Message-ID: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> On Sep 17, 2005, at 11:54 PM, Dave Rolsky wrote: > I'd also encourage people not to worry about slides too much either. > Realistically, you only have time for a few slides at most, so this > isn't > something that should you should spend lots of your time on. And of > course, with a 5 minute talk, it's easy enough to have no slides at > all. For those with slides, it would be good to either put them up on the web, or convert them to PDF, so we can switch from one person's talk to another easily. And when I say "we," I unfortunately don't include me, because I'll be on the east coast on the 12th. -Ken From craig at wavefront.net Sun Sep 18 18:25:42 2005 From: craig at wavefront.net (Craig S. Wilson) Date: Sun, 18 Sep 2005 20:25:42 -0500 Subject: [Mpls-pm] Call for lightning talks for October In-Reply-To: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> References: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> Message-ID: <432E1396.5000403@wavefront.net> Ken Williams wrote: > For those with slides, it would be good to either put them up on the > web, or convert them to PDF, so we can switch from one person's talk to > another easily. I would give a talk about my not yet finished PDF::Table module, except that I am teaching a class on Wednesday night this semester. Maybe I will tell my students that we will be meeting at the Perl Mongers meeting that night. -- --------------------------------------- Craig S. Wilson craig at wavefront.net WaveFront Communications, Inc. 1677 Lake Valentine Road Arden Hills MN 55112-2840 1.651.638.9594 1.612.865.8794 =============================== Note: If you send me HTML-mail, it will probably end up in my SPAM bucket. --------------------------------------- From rfischer at corradiation.net Mon Sep 19 08:30:21 2005 From: rfischer at corradiation.net (Robert Fischer) Date: Mon, 19 Sep 2005 10:30:21 -0500 (CDT) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <432E1396.5000403@wavefront.net> References: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> <432E1396.5000403@wavefront.net> Message-ID: <31596.198.203.175.175.1127143821.squirrel@webmail.corradiation.net> I've got a problem that just came up at work. Any help? Given an arbitrary string and a collection of regular expressions, and assuming that one and only one of the regular expressions match the string, what is the best way to find the matching regular expression? ~~ Robert Fischer. rfischer at corradiation.net 651-398-8010 From gypsy at freeq.com Mon Sep 19 08:47:17 2005 From: gypsy at freeq.com (Gypsy Rogers) Date: 19 Sep 2005 15:47:17 -0000 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <> References: <> Message-ID: <20050919154717.3562.qmail@ll.gypsy.org> The most straight forward is a series of elsif statements, not the prettiest but it works clean. if ($string =~ /regex1/){ $match = "regex1"; } elsif ($string =~ /regex2/) { $match = "regex2";} elsif ($string =~ /regex3/) { $match = "regex3";} else { $match = "Your string is worthless"; } No hubburis here, and I'm sure someone else will come up with a much more fun and acrobatic way to do it, but I've learned to keep my code easy for Jr Level coders to read to save me headaches. So, this works. On Mon, 19 Sep 2005 10:30:21 -0500 (CDT), "Robert Fischer" wrote : > I've got a problem that just came up at work. Any help? > > Given an arbitrary string and a collection of regular expressions, and > assuming that one and only one of the regular expressions match the > string, what is the best way to find the matching regular expression? > > ~~ Robert Fischer. > rfischer at corradiation.net > 651-398-8010 > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > > > From ian at indecorous.com Mon Sep 19 08:55:45 2005 From: ian at indecorous.com (Ian Malpass) Date: Mon, 19 Sep 2005 16:55:45 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <31596.198.203.175.175.1127143821.squirrel@webmail.corradiation.net> References: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> <432E1396.5000403@wavefront.net> <31596.198.203.175.175.1127143821.squirrel@webmail.corradiation.net> Message-ID: On Mon, 19 Sep 2005, Robert Fischer wrote: > I've got a problem that just came up at work. Any help? > > Given an arbitrary string and a collection of regular expressions, and > assuming that one and only one of the regular expressions match the > string, what is the best way to find the matching regular expression? Are you looking for something more efficient than trying each one until you find one that matches? If not, this would work, I think: use strict; use warnings; my $string = "A fool and his money are soon parted"; my %patterns = ( foo => qr/foo/, bar => qr/bar/, baz => qr/baz/ ); while ( my ( $name, $pattern ) = each %patterns ) { if ( $string =~ $pattern ) { print "Matched $name\n"; last; } } Tries them in hash-random order. Ian P.S. Gypsy's if/elsif solution is also fine, but requires a bit more cut-and-paste when adding or removing patterns later. Depends on the number of patterns you have, and how often they change, I suppose. Also a judgement call on which is easier for other coders to understand. I personally prefer mine (but then I suppose I would) since there's no duplicated code. - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass From jim at acadcam.com Mon Sep 19 09:00:08 2005 From: jim at acadcam.com (Jim Anderson) Date: Mon, 19 Sep 2005 11:00:08 -0500 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <31596.198.203.175.175.1127143821.squirrel@webmail.corradiation.net>; from rfischer@corradiation.net on Mon, Sep 19, 2005 at 10:30:21AM -0500 References: <6e9122820cf1ade7091c4a3f3cf063fd@mathforum.org> <432E1396.5000403@wavefront.net> <31596.198.203.175.175.1127143821.squirrel@webmail.corradiation.net> Message-ID: <20050919110008.A2047@acadcam.com> On Mon, Sep 19, 2005 at 10:30:21AM -0500, Robert Fischer wrote: > I've got a problem that just came up at work. Any help? > > Given an arbitrary string and a collection of regular expressions, and > assuming that one and only one of the regular expressions match the > string, what is the best way to find the matching regular expression? Beyond the suggested "just keep trying them sequentially", if you have a history of the arbitrary strings, you could sort the regular expressions by frequency of occurance if performance is an issue. -- Jim Anderson (612) 782-0456 jim at acadcam.com Anderson CAD/CAM, Inc Lucifer designed MS-DOS to try 2500 Highway 88, Suite 108 men's souls. St Anthony, MN 55418 Then he had a better idea... From gypsy at freeq.com Mon Sep 19 09:12:01 2005 From: gypsy at freeq.com (Gypsy Rogers) Date: 19 Sep 2005 16:12:01 -0000 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <> References: <> Message-ID: <20050919161201.12935.qmail@ll.gypsy.org> Like I said, one of my biggest motivations recently has become to make sure I don't have to spend time explaining my code to someone else when I hand it off to them to maintain. That being said, yeah, if you have more then 3 expressions to check the loop is probably the way to go. I just threw out the first thing to pop to my mind. :) On Mon, 19 Sep 2005 16:55:45 +0100 (BST), Ian Malpass wrote : > On Mon, 19 Sep 2005, Robert Fischer wrote: > > > I've got a problem that just came up at work. Any help? > > > > Given an arbitrary string and a collection of regular expressions, and > > assuming that one and only one of the regular expressions match the > > string, what is the best way to find the matching regular expression? > > Are you looking for something more efficient than trying each one until > you find one that matches? > > If not, this would work, I think: > > use strict; > use warnings; > > my $string = "A fool and his money are soon parted"; > > my %patterns = ( > foo => qr/foo/, > bar => qr/bar/, > baz => qr/baz/ > ); > > while ( my ( $name, $pattern ) = each %patterns ) { > if ( $string =~ $pattern ) { > print "Matched $name\n"; > last; > } > } > > Tries them in hash-random order. > > Ian > > P.S. Gypsy's if/elsif solution is also fine, but requires a bit more > cut-and-paste when adding or removing patterns later. Depends on the > number of patterns you have, and how often they change, I suppose. Also a > judgement call on which is easier for other coders to understand. I > personally prefer mine (but then I suppose I would) since there's no > duplicated code. > > - > --------------------------------------------------------------------------- > > The soul would have no rainbows if the eyes held no tears. > > Ian Malpass > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > > > From twists at gmail.com Mon Sep 19 09:28:42 2005 From: twists at gmail.com (Joshua ben Jore) Date: Mon, 19 Sep 2005 11:28:42 -0500 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <20050919161201.12935.qmail@ll.gypsy.org> References: <20050919161201.12935.qmail@ll.gypsy.org> Message-ID: On 19 Sep 2005 16:12:01 -0000, Gypsy Rogers wrote: > > Like I said, one of my biggest motivations recently has become to make sure > I don't have to spend time explaining my code to someone else when I hand it > off to them to maintain. > > That being said, yeah, if you have more then 3 > expressions to check the loop is probably the way to go. I just threw out > the first thing to pop to my mind. :) I've been hoping someone would mention List::Util::first. use List::Util 'first'; $matching_expression = first { $text =~ $_ } @candidate_expressions Josh From gypsy at freeq.com Mon Sep 19 09:42:26 2005 From: gypsy at freeq.com (Gypsy Rogers) Date: 19 Sep 2005 16:42:26 -0000 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <> References: <> Message-ID: <20050919164226.22493.qmail@ll.gypsy.org> Congradulations, your hope has been forfilled, someone just did: You! *winks* On Mon, 19 Sep 2005 11:28:42 -0500, Joshua ben Jore wrote : > On 19 Sep 2005 16:12:01 -0000, Gypsy Rogers wrote: > > > > Like I said, one of my biggest motivations recently has become to make sure > > I don't have to spend time explaining my code to someone else when I hand it > > off to them to maintain. > > > > That being said, yeah, if you have more then 3 > > expressions to check the loop is probably the way to go. I just threw out > > the first thing to pop to my mind. :) > > I've been hoping someone would mention List::Util::first. > > use List::Util 'first'; > $matching_expression = first { $text =~ $_ } @candidate_expressions > > Josh > > > From ian at indecorous.com Mon Sep 19 10:14:10 2005 From: ian at indecorous.com (Ian Malpass) Date: Mon, 19 Sep 2005 18:14:10 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> Message-ID: On Mon, 19 Sep 2005, Joshua ben Jore wrote: > I've been hoping someone would mention List::Util::first. > > use List::Util 'first'; > $matching_expression = first { $text =~ $_ } @candidate_expressions Problem with that is that you get a regex out at the end, which you then need to hack on to make useful. List::Util has useful stuff in it, and if I was doing the list manipulations it covers frequently in a piece of code I'd probably use it, but it seems a shame to require the module just to get a for loop with a break in it. I also don't find the above syntax all that clear. Especially if the use and the call to first() are separated by a lot of code. The name 'first' is fairly self-explanatory, I suppose, but it's not a standard perl function, which might fox the unwary. All comes down to philosophy and house style, in the end. And documentation. There is, after all, more than one way to do it. Ian - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass From rfischer at corradiation.net Mon Sep 19 10:39:27 2005 From: rfischer at corradiation.net (Robert Fischer) Date: Mon, 19 Sep 2005 12:39:27 -0500 (CDT) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> Message-ID: <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> List::Util::first is a great shortcut for implementing the sequential-search algorithm, but I'm looking for something better than sequential-search. I need to rephrase the problem, though, because I realize I left a major aspect out of the phrasing: Given a *collection of* strings and a collection of regular expression strings, where it is known that each string is matched by precisely one regular expression, how do you most efficiently develop the mapping? Given that arrangement, I'm currently looking at lexically sorting the strings and getting an MRU list for the pattern matches. That implementation assumes that strings which are lexically close are liable to match the same (or similar) regular expressions. As a tangential note: is there a concept of a "distance" between regular expressions which can be reasonably implemented? If so, has anyone implemented it yet? String distance certainly doesn't work, because \d{3} and [1-90][1-90][1-90] are implementation-identical, but have a drastic edit distance. ~~ Robert Fischer. rfischer at corradiation.net 651-398-8010 > On Mon, 19 Sep 2005, Joshua ben Jore wrote: > >> I've been hoping someone would mention List::Util::first. >> >> use List::Util 'first'; >> $matching_expression = first { $text =~ $_ } @candidate_expressions > > Problem with that is that you get a regex out at the end, which you then > need to hack on to make useful. > > List::Util has useful stuff in it, and if I was doing the list > manipulations it covers frequently in a piece of code I'd probably use it, > but it seems a shame to require the module just to get a for loop with a > break in it. > > I also don't find the above syntax all that clear. Especially if the use > and the call to first() are separated by a lot of code. The name 'first' > is fairly self-explanatory, I suppose, but it's not a standard perl > function, which might fox the unwary. > > All comes down to philosophy and house style, in the end. And > documentation. There is, after all, more than one way to do it. > > Ian > > - > --------------------------------------------------------------------------- > > The soul would have no rainbows if the eyes held no tears. > > Ian Malpass > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > From ian at indecorous.com Mon Sep 19 11:11:24 2005 From: ian at indecorous.com (Ian Malpass) Date: Mon, 19 Sep 2005 19:11:24 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> Message-ID: On Mon, 19 Sep 2005, Robert Fischer wrote: > I need to rephrase the problem, though, because I realize I left a major > aspect out of the phrasing: Given a *collection of* strings and a > collection of regular expression strings, where it is known that each > string is matched by precisely one regular expression, how do you most > efficiently develop the mapping? > > Given that arrangement, I'm currently looking at lexically sorting the > strings and getting an MRU list for the pattern matches. That > implementation assumes that strings which are lexically close are liable > to match the same (or similar) regular expressions. Are they, though? "Alphabet" and "ZZ 9 Plural Z Alpha" both match /Alpha/. And /^.*Alpha.*$/ for that matter, if you need to match the entire string. A bit of a pathalogical case, I know, but it illustrates the point. I'm not saying your approach won't bring some improvement, but the extra list maintenance and sorting will add overhead too. I'd profile it against a more straightforward approach (e.g. nested loops) and see what benefits it brings. Probably depends on the set sizes. Certainly depends on the set contents. More complexity brings more opportunity for things to go wrong, and more difficulty maintaining things later :( Ian - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass From ngraham at urth.org Mon Sep 19 12:27:24 2005 From: ngraham at urth.org (Nathan Graham) Date: Mon, 19 Sep 2005 14:27:24 -0500 (CDT) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> Message-ID: <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> > I need to rephrase the problem, though, because I realize I left a major > aspect out of the phrasing: Given a *collection of* strings and a > collection of regular expression strings, where it is known that each > string is matched by precisely one regular expression, how do you most > efficiently develop the mapping? If you have lots of patterns and only a few strings, then using the study() function might speed things up. If you just want a 1 to 1 mapping, then I would suggest using a two dimensional array like so: use strict; my @strings = qw(foo bar baz); my @matches; my %patterns = ( foo => qr/foo/, bar => qr/bar/, baz => qr/baz/ ); for(@strings) { study; while ( my ( $name, $pattern ) = each %patterns ) if (/$pattern/) { push(@matches, [ $_, $pattern ]); delete $patterns{$pattern}; } } } for(@matches) { print "$_->[0] matches $_->[1]\n" } Don't know if this helps. -Nathan From glenn at easy-access.com Mon Sep 19 12:46:02 2005 From: glenn at easy-access.com (Glenn Bushee) Date: Mon, 19 Sep 2005 14:46:02 -0500 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> Message-ID: If I understand the problem correctly, it reminds me of something Friedl mentioned in a talk that he was doing about his work at Yahoo -- making a regex engine so that company names (and various versions of them) could be efficiently substituted in news articles with the stock symbol and lookup link. Unfortunately, I couldn't find a module on CPAN that facilitates this or includes his code, but this might give a different direction to go on this -- perhaps even contacting Friedl directly. And if you were going to take the collection and perform a search like what was mentioned before, essentially iterating through the collection and then through the regexes for each, perhaps you could do an up-front sorting of the collection and the regexes. For example, if the collection is of various lengths: # try to match the easier ones (roughly speaking) first @collection = sort { length($a) <=> length($b) } @collection; # sort out the regexes by complexity (this could get ugly) or # at least get the faster matches up front # Perhaps a hash could be used instead to build a smarter version of this. # Example below is to pull out the regexes that do a begining match first # eg: @regexes = ('/abc/', '/^abc/', '/abc$/', '/^abc$/'); @regexes = sort { $b =~ m#^/\^# <=> $a =~ m#^/\^# } @regexes; Then use these to do the iterative approach. And if the regex is to only be used once in the assumed 1-to-1 relationship, you can splice it out of @regexes as you go onto the next string. I hope this wasn't too confusing or off the track you were going. - Glenn On 9/19/05, Ian Malpass wrote: > On Mon, 19 Sep 2005, Robert Fischer wrote: > > > I need to rephrase the problem, though, because I realize I left a major > > aspect out of the phrasing: Given a *collection of* strings and a > > collection of regular expression strings, where it is known that each > > string is matched by precisely one regular expression, how do you most > > efficiently develop the mapping? > > > > Given that arrangement, I'm currently looking at lexically sorting the > > strings and getting an MRU list for the pattern matches. That > > implementation assumes that strings which are lexically close are liable > > to match the same (or similar) regular expressions. > > Are they, though? "Alphabet" and "ZZ 9 Plural Z Alpha" both match /Alpha/. > And /^.*Alpha.*$/ for that matter, if you need to match the entire string. > A bit of a pathalogical case, I know, but it illustrates the point. > > I'm not saying your approach won't bring some improvement, but the extra > list maintenance and sorting will add overhead too. I'd profile it against > a more straightforward approach (e.g. nested loops) and see what benefits > it brings. Probably depends on the set sizes. Certainly depends on the set > contents. > > More complexity brings more opportunity for things to go wrong, and more > difficulty maintaining things later :( > > Ian > > - > --------------------------------------------------------------------------- > > The soul would have no rainbows if the eyes held no tears. > > Ian Malpass > > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > From rfischer at corradiation.net Mon Sep 19 13:26:37 2005 From: rfischer at corradiation.net (Robert Fischer) Date: Mon, 19 Sep 2005 15:26:37 -0500 (CDT) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> Message-ID: <18115.198.203.175.175.1127161597.squirrel@webmail.corradiation.net> Glenn and Ian: What you're talking about is exactly on the line of what I'm looking at. The relationship is not 1-to-1 (it's many-to-one), but the thoughts on sorting and the like is exactly what I'm looking at. ~~ Robert. > If I understand the problem correctly, it reminds me of something > Friedl mentioned in a talk that he was doing about his work at Yahoo > -- making a regex engine so that company names (and various versions > of them) could be efficiently substituted in news articles with the > stock symbol and lookup link. Unfortunately, I couldn't find a module > on CPAN that facilitates this or includes his code, but this might > give a different direction to go on this -- perhaps even contacting > Friedl directly. > > And if you were going to take the collection and perform a search like > what was mentioned before, essentially iterating through the > collection and then through the regexes for each, perhaps you could do > an up-front sorting of the collection and the regexes. > > For example, if the collection is of various lengths: > > # try to match the easier ones (roughly speaking) first > @collection = sort { length($a) <=> length($b) } @collection; > > # sort out the regexes by complexity (this could get ugly) or > # at least get the faster matches up front > # Perhaps a hash could be used instead to build a smarter version of this. > # Example below is to pull out the regexes that do a begining match first > # eg: @regexes = ('/abc/', '/^abc/', '/abc$/', '/^abc$/'); > @regexes = sort { $b =~ m#^/\^# <=> $a =~ m#^/\^# } @regexes; > > Then use these to do the iterative approach. And if the regex is to > only be used once in the assumed 1-to-1 relationship, you can splice > it out of @regexes as you go onto the next string. > > I hope this wasn't too confusing or off the track you were going. > > - Glenn > > > > > On 9/19/05, Ian Malpass wrote: >> On Mon, 19 Sep 2005, Robert Fischer wrote: >> >> > I need to rephrase the problem, though, because I realize I left a >> major >> > aspect out of the phrasing: Given a *collection of* strings and a >> > collection of regular expression strings, where it is known that each >> > string is matched by precisely one regular expression, how do you most >> > efficiently develop the mapping? >> > >> > Given that arrangement, I'm currently looking at lexically sorting the >> > strings and getting an MRU list for the pattern matches. That >> > implementation assumes that strings which are lexically close are >> liable >> > to match the same (or similar) regular expressions. >> >> Are they, though? "Alphabet" and "ZZ 9 Plural Z Alpha" both match >> /Alpha/. >> And /^.*Alpha.*$/ for that matter, if you need to match the entire >> string. >> A bit of a pathalogical case, I know, but it illustrates the point. >> >> I'm not saying your approach won't bring some improvement, but the extra >> list maintenance and sorting will add overhead too. I'd profile it >> against >> a more straightforward approach (e.g. nested loops) and see what >> benefits >> it brings. Probably depends on the set sizes. Certainly depends on the >> set >> contents. >> >> More complexity brings more opportunity for things to go wrong, and more >> difficulty maintaining things later :( >> >> Ian >> >> - >> --------------------------------------------------------------------------- >> >> The soul would have no rainbows if the eyes held no tears. >> >> Ian Malpass >> >> _______________________________________________ >> Mpls-pm mailing list >> Mpls-pm at pm.org >> http://mail.pm.org/mailman/listinfo/mpls-pm >> > _______________________________________________ > Mpls-pm mailing list > Mpls-pm at pm.org > http://mail.pm.org/mailman/listinfo/mpls-pm > ~~ Robert Fischer. rfischer at corradiation.net 651-398-8010 From ian at indecorous.com Mon Sep 19 13:56:53 2005 From: ian at indecorous.com (Ian Malpass) Date: Mon, 19 Sep 2005 21:56:53 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> Message-ID: On Mon, 19 Sep 2005, Nathan Graham wrote: > If you have lots of patterns and only a few strings, then using the > study() function might speed things up. Oh yes, I'd forgotten about study(). Note that the documentaton is equivocal about the performance benefits of study()[0], since there is overhead in studying a string, and no guarantee of it giving you anything in return. Anyone want to do a lightning talk on profiling and benchmarking in perl? Ian [0] "...you probably want to compare run times with and without it to see which runs faster" - perldoc -f study - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass From andy at petdance.com Mon Sep 19 14:01:44 2005 From: andy at petdance.com (Andy Lester) Date: Mon, 19 Sep 2005 16:01:44 -0500 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> Message-ID: <20050919210144.GB18287@petdance.com> On Mon, Sep 19, 2005 at 09:56:53PM +0100, Ian Malpass (ian at indecorous.com) wrote: > Oh yes, I'd forgotten about study(). Note that the documentaton is > equivocal about the performance benefits of study()[0], since there is > overhead in studying a string, and no guarantee of it giving you anything > in return. As I understand it, study says "I'm going to find the first instance of each of the 256 possible characters in the string, so that strings anchored with a literal can start right there." This means, if you have $str = "Minneapolis Perl Mongers"; then $str =~ /Perl/ can start right at position 12 rather than scanning to find it, and $str =~ /fried cheese/ doesn't have to match anything because there's no "f" in $str, $str =~ /.+flooble/ doesn't do anything differently. xox,o Andy -- Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance From ian at indecorous.com Mon Sep 19 16:32:00 2005 From: ian at indecorous.com (Ian Malpass) Date: Tue, 20 Sep 2005 00:32:00 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <20050919210144.GB18287@petdance.com> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> <20050919210144.GB18287@petdance.com> Message-ID: On Mon, 19 Sep 2005, Andy Lester wrote: > Date: Mon, 19 Sep 2005 16:01:44 -0500 > From: Andy Lester > To: Ian Malpass > Cc: mpls-pm at pm.org > Subject: Re: [Mpls-pm] Interesting RegEx Problem > As I understand it, study says "I'm going to find the first instance of each of > the 256 possible characters in the string, so that strings anchored with > a literal can start right there." It actually goes a little further, and does some symbol frequency checking too. From the perldoc: >>>>> The way "study" works is this: a linked list of every character in the string to be searched is made, so we know, for example, where all the 'k' characters are. From each search string, the rarest character is selected, based on some static frequency tables constructed from some C programs and English text. Only those places that contain this "rarest" character are examined. For example, here is a loop that inserts index producing entries before any line containing a certain pattern: while (<>) { study; print ".IX foo\n" if /\bfoo\b/; print ".IX bar\n" if /\bbar\b/; print ".IX blurfl\n" if /\bblurfl\b/; # ... print; } In searching for "/\bfoo\b/", only those locations in $_ that contain "f" will be looked at, because "f" is rarer than "o". In general, this is a big win except in pathological cases. The only question is whether it saves you more time than it took to build the linked list in the first place. <<<<< So, if all the patterns are equally likely to match, then the amount of time it takes to run find the pattern that matches will be directly proportional to the number of patters. If all the input strings are of similar lengths, then I would guess that study() will take a similar amount of time on each, and that it would take longer on longer strings. That would suggest there's a tipping point beyond which using study is of benefit, but I think that can only be determined empirically :) The nice thing is that because it's so localised in the code, it's easy to turn it on or off with a flag or environment variable. Even if it's not of benefit now, it might be in two years' time when the number of patterns has inflated wildly.... Certainly it should be documented in the code that study() might be applied at a given point even if it isn't used. Still, "Optimise later, if at all", and all that jazz. Ian - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass From rfischer at corradiation.net Mon Sep 19 16:59:15 2005 From: rfischer at corradiation.net (Robert Fischer) Date: Mon, 19 Sep 2005 18:59:15 -0500 (CDT) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> <20050919210144.GB18287@petdance.com> Message-ID: <2438.209.98.66.210.1127174355.squirrel@webmail.corradiation.net> > Still, > > "Optimise later, if at all", and all that jazz. > > Ian > That's generally the plan right now. I'm more looking for good algorithms right now: I'll leave performance tweaks like "study" for if/when this hunk of code becomes a performance bottleneck worth being concerned about. ~~ Robert Fischer. rfischer at corradiation.net 651-398-8010 From ken at mathforum.org Mon Sep 19 17:35:18 2005 From: ken at mathforum.org (Ken Williams) Date: Mon, 19 Sep 2005 19:35:18 -0500 Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> Message-ID: <8b1f1faad135bddb1bad4555311a423b@mathforum.org> On Sep 19, 2005, at 12:39 PM, Robert Fischer wrote: > List::Util::first is a great shortcut for implementing the > sequential-search algorithm, but I'm looking for something better than > sequential-search. Check out Regexp::Assemble: http://search.cpan.org/~dland/Regexp-Assemble-0.17/Assemble.pm It delves into the guts of the regexes and finds a good (i.e. short) regex that will match the alternation of the set of regexes. For example: my $ra = Regexp::Assemble->new; $ra->add( 'ab+c' ); $ra->add( 'ab+\\d*\\s+c' ); $ra->add( 'a\\w+\\d+' ); $ra->add( 'a\\d+' ); print $ra->re; # prints (?:a(?:b+(?:\d*\s+)?c|(?:\w+)?\d+)) It can also track which of the original patterns matched. > As a tangential note: is there a concept of a "distance" between > regular > expressions which can be reasonably implemented? If so, has anyone > implemented it yet? String distance certainly doesn't work, because > \d{3} > and [1-90][1-90][1-90] are implementation-identical, but have a drastic > edit distance. Your best bet might be to convert each regex to a canonical DFA (if possible - it won't be possible if you're using capturing groups or other regex extensions) and then use a graph-theoretical metric on the underlying DFA. Definitely a research problem, I'm thinking. =) -Ken From ian at indecorous.com Mon Sep 19 19:25:23 2005 From: ian at indecorous.com (Ian Malpass) Date: Tue, 20 Sep 2005 03:25:23 +0100 (BST) Subject: [Mpls-pm] Interesting RegEx Problem In-Reply-To: <2438.209.98.66.210.1127174355.squirrel@webmail.corradiation.net> References: <20050919161201.12935.qmail@ll.gypsy.org> <42144.198.203.175.175.1127151567.squirrel@webmail.corradiation.net> <38225.24.223.246.250.1127158044.squirrel@webmail.urth.org> <20050919210144.GB18287@petdance.com> <2438.209.98.66.210.1127174355.squirrel@webmail.corradiation.net> Message-ID: On Mon, 19 Sep 2005, Robert Fischer wrote: > That's generally the plan right now. I'm more looking for good algorithms > right now: I'll leave performance tweaks like "study" for if/when this > hunk of code becomes a performance bottleneck worth being concerned about. Then I would suggest doing two loops, with the inner one breaking when you hit a match. Using qr// to pre-build the regexes (as we've done in the various examples posted) will save you much of the overhead[0], as well as making your code clearer. It'll work, and it'll be clear, it'll be easy to debug, and it's an isolated section of code that you can pull out and replace with a fancier algorithm later. Later might be quite soon, but the two loops are five minutes of code writing. Finish the whole app, benchmark it and profile it, and then you'll know if getting fancy is (a) necessary, and (b) effective. Without the benchmark of the simple solution, how will you know how good your better algorithm is, and whether its worth losing clarity and ease-of-maintenance for the performance improvement? I'll stop now ;) Ian [0] Of course, we've been blithly assuming that the patterns are loop-invariant, so precompiling them with qr// is valid, but I got that impression from your problem statement :) - --------------------------------------------------------------------------- The soul would have no rainbows if the eyes held no tears. Ian Malpass