From westerman at purdue.edu Thu Apr 5 06:24:32 2007 From: westerman at purdue.edu (Rick Westerman) Date: Thu, 05 Apr 2007 09:24:32 -0400 Subject: [Purdue-pm] Next technical meeting in 5 days Message-ID: <4614F890.700@purdue.edu> A reminder that the next Perl Mongers meeting is in 5 days. Tuesday, April 10th. *6:00-7:30pm, ME 119. Hope to see you there! * -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From westerman at purdue.edu Tue Apr 10 07:21:31 2007 From: westerman at purdue.edu (Rick Westerman) Date: Tue, 10 Apr 2007 10:21:31 -0400 Subject: [Purdue-pm] Technical meeting tonight. Message-ID: <461B9D6B.2020609@purdue.edu> We have our April technical meeting tonight, *6:00-7:30pm, Tue Apr 10 2007* Mechanical Engineering 119 . The topics include: Logging Unneeded Complexity How using BioPerl? made my program run 250 times slower How improving my program made it 54,000% faster The Basics of Perl on Win32 Clockwork Magick HOP to it! A review of the "Higher Order Perl" book Hope to see you there! -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From westerman at purdue.edu Tue Apr 10 07:24:23 2007 From: westerman at purdue.edu (Rick Westerman) Date: Tue, 10 Apr 2007 10:24:23 -0400 Subject: [Purdue-pm] Technical meeting tonight (without the hypertext this time!) Message-ID: <461B9E17.8030306@purdue.edu> We have our April technical meeting tonight, 6:00-7:30pm, Tue Apr 10 2007, Mechanical Engineering 119. The topics include: Logging Unneeded Complexity How using BioPerl made my program run 250 times slower How improving my program made it 54,000% faster The Basics of Perl on Win32 Clockwork Magick HOP to it! A review of the "Higher Order Perl" book Hope to see you there! -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From jacoby at csociety.ecn.purdue.edu Tue Apr 10 07:33:13 2007 From: jacoby at csociety.ecn.purdue.edu (David Jacoby) Date: Tue, 10 Apr 2007 10:33:13 -0400 (EDT) Subject: [Purdue-pm] Technical meeting tonight (without the hypertext this time!) In-Reply-To: <461B9E17.8030306@purdue.edu> Message-ID: On Tue, 10 Apr 2007, Rick Westerman wrote: > We have our April technical meeting tonight, 6:00-7:30pm, Tue Apr 10 > 2007, Mechanical Engineering 119. The topics include: > > Logging > Unneeded Complexity > How using BioPerl made my program run 250 times slower > How improving my program made it 54,000% faster > The Basics of Perl on Win32 > Clockwork Magick > HOP to it! A review of the "Higher Order Perl" book It's a shame that I'm probably going to miss the front part, because that sounds like some great presentations. And because I can't promise to have the laptop, I won't be talking on Win32 Perl. I'll have a second presentation on programming images, instead. If I can make it.... > Hope to see you there! -- Dave Jacoby -- jacoby at csociety.org "After three days without programming, life becomes meaningless." The Tao of Programming 657 days and counting..... From jacoby at csociety.ecn.purdue.edu Tue Apr 10 20:40:48 2007 From: jacoby at csociety.ecn.purdue.edu (David Jacoby) Date: Tue, 10 Apr 2007 23:40:48 -0400 (EDT) Subject: [Purdue-pm] Missed Presentation Message-ID: I certainly hope that you guys aren't still waiting for me. Sorry I couldn't make it. I had a talk on programming images in Perl put together, drawn partially from _Graphics Programming With Perl_, which is now available for check-out. The code and examples are up on my Presentations area, here: http://csociety.org/~jacoby/Presentations/Images/ So, I'd love to hear comments on, or see the slides from, the presentations I missed. 54,000% faster, Mark? You must be slinging black magic, or you must've coded it in bad ways the first time. And Rick? Looking at closures for another presentation, it struck me that, once you go there, once you start thinking about that, you open the door into everything HOP goes into, so I think I really missed something by missing your talk. There is something I *can* present to you, even having missed the meeting. PerlCast has brian d foy talking about Benchmarking and his upcoming book from O'Reilly, _Mastering Perl_. From the look of it, that book will be the next in the list of Perl 'must-reads'. http://www.perlcast.com/audio/Perlcast_Presentation_002.mp3 http://www/perlcast.com/ -- Dave Jacoby -- jacoby at csociety.org "After three days without programming, life becomes meaningless." The Tao of Programming 657 days and counting..... From westerman at purdue.edu Wed Apr 11 08:04:12 2007 From: westerman at purdue.edu (Rick Westerman) Date: Wed, 11 Apr 2007 11:04:12 -0400 Subject: [Purdue-pm] Missed Presentation In-Reply-To: References: Message-ID: <461CF8EC.7030206@purdue.edu> David Jacoby wrote: > I certainly hope that you guys aren't still waiting for me. > We aren't. The meeting, as usual, was lightly attended (6 of us) but very informative. > Sorry I couldn't make it. I had a talk on programming ... Since you have so many talks in store, we decided to make the next the technical meeting an "all Dave all the time" meeting. :-) Actually I will probably continue exploring the Higher Order Perl (HOP) book since I was only able to present the first 5 chapters last night. What I did present is now on-line at the Mongers web site. Mark's "unneeded complexity" talked segued nicely into my HOP talk -- we both touched on the idea of table-driven programming. The 54,000% faster lightning talk of Mark's was about using multiple simple regex's or-ed together instead of a single complex regex. Personally I can't replicate his results but then I don't have his code in front of me. He also may have been using an unneeded capture in his complex regex. Greg's talk about BioPerl slowing down his program was interesting. It is hard to compare his BioPerl-based program to his non-BioPerl-based program since they are slightly different (although reading the same type of data files.) Personally I'm not very fond of BioPerl since it seems to be a large cumbersome beast. Phillip likes it though. It would be interesting to do an exact side-by-side comparison but I don't see anyone doing this in the near future. Unfortunately I was late and missed Doug's talk on logging. -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From mark at ecn.purdue.edu Wed Apr 11 09:41:17 2007 From: mark at ecn.purdue.edu (Mark Senn) Date: Wed, 11 Apr 2007 12:41:17 -0400 Subject: [Purdue-pm] Missed Presentation In-Reply-To: <461CF8EC.7030206@purdue.edu> References: <461CF8EC.7030206@purdue.edu> Message-ID: <26218.1176309677@pier.ecn.purdue.edu> > The 54,000% faster lightning talk of Mark's was about using multiple > simple regex's or-ed together instead of a single complex regex. > Personally I can't replicate his results but then I don't have his code > in front of me. He also may have been using an unneeded capture in his > complex regex. To get the 54,000% speedup the following was done o Four regexs that were being done near the end of ~2000 regexes were moved so they would be at the beginning. o Right after some of these common cases were eliminated from $_ a "study" command was done to (probably) make the rest of the ~2000 regexes run faster. o Regexes of the form /a|b|c/ were changed to /a/ || /b/ || /c/. I didn't do timings for how much difference was due to the different factors. Here's a test program to compare /a/ || /b/ vs. /a|b/ ===== start with next line #!/usr/local/bin/perl $dict = '/usr/dict/words'; @ARGV = $dict; @word = <>; print scalar @word, " words read from $dict\n"; use Benchmark; timethese ( 100 => { 'expression or' => '@x = grep /ae/ || /ei/ || /io/ || /ou/ || /uy/, @word', 'expression or o' => '@x = grep /ae/o || /ei/o || /io/o || /ou/o || /uy/o, @word', 'regex or x' => '@x = grep /ae | ei | io | ou | uy/x, @word', 'regex or ox' => '@x = grep /ae | ei | io | ou | uy/ox, @word', } ); ===== end with previous line I get % ./benchmark.pl 25143 words read from /usr/dict/words Benchmark: timing 100 iterations of expression or, expression or o, regex or ox, regex or x... expression or: 5 wallclock secs ( 4.85 usr + 0.00 sys = 4.85 CPU) @ 20.62/s (n=100) expression or o: 5 wallclock secs ( 4.86 usr + 0.00 sys = 4.86 CPU) @ 20.58/s (n=100) regex or ox: 12 wallclock secs (11.63 usr + 0.01 sys = 11.64 CPU) @ 8.59/s (n=100) regex or x: 11 wallclock secs (11.60 usr + 0.00 sys = 11.60 CPU) @ 8.62/s (n=100) -mark From andy at petdance.com Wed Apr 11 10:28:47 2007 From: andy at petdance.com (Andy Lester) Date: Wed, 11 Apr 2007 12:28:47 -0500 Subject: [Purdue-pm] Missed Presentation In-Reply-To: <26218.1176309677@pier.ecn.purdue.edu> References: <461CF8EC.7030206@purdue.edu> <26218.1176309677@pier.ecn.purdue.edu> Message-ID: <43B19578-9F5E-4577-B037-B94564B1D9C6@petdance.com> On Apr 11, 2007, at 11:41 AM, Mark Senn wrote: > o Right after some of these common cases were eliminated from $_ > a "study" command was done to (probably) make the rest of the > ~2000 regexes run faster. study is not helpful in the vast majority of cases. All it does is make a table of where the first occurrence of each of 256 bytes is in the string. This means that if you have a 1,000-character string, and you search for lots of strings that begin with a constant character, then the matcher can jump right to it. For example: "This is a very long [... 900 characters skipped...] string that I have here, ending at position 1000" Now, if you are matching this against the regex /Icky/, the matcher will try to find the first letter "I" that matches. That may take scanning through the first 900+ characters until you get to it. But what study does is build a table of the 256 possible bytes and where they first appear, so that in this case, the scanner can jump right to that position and start matching. -- Andy Lester => andy at petdance.com => www.petdance.com => AIM:petdance From westerman at purdue.edu Wed Apr 11 11:07:07 2007 From: westerman at purdue.edu (Rick Westerman) Date: Wed, 11 Apr 2007 14:07:07 -0400 Subject: [Purdue-pm] Missed Presentation In-Reply-To: <26218.1176309677@pier.ecn.purdue.edu> References: <461CF8EC.7030206@purdue.edu> <26218.1176309677@pier.ecn.purdue.edu> Message-ID: <461D23CB.5070702@purdue.edu> Mark Senn wrote: > > To get the 54,000% speedup the following was done ...o Regexes of the form /a|b|c/ were changed to /a/ || /b/ || /c/. > Yes. Using a modified version of your program I see that writing the regex in the latter form is faster by about a factor of two. This could be significant with a large data file or a slower machine. -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building From westerman at purdue.edu Mon Apr 23 07:22:54 2007 From: westerman at purdue.edu (Rick Westerman) Date: Mon, 23 Apr 2007 10:22:54 -0400 Subject: [Purdue-pm] Reminder: Social meeting tomorrow night, April 24th In-Reply-To: <15520.1177330342@pier.ecn.purdue.edu> References: <15520.1177330342@pier.ecn.purdue.edu> Message-ID: <462CC13E.3080500@purdue.edu> * Just a reminder. Social Meeting, 7:00pm, Tue Apr 24 2007* Cafe Royale -- Rick Westerman westerman at purdue.edu Bioinformatics specialist at the Genomics Facility. Phone: (765) 494-0505 FAX: (765) 496-7255 Department of Horticulture and Landscape Architecture 625 Agriculture Mall Drive West Lafayette, IN 47907-2010 Physically located in room S049, WSLR building