From jkeen at verizon.net Thu Oct 16 18:44:46 2008 From: jkeen at verizon.net (James E Keenan) Date: Thu, 16 Oct 2008 21:44:46 -0400 Subject: [Neworleans-pm] Out-of-town Visitor Message-ID: I'll be in NOLA Wed Nov 12 - Sat Nov 15 for my approximately biannual musical pub crawl. If there are any Perlmongers who would like together, please let me know. Thanks. Jim Keenan From estrabd at mailcan.com Fri Oct 17 09:09:24 2008 From: estrabd at mailcan.com (B. Estrade) Date: Fri, 17 Oct 2008 11:09:24 -0500 Subject: [Neworleans-pm] http://neworleans.pm.org/ Message-ID: <20081017160924.GL46257@bc3.hpc.lsu.edu> The recent email from James Keen sparked me to think about NOPM for the first time in a long time. I know things have been dead for a while, but why is http://neworleans.pm.org/ resolving to some real estate page? Is this a project of whomever controls the web server that NOPM had been using? I am just curious - I was just surprised to find that there was something strange in its place. And is there an interest in reviving things, even only "virtually"? I am in Baton Rouge, but am still very much interested in Perl. Cheers, Brett -- B. Estrade Louisiana Optical Network Initiative +1.225.578.1920 aim: bz743 :wq From djohn at archdiocese-no.org Fri Oct 17 09:24:00 2008 From: djohn at archdiocese-no.org (David B. John) Date: Fri, 17 Oct 2008 11:24:00 -0500 Subject: [Neworleans-pm] http://neworleans.pm.org/ In-Reply-To: <20081017160924.GL46257@bc3.hpc.lsu.edu> References: <20081017160924.GL46257@bc3.hpc.lsu.edu> Message-ID: <1224260640.12746.22.camel@isd4> On Fri, 2008-10-17 at 11:09 -0500, B. Estrade wrote: > > And is there an interest in reviving things, even only "virtually"? I am in Baton Rouge, but am still very much interested in Perl. > > Cheers, > Brett > Picked up my first Perl book ("Learning Perl", O'reilly) about 6 months ago. Best read ever. I'm interested, if only in a "virtual" sense. David -- God Bless. -- David B. John Department of Information Technology Archdiocese of New Orleans -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From estrabd at mailcan.com Fri Oct 17 09:46:26 2008 From: estrabd at mailcan.com (B. Estrade) Date: Fri, 17 Oct 2008 11:46:26 -0500 Subject: [Neworleans-pm] http://neworleans.pm.org/ In-Reply-To: <1224260640.12746.22.camel@isd4> References: <20081017160924.GL46257@bc3.hpc.lsu.edu> <1224260640.12746.22.camel@isd4> Message-ID: <20081017164626.GM46257@bc3.hpc.lsu.edu> On Fri, Oct 17, 2008 at 11:24:00AM -0500, David B. John wrote: > On Fri, 2008-10-17 at 11:09 -0500, B. Estrade wrote: > > > > > And is there an interest in reviving things, even only "virtually"? I am in Baton Rouge, but am still very much interested in Perl. > > > > Cheers, > > Brett > > > > Picked up my first Perl book ("Learning Perl", O'reilly) about 6 months > ago. Best read ever. I'm interested, if only in a "virtual" sense. Very nice - and thanks for the reply. My favorite Perl book of all time was Damien Conway's Object Oriented Perl. It covers OOP, but only after going over the set of languages features that makes Perl, Perl. So it is worth buying simply for the introduction. I am not sure if it's been update for Perl 6, but I don't think its a problem if not. http://www.amazon.com/Object-Oriented-Perl-Comprehensive-Programming/dp/1884777791 Another favorite of mine is a digest of "The Perl Journel" topics called, Computer Science & Perl Programming. It covers a wide array of important data structure and programming topics. http://www.amazon.com/Computer-Science-Perl-Programming-Best/dp/0596003102 I highly recommend it even if just for the introduction material. I am still getting my head around MJD's Higher Order Perl, but I'd only recommend this if you're interested in functional programming. It is very insightful, though. I use Perl mostly for conceptualizing - I've gotten to the point where I can rapidly protype algorithms and data structures, which is useful in my area of study. I've also been lurking on the Perl 6 dev list, and I am pretty excited about some of the features that they're putting into the language. Cheers, Brett > > David > > -- > God Bless. > -- > David B. John > Department of Information Technology > Archdiocese of New Orleans > _______________________________________________ > NewOrleans-pm mailing list > NewOrleans-pm at pm.org > http://mail.pm.org/mailman/listinfo/neworleans-pm -- B. Estrade Louisiana Optical Network Initiative +1.225.578.1920 aim: bz743 :wq From djohn at archdiocese-no.org Mon Oct 27 13:59:32 2008 From: djohn at archdiocese-no.org (David B. John) Date: Mon, 27 Oct 2008 15:59:32 -0500 Subject: [Neworleans-pm] split vs. match Message-ID: <1225141172.27985.36.camel@isd4> (perl v5.8.8 on Ubuntu Hardy Heron/2.6.24-21-generic.) I have an http logfile I'm trying to parse which looks like: 2008:10:24-00:00:06 x.x.x.x httpproxy[4997]: id="0001" severity="info" sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" srcip="x.x.x.x" user="" statuscode="200" cached="0" profile="profile_0" filteraction="action_REF_DefaultHTTPCFFAction" size="78632" time="105 ms" request="0x90075b60" url="http://sb.google.com/safebrowsing/update?client=navclient-auto-ffox&appver=2.0.0.17&version=goog-white-domain:1:481,goog-white-url:1:371,goog-black-url:1:25401,goog-black-enchash:1:62374" error="" category="175,178" categoryname="Software/Hardware,Internet Services" content-type="text/html" (Nothing Spectacular.) If I loop through the log file and do: my ($date,$fwip,$proxy,$id,$severity,$sys,$sub,$name,$action,$method, $srcip,$user,$statuscode,$cached,$profile,$filteraction,$size,$time, $request,$url,$error,$category,$category_name,$content_type) = $_ =~ /\w+=".*?"|\S+/g; life is good but could be better (~ 75 seconds on a Hp dc7700S for a compressed 500 MB logfile). However, I'd really like to use split b/c it's so much faster (~ 15 seconds). The problem is if I split, sometimes $name, $time or $category_name will include a space within the quotes which I don't want to split on (see above). I used Text::ParseWords but gave up after waiting 5 minutes. I know I can use a regex with split but I'm stumped as to how I would go about writing it. E.g. split on a space except when enclosed in quotes. Also, would it theoretically be any faster than the example above since it's using regex or should I just live with it? Thanks. David From estrabd at mailcan.com Mon Oct 27 14:27:45 2008 From: estrabd at mailcan.com (B. Estrade) Date: Mon, 27 Oct 2008 16:27:45 -0500 Subject: [Neworleans-pm] split vs. match In-Reply-To: <1225141172.27985.36.camel@isd4> References: <1225141172.27985.36.camel@isd4> Message-ID: <20081027212745.GE12981@bc3.hpc.lsu.edu> David, Maybe the following will help: http://oreilly.com/catalog/perlwsmng/chapter/ch08.html I don't know a whole bunch about parsing tons of text with regexes. I do have one immediate suggestion - that "/g" might not be necessary. Anyway, take a look at that link; it might help. Cheers, Brett On Mon, Oct 27, 2008 at 03:59:32PM -0500, David B. John wrote: > (perl v5.8.8 on Ubuntu Hardy Heron/2.6.24-21-generic.) > > I have an http logfile I'm trying to parse which looks like: > > 2008:10:24-00:00:06 x.x.x.x httpproxy[4997]: id="0001" severity="info" > sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" > srcip="x.x.x.x" user="" statuscode="200" cached="0" profile="profile_0" > filteraction="action_REF_DefaultHTTPCFFAction" size="78632" time="105 > ms" request="0x90075b60" > url="http://sb.google.com/safebrowsing/update?client=navclient-auto-ffox&appver=2.0.0.17&version=goog-white-domain:1:481,goog-white-url:1:371,goog-black-url:1:25401,goog-black-enchash:1:62374" error="" category="175,178" categoryname="Software/Hardware,Internet Services" content-type="text/html" > > > (Nothing Spectacular.) > > If I loop through the log file and do: > > my ($date,$fwip,$proxy,$id,$severity,$sys,$sub,$name,$action,$method, > $srcip,$user,$statuscode,$cached,$profile,$filteraction,$size,$time, > $request,$url,$error,$category,$category_name,$content_type) = > $_ =~ /\w+=".*?"|\S+/g; > > life is good but could be better (~ 75 seconds on a Hp dc7700S for a > compressed 500 MB logfile). > > However, I'd really like to use split b/c it's so much faster (~ 15 > seconds). The problem is if I split, sometimes $name, $time or > $category_name will include a space within the quotes which I don't want > to split on (see above). > > I used Text::ParseWords but gave up after waiting 5 minutes. > > I know I can use a regex with split but I'm stumped as to how I would go > about writing it. E.g. split on a space except when enclosed in quotes. > Also, would it theoretically be any faster than the example above since > it's using regex or should I just live with it? > > Thanks. > > David > > > _______________________________________________ > NewOrleans-pm mailing list > NewOrleans-pm at pm.org > http://mail.pm.org/mailman/listinfo/neworleans-pm -- B. Estrade Louisiana Optical Network Initiative +1.225.578.1920 aim: bz743 :wq From donnie at solomonstreet.com Mon Oct 27 22:54:53 2008 From: donnie at solomonstreet.com (Donnie Cameron) Date: Tue, 28 Oct 2008 01:54:53 -0400 Subject: [Neworleans-pm] split vs. match In-Reply-To: <20081027212745.GE12981@bc3.hpc.lsu.edu> References: <1225141172.27985.36.camel@isd4> <20081027212745.GE12981@bc3.hpc.lsu.edu> Message-ID: <24e3b4050810272254s44308700g13d228658010e1f8@mail.gmail.com> David, The split function is not going to make things any faster. In fact, without resorting to the use of another language, I can't think of a faster way of doing it than you have suggested. Even if you were to split on something like a quote followed by a space (/" /) and then reattach the quote to the end of each resulting element (work that is vastly simpler than regex matching), the process would end up being slower than regex matching because the regex maching happens in machine language and the more efficient work happens in Perl. I'm convinced also that even if you were to use the index function, your Perl code would still be slower than the regex-based solution you described. In the past, I have tried a number of tricks to try to beat simple regex matching for this type of work and I've seldom been able to beat the regex matching. (When I write "this type of work", I am of course excluding regular Apache-like log files and other files that are designed to be easy and fast to parse. I'm talking about more thoughtless file designs, such as the one you described.) You could roll out your own C extension, but that's just ridiculous because the hardware to process the slower and more general Perl regex would be less expensive than your time. I don't know how you timed the split function, but I suspect that it was much faster because its regex was probably much simpler. If you try the split function with a more complicated regex, I'm sure you'll find that split isn't so fast any more. You do need the /g at the end, of course. --Donnie On Mon, Oct 27, 2008 at 5:27 PM, B. Estrade wrote: > David, > > Maybe the following will help: > > http://oreilly.com/catalog/perlwsmng/chapter/ch08.html > > I don't know a whole bunch about parsing tons of text with regexes. I do > have one immediate suggestion - that "/g" might not be necessary. Anyway, > take a look at that link; it might help. > > Cheers, > Brett > > On Mon, Oct 27, 2008 at 03:59:32PM -0500, David B. John wrote: > > (perl v5.8.8 on Ubuntu Hardy Heron/2.6.24-21-generic.) > > > > I have an http logfile I'm trying to parse which looks like: > > > > 2008:10:24-00:00:06 x.x.x.x httpproxy[4997]: id="0001" severity="info" > > sys="SecureWeb" sub="http" name="http access" action="pass" method="GET" > > srcip="x.x.x.x" user="" statuscode="200" cached="0" profile="profile_0" > > filteraction="action_REF_DefaultHTTPCFFAction" size="78632" time="105 > > ms" request="0x90075b60" > > url=" > http://sb.google.com/safebrowsing/update?client=navclient-auto-ffox&appver=2.0.0.17&version=goog-white-domain:1:481,goog-white-url:1:371,goog-black-url:1:25401,goog-black-enchash:1:62374" > error="" category="175,178" categoryname="Software/Hardware,Internet > Services" content-type="text/html" > > > > > > (Nothing Spectacular.) > > > > If I loop through the log file and do: > > > > my > ($date,$fwip,$proxy,$id,$severity,$sys,$sub,$name,$action,$method, > > $srcip,$user,$statuscode,$cached,$profile,$filteraction,$size,$time, > > $request,$url,$error,$category,$category_name,$content_type) = > > $_ =~ /\w+=".*?"|\S+/g; > > > > life is good but could be better (~ 75 seconds on a Hp dc7700S for a > > compressed 500 MB logfile). > > > > However, I'd really like to use split b/c it's so much faster (~ 15 > > seconds). The problem is if I split, sometimes $name, $time or > > $category_name will include a space within the quotes which I don't want > > to split on (see above). > > > > I used Text::ParseWords but gave up after waiting 5 minutes. > > > > I know I can use a regex with split but I'm stumped as to how I would go > > about writing it. E.g. split on a space except when enclosed in quotes. > > Also, would it theoretically be any faster than the example above since > > it's using regex or should I just live with it? > > > > Thanks. > > > > David > > > > > > _______________________________________________ > > NewOrleans-pm mailing list > > NewOrleans-pm at pm.org > > http://mail.pm.org/mailman/listinfo/neworleans-pm > > -- > B. Estrade > Louisiana Optical Network Initiative > +1.225.578.1920 aim: bz743 > :wq > _______________________________________________ > NewOrleans-pm mailing list > NewOrleans-pm at pm.org > http://mail.pm.org/mailman/listinfo/neworleans-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From djohn at archdiocese-no.org Tue Oct 28 05:51:11 2008 From: djohn at archdiocese-no.org (David B. John) Date: Tue, 28 Oct 2008 07:51:11 -0500 Subject: [Neworleans-pm] split vs. match In-Reply-To: <24e3b4050810272254s44308700g13d228658010e1f8@mail.gmail.com> References: <1225141172.27985.36.camel@isd4> <20081027212745.GE12981@bc3.hpc.lsu.edu> <24e3b4050810272254s44308700g13d228658010e1f8@mail.gmail.com> Message-ID: <1225198271.6315.5.camel@isd4> On Tue, 2008-10-28 at 01:54 -0400, Donnie Cameron wrote: > David, > > The split function is not going to make things any faster. In fact, > without resorting to the use of another language, I can't think of a > faster way of doing it than you have suggested. Even if you were to > split on something like a quote followed by a space (/" /) and then > reattach the quote to the end of each resulting element (work that is > vastly simpler than regex matching), the process would end up being > slower than regex matching because the regex maching happens in > machine language and the more efficient work happens in Perl. I'm > convinced also that even if you were to use the index function, your > Perl code would still be slower than the regex-based solution you > described. > > In the past, I have tried a number of tricks to try to beat simple > regex matching for this type of work and I've seldom been able to beat > the regex matching. (When I write "this type of work", I am of course > excluding regular Apache-like log files and other files that are > designed to be easy and fast to parse. I'm talking about more > thoughtless file designs, such as the one you described.) > > You could roll out your own C extension, but that's just ridiculous > because the hardware to process the slower and more general Perl regex > would be less expensive than your time. > > I don't know how you timed the split function, but I suspect that it > was much faster because its regex was probably much simpler. If you > try the split function with a more complicated regex, I'm sure you'll > find that split isn't so fast any more. > > You do need the /g at the end, of course. > > --Donnie > Thanks Donnie. I can live with that. :) David -------------- next part -------------- An HTML attachment was scrubbed... URL: From estrabd at mailcan.com Tue Oct 28 06:17:59 2008 From: estrabd at mailcan.com (B. Estrade) Date: Tue, 28 Oct 2008 08:17:59 -0500 Subject: [Neworleans-pm] [neworleans-pm-owner@pm.org: Re: split vs. match] Message-ID: <20081028131759.GI12981@bc3.hpc.lsu.edu> Sorry for the dupes - it's been so long, I can't keep straight which email address is subscribed here :)...my reply is below. On Tue, Oct 28, 2008 at 07:51:11AM -0500, David B. John wrote: > On Tue, 2008-10-28 at 01:54 -0400, Donnie Cameron wrote: > > > David, > > > > The split function is not going to make things any faster. In fact, > > without resorting to the use of another language, I can't think of a > > faster way of doing it than you have suggested. Even if you were to > > split on something like a quote followed by a space (/" /) and then > > reattach the quote to the end of each resulting element (work that is > > vastly simpler than regex matching), the process would end up being > > slower than regex matching because the regex maching happens in > > machine language and the more efficient work happens in Perl. I'm > > convinced also that even if you were to use the index function, your > > Perl code would still be slower than the regex-based solution you > > described. > > > > In the past, I have tried a number of tricks to try to beat simple > > regex matching for this type of work and I've seldom been able to beat > > the regex matching. (When I write "this type of work", I am of course > > excluding regular Apache-like log files and other files that are > > designed to be easy and fast to parse. I'm talking about more > > thoughtless file designs, such as the one you described.) > > > > You could roll out your own C extension, but that's just ridiculous > > because the hardware to process the slower and more general Perl regex > > would be less expensive than your time. > > > > I don't know how you timed the split function, but I suspect that it > > was much faster because its regex was probably much simpler. If you > > try the split function with a more complicated regex, I'm sure you'll > > find that split isn't so fast any more. > > > > You do need the /g at the end, of course. Do you or don't you? I am not familiar with using the "g" switch in a pure match - I usually just use it when doing global search and replaces. > > > > --Donnie > > > > Thanks Donnie. I can live with that. :) David: Is this for Apache? If so, you are treading on well-worn ground - http://www.google.com/search?q=perl+parse+apache+log+file&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a Also, if you want to analyze your log files, you may want to check out AWStats - http://awstats.sourceforge.net/. Lastly, you can try an approach that essentially parses parts of the file in parallel. I am not familiar with writing multi-threaded Perl scripts, but that would allow you to get further speed-up once you've found the magic regex to use. Of course, you might have to deal with bringing back the results in some ordered way, so it is a rather advanced approach to take. Cheers, Brett > David > > -- B. Estrade Louisiana Optical Network Initiative +1.225.578.1920 aim: bz743 :wq From estrabd at mailcan.com Tue Oct 28 06:22:58 2008 From: estrabd at mailcan.com (B. Estrade) Date: Tue, 28 Oct 2008 08:22:58 -0500 Subject: [Neworleans-pm] Perl 6? Message-ID: <20081028132258.GJ12981@bc3.hpc.lsu.edu> Has anyone been following or playing with Perl 6/Parrot? I've been lurking on the lists since the early days, and have gone through fits where I contribute automated smoke tests, but that is about it. For those of you who recall my Perl FLaT project (http://www.0x743.com/flat), I may try to reimplement some of it in Perl 6. I have a feeling that with all the Perl magick on which I rely, it is bound to break something :). cheers, Brett -- B. Estrade Louisiana Optical Network Initiative +1.225.578.1920 aim: bz743 :wq From djohn at archdiocese-no.org Tue Oct 28 09:20:06 2008 From: djohn at archdiocese-no.org (David B. John) Date: Tue, 28 Oct 2008 11:20:06 -0500 Subject: [Neworleans-pm] split vs. match In-Reply-To: <20081028131345.GG12981@bc3.hpc.lsu.edu> References: <1225141172.27985.36.camel@isd4> <20081027212745.GE12981@bc3.hpc.lsu.edu> <24e3b4050810272254s44308700g13d228658010e1f8@mail.gmail.com> <1225198271.6315.5.camel@isd4> <20081028131345.GG12981@bc3.hpc.lsu.edu> Message-ID: <1225210806.9168.24.camel@isd4> On Tue, 2008-10-28 at 08:13 -0500, B. Estrade wrote: > > > > > > You do need the /g at the end, of course. > > Do you or don't you? I am not familiar with using the "g" switch in a pure match - I usually just use it when doing global search and replaces. > If /g was not used then it would only match once. > > David: > > Is this for Apache? If so, you are treading on well-worn ground - > > http://www.google.com/search?q=perl+parse+apache+log+file&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a I wish it was, that would be easier ;) This is how the Astaro Firewall spits out the http proxy logs. > > Also, if you want to analyze your log files, you may want to check out AWStats - http://awstats.sourceforge.net/. > > Lastly, you can try an approach that essentially parses parts of the file in parallel. I am not familiar with writing multi-threaded Perl scripts, but that would allow you to get further speed-up once you've found the magic regex to use. Of course, you might have to deal with bringing back the results in some ordered way, so it is a rather advanced approach to take. Yep, we use AWStats for the web sites on Plesk. I'll be alright with the regex approach. Essentially, I wrote a parser in bash a while back that matches keywords and excludes certain url's in order to see who's browsing pron @ work. I've been too lazy to redo it right in Perl. Just getting around to it. ;) On Tue, 2008-10-28 at 08:16 -0500, B. Estrade wrote: > > > You should check out a fairly old language called APL - > http://en.wikipedia.org/wiki/APL_programming_language > > People are saying that Perl is becoming more LISP-like and more > APL-like :). > Now that's intriguing. LISP to. I wonder how much LISP T2 has? ;) David -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From jkeen at verizon.net Tue Oct 28 19:01:48 2008 From: jkeen at verizon.net (James E Keenan) Date: Tue, 28 Oct 2008 22:01:48 -0400 Subject: [Neworleans-pm] Perl 6? In-Reply-To: References: Message-ID: <12CE158D-44D4-4161-8B65-1F26061C858B@verizon.net> On Oct 28, 2008, at 3:00 PM, neworleans-pm-request at pm.org wrote: > > Message: 1 > Date: Tue, 28 Oct 2008 08:22:58 -0500 > From: "B. Estrade" > Subject: [Neworleans-pm] Perl 6? > To: New Orleans Perl Mongers > > Has anyone been following or playing with Perl 6/Parrot? I've been > lurking on the lists since the early days, and have gone through > fits where I contribute automated smoke tests, but that is about it. > > I've been participating in the Parrot project since November 2006 (about 8 months after my last visit to NOLA). I fell into it by accident while attending a Perl hackathon in Chicago that year and running out of other things to do. Since I have only elementary C skills and no formal background in compilers or virtual machines, I can't claim to have worked on the essential parts of Parrot. But I have become a de factor maintainer of the Perl 5 code used in the 'configure' and 'build' stages of Parrot build. I spoke on those efforts at YAPC::NA::2007 in Houston (http://thenceforward.net/perl/ yapc/YAPC-NA-2007/cft.tgz) and at YAPC in Chicago this year I led a 'Parrot/Rakudo' build fest, the object of which was to have people build and compile Parrot on their laptops (or remote machines) and then have them build and compile Rakudo -- the Perl 6 implementation on Parrot -- just enough to get to "Hello world" in Perl 6. I have not followed the Perl 6 language discussions at all, and I don't have enough tuits to follow Rakudo's ongoing development. However, in our local Perl user group, Perl Seminar NY, we are planning to have short, bimonthly sessions organized around the theme of "Perl 5 to Perl 6," the focus of which will be to let people see what features of Perl 6 are currently available in the Rakudo implementation and how they differ from Perl 5. Perhaps we could get together and discuss this when I'm in New Orleans in a few weeks. And the lead Rakudo developer is Patrick Michaud, who lives outside Dallas which, in the larger scheme of things, is not that far from your town. Jim Keenan BTW: Did anyone bother to archive the New Orleans perlmongers wiki before the site lapsed?