From joey at joeykelly.net Mon Oct 4 16:08:54 2004 From: joey at joeykelly.net (Joey Kelly) Date: Mon Oct 4 15:57:28 2004 Subject: [Neworleans-pm] web malware blocker Message-ID: <200410041608.54115.joey@joeykelly.net> Y'all, I've come up with an interesting project. As most of you know, it is possible to filter for viruses on the mail server, using amavis-new and clamav with postfix, for example. I would like to filter web traffic in a similar manner, using available open source tools whenever possible. The best setup would be to find a web proxy engine that allowed plugins, which would enable me to write just the scanning plugin. I don't yet know if squid (my preferred proxy) allows this. The next best thing, in my mind, is to do something like amavis does for sendmail. In that scenario, you run 2 copies of sendmail, with amavis sitting between them, such that all email passes through amavis and whatever scanners amavis is configured to use. Bringing this configuration over to my project. I envision running 2 instances of squid on different ports, telling the user-facing proxy (the one the browser is configured to use) to fetch everything from an upstream proxy. In between the two, I run my malware scanner. Remember, #3 is profit, but #2 is the hard part ;-) Here are a few random thoughts... Clamav can be run as a standalone daemon, accepting discrete files. If will also can a file or directory if you tell it to. It will give you a thumbs-up or thumbs-down, the output of which can be used to feed bad URLs to a blocklist which the internal copy of squid can make use of. I'm worried about speed problems, but the cost of piping everything though a scanner might be offset by the fact that we're running a cache, after all. Web malware can be defined as virii, spyware, cross-site scripting, bad javascript, etc.. I wouldn't know how to scan for much except viruses, but others can write plugins if we can come up with a working framework for scanning web traffic. Thanks for reading. All feedback is welcome. Help is even more welcome :-) -- Joey Kelly < Minister of the Gospel | Linux Consultant > http://joeykelly.net "I may have invented it, but Bill made it famous." --- David Bradley, the IBM employee that invented CTRL-ALT-DEL From estrabd at yahoo.com Wed Oct 6 10:33:26 2004 From: estrabd at yahoo.com (E. Strade, B.D.) Date: Wed Oct 6 10:33:35 2004 Subject: [Neworleans-pm] Fwd: Perl 'Expert' Quiz of the Week #25 (acrostic puzzle generator) Message-ID: <1097076806.9359.205895153@webmail.messagingengine.com> ===== http://www.brettsbsd.net/~estrabd __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- Original message ----- From: "Mark Jason Dominus" To: perl-qotw@plover.com Date: Wed, 06 Oct 2004 11:22:29 -0400 Subject: Perl 'Expert' Quiz of the Week #25 (acrostic puzzle generator) IMPORTANT: Please do not post solutions, hints, or other spoilers until at least 60 hours after the date of this message. Thanks. IMPORTANT: S'il vous pla?t, attendez au minimum 60 heures apr?s la date de ce message avant de poster solutions, indices ou autres r?v?lations. Merci. Qing3 Zhu4Yi4: Qing3 Ning2 Deng3Dao4 Jie1Dao4 Ben3 Xin4Xi2 Zhi1Hou4 60 Xiao3Shi2, Zai4 Fa1Biao3 Jie3Da2, Ti2Shi4, Huo4 Qi2Ta1 Hui4 Xie4Lou4 Da2An4 De5 Jian4Yi4. Xie4Xie4. UWAGA: Prosimy nie publikowac rozwiazan, dodatkowych badz pomocniczych informacjii przez co najmniej 60 godzin od daty tej wiadomosci. Dziekuje. ---------------------------------------------------------------- You will write a program to generate double-acrostic puzzles. A double-acrostic puzzle is a little like a crossword puzzle, except that the words don't cross. The goal of the puzzle is to determine the contents of a secret quotation. The solver receives a list of crossword-style clues. The solution to each clue is a word or a short phrase. Each letter in a clue answer is transferred to corresponding labeled spaces in a grid. When all the spaces in the grid are filled in with the correct letters, the grid will contain the secret quotation. At each point in the solution process, the solver can work forwards, using the clue answers to determine letters in the quotation, or backwards, completing partial words in the quotation and then transferring the inferred letters back to the clue answers. Here's a very small example: __ __ __ __ __ __ __ __ __ __ __ __ __ 1d 2f 3g 4d 5b 6c 7g 8a 9f 10e 11g 12d 13a __ __ __ __ __ __ __ __ __ __ __ __ __ __ 14f 15a 16e 17c 18b 19b 20a 21e 20f 23a 24e 25g 26c 27d __ __ __ __ __ __ __ __ __ __ __ __ __ 28f 29g 30b 31a 32 33b 34L 35e 36f 37e 38a 39d 40c ---------------------------------------------------------------- a. __ __ __ __ __ __ Unorthodox belief 15 23 38 8 13 31 b. __ __ __ __ __ __ Where the sun comes up 19 18 33 30 5 22 c. __ __ __ __ __ __ __ A new twist 40 24 6 26 34 32 17 d. __ __ __ __ __ Check the records 39 4 27 12 1 e. __ __ __ __ __ __ Strive against 24 35 21 16 10 37 f. __ __ __ __ __ __ A score and half 28 2 36 9 14 20 g. __ __ __ __ __ Followed by a ho? 29 11 25 7 3 Suppose you guess that the answer to clue c, "A new twist", is "upgrade". You would fill in "UPGRADE" into the seven blanks following clue c, and then transfer the letters to the corresponding spaces in the quotation grid, "U" into space 40, "P" into space 24, and so on up to "E" in space 17. At this point the word at spaces 38-39-40 in the main grid is "__U", and there are very few three-letter words in English that end in 'U'. The most common is "YOU", so perhaps you tentatively guess that 38-39-40 is "YOU". The notation "a" below space 38 means that that letter is found in the answer to clue a, so you can fill in "Y" in the third space of "a", and similarly, the first letter of d would be "O". There's a third source of information in these puzzles: The initial letters of the clue answers form the name of the author of the quotation, and sometimes the title of its source. In this example, letters 15-19-40-39-24-38-29 spell the last name of the author. Typical examples are larger than this. The quotation is usually between 100 and 200 letters long, and there are typically 20-30 clues. Write a program, "acrostic2", which generates a double acrostic puzzle, given a quotation, an author credit, and a dictionary file: acrostic2 quotefile dictionary The quote file will contain the quotation, followed by a blank line, followed by the author or source credit. The program should remove all nonalphabetic characters from the quotation and source credit. Upper and lowercase letters are considered equivalent. The dictionary should contain one word per line. Again upper and lower case letters are equivalent. The program should output a list of clue answer values, one per line, such that: 0. No two clue answers are the same 1. Each clue answer is a dictionary entry, 2. There are exactly as many clue answers as there are alphabetic characters in the source credit, 3. Each clue answer begins with the corresponding letter of the source credit, and 4. The letters in the output can be rearranged to form the quotation, with no letters left over. The program is not responsible for assigning the letters to the grid or for formatting the puzzle, because coming up with the answer list is the hard part. If you do decide to have your program emit completed puzzles, please do not make this the default behavior; have it be enabled by a command-line option. (Note that there is a quality-of-implementation issue here: the letters from any one or two clue answers should be scattered around the grid, and not clustered together into just a few words.) [ ADMIN: Dan Sanderson assures me that he is working on writing up sample solutions for the RPN calculator quiz from last week. The report will be along when he finished it. -- MJD ] From estrabd at yahoo.com Wed Oct 6 16:58:47 2004 From: estrabd at yahoo.com (E. Strade, B.D.) Date: Wed Oct 6 16:58:59 2004 Subject: [Neworleans-pm] Fwd: Re: Perl 'Expert' Quiz of the Week #25 (acrostic puzzle generator) Message-ID: <1097099927.11294.205924436@webmail.messagingengine.com> ===== http://www.brettsbsd.net/~estrabd __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- Original message ----- From: "Mark Jason Dominus" To: perl-qotw@plover.com Date: Wed, 06 Oct 2004 13:22:26 -0400 Subject: Re: Perl 'Expert' Quiz of the Week #25 (acrostic puzzle generator) I wrote: > Here's a very small example: It has come to my attention that my example contained many errors, the largest of which was that although the quotation contained 40 letters, the clue answers totalled 41. This is incorrect; the quotation and the answers should contain the same number of letters. There were other errors as well. I recommend that you not try to solve the example puzzle. If I correct the example, I will post the corrected version to the perl-qotw-discuss list, which is archived on the web at these two locations: http://perl.plover.com/~alias/list.cgi/1/ http://news.gmane.org/thread.php?group=gmane.comp.lang.perl.qotw.discuss My apologies to anyone whose time was wasted. From dave at gnofn.org Fri Oct 8 14:31:28 2004 From: dave at gnofn.org (Dave Cash) Date: Fri Oct 8 15:07:10 2004 Subject: [Neworleans-pm] Meeting Tonight Message-ID: <20041008142741.Y93217@sparkie.gnofn.org> Monthly meeting at Fair Grinds is tonight, for 5pm. See http://neworleans.pm.org/ for more info. Hope to see you all there! Dave /L\_/E\_/A\_/R\_/N\_/T\_/E\_/A\_/C\_/H\_/L\_/E\_/A\_/R\_/N\ Dave Cash Power to the People! Frolicking in Fields of Garlic Right On-Line! dave@gnofn.org Dig it all. From donnie at solomonstreet.com Fri Oct 8 20:01:41 2004 From: donnie at solomonstreet.com (Donnie Cameron) Date: Fri Oct 8 20:02:25 2004 Subject: [Neworleans-pm] Meeting Tonight In-Reply-To: <20041008142741.Y93217@sparkie.gnofn.org> References: <20041008142741.Y93217@sparkie.gnofn.org> Message-ID: <41673875.1090703@solomonstreet.com> Hi All, I'm sorry I missed the meeting. I'm still in New Jersey, working on a contract. I might have to spend a few months here. Let's hope you remember me when I get back. In the mean time, I'll be watching this list, so don't say anything nasty about me. Oh, by the way, this new contract involves a lot of Perl programming. I'm getting a sense that people around here are starting to think really hard about moving away from ASP and towards Perl and even PHP. The problems I hear most often have to do with DLLs. Most of the people I've spoken to here feel more confident about deploying Web apps in Perl, even on Windows machines. Clearly my opinions are not scientific--I've haven't spoken to thousands of people or anything like that. But I find the move to Perl surprising because I thought .NET would suck up all of the ASP crowd. Particularly in this part of the country. --Donnie Dave Cash wrote: > Monthly meeting at Fair Grinds is tonight, for 5pm. See > http://neworleans.pm.org/ for more info. > > Hope to see you all there! > > Dave > > /L\_/E\_/A\_/R\_/N\_/T\_/E\_/A\_/C\_/H\_/L\_/E\_/A\_/R\_/N\ > Dave Cash Power to the People! > Frolicking in Fields of Garlic Right On-Line! > dave@gnofn.org Dig it all. > _______________________________________________ > NewOrleans-pm mailing list > NewOrleans-pm@mail.pm.org > http://mail.pm.org/mailman/listinfo/neworleans-pm > From erin at thelaines.org Fri Oct 8 22:25:44 2004 From: erin at thelaines.org (Erin Laine) Date: Fri Oct 8 22:25:50 2004 Subject: [Neworleans-pm] Sending Mail via Perl on Win32 Message-ID: <41675A38.5040405@thelaines.org> I'm developing a Perl script on NT4 that processes web form input including the collection of an email address. I want to send an email back to the user to confirm the address they entered is working. I'm using the Net::SMTP Perl package, and it works, but the response time after the user submits the form until my "thank you" page is displayed is about 30 seconds. Either the NET::SMTP package is slow or the path to my SMTP server is. I've tried the Mail::Sender package with similar results. So I'm think of having the confirmation email sent at a later time. Maybe a Perl script running as an NT service that monitors a directory for files with a certain extension. These files would be created at form processing time and would contain the info needed to send the confirmation email. If you are still following this, does this sound like a reasonable approach? Or can you recommend something that might be better? Thanks. Erin Laine ... feeling clumsy on NT From donnie at solomonstreet.com Sat Oct 9 00:23:12 2004 From: donnie at solomonstreet.com (Donnie Cameron) Date: Sat Oct 9 00:23:38 2004 Subject: [Neworleans-pm] Sending Mail via Perl on Win32 In-Reply-To: <41675A38.5040405@thelaines.org> References: <41675A38.5040405@thelaines.org> Message-ID: <416775C0.6050009@solomonstreet.com> Hi Erin, The first thing I would do is determine if other scripts take a long time to execute. Try running a CGI script that loads a couple of common packages (like the CGI package, for example), but not the NET::SMTP or Mail::Sender packages. If the script takes a long time to return a Web page, then the problem is not related to sending mail, but rather to loading and running Perl. If it turns out that your server is running Perl scripts slowly, there are a couple of things you can try to speed the scripts up significantly. I won't go into detail because I can't remember how to do it, but you can load the entire Perl interpreter into a RAM disk. Another thing you can try is getting rid of IIS and loading something like the Xitami server, which is small, fast, and easy to configure (you have to do almost nothing in terms of configuration to get it running). I did this once because IIS was running too slowly and discovered that Xitami provided a huge (20X) increase in speed. I think you need a special license for IIS in order for the HTTP server to perform adecuately. If, on the other hand, it turns out that the slowness is due to SMTP, you could try putting another SMTP server in place. Or, you could write a program that uses MAPI to send the message. There might be a Perl module out there for MAPI. If not, you could write the MAPI part in another language (such as C++, VB, or C#) and then execute it from your Perl script. MAPI is slow, but certainly not in the order of tens of seconds. --Donnie Erin Laine wrote: > I'm developing a Perl script on NT4 that processes web form input > including the collection of an email address. I want to send an email > back to the user to confirm the address they entered is working. > > I'm using the Net::SMTP Perl package, and it works, but the response > time after the user submits the form until my "thank you" page is > displayed is about 30 seconds. Either the NET::SMTP package is slow or > the path to my SMTP server is. I've tried the Mail::Sender package with > similar results. > > So I'm think of having the confirmation email sent at a later time. > Maybe a Perl script running as an NT service that monitors a directory > for files with a certain extension. These files would be created at > form processing time and would contain the info needed to send the > confirmation email. > > If you are still following this, does this sound like a reasonable > approach? Or can you recommend something that might be better? > > Thanks. > > Erin Laine > ... feeling clumsy on NT > _______________________________________________ > NewOrleans-pm mailing list > NewOrleans-pm@mail.pm.org > http://mail.pm.org/mailman/listinfo/neworleans-pm > From joey at joeykelly.net Sat Oct 9 00:53:24 2004 From: joey at joeykelly.net (Joey Kelly) Date: Sat Oct 9 00:42:14 2004 Subject: [Neworleans-pm] Sending Mail via Perl on Win32 In-Reply-To: <41675A38.5040405@thelaines.org> References: <41675A38.5040405@thelaines.org> Message-ID: <200410090053.24517.joey@joeykelly.net> > I'm using the Net::SMTP Perl package, and it works, but the response > time after the user submits the form until my "thank you" page is > displayed is about 30 seconds. Either the NET::SMTP package is slow or > the path to my SMTP server is. I've tried the Mail::Sender package with > similar results. Ugh, that sounds eerily similar to what happens if either apache or DNS isn't configged right. -- Joey Kelly < Minister of the Gospel | Linux Consultant > http://joeykelly.net "I may have invented it, but Bill made it famous." --- David Bradley, the IBM employee that invented CTRL-ALT-DEL From dave at gnofn.org Tue Oct 12 11:33:51 2004 From: dave at gnofn.org (Dave Cash) Date: Tue Oct 12 15:00:37 2004 Subject: [Neworleans-pm] New Meeting Times, Meeting Summaries, etc. Message-ID: <20041012112925.O40226@sparkie.gnofn.org> Hello, all. Sorry it took so long, but I've posted meeting summaries for the last two meetings. Please feel free to add or correct anything I may have left out or got wrong: http://neworleans.pm.org/ At the last meeting, Joey and I talked to Robert (one of the owners of Fair Grinds) and we were able to tweak our meeting time to go a little longer and run a little later. The new meeting time is 5:30pm to 8:00pm, still the second Friday of each month. And Robert said he'd reserve that block of time for us until will decide we don't want it anymore. I look forward to seeing any and all of you who can make it to the next meeting, on Friday, November 12. Take care, Dave /L\_/E\_/A\_/R\_/N\_/T\_/E\_/A\_/C\_/H\_/L\_/E\_/A\_/R\_/N\ Dave Cash Power to the People! Frolicking in Fields of Garlic Right On-Line! dave@gnofn.org Dig it all. From dave at gnofn.org Tue Oct 12 11:43:34 2004 From: dave at gnofn.org (Dave Cash) Date: Tue Oct 12 15:06:39 2004 Subject: [Neworleans-pm] Sending Mail via Perl on Win32 In-Reply-To: <416775C0.6050009@solomonstreet.com> References: <41675A38.5040405@thelaines.org> <416775C0.6050009@solomonstreet.com> Message-ID: <20041012113702.B40226@sparkie.gnofn.org> On Sat, 9 Oct 2004, Donnie Cameron wrote: > The first thing I would do is determine if other scripts take a > long time to execute. Try running a CGI script that loads a couple > of common packages (like the CGI package, for example), but not > the NET::SMTP or Mail::Sender packages. If the script takes a long > time to return a Web page, then the problem is not related to > sending mail, but rather to loading and running Perl. Erin, This is good advice. I concur with the idea of isolating the different suspects and seeing which one is doing the crawling. > If it turns out that your server is running Perl scripts slowly, > there are a couple of things you can try to speed the scripts up > significantly. I won't go into detail because I can't remember how > to do it, but you can load the entire Perl interpreter into a RAM > disk. > > Another thing you can try is getting rid of IIS and loading > something like the Xitami server, which is small, fast, and easy > to configure (you have to do almost nothing in terms of > configuration to get it running). I did this once because IIS was > running too slowly and discovered that Xitami provided a huge > (20X) increase in speed. I think you need a special license for > IIS in order for the HTTP server to perform adecuately. Another option is to use Apache for Win32. And if you still have some slowness, you could even get mod_perl running with it to help speed things up (mod_perl is super fast, but definitely a resource hog). I'd stick to Apache 1 if you decide to go this route. > If, on the other hand, it turns out that the slowness is due to > SMTP, you could try putting another SMTP server in place. Or, you > could write a program that uses MAPI to send the message. There > might be a Perl module out there for MAPI. If not, you could write > the MAPI part in another language (such as C++, VB, or C#) and > then execute it from your Perl script. MAPI is slow, but certainly > not in the order of tens of seconds. Another approach here is to just have Net::SMTP or Mail::Sender (my mail module of choice, BTW) connect to a remote SMTP server that will allow you to relay through it (such as another one on your LAN, preferably on a *nix machine) and you may see some performance improvement that way. Good luck. I'm sure we'd all love to know what the problem ends up being and what solution path you choose. Take care, Dave /L\_/E\_/A\_/R\_/N\_/T\_/E\_/A\_/C\_/H\_/L\_/E\_/A\_/R\_/N\ Dave Cash Power to the People! Frolicking in Fields of Garlic Right On-Line! dave@gnofn.org Dig it all. From estrabd at yahoo.com Thu Oct 14 11:28:35 2004 From: estrabd at yahoo.com (E. Strade, B.D.) Date: Thu Oct 14 11:29:53 2004 Subject: [Neworleans-pm] Fwd: Solutions and Discussion for Perl Quiz of the Week #25 (RPN calculator) Message-ID: <1097771315.23135.206474000@webmail.messagingengine.com> ===== http://www.brettsbsd.net/~estrabd __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- Original message ----- From: "Dan Sanderson" To: perl-qotw@plover.com Date: Thu, 14 Oct 2004 10:33:37 -0400 Subject: Solutions and Discussion for Perl Quiz of the Week #25 (RPN calculator) Sample solutions and discussion Perl Quiz of The Week #25 (20040928) (The quiz question is archived at http://perl.plover.com/qotw/r/025 .) == Posted Solutions As of October 3, 2004, twelve people submitted solutions to this quiz. Nine used Perl, one used Python and two used Ruby. Thanks to everyone who participated in this week's quiz. Rod Adams http://perl.plover.com/~alias/list.cgi?1:mss:2295 Kester Allen http://perl.plover.com/~alias/list.cgi?1:mss:2297 Rich Bishop http://perl.plover.com/~alias/list.cgi?1:mss:2292 Dan Boger http://perl.plover.com/~alias/list.cgi?1:mss:2302 Roger Burton West http://perl.plover.com/~alias/list.cgi?1:msp:2285 Michael Carman http://perl.plover.com/~alias/list.cgi?1:msp:2288 http://perl.plover.com/~alias/list.cgi?1:mss:2294 Andrew Dalke (Python) http://perl.plover.com/~alias/list.cgi?1:mss:2300 Jon Ericson http://perl.plover.com/~alias/list.cgi?1:msp:2294 James Edward Gray II (Ruby) http://perl.plover.com/~alias/list.cgi?1:msp:2284 Zed Lopez http://perl.plover.com/~alias/list.cgi?1:msp:2297 Alex Smolianinov http://perl.plover.com/~alias/list.cgi?1:mss:2304 http://perl.plover.com/~alias/list.cgi?1:mss:2317 Mike Stok (Ruby) http://perl.plover.com/~alias/list.cgi?1:mss:2303 I tried to get every solution working. A few I couldn't get to run, and a few more could not do everything listed in the original problem, but most solutions did quite well and bugs seemed minor. The following test script exercised the basic features (without causing error conditions): 3 5 + 2 * 2 3 + / 5 - 10 3 % 2 8 ** drop 100 swap clear 100 dup 200 dup + + + Many solutions implemented extensions to the basic problem, including some or all of the suggestions listed with the problem text, and many good original ideas. I did not test every feature of every implementation, but you should give them a try, they're neat. As this was intended to be a "regular" quiz, I will walk through a simple a simple Perl implementation. We can discuss fancier topics in perl-qotw-discuss. == Prompting For and Processing Input A fancy prompt was not actually a requirement of the quiz, but it's easy to include one using the Perl module Term::Readline. The following is an example of setting up a prompt and looping over input, one line at a time, until the user hits Control-D (the End Of File character): use Term::Readline; my $term = new Term::ReadLine 'RPN calculator'; my $prompt = '> '; # ... while (defined(my $line = $term->readline($prompt))) { # ... } exit(0); Without a fancy prompt, we could just read lines of input from STDIN the usual way: while (defined(my $line = )) { # ... } The calculator processes input one term at a time, and there can be multiple terms on a line of input. We can split the line into a list of terms using the split operator and a regular expression that matches white space: for my $term (split(/\s+/, $line)) { # ... } == The Stack Each term will be either a number or an operation. If the term is a number, it gets pushed onto a stack. If the term is an operation, the operation executes, possibly using or modifying the contents of the stack. Perl arrays can behave as stacks easily, using Perl's push and pop operators. The calculator only needs one stack: my @stack; After each line is processed, the stack is printed to the terminal. The problem text showed the stack being printed from "bottom" to "top," one element per line. Each line included a number representing the stack level, where 0 is the top. This display makes it easy to use stack manipulation functions that take a stack level number as a parameter, like the suggested "roll" operation. If we're using Perl's push and pop operators, we must remember that the top of the stack is the last element in the @stack array. The following for() loop displays the elements in the proper order, with the correct numbering: for (my $i = scalar(@stack); $i > 0; --$i) { print $i-1, ": ", $stack[$i-1], "\n"; } == Processing Terms The calculator needs to do something different if a term matches an operation, or if it matches a number. If it matches neither a number or a known operation, it is an error. A simple way to do something different for each case would be an if/elsif/else structure: if ($term =~ / (a pattern that matches a number) /) { # Push the number onto the stack. # ... } elsif ($term eq '+') { # Execute the plus (+) operation. # ... } elsif ($term eq '-') { # Execute the minus (-) operation. # ... # ... } else { print "Invalid term: $term\n"; } We can construct a regular expression that matches a number by deciding what makes a number. For instance, a number might begin with a minus sign (if it is negative), then have zero or more digits, then optionally have a decimal point and one or more digits. Such a regular expression might look like this: if ($term =~ /^-?\d*(\.\d+)?$/) { Several posted solutions used the Regexp::Common module to provide the pattern that matches a number. The patterns included in this module are thorough, well tested, and cover many more cases that the pattern above forgot. The module includes patterns for numbers, dates, delimited strings, balanced parentheses, profane language, and much more. use Regexp::Common; # ... if ($term =~ /^$RE{num}{real}$/) { If the term is a number, push it onto the stack. For this example, we'll assume that the term is something Perl understands as a number, so we can treat the $term value as a number without further manipulation. Only numbers are allowed on our stack, so we simply need to push it: push @stack, $term; == Arithmetic Operations If the term is an operation, the calculator performs the operation using values on the stack. The basic problem description listed six arithmetic operations, each of which takes two values and results in one value: + - * / % ** Here is a simple implementation of the plus (+) operation, which pops two values off the top of the stack, adds them together, and pushes the result on the stack: } elsif ($term eq '+') { my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; The remaining arithmetic operations could be written similarly. Notice that for some operations, the order of terms is important. The problem text says "20 5 /" results in 20 divided by 5, which equals 4. This means the division operation will pop the 5 first ($a) then the 20 ($b), so it must return $b / $a to produce the correct result. == Stack Manipulation Operations Here are possible implementations for the four stack manipulation operations mentioned in the problem: } elsif ($term eq 'drop') { pop @stack; } elsif ($term eq 'swap') { my $a = pop @stack; my $b = pop @stack; push @stack, $a, $b; } elsif ($term eq 'clear') { @stack = (); } elsif ($term eq 'dup') { push @stack, $stack[$#stack]; Notice how this implementation of "dup" gets the value of the last element in the array (the top of the stack) without popping it. == Errors and Edge Cases The problem text stated that if a term or operation causes an error, processing of a line stops, allowing the user to correct the error without further operations on the line causing undesirable damage to the contents of the stack. So far, we have code that detects one error case, when the term is neither a number nor a valid operation. We want it to both print an error and stop processing the line. Since we're processing terms in a for loop, Perl's "last" operator will exit out of the loop without processing additional terms. } else { print "Invalid term: $term\n"; last; } Another error case might be using an operation when there aren't enough elements on the stack. Using the + operation when there is only one item doesn't make sense, and is probably an error the user wants to correct before proceeding. In this implementation, we can use "last" again when an error is encountered: } elsif ($term eq '+') { if (scalar(@stack) < 2) { print "Not enough elements on stack\n"; last; } my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; Some operations have specific error cases. For instance, "99 0 /" would cause Perl to attempt to divide by zero, which would result in termination of the program. We may want the division operation to catch this case as an error and continue to run: } elsif ($term eq '/') { if (scalar(@stack) < 2) { print "Not enough elements on stack\n"; last; } if ($stack[$#stack] == 0) { print "Division by zero attempted\n"; last; } my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; There are other edge cases we may want to consider to make the calculator more useful. For example, we may want a blank line to simply re-display the prompt, instead of complaining. == Other Ways: A Dispatch Table The implementation above uses an if/elsif/else structure to check if $term is equal to one of the strings that represents an operation. For each term, each operation is compared until a match is found. Because the term has to match an exact string, we could use a hash table instead of a bunch of conditions to look up the operation to perform. This could be faster, and may make the source code easier to read if there are many possible operations. We could even write a way for other modules or plugins to add operations to the calculator simply by modifying the hash table. A hash table can refer to operations using subroutine references. For instance: sub plus { my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; } %operations = ( '+' => \&plus, # ... ); If no other part of the program is going to use the &plus subroutine, we can write this as an anonymous subroutine: %operations = ( '+' => sub { # ... my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; }, # ... ); We can then replace all the operations in our if/elsif/else structure with: } elsif (exists($operations{$term})) { $operations{$term}->(); Now that the code for executing the plus operation is in an anonymous subroutine, the routine can no longer use "last" to stop processing of terms if there is an error. We need a different way for subroutines to report errors to the main loop. The subroutines could simply return true if successful or false if there was an error, or it could return the empty string ("") on success and an error message if there was an error. A better way might be to use Perl exception handling. The subroutine could call "die" with a message if there was an error. The code calling the subroutine could be wrapped in an "eval" block, which executes the code and stores the message of any "die" call in the $@ special variable. '/' => sub { die "Not enough elements on stack" if (scalar(@stack) < 2); die "Division by zero attempted" if $stack[$#stack] == 0; my $a = pop @stack; my $b = pop @stack; push @stack, $b + $a; }, # ... } elsif (exists($operations{$term})) { eval { $operations{$term}->(); } if ($@) { print $@; last; } A nice side-effect of using exception handling is we no longer need to worry about calculation errors that would cause Perl to die. Specifically, we do not need to check if the / operator would cause division by zero, because if it did, Perl would throw an exception that would get caught in the eval block, and an appropriate error message would be printed. == Other Ways: Re-using Code Our operation routines are getting rather long, repeating common tasks like checking the number of elements on the stack, popping the appropriate number of items, and so forth. The posted solutions used a bunch of different techniques to make operation definitions more succinct. We could have a subroutine that checks that there are at least two elements on the stack, reports an error if there aren't, and otherwise pops and returns the two elements: sub pop2 { die "Not enough values on stack\n" if scalar(@stack) < 2; return (pop @stack, pop @stack); } Then our six arithmetic operations could be defined as: '+' => sub { my ($a, $b) = &pop2; push @stack, $b + $a; }, '-' => sub { my ($a, $b) = &pop2; push @stack, $b - $a; }, '*' => sub { my ($a, $b) = &pop2; push @stack, $b * $a; }, '/' => sub { my ($a, $b) = &pop2; push @stack, $b / $a; }, '%' => sub { my ($a, $b) = &pop2; push @stack, $b % $a; }, '**' => sub { my ($a, $b) = &pop2; push @stack, $b ** $a; }, We could go further, and define a subroutine that evaluates any binary operation that can be evaluated in Perl using infix notation: sub do_binary { my $op = (@_); die "Not enough values on stack\n" if scalar(@stack) < 2; my ($a, $b) = (pop @stack, pop @stack); push @stack, eval "$b $op $a"; } # ... '+' => sub { do_binary('+'); }, # ... == Extensions An extension suggested by the problem text involved additional Perl builtin operations, some of which were infix binary operations ($b & $a) while others were prefix unary (cos $a) or prefix binary (atan2 $b, $a). If succinct dispatch code was used, additional supporting code may be required. Another involved entering numbers of different bases, supporting hexadecimal (0xABCD), binary (0b11010), and octal (0762). These forms are understood by Perl as numbers, though strings containing these would need to be eval'd into numeric form before being pushed onto the stack (otherwise they remain strings). The pattern match for numbers would also need to be changed to accept the new forms. The problem text suggested different display modes for the stack, controlled by operations (that otherwise did not affect the stack contents). A state variable, affected by the "dec", "bin", "oct" and "hex" operations, could control the stack printing routine. "roll" and "rolld" could be implemented using Perl's splice operator. == Conclusion The posted solutions had a variety of strategies for re-using code, and a bunch of good ideas for extensions. They're all worth checking out. Thanks again to those who submitted solutions. CPAN has several modules regarding RPN calculators. Math::RPN implements an rpn() routine that takes one or more terms in a comma-delimited string, and returns the result, supporting several dozen operations. Parse::RPN is similar to Math::RPN, with many more operations supported. Tk::Calculator::RPN::HP is a Perl/Tk widget that implements an RPN calculator based on many of the Hewlett-Packard calculators. http://search.cpan.org/~fdulau/Parse-RPN-2.7/RPN.pm http://search.cpan.org/~owen/Math-RPN-1.08/RPN/RPN.pm http://search.cpan.org/~lusol/Tk-Calculator-RPN-HP-0.6/Tk/Calculator/RPN/HP.pm The interactive terminal-based RPN calculator I've been using for a while is called "vc", and is written in Perl. It includes vector math, undo/redo, variables and macro programmability. http://perl.foundries.sourceforge.net/article.pl?sid=02/10/19/0132232&mode=thread&tid=167 -- Dan From estrabd at yahoo.com Thu Oct 14 18:02:13 2004 From: estrabd at yahoo.com (E. Strade, B.D.) Date: Thu Oct 14 18:02:16 2004 Subject: [Neworleans-pm] Fwd: Perl Quiz of the Week #26 (Acrostic puzzle formatter) Message-ID: <1097794933.7340.206500487@webmail.messagingengine.com> ===== http://www.brettsbsd.net/~estrabd __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- Original message ----- From: "Mark Jason Dominus" To: perl-qotw@plover.com Date: Thu, 14 Oct 2004 16:49:48 -0400 Subject: Perl Quiz of the Week #26 (Acrostic puzzle formatter) IMPORTANT: Please do not post solutions, hints, or other spoilers until at least 60 hours after the date of this message. Thanks. IMPORTANT: S'il vous pla?t, attendez au minimum 60 heures apr?s la date de ce message avant de poster solutions, indices ou autres r?v?lations. Merci. BELANGRIJK: Stuur aub geen oplossingen, hints of andere tips in de eerste 60 uur na het verzendingstijdstip van dit bericht. Waarvoor dank. VNIMANIE: Pozhalujsta ne shlite reshenija, nameki na reshenija, i voobshe lyubye podskazki v techenie po krajnej mere 60 chasov ot daty etogo soobshenija. Spasibo. Qing3 Zhu4Yi4: Qing3 Ning2 Deng3Dao4 Jie1Dao4 Ben3 Xin4Xi2 Zhi1Hou4 60 Xiao3Shi2, Zai4 Fa1Biao3 Jie3Da2, Ti2Shi4, Huo4 Qi2Ta1 Hui4 Xie4Lou4 Da2An4 De5 Jian4Yi4. Xie4Xie4. ---------------------------------------------------------------- Last week I asked folks to write a program to generate acrostic puzzles, given a quotation and a source credit. (See http://perl.plover.com/qotw/e/025 for complete details.) Here's an example puzzle: ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 1G 2A 3C 4A 5B 6F 7C 8G 9E 10D 11G 12B 13E ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 14D 15G 16F 17E 18B 19B 20E 21D 22F 23A 24G 25C 26F 27G ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ ___ 28A 29F 30B 31G 32A 33G 34E 35D 36C 37B 38A 39D 40C A. ___ ___ ___ ___ ___ ___ Fly headlong 2 4 38 28 32 23 B. ___ ___ ___ ___ ___ ___ Where the sun comes up 19 18 12 30 5 37 C. ___ ___ ___ ___ ___ Relinquish 40 25 36 7 3 D. ___ ___ ___ ___ ___ Valuable property 39 10 21 35 14 E. ___ ___ ___ ___ ___ Dangerous 9 17 13 34 20 F. ___ ___ ___ ___ ___ Formerly yours 22 29 6 26 16 G. ___ ___ ___ ___ ___ ___ ___ ___ Why you're like your dad 15 8 24 11 27 33 1 31 There are two parts. The top part is a quotation; the bottom part is a list of clues and answer words. To solve the puzzle, you guess an answer word based on the clue and fill in its letters. You then transfer each letter to the corrspondingly-numbered space in the quotation. This may give you enough information to guess one of the words in the quotation; you can then transfer letters from the quotation back to the correspondingly-numbered spaces in the answer words. (For example, suppose you guess that the second word in the quotation is "BRAISING". Then you put an "R" into space 5B, and also into space 5 of answer word B in the bottom section.) The goal is to find the entire quotation. This type of puzzle is called an "acrostic" puzzle because the initial letters of the answer words spell the name of the author of the quotation, or the title of the source. Actually the puzzle is above not exactly the example I posted last week, because the example I posted contained several errors. I later posted a correction, which *also* contained several errors. Ouch! Although it is straightforward in principle to take a quotation and an appropriate list of answer words and to construct the puzzle, it turned out to be a lot trickier to actually do this than I thought it would be. I kept getting the letters confused. I also inserted several typos. Clearly, this is a task that is well-suited for a computer. This week, you'll write a program to format and print puzzles. The input to the program will be a file with two sections. The first section will be the quotation, and will be followed by a single blank line, and then the second section, which will be the list of answer words, one per line. A sample input: Paul Gray ... has always edited and improved my writings ... In return, I never mention his name unless somebody points out an error ... in which case I always say, "Paul Gray told me that." Ravish Organdy Bastion Earthy Rescind Transit Elysian Midspan Avow Company Hearty Odious Limp Lignum Eastern Rumple Nowise Enough Reliant Study Lawyer Atheism Wade Your program will emit an output analogous to the one at the beginning of this message. The letters in the quotation should be replaced by blanks; punctuation should be deleted. The blanks should be numbered consecutively, and each blank should be labeled with the corresponding letter of the answer word in which its letter appears. The bottom section of your puzzle should have blanks of the appropriate length for each answer word, and the blanks should be labeled with the appropriate numbers. Each number should appear exactly once in the upper section and once in the lower section. Each upper-section space should have the correct letter. Your program does not have to generate the clues themselves; those will be added later by a human. That is, it only should generate the G. ___ ___ ___ ___ ___ ___ ___ ___ 15 8 24 11 27 33 1 31 part of the answer section; it does not need to come up with "Why you're like your dad". An important quality factor is that the letters of an answer word should not be clumped together in the quotation; they should be spread out to appear in many different words in the quotation, and vice versa. If there are more than 26 answer words, it is traditional to label them "AA", "BB", and so on. The example quotation above is an abridged version of the following: Much of what I write was suggested by Paul Gray, and in all cases, he has always edited and improved my writings before they are published. In return, I never mention his name unless somebody points out an error in one of my publications, in which case I always say, "Paul Gray told me that." -- Robert E. Machol, "Lerner's Law". _OR/MS Today_, 1998 "Lerner's Law", according to Dr. Machol, is "no good deed goes unpunished." From estrabd at yahoo.com Thu Oct 21 09:56:38 2004 From: estrabd at yahoo.com (E. Strade, B.D.) Date: Thu Oct 21 09:56:47 2004 Subject: [Neworleans-pm] Fwd: Perl 'Expert' Quiz of the Week #26 (Tk roller coaster simulation) Message-ID: <1098370598.19450.206968490@webmail.messagingengine.com> ===== http://www.brettsbsd.net/~estrabd __________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com ----- Original message ----- From: "Xavier Noria" To: perl-qotw@plover.com Date: Thu, 21 Oct 2004 08:09:53 -0400 Subject: Perl 'Expert' Quiz of the Week #26 (Tk roller coaster simulation) IMPORTANT: Please do not post solutions, hints, or other spoilers until at least 60 hours after the date of this message. Thanks. IMPORTANTE: Por favor, no envi?is soluciones, pistas, o cualquier otra cosa que pueda echar a perder la resoluci?n del problema hasta que hayan pasado por lo menos 60 horas desde el env?o de este mensaje. Gracias. BELANGRIJK: Stuur aub geen oplossingen, hints of andere tips in de eerste 60 uur na het verzendingstijdstip van dit bericht. Waarvoor dank. UWAGA: Prosimy nie publikowac rozwiazan, dodatkowych badz pomocniczych informacjii przez co najmniej 60 godzin od daty tej wiadomosci. Dziekuje. ---------------------------------------------------------------- You will write a program that simulates a two-dimensional roller coaster. A 2-D roller coaster track is given by stdin as a sequence of scaled line segments: ... 153.167706326906 170.818594853651 152.705297542038 170.778692088403 152.230255082701 170.621111633323 151.772089241241 170.340303566736 ... All numbers are positive floats. The first coordinate (x) is between 0 and 600, and the second coordinate (y) is between 0 and 400. In principle each 10 in the input file means 1 meter, so the roller coasters described have a maximum height of 40 meters, but feel free to play with the scale. There are three examples here: http://perl.plover.com/qotw/misc/e026/quad.rc http://perl.plover.com/qotw/misc/e026/loops.rc http://perl.plover.com/qotw/misc/e026/huricane.rc Write a Tk application that draws the track in a Tk::Canvas and simulates the run of a car of mass m left to the effect of gravity at the start. We assume there's no friction. The run ends either when some extreme of the track is reached, or the car is stopped in a balanced position. A track may make the car to go backwards at some point. For instance, a possible solution would run loops.rc more or less like this: http://perl.plover.com/qotw/misc/e026/loops.mov Since physics is not the topic of QOTW here's a summary of an approach to the problem in case you don't want to work that part out yourself. Let's suppose that the car starts motionless at (50, 60) and slides down a slope to (20, 20). Its potential energy at the top is m*g*60, where m is the mass of the car, and g is a constant that represents the force of gravity. At the bottom, the potential energy is only m*g*20. The law of conservation of energy says that the extra m*g*40 has to go somewhere, and in fact it has turned into kinetic energy, which is the energy of the speed of the car. Physics says that the kinetic energy (k) of the car is m* v**2 / 2, where m is the mass of the car and v is the velocity. So we have m*g*20 = m*v**2/2, so v = sqrt(40g) at the bottom. The average velocity over the entire slope is half this, or sqrt(10g). Since the length of the slope is 50 units, the car takes 50/sqrt(10g) time units to slide down the slope. How fast it actually slides depends on g, the force of gravity; if g is large it will slide faster, and if g is zero there is no gravity, so it won't slide down at all; it will stay at the top forever. The gravity constant g can be built into your program or supplied via a command-line option. In summary: p = m*g*y k = m*(v**2)/2 p = k v(average) = (v(start) + v(end))/2; t * v(average) = l y = height above the ground at a particular time p = potential energy at that time m = mass of the car g = gravity constant k = kinetic energy at a particular time v = velocity of the car at that time v(start) = velocity of the car at the start of a segment of track v(end) = velocity of the car at the end of the segment v(average) = average velocity of the car over the segment l = length of the segment t = time to traverse the segment To get a more accurate simulation, your program can divide each straight segment of track into several smaller straight segments and simulate each one separately. From jkeen at verizon.net Mon Oct 25 21:47:56 2004 From: jkeen at verizon.net (James Keenan) Date: Mon Oct 25 21:47:26 2004 Subject: [Neworleans-pm] Visiting New Orleans Message-ID: <710672B0-26F9-11D9-8B65-000D932B9CD4@verizon.net> Friends: I have been lurking on this list for a couple of months. I am planning a trip to New Orleans in December to visit friends. I have some flexibility in my scheduling and am interested in knowing if you will be holding your December meeting on Friday the 10th as listed on your web page. If so, I would like to attend your meeting. If your meeting organizer would like to respond off list, that would be fine. (I separately e-mailed speakers@neworleans.pm.org but haven't gotten a response.) Thank you very much. Jim Keenan Brooklyn, NY jkeenan [at] cpan [dot] org From EmailLists at SimonDorfman.com Mon Oct 25 23:12:58 2004 From: EmailLists at SimonDorfman.com (Simon Dorfman) Date: Mon Oct 25 23:13:06 2004 Subject: [Neworleans-pm] Perl optimization article Message-ID: http://www-106.ibm.com/developerworks/library/l-optperl.html Optimize Perl Squeeze the most from your code Level: Intermediate Martin C. Brown (questions@mcslp.com) Freelance writer and consultant 19 Oct 2004 Perl is an incredibly flexible language, but its ease of use can lead to some sloppy and lazy programming habits. We're all guilty of them, but there are some quick steps you can take to improve the performance of your Perl applications. In this article, I'll look at the key areas of optimization, which solutions work and which don't, and how to continue to build and extend your applications with optimization and speed in mind. Sloppy programming, sloppy performance I'll be honest: I love Perl and I use it everywhere. I've written Web sites, administration scripts, and games using Perl. I frequently save time by getting Perl to do and check things automatically for me, everything from my lottery numbers to the stock markets, and I even use it to automatically file my e-mail. Because Perl makes it so easy to do all of these things, there's a tendency to forget about optimization. In many cases this isn't the end of the world. So what if it takes an extra few milliseconds to look up your stock reports or parse those log files? However, those same lazy habits that cost milliseconds in a small application are multiplied when dealing with larger scale development projects. It's the one area where the Perl mantra of TMTOWTDI (There's More Than One Way To Do It) starts to look like a bad plan. If you need speed, there may be only one or two ways to achieve the fastest results, whereas there are many slower alternatives. Ultimately, sloppy programming -- even if you achieve the desired result -- is going to result in sloppy performance. So, in this article I'm going to look at some of the key techniques you can use to squeeze those extra cycles out of your Perl application. Approaching optimization First of all, it's worth remembering that Perl is a compiled language. The source code you write is compiled on the fly into the bytecode that is executed. The bytecode is itself based on a range of instructions, all of which are written in a highly optimized form of C. However, even within these instructions, some operations that can achieve similar results are more highly optimized than others. Overall, this means that it's the combination of the logic sequence you use and the bytecode that is generated from this that ultimately affects performance. The differences between certain similar operations can be drastic. Consider the code in Listings 1 and 2. Both create a concatenated string, one through ordinary concatenation and the other through generating an array and concatenating it with join. Listing 1. Concatenating a string, version 1 my $string = 'abcdefghijklmnopqrstuvwxyz'; my $concat = ''; foreach my $count (1..999999) { $concat .= $string; } Listing 2. Concatenating a string, version 2 my $string = 'abcdefghijklmnopqrstuvwxyz'; my @concat; foreach my $count (1..999999) { push @concat,$string; } my $concat = join('',@concat); Running Listing 1, I get a time of 1.765 seconds, whereas Listing 2 requires 5.244 seconds. Both generate a string, so what's taking up the time? Conventional wisdom (including that of the Perl team) would say that concatenating a string is a time-expensive process, because we have to extend the memory allocation for the variable and then copy the string and its addition into the new variable. Conversely, adding a string to an array should be relatively easy. We also have the added problem of duplicating the string concatenation using join(), which adds an extra second. The problem, in this instance, is that push()-ing strings onto an array is time-intensive; first of all, we have a function call (which means pushing items onto a stack, and then taking them off), and we also have the additional array management overhead. In contrast, concatenating a string is pretty much just a case of running a single opcode to append a string variable to an existing string variable. Even if we set the array size to alleviate the overhead (using $#concat = 999999), we still only save another second. The above is an extreme example, and there are times when using an array will be much quicker than using strings; a good example here is if you need to reuse a particular sequence but with an alternate order or different interstitial character. Arrays are also useful, of course, if you want to rearrange or reorder the contents. By the way, in this example, an even quicker way of producing a string that repeats the alphabet 999,999 times would be to use: $concat = 999999 x 'abcdefghijklmnopqrstuvwxyz'; Individually, many of the techniques covered here won't make a huge difference, but combined in one application, you could shave a few hundred milliseconds, or even seconds, off of your Perl applications. Use references If you work with large arrays or hashes and use them as arguments to functions, use a reference instead of the variable directly. By using a reference, you tell the function to point to the information. Without a reference, you copy the entire array or hash onto the function call stack, and then copy it again in the function. References also save memory (which reduces footprint and management overheads) and simplify your programming. String handling If you are using static strings in your application a lot -- for example, in a Web application -- remember to use single quotes rather than doubles. Double quotes force Perl to look for a potential interpolation of information, which adds to the overhead of printing out the string: print 'A string','another string',"\n"; I've also used commas to separate arguments rather than using a period to concatenate the string first. This simplifies the process; print simply sends each argument to the output file. Concatenation would concatenate the string and print it as one argument. Loops As you've already seen, function calls with arguments are expensive, because for the function call to work, Perl has to put the arguments onto the call stack, call the function, and then receive the responses back through the stack again. All of this requires overhead and processing that we could probably do without. For this reason, excessive function calls in a loop are generally a bad idea. Again, it comes down to a comparison of numbers. Looping through 1,000 items and passing information to a function will trigger the function call 1,000 times. To get around this, I just switch the sequence around. Instead of using the format in Listing 3, I use the approach in Listing 4. Listing 3. Loop calling functions foreach my $item (keys %{$values}) { $values->{$item}->{result} = calculate($values->{$item}); } sub calculate { my ($item) = @_; return ($item->{adda}+$item->{addb}); } Listing 4. Function using loops calculate_list($values); sub calculate_list { my ($list) = @_; foreach my $item (keys %{$values}) { $values->{$item}->{result} = ($item->{adda}+$item->{addb}); } } Better still, in a simple calculation like this one or for any straightforward loop work, use map: map { $values->{$_}->{result} = $values->{$_}->{adda}+$values->{$_}->{addb} } keys %{$values}; Remember also that each iteration through the loop wastes time, so rather than working through the same loop a number of times, try to perform all the actions in one pass through the loop. Sorts Another common operation related to loops is sorting information, particularly keys in a hash. It's tempting in this instance to embed some processing of list elements into the sort operation, such as the one shown here in Listing 5. Listing 5. Bad sorting my @marksorted = sort {sprintf('%s%s%s', $marked_items->{$b}->{'upddate'}, $marked_items->{$b}->{'updtime'}, $marked_items->{$a}->{itemid}) <=> sprintf('%s%s%s', $marked_items->{$a}->{'upddate'}, $marked_items->{$a}->{'updtime'}, $marked_items->{$a}->{itemid}) } keys %{$marked_items}; This is a fairly typical sort of complex data, in this case ordering something by date, time, and ID number by concatenating the numbers into a single number that we can then sort numerically. The problem is that the sort works through the list of items and moves them up or down through the list based on the comparison operation. In effect, this is a type of loop, but unlike the loop examples we've already seen, a sprintf call has to be made for each comparison. That's at least twice for each iteration, and the exact number of iterations through the list will depend how ordered it was to begin with. For example, with a 10,000-item list you could expect to call sprintf over 240,000 times. The solution is to create a list that contains the sort information, and generate the sort field information just once. Taking the sample in Listing 5 as a guide, I'd rewrite that fragment into something like the code in Listing 6. Listing 6. Better sorting map { $marked_items->{$_}->{sort} = sprintf('%s%s%s', $marked_items->{$_}->{'upddate'}, $marked_items->{$_}->{'updtime'}, $marked_items->{$_}->{itemid}) } keys %{$marked_items}; my @marksorted = sort { $marked_items->{$b}->{sort} <=> $marked_items->{$a}->{sort} } keys %{$marked_items}; Instead of calling sprintf all those times, we call it just once for each item in the hash in order to generate a sort field in the hash, and then use that sort field directly during the sort. The sorting process only has to access the sort field's value. You have cut down the calls on that 10,000-item hash from 240,000 to just 10,000. It depends on what you are doing in that sort section originally, but it's possible to save as much as half the time it would take using the method shown in Listing 6. If you produce these hashes through results from a database query -- through MySQL or similar -- using sorting within the query and then recording the order as you build the hash, you won't need to iterate over the information again. Using short circuit logic Related to the sort operation is how to work through a list of alternative values. Using many if statements can be incredibly time consuming. For example, look at the code in Listing 7. Listing 7. Making a choice if ($userchoice > 0) { $realchoice = $userchoice; } elsif ($systemchoice > 0) { $realchoice = $systemchoice; } else { $realchoice = $defaultchoice; } Aside from the waste of space in terms of sheer content, there are a couple of problems with this structure. From a programming perspective, it has the issue that it never checks if any of the variables have a valid value, a fact that would be highlighted if warnings were switched on. Second, it has to check each option until it gets to the one it wants, which is wasteful, as comparison operations (particularly on strings) are time consuming. Both problems can be solved by using short circuit logic. If you use the logical || operator, Perl will use the first true value it comes across, in order, from left to right. The moment it finds a valid value, it doesn't bother processing any of the other values. In addition, because Perl is looking for a true value, it also ignores undefined values without complaining about them. So we can rewrite the above into a single line: $realchoice = $userchoice || $systemchoice || $defaultchoice; If $userchoice is a true value, Perl doesn't even look at the other variables. If $userchoice is false (see Table 1), then Perl checks the value of $systemchoice and so on until it gets to the last value, which is always used, whether it's true or not. Table 1. $userchoice values Value Logical value Negative number True Zero False Positive number True Empty string False Non-empty string True Undefined value False Empty list (including hashes) False List with at least one element (including hashes) True Use AutoLoader One of the most expensive portions of the execution of a Perl script is the compilation of source code into the bytecode that is actually executed. On a small script with no external modules, the process takes milliseconds. But start to include a few of your own external modules and the time increases. The reason is that Perl does little more with a module than importing the text and running it through the same compilation stage. That can turn your 200 line script into a 10,000 or 20,000 line script very quickly. The result is that you increase the initial stages of the compilation process before the script even starts to do any work. During the normal execution of your script, it may be that you only use 10 percent, or even 5 percent, of all the functions defined in those modules. So why load them all when you start the script? The solution is to use AutoLoader, which acts a bit like a dynamic loader for Perl modules. This uses files generated by the AutoSplit system, which divides up a module into the individual functions. When you load the module through use, all you do is load the stub code for the module. It's only when you call a function contained within the module that the AutoLoader steps in and then loads and compiles the code only for that function. The result is that you convert that 20,000 line script with modules back into a 200-line script, speeding up the initial loading and compilation stages. I've saved as much as two seconds just by converting one of my applications to use the AutoLoader system in place of preloading. It's easy to use by just changing your modules from the format shown in Listing 8 to that shown in Listing 9, and then making sure to use AutoSplit to create the loading functions you need. Note that you don't need to use Exporter any more; AutoLoader handles the loading of individual functions automatically without you have to explicitly list them. Listing 8. A standard module package MyModule; use OtherModule; require 'Exporter'; @EXPORT = qw/MySub/; sub MySub { ... } 1; Listing 9. An autoloading module package MyModule; use OtherModule; use AutoLoader 'AUTOLOAD'; 1; __END__ sub MySub { ... } The main difference here is that functions you want to autoload are no longer defined within the module's package space but in the data section at the end of the module (after the __END__ token). AutoSplit will place any functions defined here into the special AutoLoader files. To split up the module, use the following command line: perl -e 'use AutoSplit; autosplit($ARGV[0], $ARGV[1], 0, 1, 1)' MyModule.pm auto Using bytecode and the compiler back ends There are three ways to use the compiler: bytecode production, full compilation, or simply as a debugging/optimizing tool. The first two methods rely on converting your original Perl source into its compiled bytecode form and storing this precompiled version for execution. This is best used through the perlcc command. These two modes follow the same basic model but produce the final result differently. In bytecode mode, the resulting compiled bytecode is written out to another Perl script. The script consists of the ByteLoader preamble, with the compiled code stored as a byte string. To create this bytecode version, use the -B option to the perlcc command. For example: $ perlcc -B script.pl This will create a file, a.out. The output, however, is not very Web friendly. The resulting file can be executed with any Perl executable on any platform (Perl bytecode is platform independent): $ perl a.out What this does is save Perl from having to compile the script from its source code into the bytecode each time. Instead, it just runs the bytecode that was generated. This is similar to the process behind Java compilation and is in fact that same one-step away from being a truly compiled form of the language. On short scripts, especially those that use a number of external modules, you probably won't notice a huge speed increase. On larger scripts that "stand alone" without a lot of external module use, you should see a noticeable improvement. The full compilation mode is almost identical, except that instead of producing a Perl script with the compiled bytecode embedded in it, perlcc produces a version embedded into C source that is then compiled into a full-blown, standalone executable. This is not cross-platform compatible, but it does allow you to distribute an executable version of a Perl script without giving out the source. Note, however, that this doesn't convert the Perl into C, it just embeds Perl bytecode into a C-based application. This is actually the default mode of perlcc, so a simple: $ perlcc script.pl will create, and compile, a standalone application called a.out. One of the lesser-known solutions for both debugging and optimizing your code is to use the Perl compiler with one of the many "back ends." The back ends are actually what drive the perlcc command, and it's possible to use a back-end module directly to create a C source file that you can examine. The Perl compiler works by taking the generated bytecode and then outputting the results in a variety of different ways. Because you're looking at the opcodes generated during the compilation stage, you get to see the code after Perl's own internal optimizations have been applied. Providing you know the Perl opcodes, you can begin to identify where the potential bottlenecks might be. From a debugging perspective, go with back ends such as Terse (which is itself a wrapper on Concise) and Showlex. You can see in Listing 10 what the original Listing 1 looks like through the Terse back end. Listing 10. Using Terse to study bytecode LISTOP (0x306230) leave [1] OP (0x305f60) enter COP (0x3062d0) nextstate BINOP (0x306210) sassign SVOP (0x301ab0) const [7] PV (0x1809f9c) "abcdefghijklmnopqrstuvwxyz" OP (0x305c30) padsv [1] COP (0x305c70) nextstate BINOP (0x305c50) sassign SVOP (0x306330) const [8] PV (0x180be60) "" OP (0x306310) padsv [2] COP (0x305f20) nextstate BINOP (0x305f00) leaveloop LOOP (0x305d10) enteriter [3] OP (0x305cf0) null [3] UNOP (0x305cd0) null [141] OP (0x305e80) pushmark SVOP (0x3065d0) const [9] IV (0x180be30) 1 SVOP (0x3065f0) const [10] IV (0x1801240) 999999 UNOP (0x305ee0) null LOGOP (0x305ec0) and OP (0x305d50) iter LISTOP (0x305e60) lineseq COP (0x305e10) nextstate BINOP (0x305df0) concat [6] OP (0x305d70) padsv [2] OP (0x305dd0) padsv [1] OP (0x305ea0) unstack concat1.pl syntax OK Other tools What I've covered here looks entirely at the code that makes up your applications. While that's where most of the problems will be, there are tools and systems you can use that can help identify and locate problems in your code that might ultimately help with performance. Warnings/strict execution It's a common recommendation, but it really can make a difference. Use the warnings and strict pragmas to ensure nothing funny is going on with variable use, typos, and other inconsistencies. Using them in all your scripts will help you eliminate all sorts of problems, many of which can be the source of performance bottlenecks. Common faults picked up by these pragmas are ambiguous references and de-references, use of undefined values, and some help identifying typos for unused or undefined functions. All of this help, though, comes at a slight performance cost. I keep warnings and strict on while programming and debugging, and I switch it off once the script is ready to be used in the real world. It won't save much, but every millisecond counts. Profiling Profiling is a useful tool for optimizing code, but all it does is identify the potential location of the problem; it doesn't actually point out what the potential issue is or how to resolve it. Also, because profiling relies on monitoring the number of executions of different parts of your application it can, on occasion, give misleading advice about where a problem lies and the best approach for resolving it. However, profiling is still a useful, and often vital, part of the optimization process. Just don't rely on it to tell you everything you need to know. Debugging To me, a badly optimized program means that it has a bug. The reverse is also true: bugs often lead to performance problems. Classic examples are badly de-referenced variables or reading and/or filtering the wrong information. It doesn't matter whether your debugging technique involves using print statements or the full-blown debugger provided by Perl. The sooner you eliminate the bugs, the sooner you will be able to start optimizing your application. Putting it all together Now that you know the techniques, here is the way to go about using them together to produce optimized applications. I generally follow this sequence when optimizing: 1. Write the program as optimized as possible using the techniques above. Once you start to use them regularly, they become the only way you program. 2. Once the program is finished or at least in a releasable state, go through and double check that you are using the most efficient solution by hand by reading the code. You'll be able to spot a number of issues just by re-reading, and you might pick up a few potential bugs, too. 3. Debug your program. Bugs can cause performance problems, so you should always eliminate the bugs first before doing a more intense optimization. 4. Run the profiler. I always do this once on any serious application, just to see if there's something -- often obvious -- that I might have missed. 5. Go back to step 1 and repeat. I've lost track of the number of times I've completely missed a potential optimization the first time around. Either I'll go back and repeat the process two or three times in one session, or I'll leave, do another project, and return a few days, weeks, or months later. Weeks and months after, you'll often have found an alternative way of doing something that saves time. At the end of the day, there is no magic wand that will optimize your software for you. Even with the debugger and profiler, all you get is information about what might be causing a performance problem, not necessarily any helpful advice on what you should do to fix it. Be aware as well that there is a limit to what you can optimize. Some operations will simply take a lot of time to complete. If you have to work through a 10,000-item hash, there's no way of simplifying that process. But as you've seen, there might be ways of reducing the overhead in each case. Resources ? Read about techniques for debugging Perl in Cultured Perl: Debugging Perl with ease (developerWorks, November 2000). ? You'll find more information on Perl programming in Ted Zlatanov's Cultured Perl column on developerWorks. ? Visit CPAN for all the Perl modules you could ever want. ? Check out the O'Reilly Network's Perl.com for Perl information and related resources. ? Find more resources for Linux developers in the developerWorks Linux zone. ? Download no-charge trial versions of IBM middleware products that run on Linux, including WebSphere? Studio Application Developer, WebSphere Application Server, DB2? Universal Database, Tivoli? Access Manager, and Tivoli Directory Server, and explore how-to articles and tech support, in the Speed-start your Linux app section of developerWorks. ? Get involved in the developerWorks community by participating in developerWorks blogs. ? Purchase Linux books at discounted prices in the Linux section of the Developer Bookstore. About the author Martin C. Brown is a former IT Director with experience in cross-platform integration. A keen developer, he has produced dynamic sites for blue-chip customers including HP and Oracle and is the Technical Director of Foodware.net. Now a freelance writer and consultant, MC, as he is better known, works closely with Microsoft as an SME, is the LAMP Technologies Editor for LinuxWorld magazine, a core member of the AnswerSquad.com team, and has written a number of books on topics as diverse as Microsoft Certification, iMacs, and open source programming. Despite his best attempts, he remains a regular and voracious programmer on many platforms and numerous environments. MC can be contacted at questions@mcslp.com, or through his Web site. From joey at joeykelly.net Tue Oct 26 11:02:14 2004 From: joey at joeykelly.net (Joey Kelly) Date: Tue Oct 26 10:51:53 2004 Subject: [Neworleans-pm] Visiting New Orleans In-Reply-To: <710672B0-26F9-11D9-8B65-000D932B9CD4@verizon.net> References: <710672B0-26F9-11D9-8B65-000D932B9CD4@verizon.net> Message-ID: <200410261102.14700.joey@joeykelly.net> On Monday 25 October 2004 9:47 pm, James Keenan spake: > Friends: > > I have been lurking on this list for a couple of months. I am planning > a trip to New Orleans in December to visit friends. I have some > flexibility in my scheduling and am interested in knowing if you will > be holding your December meeting on Friday the 10th as listed on your > web page. If so, I would like to attend your meeting. I haven't heard anything about the meeting being cancelled, so yeah, we're holding it. Looking forward to meeting you :-) -- Joey Kelly < Minister of the Gospel | Linux Consultant > http://joeykelly.net "I may have invented it, but Bill made it famous." --- David Bradley, the IBM employee that invented CTRL-ALT-DEL From jkeen at verizon.net Tue Oct 26 18:02:16 2004 From: jkeen at verizon.net (James Keenan) Date: Tue Oct 26 18:01:42 2004 Subject: [Neworleans-pm] Perl optimization article In-Reply-To: <200410261552.i9QFq3XE013742@www.pm.org> References: <200410261552.i9QFq3XE013742@www.pm.org> Message-ID: <14A884BA-27A3-11D9-BD2C-000D932B9CD4@verizon.net> > Message: 2 > Date: Mon, 25 Oct 2004 23:12:58 -0500 > From: Simon Dorfman > Subject: [Neworleans-pm] Perl optimization article > To: > Message-ID: > Content-Type: text/plain; charset="ISO-8859-1" > > http://www-106.ibm.com/developerworks/library/l-optperl.html > > You should note that this article has been subjected to some scorching criticisms on comp.lang.perl.misc starting on 10/23/04. Here, for example, is what Uri Guttman wrote: "so many misconceptions about perl that i can't even start. and he misses so many ways to optimize perl as well. no mention of the Benchmark.pm module. perl bytecode is generally useless and doesn't give much speedup choosing a better algorithm and/or data structure is the best way to optimize code in any language. if you think perl 's speed is the problem then you either chose the wrong language or don't know how to code perl efficiently. examining bytecode is silly in this context. it still won't show you which ops are the bottlenecks. the speed difference between single and double quoted strings is negligible. try a benchmark yourself. double quoted strings are converted to a join of the string parts at compile time so there is no runtime loss for simple strings." Jim Keenan From dave at gnofn.org Thu Oct 28 11:12:42 2004 From: dave at gnofn.org (Dave Cash) Date: Thu Oct 28 11:23:38 2004 Subject: [Neworleans-pm] Projector for December Meeting Message-ID: <20041028110251.X55288@sparkie.gnofn.org> Hello, all. I've been corresponding with Jim Keenan from NY Perl Seminar, a Perl Mongers affiliate. He's offered to give a talk at our December meeting. We've discussed which of his talks would fit best with the skill level of our group here. Of his talks, we think the one on his CPAN module List::Compare would be best. He plans to talk about how a non-expert Perl hacker can prepare a CPAN distribution, basic object-oriented Perl, how a test suite works, how a module evolves over time, etc. I think this would be a good intro to a lot of this stuff for a lot of us. One thing Jim would very much like to have to do the presentation is a video projector. Does anyone on this list have access to use or borrow a video projector for our December 10 meeting? The meeting will be from 5:30 - 8:00. I'm excited that we'll be having our first guest speaker soon. Thanks! Dave /L\_/E\_/A\_/R\_/N\_/T\_/E\_/A\_/C\_/H\_/L\_/E\_/A\_/R\_/N\ Dave Cash Power to the People! Frolicking in Fields of Garlic Right On-Line! dave@gnofn.org Dig it all.