From alec.clews at gmail.com Thu May 1 02:24:38 2008 From: alec.clews at gmail.com (Alec Clews) Date: Thu, 01 May 2008 19:24:38 +1000 Subject: [Melbourne-pm] Next meeting: 14th May. LIGHTNING TALKS In-Reply-To: <48195B2B.6030700@perltraining.com.au> References: <48192965.7030601@perltraining.com.au> <48193686.9040700@rea-group.com> <48195B2B.6030700@perltraining.com.au> Message-ID: <1209633878.6358.1.camel@seven> If possible I'd love a quick look at how to use git as a client for Subversion? On Thu, 2008-05-01 at 15:54 +1000, Paul Fenwick wrote: > G'day Toby / MPM, > > Toby Corkindale wrote: > > > It's not strictly Perl, nor a 5 min talk, but I mentioned to Paul at the > > last meeting that I could do a quick talk on using Git for source > > control, with a quick run-through of how to use it. > > Oh goodness! You did indeed! Talking about git could be considered a bit > Perlish, since the Perl 5 source is moving to git as its source-control > system. You could make it especially Perlish if you added git-fu on how to > run Perl::Critic or something similar before code gets committed. > > > I'm still happy to do a cut-down version in five minutes though, if > > people are interested? > > You immediately qualify on the 5-minute version because if we're doing > lightning talks, then that's a lightning talk. ;) > > I'd personally love to see git as a longer talk (unless anyone complains), > so I'd propose the git talk as the "feature talk" for the evening, with > lightning talks either before or after. > > Cheerio, > > Paul > From ddick at aapt.net.au Thu May 1 02:25:10 2008 From: ddick at aapt.net.au (David Dick) Date: Thu, 01 May 2008 19:25:10 +1000 Subject: [Melbourne-pm] Next meeting: 14th May. LIGHTNING TALKS In-Reply-To: <48192965.7030601@perltraining.com.au> References: <48192965.7030601@perltraining.com.au> Message-ID: <48198C76.1000801@aapt.net.au> Jacinta Richardson wrote: > The next Melbourne Perl Mongers meeting will be held: > > 6:30pm 14th May > Level 1 > 172 Flinders St > (just opposite Federation Square) > > David Dick and AAPT have kindly volunteered to host us ummm.... actually the sponsor is Remasys Pty Ltd. :) Everyone will get confused if they show up looking for the AAPT offices... :) From jarich at perltraining.com.au Thu May 1 18:43:48 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Fri, 02 May 2008 11:43:48 +1000 Subject: [Melbourne-pm] Next meeting: 14th May. LIGHTNING TALKS In-Reply-To: <48198C76.1000801@aapt.net.au> References: <48192965.7030601@perltraining.com.au> <48198C76.1000801@aapt.net.au> Message-ID: <481A71D4.2050604@perltraining.com.au> David Dick wrote: > Jacinta Richardson wrote: >> The next Melbourne Perl Mongers meeting will be held: >> >> 6:30pm 14th May >> Level 1 >> 172 Flinders St >> (just opposite Federation Square) >> >> David Dick and AAPT have kindly volunteered to host us > ummm.... actually the sponsor is Remasys Pty Ltd. :) Everyone will get > confused if they show up looking for the AAPT offices... :) My mistake, thankyou for the clarification. J -- ("`-''-/").___..--''"`-._ | Jacinta Richardson | `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia | (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 | _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au | (il),-'' (li),' ((!.-' | www.perltraining.com.au | From guy at alchemy.com.au Thu May 1 19:35:33 2008 From: guy at alchemy.com.au (Guy Morton) Date: Fri, 2 May 2008 12:35:33 +1000 Subject: [Melbourne-pm] Amazon S3 Message-ID: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au> Hello perlers Anyone here had experience using perl and Amazon::S3 to do mysql database backups to S3? I've tried this guy's script as a way to get started, but it no workee: http://dparrish.com/2008/02/mysql-backup-to-amazon-s3/ It seems to die on the add_bucket command - fails with a file not found error...which I don't really understand. Anyone here got any ideas or pointers? TIA Guy -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080502/7a51d84e/attachment.html From scottp at dd.com.au Sat May 3 17:31:26 2008 From: scottp at dd.com.au (Scott Penrose) Date: Sun, 04 May 2008 10:31:26 +1000 Subject: [Melbourne-pm] $SIG{CHLD} Message-ID: <481D03DE.7090401@dd.com.au> Hey Dudes To capture exit values of forked daemons and not end up with a set of zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do it fully. However once you do you loose the ability to capture the return value of a 'system' call - unless you do it the hard way (record in a hash the value by process id and then use that and remove it after your system call). Anyway to get this all written down I wrote it on my site, but also as partly an open question - is there a better way of doing 'system' which does not depend on changes to $SIG{CHLD} or other solutions: http://scott.dd.com.au/wiki/SIG_CHLD So anyone know of one? Scott From daniel at rimspace.net Sun May 4 01:23:54 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Sun, 04 May 2008 18:23:54 +1000 Subject: [Melbourne-pm] $SIG{CHLD} In-Reply-To: <481D03DE.7090401@dd.com.au> (Scott Penrose's message of "Sun, 04 May 2008 10:31:26 +1000") References: <481D03DE.7090401@dd.com.au> Message-ID: <87iqxuqxyd.fsf@rimspace.net> Scott Penrose writes: > To capture exit values of forked daemons and not end up with a set of > zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do > it fully. However once you do you loose the ability to capture the > return value of a 'system' call - unless you do it the hard way > (record in a hash the value by process id and then use that and remove > it after your system call). > > Anyway to get this all written down I wrote it on my site, but also as > partly an open question - is there a better way of doing 'system' > which does not depend on changes to $SIG{CHLD} or other solutions: > > http://scott.dd.com.au/wiki/SIG_CHLD > > So anyone know of one? Well, my very strong preference for doing /anything/ related to child processes is to use the IPC::Run module. This wraps up a whole bunch of stuff from a dead simple 'run this' through to a complex 'write to and read from a filter, looking for specific out' and 'build a pipeline' stuff. The interface is sensible, light-weight, and the tool scales very well from start to finish. It also, on investigation, handles $? appropriately internally so that it does the right stuff as far as I can tell. You may want to look into it, although it isn't always perfect: http://www.perlmonks.org/?node_id=674306 Also, not always playing nice with SIG{CHLD} handlers, although this is very much in the "point gun at foot, pull trigger" style: http://www.depesz.com/index.php/2008/02/07/failing-ls/ (Answer for those who don't want to read the code below the cut) Anyway, it should play nicely with existing SIG{CHLD} handlers that are written such that they don't break random library code and the like, and certainly beats hand-coding everything. Regards, Daniel I have not actually tried the Perl co-process support, but everything else seems solid enough. The answer is that the install SIG{CHLD} handler will wait for and collect the exit status from *everything*, which means that my the time the IPC::Run code in _cleanup (IPC/Run.pm:3157) is called the exit status is already gone as is the zombie process. See the manual page (and error code) for the waitpid system call, From scottp at dd.com.au Sun May 4 05:11:04 2008 From: scottp at dd.com.au (Scott Penrose) Date: Sun, 04 May 2008 22:11:04 +1000 Subject: [Melbourne-pm] $SIG{CHLD} In-Reply-To: <87iqxuqxyd.fsf@rimspace.net> References: <481D03DE.7090401@dd.com.au> <87iqxuqxyd.fsf@rimspace.net> Message-ID: <481DA7D8.7010504@dd.com.au> Daniel Pittman wrote: > Scott Penrose writes: > > >> To capture exit values of forked daemons and not end up with a set of >> zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do >> it fully. However once you do you loose the ability to capture the >> return value of a 'system' call - unless you do it the hard way >> (record in a hash the value by process id and then use that and remove >> it after your system call). >> >> Anyway to get this all written down I wrote it on my site, but also as >> partly an open question - is there a better way of doing 'system' >> which does not depend on changes to $SIG{CHLD} or other solutions: >> >> http://scott.dd.com.au/wiki/SIG_CHLD >> >> So anyone know of one? >> > > Well, my very strong preference for doing /anything/ related to child > processes is to use the IPC::Run module. This wraps up a whole bunch of > stuff from a dead simple 'run this' through to a complex 'write to and > read from a filter, looking for specific out' and 'build a pipeline' > stuff. > > The interface is sensible, light-weight, and the tool scales very well > from start to finish. > > It also, on investigation, handles $? appropriately internally so that > it does the right stuff as far as I can tell. > > You may want to look into it, although it isn't always perfect: > > http://www.perlmonks.org/?node_id=674306 > > Also, not always playing nice with SIG{CHLD} handlers, although this is > very much in the "point gun at foot, pull trigger" style: > > http://www.depesz.com/index.php/2008/02/07/failing-ls/ > > (Answer for those who don't want to read the code below the cut) > > Anyway, it should play nicely with existing SIG{CHLD} handlers that are > written such that they don't break random library code and the like, and > certainly beats hand-coding everything. > > Regards, > Daniel > > I have not actually tried the Perl co-process support, but everything > else seems solid enough. > > > > The answer is that the install SIG{CHLD} handler will wait for and > collect the exit status from *everything*, which means that my the time > the IPC::Run code in _cleanup (IPC/Run.pm:3157) is called the exit > status is already gone as is the zombie process. > THAT'S IT ! I knew I had seen a module around that I had used before, and I could not find it on CPAN, or remember it - sometimes getting the name right is tricky (better search for CPAN is another topic). Have you ever spent a day writing a module that does not exist on CPAN, only to find at the end of that day you gained enough knowledge to find the module that did exist on CPAN :-) Thanks Scott -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080504/cec68c11/attachment.html From daniel at rimspace.net Sun May 4 18:18:32 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Mon, 05 May 2008 11:18:32 +1000 Subject: [Melbourne-pm] $SIG{CHLD} In-Reply-To: <481DA7D8.7010504@dd.com.au> (Scott Penrose's message of "Sun, 04 May 2008 22:11:04 +1000") References: <481D03DE.7090401@dd.com.au> <87iqxuqxyd.fsf@rimspace.net> <481DA7D8.7010504@dd.com.au> Message-ID: <87zlr5im53.fsf@rimspace.net> Scott Penrose writes: > Daniel Pittman wrote: > Scott Penrose writes: > > To capture exit values of forked daemons and not end up with a set of > zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do > it fully. [...] > Well, my very strong preference for doing /anything/ related to child > processes is to use the IPC::Run module. [...] > THAT'S IT ! I knew I had seen a module around that I had used before, > and I could not find it on CPAN, or remember it - sometimes getting > the name right is tricky (better search for CPAN is another topic). Mmmm. For the audience there is also IPC::Run3, which aims to be IPC::Run without the complexity; I don't advise it because the complexity doesn't really slow you up or show up 'til you need it in the larger module. > Have you ever spent a day writing a module that does not exist on > CPAN, only to find at the end of that day you gained enough knowledge > to find the module that did exist on CPAN :-) Oh, sure. Happens all the time: I went looking for finance related modules in CPAN just the other day, couldn't find what I wanted, and only worked out which root to search when I read the documentation for writing a plugin to another module... Daniel From tconnors at astro.swin.edu.au Tue May 6 21:11:23 2008 From: tconnors at astro.swin.edu.au (Tim Connors) Date: Wed, 7 May 2008 14:11:23 +1000 (EST) Subject: [Melbourne-pm] case insensitive REs Message-ID: G'day. I want the user to be able to supply a -i flag to my program to make global case insensitive searching. Except that when I actually go to perform the RE operation in perl, it only takes /i as a modifier. I can't simply say, where $case contains either "i" or "": while (/($re)/g$case) { ... } since perl complains that I am not allowed to put a variable there: Scalar found where operator expected at /home/ssi/tconnors/bin/phrasegrep line 131, near "/($re)/g$case" (Missing operator before $case?) I would have expected perhaps a global variable in perhaps perlvar(1) telling me I could force a global case insensitive match. The only way around this that I can see is the butt ugly: if ($case) { while (/($re)/gi) { ... } } else while (/($re)/g) { ... } } (or perhaps an eval -- ick?) And obviously, this would be thouroughly stupid. So what am I doing wrong? -- Tim Connors From jarich at perltraining.com.au Tue May 6 21:16:09 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Wed, 07 May 2008 14:16:09 +1000 Subject: [Melbourne-pm] case insensitive REs In-Reply-To: References: Message-ID: <48212D09.3050209@perltraining.com.au> Tim Connors wrote: > G'day. > > I want the user to be able to supply a -i flag to my program to make > global case insensitive searching. > > Except that when I actually go to perform the RE operation in perl, it > only takes /i as a modifier. I can't simply say, where $case contains > either "i" or "": > > while (/($re)/g$case) { > ... > } You can put modifiers inside regular expressions too. while(m/(?i:($re)/g) { ... } Ever wondered why non-capturing braces look so ugly? (?: ... ) now you know! Here's a test: my $foo = "ABC"; my $bar = "abc"; foreach ($foo, $bar) { print "matched $_\n" if m/(?i:A)/; } matched ABC matched abc All the best, J -- ("`-''-/").___..--''"`-._ | Jacinta Richardson | `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia | (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 | _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au | (il),-'' (li),' ((!.-' | www.perltraining.com.au | From wigs at stirfried.org Tue May 6 21:20:22 2008 From: wigs at stirfried.org (wigs at stirfried.org) Date: Wed, 7 May 2008 14:20:22 +1000 Subject: [Melbourne-pm] case insensitive REs In-Reply-To: References: Message-ID: <20080507042022.GA9733@stirfried.org> On Wed, May 07, 2008 at 02:11:23PM +1000, Tim Connors wrote: > I want the user to be able to supply a -i flag to my program to make > global case insensitive searching. > > Except that when I actually go to perform the RE operation in perl, it > only takes /i as a modifier. I can't simply say, where $case contains > either "i" or "": > > while (/($re)/g$case) { > ... > } You can include matching operators inside the regex; for example: my $pattern = "(?i)foobar"; if ( /$pattern/ ) { } This example is taken straight from perldoc perlre, under the 'Extended Patterns' section. Cheers, -- Aaron From cas at taz.net.au Tue May 6 22:23:58 2008 From: cas at taz.net.au (Craig Sanders) Date: Wed, 7 May 2008 15:23:58 +1000 Subject: [Melbourne-pm] case insensitive REs In-Reply-To: References: Message-ID: <20080507052358.GA14155@taz.net.au> On Wed, May 07, 2008 at 02:11:23PM +1000, Tim Connors wrote: > I want the user to be able to supply a -i flag to my program to make > global case insensitive searching. > > [...] > > Scalar found where operator expected at /home/ssi/tconnors/bin/phrasegrep > line 131, near "/($re)/g$case" > (Missing operator before $case?) > > I would have expected perhaps a global variable in perhaps perlvar(1) > telling me I could force a global case insensitive match. > > The only way around this that I can see is the butt ugly: > > if ($case) { > while (/($re)/gi) { > ... > } > } else > while (/($re)/g) { > ... > } > } NOTE: the following is "Untested but it should work because i've done similar stuff before and the docs say so too". remember that trademark, it's your non-guarantee of quality :) $re = '(?i)' . $re if ($case); while (/($re)/g) { ... } alternatively: $mods = 'g'; $mods = 'i' . $mods if ($case); $re = "(?$mods)$re"; while (/($re)/g) { ... } from perlre(1): "(?imsx-imsx)" One or more embedded pattern-match modifiers, to be turned on (or turned off, if preceded by "-") for the remainder of the pattern or the remainder of the enclosing pattern group (if any). This is particularly useful for dynamic patterns, such as those read in from a configuration file, read in as an argument, are specified in a table somewhere, etc. Consider the case that some of which want to be case sensitive and some do not. The case insensitive ones need to include merely "(?i)" at the front of the pattern. For example: $pattern = "foobar"; if ( /$pattern/i ) { } # more flexible: $pattern = "(?i)foobar"; if ( /$pattern/ ) { } These modifiers are restored at the end of the enclosing group. For example, ( (?i) blah ) \s+ \1 will match a repeated (including the case!) word "blah" in any case, assuming "x" modifier, and no "i" modifier outside this group. also remember: if $re is never going to change during the life of the program, then you can gain a significant performance boost by using the "/o" modifier. this compiles the regexp only once, which is very useful if you're matching the same regexp repeatedly in a loop. (digression: i just noticed that the /o modifier isn't mentioned in my perlre man page, but it is discussed in the perlretut man page. odd. perl v5.8.8) so: $re = '(?i)' . $re if ($case); while (/($re)/go) { ... } or: $mods = 'go'; $mods = 'i' . $mods if ($case); $re = "(?$mods)$re"; while (/($re)/g) { ... } see also perlretut(1). search for the section "Embedding comments and modifiers in a regular expression". and just above that is a section on compiling and saving regexps (i.e. the /o modifier). craig -- craig sanders BOFH excuse #451: astropneumatic oscillations in the water-cooling From cas at taz.net.au Tue May 6 22:31:05 2008 From: cas at taz.net.au (Craig Sanders) Date: Wed, 7 May 2008 15:31:05 +1000 Subject: [Melbourne-pm] case insensitive REs In-Reply-To: <20080507052358.GA14155@taz.net.au> References: <20080507052358.GA14155@taz.net.au> Message-ID: <20080507053105.GB14155@taz.net.au> On Wed, May 07, 2008 at 03:23:58PM +1000, Craig Sanders wrote: > $mods = 'go'; > $mods = 'i' . $mods if ($case); > $re = "(?$mods)$re"; > > while (/($re)/g) { > ... > } doh! bad cut/paste/edit. change that final '/g' on the while line to just '/': while (/($re)/) { repeat for both my examples. craig -- craig sanders Ninety percent of the politicians give the other ten percent a bad reputation. -- Henry Kissinger From tconnors at astro.swin.edu.au Wed May 7 03:35:59 2008 From: tconnors at astro.swin.edu.au (Tim Connors) Date: Wed, 7 May 2008 20:35:59 +1000 (EST) Subject: [Melbourne-pm] grep independant of newlines (Was Re: case insensitive REs) In-Reply-To: References: Message-ID: On Wed, 7 May 2008, Tim Connors wrote: > G'day. > > I want the user to be able to supply a -i flag to my program to make > global case insensitive searching. Yeehaw. My day job always seems to come back to LaTeX code. grepping for stuff that has been nicely folded at the 72 column mark is a pain, because grep usually looks at just the one line. The sed & awk book had a recipe for phrasegrep, looking over two consequetive lines at once. But had a few bugs that I worked around over the years, and if your regexp ought to match things over 3 lines, you were out of luck. Well, it now works :) Feel free to appropriate as you choose (or this is where I usually get told what program I should have been using instead of reinventing the wheel :-): #!/usr/bin/perl -w # -*- Mode: perl -*- # $Revision: 1.10 $ $Date: 2008/05/07 10:27:35 $ # $Id: phrasegrep,v 1.10 2008/05/07 10:27:35 tconnors Exp $ # $Header: /home/ssi/tconnors/cvsroot/bin/phrasegrep,v 1.10 2008/05/07 10:27:35 tconnors Exp $ # $RCSfile: phrasegrep,v $ # greps for a re in files without regards for newlines. use strict; use warnings; use Carp::Assert; use Getopt::Long; Getopt::Long::Configure ("bundling"); use Pod::Usage; my $verbose=0; my $debug=0; my $colour="tty"; my $case=0; my $greedy=0; my $VERSION='$Revision: 1.10 $'; $VERSION=~s/\$[R]evision: ([^ ].*[^ ]) *\$/$1/; my $DATE='$Date: 2008/05/07 10:27:35 $'; $DATE=~s/\$[D]ate: ([^ ].*[^ ]) *\$/$1/; my $FILE='$RCSfile: phrasegrep,v $'; $FILE=~s/\$[R]CSfile: ([^ ].*[^ ]),v *\$/$1/; my $WHAT="greps for a re in files without regards for newlines"; my (@SAVEARGV)=@ARGV; sub isNum($) { ($_[0] =~ /^[+-]?\d+$/); } my $getOptVerbose = sub { my ($junk, $v)=(@_); $v=$verbose+1 if ($v eq ""); die "verbosity level is not a number: $v\n" if (!isNum $v); $verbose=$v; }; my $getOptDebug = sub { my ($junk, $d)=(@_); $d=$debug+1 if ($d eq ""); die "debug level is not a number: $d\n" if (!isNum $d); $debug=$d; }; my $getOptColour = sub { my ($junk, $c)=(@_); $c=1 if ($c eq ""); #could also be "tty" $colour=$c; }; my ($opt_help, $opt_man, $opt_version); my $result = GetOptions ('colour:s' => $getOptColour, 'debug:s' => $getOptDebug, 'verbose:s' => $getOptVerbose, 'c' => sub { $colour=1 }, 'd' => sub { $debug++}, 'v' => sub { $verbose++ }, 'nocolour' => sub { $colour = 0 }, 'i|case!' => \$case, 'g|greedy!' => \$greedy, 'help|?|h' => \$opt_help, 'man' => \$opt_man, 'version|V' => \$opt_version, ) || pod2usage(2); pod2usage(1) if ($opt_help); pod2usage(-verbose => 2) if ($opt_man); #pod2usage(-verbose => 0) if ($opt_version); if ($opt_version) { print "$FILE ($WHAT) $VERSION ($DATE)\n"; print "Copyright Tim Connors (2002-2008)\n"; print "License: GPL\n"; print "Author(s): Tim Connors 1); @ARGV='-' if (!@ARGV); my $colopen=""; my $colclose=""; if ($colour eq "tty") { if (-t STDOUT) { $colour=1 ; } else { $colour=0; } } if ($colour) { $colopen="\033[1;31m"; $colclose="\033[0m"; } print STDERR "transforming match re from '$re' to " if $verbose; $re =~ s/ /\\s+/g; #spaces in the match always get # transformed into whitespace matches $re =~ s/([*+])/$1?/g if !$greedy; #use non greedy matches by default $re = "(?i)$re" if !$case; #case insensitive by default $re = "($re)"; print STDERR "'$re'\n" if $verbose; foreach my $file (@ARGV) { my $incfilename= $manyfiles ? "$file:" : ""; if (!open(FH, $file)) { warn "can't open $file for read"; next; } local $/; undef $/; #slurp input files my $input = ; $_=$input; #to be able to match occurences on overlapping lines, log the start #and end of the line where each match occurs, as well as, for #colouring purposes, where the matches themselves start and end my @nlmatch=(); my @eolmatch=(); my @startmatch=(); my @endmatch=(); while (/$re/goms) { #man perlretut(1): "@-" and "@+" push @startmatch, $-[0]; push @endmatch, $+[0]; my $curpos=$-[0]; while ($curpos > 0) { if ((substr $input, $curpos, 1) eq "\n") { $curpos++; last; } $curpos--; } push @nlmatch, $curpos; $curpos=$+[0]; while ($curpos < length($input)-1) { if ((substr $input, $curpos, 1) eq "\n") { $curpos--; last; } $curpos++; } push @eolmatch, $curpos; } print "nl=@nlmatch\n" if $verbose; print "eol=@eolmatch\n" if $verbose; print "s=@startmatch\n" if $verbose; print "e=@endmatch\n" if $verbose; my $curpos; my $length; $curpos=$nlmatch[0]; foreach my $i (0.. at nlmatch) { #iterate through each of the matches of the regexp, and if a new #line, then print the start of the line to the start of the next #re, print the colours and that re, then print the line to the #next re if same line... if (($i>0) && (($i==@nlmatch) || ($nlmatch[$i] != $nlmatch[$i-1]))) { print STDERR "new line: $i " if $verbose; $length = $eolmatch[$i-1] - $endmatch[$i-1] + 2; #+1 to get the nl print STDERR "length: $length\n" if $verbose; print substr($input, $curpos, $length); $curpos=$nlmatch[$i]; } last if ($i == @nlmatch); $length = $startmatch[$i] - $curpos; print substr($input, $curpos, $length); $curpos += $length; print $colopen; $length = $endmatch[$i] - $startmatch[$i]; print substr($input, $curpos, $length); $curpos += $length; print $colclose; } ##/m -- ^/$ becomes start/end of any line ##/s may also be necessary #previous attempts: # #this doesn't yet match multiple occurences on overlapping sets of # #lines. This makes me sad. The third bracket somehow has to # #exclude the second # while (/(\n?[^\n]*?)($re)([^\n]*?\n?)/msg) { #$case # # while (/^([^\n]*?)($re)([^\n]*?)$/msg) { #$case # my $match = "$incfilename$1$colopen$2$colclose$3"; # print "$match"; # } } # $Log: phrasegrep,v $ # Revision 1.10 2008/05/07 10:27:35 tconnors # licence information # # Revision 1.9 2008/05/07 10:24:33 tconnors # non-greedy match by default # # Revision 1.8 2008/05/07 10:14:20 tconnors # no need to transform \n in a temporary string -- already knew about matching /sm modifiers, but in this iteration of the code, couldnt quite see what I was doing # # Revision 1.7 2008/05/07 06:54:06 tconnors # port to perl, and suck in the entire files at once so can compare over more than 2 lines at a time # __END__ =head1 NAME phrasegrep - greps for a re in files without regards for newlines =head1 SYNOPSIS phrasegrep [options] Options: [--help|-?|-h] [--man] [--version|-V] [--colour |--nocolour|-c] [--debug |-d] [--verbose |-v] [--case|-i] [--greedy|-g] =head1 OPTIONS =over 8 =item B<--help|-h|-?> Print a brief help message and exits. =item B<--man> Prints the manual page and exits. =item B<--version|-V> Prints version information and exits. =item B<--colour {yes|auto|no}|--nocolour|-c> STDIO uses colour always, only when STDOUT is a terminal, or never =item B<--debug {level}|-d> Sets or increments the debug level. Current level is 1 =item B<--verbose {level}|-d> Sets or increments the verbosity level. Current level is 1 =item B<--case|-i> Performs a case sensetive regexp search =item B<--greedy|-g> Performs the default perl greedy match instead of non greedy =back =head1 DESCRIPTION B greps for a re in files without regards for newlines =cut -- Tim Connors From jarich at perltraining.com.au Mon May 12 05:53:19 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Mon, 12 May 2008 22:53:19 +1000 Subject: [Melbourne-pm] SAGE-AU Victorian IT Symposium - Friday 30th May 2008 Message-ID: <48283DBF.6010107@perltraining.com.au> The SAGE-AU Victorian IT Symposium - Friday 30th May 2008 ========================================================= Hotel Grand Chancellor 131 Lonsdale Street Melbourne Friday 30th May 2008, 9am - 5pm Book before Friday (16th May) to take advantage of our early bird offer! The SAGE-AU Victorian IT Symposium is a one day technical conference held in Melbourne. It is organised by the SAGE-AU Victorian Chapter and aims to provide an educational forum for systems and network administrators, system managers, developers and other technical professionals to meet and share their knowledge and experiences. This is the fifth year running for this event, focusing on a providing a fast paced stream of technical presentations. Morning and afternoon teas, and lunch will be provided. Come and spend a day with your peers and share your knowledge! Register: * Early bird registrations until 16th May 2008 * Register online at: http://www.sage-au.org.au/display/2008VIC/Registrations Programme: * Evolution of Storage - Cameron Huysmans (Total RISC Technology) * EMC Next Generation Products - Shane Moore (EMC) * Routing and Security Platforms - Lachlan Kidd (Cisco) * Life-cycle Management of Red Hat Enterprise Linux - Michael Wahren (Red Hat) * Apple Technology Update - Joseph Cox (Apple) * An Illustrated History of Software Failure - Paul Fenwick (Perl Training Australia) The SAGE-AU Victorian IT Symposium is proudly supported by our Gold Sponsors Red Hat, EMC Corporation and Total RISC Technology. You can find out more details at: http://www.sage-au.org.au/display/2008VIC/Home From pjf at perltraining.com.au Mon May 12 22:24:15 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Tue, 13 May 2008 15:24:15 +1000 Subject: [Melbourne-pm] Reminder: Meeting Wednesday (tomorrow) night! Message-ID: <482925FF.5060806@perltraining.com.au> G'day Everyone, It's that time again! Tomorrow night is Melbourne Perl Mongers night! When: Wednesday, 14th May (tomorrow) 6:30pm Where: Remasys Level 1 172 Flinders St (Opposite Deferation Square) Talk: Toby Corkindale - How awesome is git[1]? After discovering that all revision control software sucks, Linus Torvalds, inventor of Linux, created the git source control system. Supporting distributed development, incredible branching tools, amazing support tools, and more distribution mechanisms that you can poke a stick at. Git is not only used for source control of the Linux project, but also the new source control system for the Perl 5 core. Toby will reveal the secrets of how git solved his source control headaches, toned his muscles, and gave him a full head of hair[2]! After: Lightning talks, news, announcements. Drinks and dinner for those hungry and/or thirsty. Looking forward to seeing you all there! Paul [1] I didn't actually have a real abstract from Toby, so I made it up. However it is definitely about git. [2] Actual results may vary. -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From toby.corkindale at rea-group.com Tue May 13 20:58:21 2008 From: toby.corkindale at rea-group.com (Toby Corkindale) Date: Wed, 14 May 2008 13:58:21 +1000 Subject: [Melbourne-pm] MacBook DVI or VGA adaptor? Message-ID: <482A635D.2090108@rea-group.com> I've just realised I left the magic video-out adaptor for my MacBook at home, but I was going to use it for a talk at the meeting tonight. >.< Does anyone have one who is coming to the meeting? (PS. Is DVI OK for the data projector?) Otherwise I'll try and work something else out - I could use another laptop, if I may borrow one and ssh out from it, if there's internet available. Or I just run home first and come back into town, and just arrive late. Not the end of the world really. Toby -- Toby Corkindale Software developer w: www.rea-group.com REA Group refers to realestate.com.au Ltd (ASX:REA) Warning - This e-mail transmission may contain confidential information. If you have received this transmission in error, please notify us immediately on (61 3) 9897 1121 or by reply email to the sender. You must destroy the e-mail immediately and not use, copy, distribute or disclose the contents. From toby.corkindale at rea-group.com Tue May 13 21:01:58 2008 From: toby.corkindale at rea-group.com (Toby Corkindale) Date: Wed, 14 May 2008 14:01:58 +1000 Subject: [Melbourne-pm] MacBook DVI or VGA adaptor? In-Reply-To: <482A635D.2090108@rea-group.com> References: <482A635D.2090108@rea-group.com> Message-ID: <482A6436.5040601@rea-group.com> Toby Corkindale wrote: > I've just realised I left the magic video-out adaptor for my MacBook at > home, but I was going to use it for a talk at the meeting tonight. > >.< > > Does anyone have one who is coming to the meeting? Woah. Perlmongers to the rescue in record time! I now have a borrowed MacBook->VGA (analog, not DVI) adaptor on my desk. cheers! :D From wjmoore at gmail.com Wed May 14 03:07:00 2008 From: wjmoore at gmail.com (Wesley Moore) Date: Wed, 14 May 2008 21:07:00 +1100 Subject: [Melbourne-pm] Lego USB Flash Drive Message-ID: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com> This is a review of the Lego USB flash drives that are being sold by a Melbourne company that I mentioned at the meeting tonight. http://forums.mactalk.com.au/20/48480-zip-zip-lego-usb-drive-review.html From bjdean at bjdean.id.au Wed May 14 04:54:32 2008 From: bjdean at bjdean.id.au (Bradley Dean) Date: Wed, 14 May 2008 12:54:32 +0100 Subject: [Melbourne-pm] Amazon S3 In-Reply-To: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au> References: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au> Message-ID: <20080514115432.GI3704@bjdean.id.au> Greetings, On Fri, May 02, 2008 at 12:35:33PM +1000, Guy Morton wrote: > Hello perlers > > Anyone here had experience using perl and Amazon::S3 to do mysql database > backups to S3? > > I've tried this guy's script as a way to get started, but it no workee: > > http://dparrish.com/2008/02/mysql-backup-to-amazon-s3/ > > It seems to die on the add_bucket command - fails with a file not found > error...which I don't really understand. Amazon S3 has fairly restrictive rules on bucket names (including that they cannot contain upper-case letters). That script tries to create a bucket called: $aws_access_key_id. '-mysql-$hostname' An access key usually has uppercase characters so this won't work - incidentally there's not much point naming a bucket with the access key given that the bucket will be created inside the account defined by that access key. It's also part of the account credentials so logs containing the name of buckets will now have half of your login. Try changing the bucket name to lc('mysql-' . hostname()) and see if that helps. Here's the bucket naming restriction docs: http://docs.amazonwebservices.com/AmazonS3/2006-03-01/BucketRestrictions.html Cheerio, Brad > > Anyone here got any ideas or pointers? > > TIA > > Guy > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm -- Bradley Dean Software Engineer - http://bjdean.id.au/ Email: bjdean at bjdean.id.au Skype: skype at bjdean.id.au Mobile(Aus): +61-413014395 Mobile(UK): +44-7846895073 From pat at patspam.com Wed May 14 05:37:13 2008 From: pat at patspam.com (Patrick Donelan) Date: Wed, 14 May 2008 22:37:13 +1000 Subject: [Melbourne-pm] Lego USB Flash Drive In-Reply-To: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com> References: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com> Message-ID: <42321ee20805140537w519a11a5k81f16c048e00e28@mail.gmail.com> And here's the link to the John Resig's port of the Processing visualization language to JavaScript, using the Canvas element, as discussed on the southern end of the dinner table - as of today the project is being hosted on github, which dovetails nicely with Toby's presentation :) Patrick On Wed, May 14, 2008 at 8:07 PM, Wesley Moore wrote: > This is a review of the Lego USB flash drives that are being sold by a > Melbourne company that I mentioned at the meeting tonight. > > http://forums.mactalk.com.au/20/48480-zip-zip-lego-usb-drive-review.html > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080514/b2892902/attachment.html From tjc at wintrmute.net Wed May 14 18:41:12 2008 From: tjc at wintrmute.net (Toby Corkindale) Date: Thu, 15 May 2008 11:41:12 +1000 Subject: [Melbourne-pm] Git Message-ID: <20080515014112.GB2391@roseberry> Some links relating to the talk last night: Git's official home is: http://git.or.cz/ Gui tool screenshots, of a better tool than the one I didn't demonstrate well last night: http://sourceforge.net/project/screenshots.php?group_id=139897 http://sourceforge.net/project/screenshots.php?group_id=139897&ssid=33925 GitWeb in action: http://git.kernel.org/?p=git/git.git;a=summary Toby From jarich at perltraining.com.au Thu May 15 23:05:13 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Fri, 16 May 2008 16:05:13 +1000 Subject: [Melbourne-pm] OSDC 2008 Sydney (1-5 Dec 2008) - Call for Papers Message-ID: <482D2419.1050404@perltraining.com.au> Sorry if this results in a duplicate. I just haven't seen it around as much as I'd like ------------------------------------------------------------------------- Call for Papers Open Source Developers' Conference 2008 1st - 5th December 2008, Sydney, Australia The Open Source Developers' Conference 2008 is a conference run by open source developers, for developers and business people. It covers numerous programming languages across a range of operating systems, and related topics such as business processes, licensing, and strategy. Talks vary from introductory pieces through to the deeply technical. It is a great opportunity to meet, share, and learn with like-minded individuals. This year, the conference will be held in Sydney, Australia during the first week of December (1st - 5th). If you are an Open Source maintainer, developer or user, the organising committee would encourage you to submit a talk proposal on open source tools, solutions, languages or technologies you are working with. For more details and to submit your proposal(s), go to: http://osdc.com.au/2008/papers/cfp.html If you have any questions or require assistance with your submission, please don't hesitate to ask! We recognise the importance of Open Source in providing a medium for collaboration between individuals, researchers, business and government. In recognition of this and ensure a high standard of presentations, we intend to peer-review all submitted papers. OSDC 2008 Sydney (Australia) - Key Program Dates: 30 Jun - Initial proposals (short abstract) due 21 Jul - Proposal acceptance 15 Sep - Accepted paper submissions 13 Oct - Reviews completed 27 Oct - Final paper submission cut-off For all information, contacts and updates, see the OSDC conference web site at http://osdc.com.au/2008/ Also if you are interested in sponsoring, please see: http://www.osdc.com.au/2008/sponsors/opportunities.html Regards Mark Rees OSDC 2008 Marketing Co-ordinator From pjf at perltraining.com.au Sat May 17 21:12:01 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Sun, 18 May 2008 14:12:01 +1000 Subject: [Melbourne-pm] White Camel nominations are now open Message-ID: <482FAC91.7020500@perltraining.com.au> ---------- Forwarded Message: ---------- Subject: [pm_groups] White Camel nominations are now open Date: Saturday 17 May 2008 From: "Jos? Castro" Every year, at OSCON, the White Camels are presented. If you look at the previous winners [1], you'll notice that these are mostly unsung heroes, like previous awardee Eric Cholet, the human moderator of so many Perl mailing lists, or Jay Hannah, one of the people running pm.org [2] (if you ever created/maintained a pm group, chances are that Jay walked you through the process). Some of these people may be well known, like Allison Randal or Randal Schwartz, while others may be complete strangers to at least part of the globe, like Josh McAdams or Jay. Some of them may be extreme Perl hackers who created the original JAPH, but they actually received this award as a recognition for their community contributions to Perl. That's not to say a great hacker can't receive the award, but you don't have to be one in order to be eligible. That being said, the nomination process for the 2008 White Camels is now open. If you think there's someone who deserves a White Camel, this is the time for you to send in your nominations. Send them to jose at pm.org, if possible with a subject along the lines of "White Camel Nomination :: $name". Make sure you properly identify the nominee and tell us why you think that's a worthy nomination. Don't go thinking "nah, somebody else will do it" because: a) everybody else may be thinking the same, and b) you may state your case differently than the next person. We'll be receiving nominations until June 11, 2008, by midnight, but don't wait up or you'll forget. Do it now! Regards, jac PS: Please forward as you see fit. [1] - http://www.perl.org/advocacy/white_camel/ [2] - http://pm.org/ -- Jos? Castro TPF Community Relations Leader ------------------------------------------------------- From scottp at dd.com.au Thu May 22 05:58:18 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 22 May 2008 22:58:18 +1000 Subject: [Melbourne-pm] Ahhh... so close Message-ID: <48356DEA.70308@dd.com.au> Hey Guys I have been working on getting a new module written with another perl programmer for our gliding club. I decided to do it all the way of Perl Best Practice. And it worked beautifully. My tests passed everywhere. My friend works on Windows using Active State and his code installed and tested ok too. But where did it fall down - IO::Prompt !!! Normally I would just use something like my $in = to get basic input, maybe put it in a loop to make sure you get the data you want. But I thought no, lets do the PBP and use prompt. Since it is a recommended PBP and all the other code we have tried has compiled and worked beautifully cross-platform - it seems a shame to have this one let us down. So... do you think we could do a little re-write to make it a little more friendly for Win32? Anyone up for it? Scott From jarich at perltraining.com.au Fri May 23 00:55:33 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Fri, 23 May 2008 17:55:33 +1000 Subject: [Melbourne-pm] The 2008 SAGE-AU Victorian IT Symposium - 1 Week Left Message-ID: <48367875.2050004@perltraining.com.au> The 2008 SAGE-AU Victorian IT Symposium - Friday 30th May 2008 ========================================================= Hotel Grand Chancellor 131 Lonsdale Street Melbourne Friday 30th May 2008, 9am - 5pm The System Administrators Guild of Australia (SAGE-AU) 2008 Victorian IT Symposium is a one day technical conference held in Melbourne. It is organised by the SAGE-AU Victorian Chapter and aims to provide an educational forum for systems and network administrators, system managers, developers and other technical professionals to meet and share their knowledge and experiences. This is the fifth year running for this event, focusing on a providing a fast paced stream of technical presentations. Morning and afternoon teas, and lunch will be provided. Come and spend a day with your peers and share your knowledge! Register online at: http://www.sage-au.org.au/display/2008VIC/Registrations Programme: * Evolution of Storage - Cameron Huysmans (Total RISC Technology) * Backup Innovations - Shane Moore (EMC) * Routing and Security Platforms - Lachlan Kidd (Cisco) * Life-cycle Management of Red Hat Enterprise Linux - Michael Wahren (Red Hat) * Apple Technology Update - Joseph Cox (Apple) * An Illustrated History of Software Failure - Paul Fenwick (Perl Training Australia) The SAGE-AU Victorian 2008 IT Symposium is proudly supported by our Gold Sponsors Red Hat, EMC Corporation and Total RISC Technology. You can find out more details at: http://www.sage-au.org.au/display/2008VIC/Home From sisyphus1 at optusnet.com.au Fri May 23 02:38:52 2008 From: sisyphus1 at optusnet.com.au (Sisyphus) Date: Fri, 23 May 2008 19:38:52 +1000 Subject: [Melbourne-pm] Ahhh... so close In-Reply-To: <48356DEA.70308@dd.com.au> References: <48356DEA.70308@dd.com.au> Message-ID: ----- Original Message ----- From: "Scott Penrose" . . > > But where did it fall down - IO::Prompt !!! . . > So... do you think we could do a little re-write to make it a little > more friendly for Win32? > Hmmm ... would our (my) re-write have to conform to the recommendations of PBP ? ... or can we (I) just write our (my) usual crap code ? > > Anyone up for it? > Sounds a little bit interesting - though I'm not a big fan of PBP (and, undoubtedly, have reams of code to prove it :-) Maybe just post a demo of the problem, and see where that leads. Is there anything at http://rt.cpan.org/Public/Dist/Display.html?Name=IO-Prompt that raises the problem you found ? (Better still, is there anything there that solves the problem ?) Cheers, Rob From rob at cataclysm.cx Fri May 23 04:35:38 2008 From: rob at cataclysm.cx (Robert Norris) Date: Fri, 23 May 2008 21:35:38 +1000 Subject: [Melbourne-pm] Ahhh... so close In-Reply-To: <48356DEA.70308@dd.com.au> References: <48356DEA.70308@dd.com.au> Message-ID: <20080523113538.GA29214@plastic.home> Hi Scott, > But where did it fall down - IO::Prompt !!! I guess thats my cue to come out of the wordwork. A couple of years ago I wrote a patch[1] to add completion and history support to IO::Prompt. I spoke to Damian about it later and he had a pile of comments about it. I volunteered to take on maintenance of the module. Shortly after that I got sidetracked and didn't touch it again until I saw your email this morning. Amongst other things, I'm sitting on a patch from a Thomas Glaesser to make IO::Prompt work on Win32. Its in pretty poor shape though. It has a pile of control code handling and such that really belongs in Term::ReadKey. The first thing I've been trying to do is get a test suite in place, which is kinda hard as the whole thing is terminal-centric. I've been writing a module, Test::MockTerm, that fakes a terminal, but its a real mess at the moment. I should have it into some sort of shape in the next few days. Once the test suite is there, work can begin on new features. I want to shift all the knowledge of how to open, read from and write to the console on different platforms out to another module. I'm not sure if thats Term::ReadKey or a whole new package. Once it exists IO::Prompt can be modified to use it and then Win32 support can happen. Anyway I'm getting the code into my git repositories[2]. Again, its all a bit all over the place but it shouldn't take long to get it into some kind of shape. All help gratefully received :) Cheers, Rob. [1] http://rt.cpan.org/Ticket/Display.html?id=21055 [2] http://cataclysm.cx/git/ From thogard at abnormal.com Tue May 27 22:54:09 2008 From: thogard at abnormal.com (Tim Hogard) Date: Wed, 28 May 2008 05:54:09 +0000 (UTC) Subject: [Melbourne-pm] An intermittent problem with open for append Message-ID: <200805280554.m4S5s9jB083498@v.abnormal.com> Hi, I've got a CGI program that has a problem every once in a while. The problem code looks like: open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details"; print OUT "$ip:t:date=",scalar localtime,"\n"; ... then it prints to OUT all the rest of ${ENV} and CGI vars. Sometimes apache will record 2 hits on the page (a double click?) and most of the time I get two sets of all the data however sometimes while running perl 5.8.8 I only get the first or second sometimes. This never happens with perl 5.005_02. Can anyone explain why perl 5.005 works yet 5.8.8 doesn't? I was under the impresson that the ">>" means tell the OS to open in append mode, any data written should go in the file and not just end up lost. This is perl, v5.8.8 built for sun4-solaris This is perl, version 5.005_02 built for sun4-solaris Solaris 5.5.1 is the OS. I'm not even which direction to try to debug this problem is its it only happens once in a million times or so. I guess I could write a program to produce 3 children and have each of them open a file and append their PID and hunt for errors or maybe even trace that the append flag is in fact on (is there an easy way to get that info?) or maybe its a singal problem where its getting an odd signal. Any ideas? -tim From pjf at perltraining.com.au Tue May 27 23:14:44 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Wed, 28 May 2008 16:14:44 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <200805280554.m4S5s9jB083498@v.abnormal.com> References: <200805280554.m4S5s9jB083498@v.abnormal.com> Message-ID: <483CF854.40601@perltraining.com.au> G'day Tim, Tim Hogard wrote: > open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details"; > print OUT "$ip:t:date=",scalar localtime,"\n"; > ... then it prints to OUT all the rest of ${ENV} and CGI vars. Well, if there's a problem opening the file, then I expect you have something in @error that may tell you what's wrong, but I'll assume that if it was that simple youd' know about it. So... > Can anyone explain why perl 5.005 works yet 5.8.8 doesn't? > I was under the impresson that the ">>" means tell the OS to > open in append mode, any data written should go in the file > and not just end up lost. It absolutely does mean it should append. My guess is that you may be seeing a buffering issue; if something later causes your program to exit unexpectedly, it may not have finished writing to the file. I'd throw a: use IO::Handle; at the top of your code, and a: OUT->flush or die "Can't flush OUT: $!"; when you've finished writing a record to your file. ->flush will force the data to be written, and will return false (and should set $!) if there's any problems. > open a file and append their PID and hunt for errors or maybe even trace > that the append flag is in fact on (is there an easy way to get that info?) > or maybe its a singal problem where its getting an odd signal. If you're using strace, you should be able to see the file open with O_APPEND as one of the options. If you have an existing filehandle, you can test for O_APPEND using fcntl: use Fcntl; my $flags = fcntl(MYFILE, F_GETFL, 0); print( ($flags & O_APPEND) ? "append" : "not append"); Cheerio, Paul -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From tjc at wintrmute.net Tue May 27 23:22:35 2008 From: tjc at wintrmute.net (Toby Corkindale) Date: Wed, 28 May 2008 16:22:35 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <200805280554.m4S5s9jB083498@v.abnormal.com> References: <200805280554.m4S5s9jB083498@v.abnormal.com> Message-ID: <20080528062234.GE16797@roseberry> On Wed, May 28, 2008 at 05:54:09AM +0000, Tim Hogard wrote: > > Hi, > > I've got a CGI program that has a problem every once in a while. > > The problem code looks like: > > open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details"; > print OUT "$ip:t:date=",scalar localtime,"\n"; > ... then it prints to OUT all the rest of ${ENV} and CGI vars. > > Sometimes apache will record 2 hits on the page (a double click?) > and most of the time I get two sets of all the data however sometimes > while running perl 5.8.8 I only get the first or second sometimes. > This never happens with perl 5.005_02. > > Can anyone explain why perl 5.005 works yet 5.8.8 doesn't? > I was under the impresson that the ">>" means tell the OS to > open in append mode, any data written should go in the file > and not just end up lost. I can't explain why it works on 5.005 and not 5.8.8, but since you have mentioned it is a very rare occurence, it is possible that it /would/ occur on 5.005 eventually. Maybe the code just runs slower or faster and flukily avoids a race condition as a result? Also, it's worth noting that append isn't always safe for use by multiple processes - it works by seeking to the end of the file before writing, but according to the man page, this doesn't work reliably on networked file systems like NFS. Also - I think Apache will send a signal to the CGIs running, to kill them if the connection dies - is it is simply a case that when someone double-clicked, one of the cgi instances was killed before it could write to the logfile? cheers, Toby From mathew.robertson at netratings.com.au Wed May 28 01:58:06 2008 From: mathew.robertson at netratings.com.au (Mathew Robertson) Date: Wed, 28 May 2008 18:58:06 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <20080528062234.GE16797@roseberry> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <20080528062234.GE16797@roseberry> Message-ID: <483D1E9E.8090905@netratings.com.au> >> I've got a CGI program that has a problem every once in a while. >> >> The problem code looks like: >> >> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details"; >> print OUT "$ip:t:date=",scalar localtime,"\n"; >> ... then it prints to OUT all the rest of ${ENV} and CGI vars. >> >> Sometimes apache will record 2 hits on the page (a double click?) >> and most of the time I get two sets of all the data however sometimes >> while running perl 5.8.8 I only get the first or second sometimes. >> This never happens with perl 5.005_02. >> >> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't? >> I was under the impresson that the ">>" means tell the OS to >> open in append mode, any data written should go in the file >> and not just end up lost. >> > > I can't explain why it works on 5.005 and not 5.8.8, but since you have > mentioned it is a very rare occurence, it is possible that it /would/ occur on > 5.005 eventually. Maybe the code just runs slower or faster and flukily avoids > a race condition as a result? > > Also, it's worth noting that append isn't always safe for use by multiple > processes - it works by seeking to the end of the file before writing, but > according to the man page, this doesn't work reliably on networked file systems > like NFS. > I suspect this is root of the problem, irrespective of NFS -> the webserver is using two instances of the script, to execute the request. If two processes open the same file for append, they will both succeed. Both processes will move their file pointer to the "end of the file" - which both happens to be at the same byte offset. One starts "print"ing... then the other "print"s -> the second write will clobber the first write. This applies to both mod_perl and CGI environments. If you want cooperative access to a "shared resource, aka the $ip file, then you need locking (or something similar). > Also - I think Apache will send a signal to the CGIs running, to kill them if > the connection dies - is it is simply a case that when someone double-clicked, > one of the cgi instances was killed before it could write to the logfile > regards, Mathew -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080528/264862f4/attachment.html From ddick at aapt.net.au Wed May 28 03:03:50 2008 From: ddick at aapt.net.au (David Dick) Date: Wed, 28 May 2008 20:03:50 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483D1E9E.8090905@netratings.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <20080528062234.GE16797@roseberry> <483D1E9E.8090905@netratings.com.au> Message-ID: <483D2E06.7090208@aapt.net.au> Mathew Robertson wrote: > I suspect this is root of the problem, irrespective of NFS -> the > webserver is using two instances of the script, to execute the request. > > If two processes open the same file for append, they will both > succeed. Both processes will move their file pointer to the "end of > the file" - which both happens to be at the same byte offset. One > starts "print"ing... then the other "print"s -> the second write will > clobber the first write. no. actually, NFS is the important factor for appending Over nfs (at least for older versions), O_APPEND is unreliable. On a local (modern) unix filesystem it is a guarantee. concept is explained by W.R. Stevens in Advanced Programming in the UNIX Environment viewable at http://www.informit.com/articles/article.aspx?p=99706&seqNum=11 From guy at alchemy.com.au Tue May 27 23:21:01 2008 From: guy at alchemy.com.au (Guy Morton) Date: Wed, 28 May 2008 16:21:01 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483CF854.40601@perltraining.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> Message-ID: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> aren't you supposed to use "or" instead of "||" after an open, due to operator precedence? http://perl.plover.com/FAQs/Precedence.html#Precedence_Traps_and_Surprises On 28/05/2008, at 4:14 PM, Paul Fenwick wrote: > G'day Tim, > > Tim Hogard wrote: > >> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details"; >> print OUT "$ip:t:date=",scalar localtime,"\n"; >> ... then it prints to OUT all the rest of ${ENV} and CGI vars. > > Well, if there's a problem opening the file, then I expect you have > something in @error that may tell you what's wrong, but I'll assume > that if > it was that simple youd' know about it. So... > >> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't? >> I was under the impresson that the ">>" means tell the OS to >> open in append mode, any data written should go in the file >> and not just end up lost. > > It absolutely does mean it should append. My guess is that you may be > seeing a buffering issue; if something later causes your program to > exit > unexpectedly, it may not have finished writing to the file. > > I'd throw a: > > use IO::Handle; > > at the top of your code, and a: > > OUT->flush or die "Can't flush OUT: $!"; > > when you've finished writing a record to your file. ->flush will > force the > data to be written, and will return false (and should set $!) if > there's any > problems. > >> open a file and append their PID and hunt for errors or maybe even >> trace >> that the append flag is in fact on (is there an easy way to get >> that info?) >> or maybe its a singal problem where its getting an odd signal. > > If you're using strace, you should be able to see the file open with > O_APPEND as one of the options. If you have an existing filehandle, > you can > test for O_APPEND using fcntl: > > use Fcntl; > > my $flags = fcntl(MYFILE, F_GETFL, 0); > print( ($flags & O_APPEND) ? "append" : "not append"); > > Cheerio, > > Paul > > -- > Paul Fenwick | http://perltraining.com.au/ > Director of Training | Ph: +61 3 9354 6001 > Perl Training Australia | Fax: +61 3 9354 2681 > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm From scottp at dd.com.au Wed May 28 04:36:03 2008 From: scottp at dd.com.au (Scott Penrose) Date: Wed, 28 May 2008 21:36:03 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> Message-ID: <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> There has been comment on is append safe or not. NFS - Absolutely not. You will need to consider locking (which also has issues) UNIX - Yes no problem BUT you must be under the internal buffer on the system, and it is line bound. So multi line insert will not be in order, but single lines will. This means you are totally safe doing an append with single line log files, locally. Using the Sync and Buffer changes Paul suggested won't improve the situation or make it any safer. This is because you may have your two scripts hit the file the same time - even if exactly the same time, the OS will put both lines in without garbaling it - UNLESS you go over the buffer size (not sure what that is, but 512 bytes would probably be safe guess). Windows - Anyone know? Scott From jarich at perltraining.com.au Wed May 28 05:04:00 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Wed, 28 May 2008 22:04:00 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> Message-ID: <483D4A30.1030606@perltraining.com.au> Guy Morton wrote: > aren't you supposed to use "or" instead of "||" after an open, due to > operator precedence? > > http://perl.plover.com/FAQs/Precedence.html#Precedence_Traps_and_Surprises This is correct. Tim's program will be interpreted as: open OUT, (">>/home/foo/que/$ip" || push @error, "Cant save details"); print OUT "$ip:t:date=",scalar localtime,"\n"; which means that the push will only occur if ">>/home/foo/que/$ip" is false - which it won't be. The correct file will be opened for appending however. Since the program isn't dying on an error, this just means that Tim's diagnostics will be ignored. Since he's then going to try printing to the possibly not-opened file handle ANYWAY, I suspect he doesn't care too much. In this kind of instance, I'd recommend: if(open OUT, ">>/home/foo/que/$ip") { print OUT "$ip:t:date=",scalar localtime,"\n"; } else { push @error, "Can't save details"; } as this ensures both the correct precedence and removes the warning (you have warnings turned on right?) about printing to an unopened filehandle in the case of an error. I don't think this is the cause of Tim's current problem, but it could be the cause of an error in the future. All the best, J -- ("`-''-/").___..--''"`-._ | Jacinta Richardson | `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia | (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 | _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au | (il),-'' (li),' ((!.-' | www.perltraining.com.au | From pjf at perltraining.com.au Wed May 28 05:30:03 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Wed, 28 May 2008 22:30:03 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> Message-ID: <483D504B.80309@perltraining.com.au> G'day Scott/Tim/MPM, Scott Penrose wrote: > UNIX - Yes no problem BUT you must be under the internal buffer on the > system, and it is line bound. So multi line insert will not be in > order, but single lines will. I do agree that O_APPEND on a local unix filesystem is atomic provided you're within the relevant limit for block IO. I beg to disagree that it has anything to do with *lines*. As far as your OS and filesystem is concerned, a file is just a bunch of bytes. If you write a 40MB "line" to that file, you can be pretty sure it won't be an atomic write. If you write ten "lines" of six characters each, you can be pretty certain it *will* be atomic. The preferred size for block IO for your filesystem can be found in the 11th field from Perl's stat() function. On most systems that corresponds to the size of a block on the filesystem, and is typically about 4k on ext2/ext3. AFAIK, it should also correspond to the smallest atomic write on your system. > Using the Sync and Buffer changes Paul suggested won't improve the > situation or make it any safer. My suggestion of forcing writes after we've written a logical record was to catch three possible problems: 1) If the data was completely missing from the file, it could be because the process is being zapped by a signal. This could be the case if the web-server zaps processes if the connection goes away, as Toby suggested earlier in this thread. Perl doesn't usually flush its buffers when dying to a signal, and so we can lose the write. You can observe this with a simple program like: use Fatal qw(open); open (my $fh, '>>', '/tmp/myfile.log'); while () { print {$fh} $_; } Type a few lines, and then hit CTRL-C. You'll discover that myfile.log ends up empty. Tim indicated that he was *missing* data, and being zapped by a signal is a possible culprit[1]. That's less likely now that Tim has indicated he's unbuffering the whole filehandle (provided this is done before it's written to). 2) If we're writing a lot of records, and we're leaving the flushing up to stdio, then stdio is free to flush data that intersects a record boundary. In this case we can end up with our record being mangled. You can see this in action by taking the above script, and repeatedly pasting a bunch of data into it while doing a 'tail -f' on myfile.log. When your data *does* get written to the file, you'll notice that the end of the data written doesn't correspond to the end of the data that's been pasted (unless you're pasting in blocks which are an exact multiple of your buffer-size). The last part of the data will be written when perl closes its filehandles (after we've hit CTRL-D to indicate end-of-input). This can particularly be a problem with long-running processes that are writing to a shared logfile. 3) If we completely unbuffer the filehandle, and then use multiple print()s to write our data, then the data from other processes can become intermingled with ours, since we'll be flushing after every print(). If we're manually calling ->flush() then we can ensure all our data is kept together, provided it fits within a single IO block. > Windows - Anyone know? Windows append isn't atomic, it's emulated by perl. It seeks, and then writes, meaning you can quite happily end up with race conditions and corrupted data if you don't take steps to avoid it (such as locking). Cheerio, Paul -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From scottp at dd.com.au Wed May 28 05:41:21 2008 From: scottp at dd.com.au (Scott Penrose) Date: Wed, 28 May 2008 22:41:21 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483D504B.80309@perltraining.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> <483D504B.80309@perltraining.com.au> Message-ID: <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au> On 28/05/2008, at 10:30 PM, Paul Fenwick wrote: > G'day Scott/Tim/MPM, > > Scott Penrose wrote: > >> UNIX - Yes no problem BUT you must be under the internal buffer on >> the system, and it is line bound. So multi line insert will not be >> in order, but single lines will. > > I do agree that O_APPEND on a local unix filesystem is atomic > provided you're within the relevant limit for block IO. I beg to > disagree that it has anything to do with *lines*. As far as your OS > and filesystem is concerned, a file is just a bunch of bytes. If > you write a 40MB "line" to that file, you can be pretty sure it > won't be an atomic write. If you write ten "lines" of six > characters each, you can be pretty certain it *will* be atomic. Quite right. It is the block that matters, what I meant is if you write multiple lines you may pass that block size. So you see this often works: print OUT "Some Error line\n"; and this often does not print OUT join("\n", @all_my_errors); Sorry about that. > 2) If we're writing a lot of records, and we're leaving the flushing > up to stdio, then stdio is free to flush data that intersects a > record boundary. In this case we can end up with our record being > mangled. You can see this in action by taking the above script, and > repeatedly pasting a bunch of data into it while doing a 'tail -f' > on myfile.log. When your data *does* get written to the file, > you'll notice that the end of the data written doesn't correspond to > the end of the data that's been pasted (unless you're pasting in > blocks which are an exact multiple of your buffer-size). The last > part of the data will be written when perl closes its filehandles > (after we've hit CTRL-D to indicate end-of-input). Sorry no, the record will still be mangled. Flushing does not fix that. If you are writing something greater than the buffer size the only answer is locking, nothing else works. Your answer above works, only if there is one script writing to the log and then you are fixing the internal flusing of the data. > Windows append isn't atomic, it's emulated by perl. It seeks, and > then writes, meaning you can quite happily end up with race > conditions and corrupted data if you don't take steps to avoid it > (such as locking). Typical, I expected that :) Then again it uses threads to emulate forks, so maybe not as big a problem :-) Scott From pjf at perltraining.com.au Wed May 28 05:47:54 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Wed, 28 May 2008 22:47:54 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> <483D504B.80309@perltraining.com.au> <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au> Message-ID: <483D547A.8080006@perltraining.com.au> G'day Scott/MPM, Scott Penrose wrote: > Sorry no, the record will still be mangled. Flushing does not fix that. > If you are writing something greater than the buffer size the only > answer is locking, nothing else works. Oops, I meant to qualify that with "provided your records are less than the atomic buffer size". You're quite right that if we hit records bigger than our atomic buffer, we have to move to locking. Cheerio, Paul -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From ddick at aapt.net.au Wed May 28 15:20:54 2008 From: ddick at aapt.net.au (David Dick) Date: Thu, 29 May 2008 08:20:54 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483D547A.8080006@perltraining.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au> <483D504B.80309@perltraining.com.au> <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au> <483D547A.8080006@perltraining.com.au> Message-ID: <483DDAC6.40408@aapt.net.au> Paul Fenwick wrote: > Oops, I meant to qualify that with "provided your records are less than the > atomic buffer size". You're quite right that if we hit records bigger than > our atomic buffer, we have to move to locking. > Very interesting thread. I had no idea that the kernel can mangle the output based on block size. However, at least in my tests, there will be no data lost, but it may be mangled? From cas at taz.net.au Wed May 28 16:08:45 2008 From: cas at taz.net.au (Craig Sanders) Date: Thu, 29 May 2008 09:08:45 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483D4A30.1030606@perltraining.com.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <483D4A30.1030606@perltraining.com.au> Message-ID: <20080528230845.GC14155@taz.net.au> On Wed, May 28, 2008 at 10:04:00PM +1000, Jacinta Richardson wrote: > In this kind of instance, I'd recommend: > > if(open OUT, ">>/home/foo/que/$ip") { > print OUT "$ip:t:date=",scalar localtime,"\n"; > } > else { > push @error, "Can't save details"; > } in this instance, i'd recommend something very similar, but more like this: my $logdir='/home/foo/que'; my $outfile="$logdir/$ip"; if(open(OUT,'>>',$outfile)) { print OUT "$ip:t:date=",scalar localtime,"\n"; } else { push @error, "Can't open $outfile for append: $!"; } advantages: 1. 3-argument open() is better practice, especially if there's a chance that the filename is based on user input. always using the 3-arg form of open() is a good habit to get into. 2. "Can't save details" is a useless error message. I've updated it to say specifically what the problem was - including the filename and "$!" aka $OS_ERROR, which is the actual error message returned by the operating system. 3. hard-coding directory names is bad. it's always good to make things easy for yourself - or your successor - in case you/they need to move things around later. put stuff like $logdir in a "configuration" or "constants" section at the top of the script to make them easy to find and change later. more general comments: at a guess, i'd say that "$ip" is probably the IP address of the remote client and that the OP wants to have a separate log file per IP address. it's hard to imagine why that would or could be a good idea. IMO, it's better to write to just one log file and include sufficient information in the log entries that you can extract whatever you need from it later with grep or some post-processing script. hundreds or thousands of little log files just makes for clutter, and makes management of the log files (e.g. daily or weekly rotation) more difficult. also, on some filesystem, it seriously impacts performance because having thousands of little files in one directory slows down all file access in that directory. craig -- craig sanders From scottp at dd.com.au Wed May 28 17:17:06 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 10:17:06 +1000 Subject: [Melbourne-pm] Data::Token References: Message-ID: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> Hey Guys Do you find you have to create unique and secure tokens? I keep finding that. The conflict we face is that unique tokens are easy with Data::UUID but they are predictable and therefore no good for authentication or other secure tokens. So the usual practice is to add a secret and take an MD5 of that number. The down side of that is they are no longer guaranteed unique (although my understanding of MD5 is that the closer the original string the further away the MD5). Anyway, the point is the algorithm you use tends to be simple, but often repeated, and may change as one learns issues (such as what to use as a secret seed, or better alternatives to MD5 etc). So I have created Data::Token, which you can run like this: perl -MData::Token -e 'print token, qq{\n}' Could you guys have a review of the module and give me some feedback before I stick it on CPAN. Ta Scott -------------- next part -------------- A non-text attachment was scrubbed... Name: Data-Token-0.0.3.tar.gz Type: application/x-gzip Size: 3488 bytes Desc: not available Url : http://mail.pm.org/pipermail/melbourne-pm/attachments/20080529/d8abf9c7/attachment.gz -------------- next part -------------- From jarich at perltraining.com.au Wed May 28 17:22:29 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Thu, 29 May 2008 10:22:29 +1000 Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <20080528230845.GC14155@taz.net.au> References: <200805280554.m4S5s9jB083498@v.abnormal.com> <483CF854.40601@perltraining.com.au> <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au> <483D4A30.1030606@perltraining.com.au> <20080528230845.GC14155@taz.net.au> Message-ID: <483DF745.4060501@perltraining.com.au> Craig Sanders wrote: > in this instance, I'd recommend something very similar, but more like this: > > my $logdir='/home/foo/que'; > my $outfile="$logdir/$ip"; > > if(open(OUT,'>>',$outfile)) { > print OUT "$ip:t:date=",scalar localtime,"\n"; > } > else { > push @error, "Can't open $outfile for append: $!"; > } All good points and I agree entirely. All the best, Jacinta -- ("`-''-/").___..--''"`-._ | Jacinta Richardson | `6_ 6 ) `-. ( ).`-.__.`) | Perl Training Australia | (_Y_.)' ._ ) `._ `. ``-..-' | +61 3 9354 6001 | _..`--'_..-_/ /--'_.' ,' | contact at perltraining.com.au | (il),-'' (li),' ((!.-' | www.perltraining.com.au | From pjf at perltraining.com.au Wed May 28 18:01:30 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Thu, 29 May 2008 11:01:30 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> Message-ID: <483E006A.7070902@perltraining.com.au> G'day Scott, Hashing ======= I notice that Data::Token is using MD5. Unfortunately, we're starting to get very good at engineering MD5 collisions, with http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ as a striking example of this. For Data::Token this could be considered a non-issue, as we just want our tokens to be hard-to-guess, rather than using them as hash of a real documentation. Even so, I'd tend towards SHA1 as a hashing algorithm with less flaws. Randomness ========== Unfortunately, rand(time) isn't very random. When Perl sees the use of rand it will first try to seed its pseudo-random number generate (PRNG) with a good source of entropy, typically from /dev/urandom on modern unixes. On most systems, this gives you at most 32 bits of entropy, since that's all the random seed will take. rand(time) then generates a floating point number between 0 and the seconds from the epoch. This number can be predicted based upon the current time, and our original 32 bits of entropy (which we can brute force). Uniqueness ========== MD5 doesn't guarantee that its output is unique, even though the input has been generated from unique identifiers. It's *very* unlikely that we'll see a collision, but it's still a possibility. Suggestion ========== Rather than pushing our UUID and our random number through MD5, I would suggest a simple concatenation. The UUID guarantees that our resulting string will be unique, and our random number (appropriately encoded) will ensure that it's hard to guess. I would allow the user to supply an argument specifying how many bits of randomness they want, and possibly an argument to specify the quality of that randomness (are we willing to block for good randomness?). I recommend using Crypt::Random from CPAN as a way to get your random numbers. It does the hard work of finding an appropriate source of randomness, including hooking into /dev/u?random, asking PARI, or talking to the entropy gathering daemon (if installed). It also takes size and strength arguments, which can be passed straight through from the user. Further reading =============== I discuss the troubles with generating good random numbers in Perl in chapter 10 of "Perl Security", available from http://perltraining.com.au/notes.html . Feedback and comments appreciated. Cheerio, Paul -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From scottp at dd.com.au Wed May 28 18:21:44 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 11:21:44 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <483E006A.7070902@perltraining.com.au> References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> Message-ID: On 29/05/2008, at 11:01 AM, Paul Fenwick wrote: > G'day Scott, > > Hashing > ======= > > I notice that Data::Token is using MD5. Unfortunately, we're > starting to get very good at engineering MD5 collisions, with http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ > as a striking example of this. For Data::Token this could be > considered a non-issue, as we just want our tokens to be hard-to- > guess, rather than using them as hash of a real documentation. Even > so, I'd tend towards SHA1 as a hashing algorithm with less flaws. Ta I will look at using SHA1 instead. > Randomness > ========== > > Unfortunately, rand(time) isn't very random. When Perl sees the use > of rand it will first try to seed its pseudo-random number generate > (PRNG) with a good source of entropy, typically from /dev/urandom on > modern unixes. On most systems, this gives you at most 32 bits of > entropy, since that's all the random seed will take. rand(time) > then generates a floating point number between 0 and the seconds > from the epoch. This number can be predicted based upon the current > time, and our original 32 bits of entropy (which we can brute force). Most of the algorithms around use a simple text string - "MySecret". This is how things tokens are generated for apache cookies and examples for tokens in PHP and on Perl Monks - but that is silly in a CPAN module, so I thought a bit of randomness. I am open to better random numbers, but even just adding time would be enough, after a hashing to make it different. All systems using a token are always open for brute force attack, and you must still protect against that, by blocking IPs, increased timeout on failed requests etc. This system does just one thing, generate the token, it does not protect it, nor at least in some parts protect against duplicates. The randomness is there to help you not guess the next free number, or at least take 1000s of attempts to do so. Preferably lots more. It is a sad fact that most of the Token code on CPAN and in the wile use things like Database ID, Time stamp or similar to set the token for a cookie :-) Ahhh I see you have a suggestion below, I will try that then. > Uniqueness > ========== > MD5 doesn't guarantee that its output is unique, even though the > input has been generated from unique identifiers. It's *very* > unlikely that we'll see a collision, but it's still a possibility. I assume that SHA1 would be the same, but I think mainly the issue is we are taking a HASH, therefore we are always gong to have a chance of being collision. In the end, I think if you are generating a token it should be checked against the existing ones before returning (I imagine in a life time we would never see a collision, but better safe than sorry). > Suggestion > ========== > Rather than pushing our UUID and our random number through MD5, I > would suggest a simple concatenation. The UUID guarantees that our > resulting string will be unique, and our random number > (appropriately encoded) will ensure that it's hard to guess. I > would allow the user to supply an argument specifying how many bits > of randomness they want, and possibly an argument to specify the > quality of that randomness (are we willing to block for good > randomness?). > > I recommend using Crypt::Random from CPAN as a way to get your > random numbers. It does the hard work of finding an appropriate > source of randomness, including hooking into /dev/u?random, asking > PARI, or talking to the entropy gathering daemon (if installed). It > also takes size and strength arguments, which can be passed straight > through from the user. Good one thanks. I think the module should try and do well with zero input (DWIM) - so I will look at Crypt::Random. But we can always allow input into the function for increased random by passing straight through. Quick question on right format though... the normal case, for most users would be just print token, "\n"; To pass in the higher level of randomness (which I think 999/1000 is unnecessary) what is the best way: * On the line "use Data::Token" * Passed into token "token(...)"; * Set variables - $Data::Token::strength (ok this one sux) * Call methods - Data::Token::strength(...); Thoughts? > Further reading > =============== > I discuss the troubles with generating good random numbers in Perl > in chapter 10 of "Perl Security", available from http://perltraining.com.au/notes.html > . Feedback and comments appreciated. Thanks, I will have a look. Thanks for all your input Paul. I think making it stronger by default is the right approach. It is unlikely this needs to be fast as it is only for generating unique tokens, not for reading them. I think I will also add in a few references, in particular to security talks. And most importantly I should add some comments on checking for uniquness in a token system AND even more important to protect against bruit force attack. Just out of interest, how many people have had to create these tokens and do the same research as above? From the feedback here I guess that this is a worth while module so that the next person does not have to do the same again :-) Scott From daniel at rimspace.net Wed May 28 19:51:01 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Thu, 29 May 2008 12:51:01 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: (Scott Penrose's message of "Thu, 29 May 2008 11:21:44 +1000") References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> Message-ID: <87tzghx1pm.fsf@rimspace.net> Scott Penrose writes: > On 29/05/2008, at 11:01 AM, Paul Fenwick wrote: > >> G'day Scott, >> >> Hashing >> ======= >> >> I notice that Data::Token is using MD5. Unfortunately, we're >> starting to get very good at engineering MD5 collisions, with >> http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ >> as a striking example of this. For Data::Token this could be >> considered a non-issue, as we just want our tokens to be hard-to- >> guess, rather than using them as hash of a real documentation. Even >> so, I'd tend towards SHA1 as a hashing algorithm with less flaws. > > Ta I will look at using SHA1 instead. SHA1 and MD5 are in the same family, and successful attacks on (full) SHA1 have reduced collision generation to 2^69 trials from 2^80. Plan on replacing SHA1 everywhere within the next ten years, and on needing to step up to SHA256 or SHA512 in the interim, at the very least. [...] > Most of the algorithms around use a simple text string - "MySecret". > This is how things tokens are generated for apache cookies and > examples for tokens in PHP and on Perl Monks - but that is silly in a > CPAN module, so I thought a bit of randomness. [...] > It is a sad fact that most of the Token code on CPAN and in the wile > use things like Database ID, Time stamp or similar to set the token > for a cookie :-) ...I agree that your model is substantially better, but I would generally encourage building secure first, then looking at allowing the protection to be weakened later. That way you fail safe rather than depending on programmers to actually have an notion of how to effectively secure the system. [...] > Good one thanks. I think the module should try and do well with zero > input (DWIM) - so I will look at Crypt::Random. But we can always > allow input into the function for increased random by passing straight > through. Allowing the end user to pass in "random" data to increase entropy will, in many cases, result in less entropy included because, frankly, most people don't really understand how to generate that. :/ However, Crypt::Random is a blocking module, and your web server is likely to be fairly entropy constrained[1], so you want to be careful to set the strength of the input to low (Strength => 0) when setting it up. [...] > Thanks for all your input Paul. I think making it stronger by default > is the right approach. It is unlikely this needs to be fast as it is > only for generating unique tokens, not for reading them. Good randomness shouldn't need to be slow, and if you really care seeding a good PRNG (the Mersenne Twister, in Math::Random::MT::*) from Crypt::Random would be fast and effective. (Seeding rand() probably isn't good enough, since it isn't a terribly high quality PRNG in many cases.) > I think I will also add in a few references, in particular to security > talks. And most importantly I should add some comments on checking > for uniquness in a token system AND even more important to protect > against bruit force attack. If you were extending this I would consider an implementation that can answer the key question "Is this my token" in a cryptographically secure fashion, ensuring that you don't need to store the token anywhere. Something like: base64(encrypt(key2, join(':', token, random, key1)), ":", token) You can then verify that the secret part decrypts, contains key1, and matches the public token, without needing to store anything. key1 and key2 can be randomly generated and only need to be stable for the life of the tokens; adding a date to the outside can also help. > Just out of interest, how many people have had to create these tokens > and do the same research as above? From the feedback here I guess that > this is a worth while module so that the next person does not have to > do the same again :-) If there was a good, portable module to produce something like the above, for arbitrary values of 'token', and optionally without exposing token at all, I would be happy. I don't know that is the use case for your module, though, but rather the current module is a component of that larger system. Regards, Daniel ...and now I wait to be pointed at the existing module that does all that for me, because you always learn it exists afterwards. Footnotes: [1] There is very, very little true entropy on a headless server, and very little support for effectively using and *trusting* entropy from a hardware RNG, even if one is present. From scottp at dd.com.au Wed May 28 20:41:55 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 13:41:55 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <87tzghx1pm.fsf@rimspace.net> References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> Message-ID: > SHA1 and MD5 are in the same family, and successful attacks on (full) > SHA1 have reduced collision generation to 2^69 trials from 2^80. > > Plan on replacing SHA1 everywhere within the next ten years, and on > needing to step up to SHA256 or SHA512 in the interim, at the very > least. All the above is correct but not quite for this case. MD5 and SHA1 and up all just decrease how likely collisions are to help against bruit force attack - but for signatures against text. Remember that this is just a way of hiding the secret. What it needs to do is make it so that you need 1000s or more of guesses to get the next entry. Where as doing time (or as shown even rand(time)) is predictable. One of the reasons Cryptography is so hard is you can't apply one rule to another. The MD5 birthday attack scenarios are useful only against documents you are signing. Where as this is just a one way hashing algorithm I need. I could probably use crypt :-) (not really). > [...] > >> Most of the algorithms around use a simple text string - "MySecret". >> This is how things tokens are generated for apache cookies and >> examples for tokens in PHP and on Perl Monks - but that is silly in a >> CPAN module, so I thought a bit of randomness. > > [...] > >> It is a sad fact that most of the Token code on CPAN and in the wile >> use things like Database ID, Time stamp or similar to set the token >> for a cookie :-) > > ...I agree that your model is substantially better, but I would > generally encourage building secure first, then looking at allowing > the > protection to be weakened later. > > That way you fail safe rather than depending on programmers to > actually > have an notion of how to effectively secure the system. Agreed. > > [...] > >> Good one thanks. I think the module should try and do well with zero >> input (DWIM) - so I will look at Crypt::Random. But we can always >> allow input into the function for increased random by passing >> straight >> through. > > Allowing the end user to pass in "random" data to increase entropy > will, > in many cases, result in less entropy included because, frankly, most > people don't really understand how to generate that. :/ > > However, Crypt::Random is a blocking module, and your web server is > likely to be fairly entropy constrained[1], so you want to be > careful to > set the strength of the input to low (Strength => 0) when setting it > up. We don't need to create the secret every time, that can be generated once and kept in memory (yes that is safe, it is not a crypt key, just a means for making the token unpredictable). However that would only work if you are using mod_perl or similar. But as for inputs - I intend to not give the user any inputs, but do it to the security good enough. Rather than provide a flexible module that does everything, this will just do one thing well. Then as issues arise, SHA-1 becomes no good, better randomness is required - I just change it. > If you were extending this I would consider an implementation that can > answer the key question "Is this my token" in a cryptographically > secure > fashion, ensuring that you don't need to store the token anywhere. That is a great idea, but not for this module I think. I will consider though a way of supporting it. The problem is of course you must keep your secret. A long time secret is vulnerable. In the end though, a token really needs to be stored, so you can always just look it up. Nice idea though, good for form processing. On another topic - Security of using MD5 - it seems that every module I find on the net from Java to PHP to Python to Perl are using what I originally wrote - MD5 of a random string (usually time) against a unique number (often just generated with a sequence, time or combination of time, ip etc). The most common PHP code is $token = md5(uniqid(rand(), TRUE)); uniqid is equiv to Data::UUID (different way of calculating). Even the praised Apache::Session and CGI::Session just use: md5_hex($$, time(), rand(time)); I can't find a single reference on the net that says this is insecure as has been documented in this thread. Some people raise in threads that you should use SHA1 and in each case it is said not to be required. So the question is: 1) Am I missing the threads on the net 2) Are we jumping to the wrong conclusion because we are mixing document signature faking with unpredictability 3) Is this really a problem and we are the first to really solve it. My gut is now telling me (2). If it is not then almost every single site on the internet is now vulnerable. Note also that the PHP, Apache::Session, CGI::Session. Even Apache::AuthCookie just uses md5_hex($date, $PID, $PAC); I can't find a single example on the net that does not use MD5, except the insecure ones. Scott From pat at patspam.com Wed May 28 21:23:06 2008 From: pat at patspam.com (Patrick Donelan) Date: Thu, 29 May 2008 14:23:06 +1000 Subject: [Melbourne-pm] Internationali[sz]ation Message-ID: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com> Hello my fellow monguers, I've been designing the API for some code I'm contributing to an open source project (WebGUI), and I've been mulling over the use of English alternative spelling in my code/documentation/file names/etc.. The majority of developers on this project are based in America, and while I'm no zealot when it comes to preserving the Queen's English I do find it takes a certain amount of effort to not start convulsing whenever I encounter the word "instanciate" in the source. So, I'm wondering what to do in my code. My heart just isn't in the 'z' and I miss the absent 'u's, but then again I've long since gotten used to writing "color" tags in html, so should I just bite the bullet and name my "authorization" methods accordingly? What do you do when you are involved in international projects? Should I just shut my eyes to it and think of the developers from non English-speaking backgrounds? Or just shut my eyes and think about Engla.. oh wait that's not right. Patrick http://patspam.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080529/4ee3eaa9/attachment.html From pjf at perltraining.com.au Wed May 28 21:53:16 2008 From: pjf at perltraining.com.au (Paul Fenwick) Date: Thu, 29 May 2008 14:53:16 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> Message-ID: <483E36BC.6020806@perltraining.com.au> G'day Scott/MPM, Scott Penrose wrote: [much snippage, apologies, I've got a deadline today] > 1) Am I missing the threads on the net > 2) Are we jumping to the wrong conclusion because we are mixing document > signature faking with unpredictability > 3) Is this really a problem and we are the first to really solve it. > My gut is now telling me (2). If it is not then almost every single site > on the internet is now vulnerable. (2a). The ability to engineer collisions with MD5 can be considered a non-issue because we're not signing documents, the only requirement is that the hash is *hard to guess*. In this sense, we're using MD5 as a way to distribute our entropy throughout a reasonably long string. MD5 (or SHA1, or ROT13) won't increase the entropy that we have, but it can increase the work an attacker needs to do, and make it less obvious with regards to the data we're using to generate the hash to begin with. The result is that the hashes are "good enough" for most applications. Yes, all the hash algorithms can result in collisions, but the possibility of such a collision coming out of our random session generator is vanishingly small. With regards to the entropy problem, we may have a session hash that has perhaps 32 bits of entropy, perhaps from a /dev/urandom seed. It's possible for an attacker to walk through all these values, push them through our hash function, generate a potential session ID, and present it to our server. However: 1) It would be obvious an attack is taking place, with up to 2^32 requests being presented to our server. 2) It would take a long time. Even if an attacker could present 100 hashes per second, it would take almost 500 days to walk the entire keyspace, although for a service with many active sessions, a collision could occur much sooner. 3) They need to hit a hash that's valid at the time it's presented. If sessions time out rapidly, then even walking through the entire keyspace may not result in a hit. 4) The session the attacker gains access to may not be very valuable, as it will almost always be a random user. 5) The service may still require a password before revealing credit card details, transferring money, changing delivery addresses, etc. 6) The service may invalidate a session if it sees the IP address, browser string, etc change, even though the session is active. 7) In most cases, it's much easier to just sniff a hash off the wire if not encrypted, or use other exploits to compromise the user. It's worth noting that tokens with poor randomness stop being "good enough" when you start having lots of sessions, or sessions which are active for a long time, or a very valuable prize for breaking a session. I'd expect the session generation for on-line banking to contain significantly more entropy, and be significantly more paranoid than the session generation for my delicious bookmarks. Heck, even eBay wants your password via https whenever you do something that an attacker may even find modestly valuable (selling/buying/changing details). Having said all that, we're going to generate tokens, and we have the stated goals of wanting them to be unique, and wanting them to be hard to guess. I don't see there being much harm in making sure they're absolutely unique, and *really* hard to guess if that doesn't cost us very much[1]. Cheerio, Paul [2] As Daniel has pointed out, blocking for entropy is likely to be costing us too much, so asking Crypt::Random to be non-blocking is a great default. -- Paul Fenwick | http://perltraining.com.au/ Director of Training | Ph: +61 3 9354 6001 Perl Training Australia | Fax: +61 3 9354 2681 From simon at unisolve.com.au Wed May 28 22:16:01 2008 From: simon at unisolve.com.au (Simon Taylor) Date: Thu, 29 May 2008 15:16:01 +1000 Subject: [Melbourne-pm] Internationali[sz]ation In-Reply-To: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com> References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com> Message-ID: <483E3C11.6060908@unisolve.com.au> Hello Patrick, > Hello my fellow monguers, ;-) > I've been designing the API for some code I'm contributing to an open > source project (WebGUI), and I've been mulling over the use of English > alternative spelling in my code/documentation/file names/etc.. > > The majority of developers on this project are based in America, and > while I'm no zealot when it comes to preserving the Queen's English I > do find it takes a certain amount of effort to not start convulsing > whenever I encounter the word "instanciate" in the source. > > So, I'm wondering what to do in my code. My heart just isn't in the > 'z' and I miss the absent 'u's, but then again I've long since gotten > used to writing "color" tags in html, so should I just bite the bullet > and name my "authorization" methods accordingly? What do you do when > you are involved in international projects? Should I just shut my eyes > to it and think of the developers from non English-speaking > backgrounds? Or just shut my eyes and think about Engla.. oh wait > that's not right. I have thought about this long and hard and my 10c worth is that our cultural inclination to feel protective of British spelling is mis-placed. Of course it *is* right to do all we can to stop US culture rampaging across the things we hold dear, whether it's our local films, the businesses we buy from, the authors we read or our football. But I'm firmly of the view that we could switch to US English tomorrow and not miss out on a single cultural thing that matters. Culture is substrate-neutral, and the things that make our culture better, (IMHO), don't rely on spelling to have the effect they do. It's a peculiar quirk of history that British English has ended up being the odd cousin, with it's French influences and quirky spellings, whilst US English is by far cleaner and more rationale. We moved effortlessly to the metric system because of our culture, (even if we spell 'metre' the French way), and the US has not manged this transition because of theirs. But no matter how you dice it, their spelling is better.... - Simon From peter at machell.net Wed May 28 22:37:16 2008 From: peter at machell.net (Peter Machell) Date: Thu, 29 May 2008 15:37:16 +1000 Subject: [Melbourne-pm] Internationali[sz]ation In-Reply-To: <483E3C11.6060908@unisolve.com.au> References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com> <483E3C11.6060908@unisolve.com.au> Message-ID: On 29/05/2008, at 3:16 PM, Simon Taylor wrote: > We moved effortlessly to the metric system because of our culture, > (even > if we spell 'metre' the French way), and the US > has not manged this transition because of theirs. I don't understand this argument. Nor do I think we moved effortlessly. I'm 36 and was taught metric at school, but still think of small distances in feet and inches and long ones in kilometres, a result of the culture I was raised in. > But no matter how you dice it, their spelling is better.... My opinion is better than yours? Color and Mom are horrible and don't make phonetic sense without the US accent (not that their aren't lots of similar English examples). I can't help but correct Program every time I see it, not to mention almost anything with a z in it. Anyway I agree that our culture wouldn't suffer much if we all submitted to the US way, but isn't the ultimate end of that line of thinking complete Americanizzzation? regards, Peter. From thogard at abnormal.com Wed May 28 22:38:31 2008 From: thogard at abnormal.com (Tim Hogard) Date: Thu, 29 May 2008 05:38:31 +0000 (UTC) Subject: [Melbourne-pm] An intermittent problem with open for append In-Reply-To: <483DDAC6.40408@aapt.net.au> Message-ID: <200805290538.m4T5cVcd052738@v.abnormal.com> > > Paul Fenwick wrote: > > Oops, I meant to qualify that with "provided your records are less than the > > atomic buffer size". You're quite right that if we hit records bigger than > > our atomic buffer, we have to move to locking. > > > Very interesting thread. I had no idea that the kernel can mangle the > output based on block size. However, at least in my tests, there will > be no data lost, but it may be mangled? Early Unix systems had 2 atomic file system calls, one was "open exclusive" and the other was "this data always gets appended". The second needs to be guaranteed for system logs where you will have several things writing to a common log file. In that case the order isn't critical but getting the data in the file is. My solaris internal book describes a write with the O_APPEND bit set as simply setting the "where to write the next block pointer" to the save value as the "file length" before the write. While this is like seeking to the end before a write, it is in the atomic operation section of write so it was a trivial way to guarantee that it works properly every time. What I'm doing about the original problem is: 1) fixing the || vs or vs () bug. 2) logging a failed open >> to someplace else. 3) checking for unexpected signals that might be showing up. 4) rebuilding perl 5.8.8 without the perl's IO abstraction layer. We have two CGIs, one uses /usr/local/bin/perl (5.8.8) and the other uses /usr/bin/perl (5.003) and the only other difference in the perl CGI scrips is one includes the base64 code (which is built into 5.8.8) and uses that function. The one has been running for at least 9 years and millions of times with out every showing this problem, yet the other one get hit about 20,000 times and has had this problem 4 times in the last month. They are both called from the very same apache and the working one still gets used far more often than the broken one. The only major difference is version of perl and other things that are not in scope. I think there is a problem in perlapio. The comments so far seem to fit into one of the following groups: 1) the open || report error The failed open won't report an error... at that location but will later down where the code with || doesn't have any commas in it. 2) >> is overwriting This is to a local Solaris 5 ufs file system. I don't think thats a problem or lots of other people would have all sorts of odd problems. If it was NFS, then I could see it being a problem or if the records were out of order. But since this is just and audit log, all it means is someone has to unscramble the data. (on the working one, I've never seen the data scrambled and it includes about 2k worth of data) 3) Apache signals 4) program not flushing These two might be the case but I do set the $| to flush and it doesn't happen with perl 5.003. It still could be a race condition. And it doesn't rule out those issues working differently in the new perl io abstration layer. Thanks for everyones help. -tim From guy at alchemy.com.au Wed May 28 23:11:57 2008 From: guy at alchemy.com.au (Guy Morton) Date: Thu, 29 May 2008 16:11:57 +1000 Subject: [Melbourne-pm] Internationali[sz]ation In-Reply-To: References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com> <483E3C11.6060908@unisolve.com.au> Message-ID: ize is arguably more correct, and is not really an americanisation. That said, I favour ise, probably for the reasons outlined here: http://www.askoxford.com/asktheexperts/faq/aboutspelling/ize Guy On 29/05/2008, at 3:37 PM, Peter Machell wrote: > On 29/05/2008, at 3:16 PM, Simon Taylor wrote: > >> We moved effortlessly to the metric system because of our culture, >> (even >> if we spell 'metre' the French way), and the US >> has not manged this transition because of theirs. > > I don't understand this argument. Nor do I think we moved > effortlessly. I'm 36 and was taught metric at school, but still think > of small distances in feet and inches and long ones in kilometres, a > result of the culture I was raised in. > >> But no matter how you dice it, their spelling is better.... > > My opinion is better than yours? Color and Mom are horrible and don't > make phonetic sense without the US accent (not that their aren't lots > of similar English examples). I can't help but correct Program every > time I see it, not to mention almost anything with a z in it. > > Anyway I agree that our culture wouldn't suffer much if we all > submitted to the US way, but isn't the ultimate end of that line of > thinking complete Americanizzzation? > > regards, > Peter. > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm From daniel at rimspace.net Wed May 28 23:36:40 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Thu, 29 May 2008 16:36:40 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: (Scott Penrose's message of "Thu, 29 May 2008 13:41:55 +1000") References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> Message-ID: <878wxtvcp3.fsf@rimspace.net> Scott Penrose writes: >> SHA1 and MD5 are in the same family, and successful attacks on (full) >> SHA1 have reduced collision generation to 2^69 trials from 2^80. >> >> Plan on replacing SHA1 everywhere within the next ten years, and on >> needing to step up to SHA256 or SHA512 in the interim, at the very >> least. > > All the above is correct but not quite for this case. MD5 and SHA1 and > up all just decrease how likely collisions are to help against bruit > force attack - but for signatures against text. I am not quite convinced that your response is correct. The issue is that finding two inputs that generate colliding outputs. The document signature case is a situation where the signed document can be replaced with a colliding document and the signature will still validate. > Remember that this is just a way of hiding the secret. What it needs > to do is make it so that you need 1000s or more of guesses to get the > next entry. Where as doing time (or as shown even rand(time)) is > predictable. I guess it depends on what you are using the token for, as Paul correctly pointed out -- MD5 and SHA1 distribute the entropy and make it harder to guess the next item in the sequence, but they don't add any entropy. time, or rand(time), has very, very little entropy, and can often be trivially determined for a network server. > One of the reasons Cryptography is so hard is you can't apply one rule > to another. The MD5 birthday attack scenarios are useful only against > documents you are signing. Where as this is just a one way hashing > algorithm I need. I could probably use crypt :-) (not really). As far as I can tell your design is vulnerable to token forgery -- if someone can mint tokens at will they can abuse your service, correct? Ah. Wait. You are storing generated tokens, so only something that was both generated on the server *and* recorded will be valid, right? Yes, on that basis this isn't a threat: tokens that might be valid but are not minted by your server are not going to grant any access. If you /didn't/ store the token information[1] then you are vulnerable to collisions, on the basis that: 1. Your UUID is (sufficiently) predictable, or you would just use that. 2. Your token comprises sha1(uuid . secret) 3. The attacker can read the source code and determine the model you are using for generating tokens.[2] On this basis we can assume that the attacker can successfully forge UUID generation from your site, then they can find any value secret' such that: sha1(uuid . secret) == sha1(uuid . secret') At that stage they can mint new tokens and abuse your services at will. Hrm. Even with token recording that means they could potentially abuse your service by speculatively generating tokens and then submitting input in the hope that a genuine matching token will be generated. It would probably be easier to just fetch tokens from your system though. :) [...] > On another topic - Security of using MD5 - it seems that every module > I find on the net from Java to PHP to Python to Perl are using what I > originally wrote - MD5 of a random string (usually time) against a > unique number (often just generated with a sequence, time or > combination of time, ip etc). > > The most common PHP code is > $token = md5(uniqid(rand(), TRUE)); > > uniqid is equiv to Data::UUID (different way of calculating). > > Even the praised Apache::Session and CGI::Session just use: > > md5_hex($$, time(), rand(time)); > > I can't find a single reference on the net that says this is insecure > as has been documented in this thread. Security is relative: it would be much easier for me to predict the Apache::Session session ID value than your Data::Token value. It is almost certainly easier to find some other security hole, though, than to brute force that. Social engineering, paying pennies per spam to humans in inexpensive locations, and other technical threats are much more profitable than hacking cryptography today. > Some people raise in threads that you should use SHA1 and in each case > it is said not to be required. Well, I just read checked the code for Apache::AuthCookie to make sure it is insecure, and it is vulnerable to exactly the risk here: It authenticates the values in the cookie with a secret, where the secret is absolutely vulnerable to the generation of collisions. > So the question is: > > 1) Am I missing the threads on the net > 2) Are we jumping to the wrong conclusion because we are mixing document > signature faking with unpredictability > 3) Is this really a problem and we are the first to really solve it. > > My gut is now telling me (2). If it is not then almost every single > site on the internet is now vulnerable. The answer is kind of 3: it is really a problem, with a caveat, and we are absolutely not the first people to solve it.[3] However... [...] Paul Fenwick writes: > (2a). The ability to engineer collisions with MD5 can be considered a > non-issue because we're not signing documents, the only requirement is that > the hash is *hard to guess*. ...this is sometimes the case, and sometimes it isn't. When it isn't (Apache::AuthCookie) then the site really is vulnerable, but. Again, the but is "in the real world...", where the cost of exploiting the MD5 weakness is much higher than exploiting some other weakness. So, yeah. In some cases this doesn't matter, for this reason, but in others it /does/ matter theoretically, but not practically for some years yet. > In this sense, we're using MD5 as a way to distribute our entropy > throughout a reasonably long string. MD5 (or SHA1, or ROT13) won't > increase the entropy that we have, but it can increase the work an > attacker needs to do, and make it less obvious with regards to the > data we're using to generate the hash to begin with. For Data::Token this is probably enough, as Paul says. [...] > It's worth noting that tokens with poor randomness stop being "good enough" > when you start having lots of sessions, or sessions which are active for a > long time, or a very valuable prize for breaking a session. I'd expect the > session generation for on-line banking to contain significantly more entropy, > and be significantly more paranoid than the session generation for my > delicious bookmarks. You would hope, eh? My online banking, which is some of the best I have seen, uses an unsalted SHA1 transformation, making my password vulnerable to a "rainbow table" attack if the SSL protection ever fails. Oh, well. I guess they didn't attend classes the day that the risks of that were discussed. [...] > Having said all that, we're going to generate tokens, and we have the > stated goals of wanting them to be unique, and wanting them to be hard > to guess. I don't see there being much harm in making sure they're > absolutely unique, and *really* hard to guess if that doesn't cost us > very much[1]. For the use case this is probably a more reasonable approach than my more secure comments. Regards, Daniel Footnotes: [1] Which, to my eye, looks like an invitation to an attacker to consume unbounded storage on your server, baring other limitations, but you did note that you address that threat outside the token system in a previous post. [2] This is probably the most unlikely part of this threat model, but essential if you want to consider any real uniqueness from the token. [3] My knowledge of this comes from cryptographic literature, and I didn't design my own security protocol, because I am not /that/ knowledgeable in the area. From scottp at dd.com.au Thu May 29 03:50:34 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 20:50:34 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <878wxtvcp3.fsf@rimspace.net> References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> <878wxtvcp3.fsf@rimspace.net> Message-ID: <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> > At that stage they can mint new tokens and abuse your services at > will. You are no longer talking about tokens. Tokens are unpredictable numbers used for things like authentication and session tracking. They must be stored. What you are talking about is encrypting cookies, or encoding other data into the data back. What we want here is just an id that if someone tries say 100,000 attempts they would fail. As for your entropy comments you are repeating what I said. Which is that the MD5/SHA1 is just a way of hiding the secret. Again, look at every implementation on the net from CGI::Session to PHP - they pretty much all use md5_hex(time, rand(time)); Now I agree, that is predictable, so we fix the randomness and a more unique key. But using SHA1 instead of MD5 does not provide any greater security for tokens - except that they are just longer. But I think I will use it anyway just to make it a little safer. > Hrm. Even with token recording that means they could potentially > abuse > your service by speculatively generating tokens and then submitting > input in the hope that a genuine matching token will be generated. Sorry to ask this Danial, but have you read any of my previous replies. This has all been discussed and pointed out. Any token scheme in the world suffers from the above. Sure you can keep making the space bigger and harder to hit, but in the end you really must push back on failed lookups. The easiest way to do this is as old as password entry, and that is just to add more and more delay. Remember there is no data in this token - it is just a pointer to the local data. > Security is relative: it would be much easier for me to predict the > Apache::Session session ID value than your Data::Token value. Yeah and Apache::Session (also same as Apache::AuthCookie) is used for Authentication. > It is almost certainly easier to find some other security hole, > though, > than to brute force that. Social engineering, paying pennies per spam > to humans in inexpensive locations, and other technical threats are > much > more profitable than hacking cryptography today. Sounds like you are arguing in circles :-) > >> Some people raise in threads that you should use SHA1 and in each >> case >> it is said not to be required. > > Well, I just read checked the code for Apache::AuthCookie to make sure > it is insecure, and it is vulnerable to exactly the risk here: Yeah it seems just about everything is using at best a process ID + time + rand(time) - take the MD5 - not great. > It authenticates the values in the cookie with a secret, where the > secret is absolutely vulnerable to the generation of collisions. Yeah. So what we are discussing here is great. We are making a better token generator, then we will encourage Apache::AuthCookie, CGI::Session and Apache::Session to use it. >> So the question is: >> >> 1) Am I missing the threads on the net >> 2) Are we jumping to the wrong conclusion because we are mixing >> document >> signature faking with unpredictability >> 3) Is this really a problem and we are the first to really solve it. >> >> My gut is now telling me (2). If it is not then almost every single >> site on the internet is now vulnerable. > > The answer is kind of 3: it is really a problem, with a caveat, and we > are absolutely not the first people to solve it.[3] :-) What I really need to do now is capture this discussion into my docs so future people can understand the reason. I might need some help, especially from you Daniel and Paul because I didn't attend those lectures :-) so I may miss something important. ITMT I think it is worth adding SHA1 and better Random secret generators as discussed. That means the ID is 160 characters though, but I still think that is reasonable. Thanks Scott From daniel at rimspace.net Thu May 29 03:56:30 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Thu, 29 May 2008 20:56:30 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> (Scott Penrose's message of "Thu, 29 May 2008 20:50:34 +1000") References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> <878wxtvcp3.fsf@rimspace.net> <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> Message-ID: <87prr5750h.fsf@rimspace.net> Scott Penrose writes: >> At that stage they can mint new tokens and abuse your services at >> will. > > You are no longer talking about tokens. Tokens are unpredictable > numbers used for things like authentication and session tracking. That is a fair point. [...] > Sorry to ask this Daniel, but have you read any of my previous replies. Unfortunately my fairly persistent cold seems to be acting up again, so the odds of my having missed the point seem high. Sorry. It wasn't my attention to waste your time. Regards, Daniel From scottp at dd.com.au Thu May 29 04:01:50 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 21:01:50 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <87prr5750h.fsf@rimspace.net> References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> <878wxtvcp3.fsf@rimspace.net> <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> <87prr5750h.fsf@rimspace.net> Message-ID: <537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au> On 29/05/2008, at 8:56 PM, Daniel Pittman wrote: > > Unfortunately my fairly persistent cold seems to be acting up again, > so > the odds of my having missed the point seem high. Sorry. It wasn't > my > attention to waste your time. Your feedback has been great. And I hope that I can get you to review my documentation when I update it. Ta Scott From daniel at rimspace.net Thu May 29 04:06:41 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Thu, 29 May 2008 21:06:41 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au> (Scott Penrose's message of "Thu, 29 May 2008 21:01:50 +1000") References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> <878wxtvcp3.fsf@rimspace.net> <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> <87prr5750h.fsf@rimspace.net> <537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au> Message-ID: <87lk1t74ji.fsf@rimspace.net> Scott Penrose writes: > On 29/05/2008, at 8:56 PM, Daniel Pittman wrote: >> >> Unfortunately my fairly persistent cold seems to be acting up again, so >> the odds of my having missed the point seem high. Sorry. It wasn't my >> attention to waste your time. > > Your feedback has been great. And I hope that I can get you to review > my documentation when I update it. I do my best. I still feel that I am missing something, probably in how the tokens are going to be used, that makes them less security critical than I perceive. Hopefully updated documentation will make that clear, though, by discussing the sort of role where they are applicable. It certainly wouldn't /hurt/ compared to many of the available modules which simply don't discuss that. Regards, Daniel From scottp at dd.com.au Thu May 29 06:14:12 2008 From: scottp at dd.com.au (Scott Penrose) Date: Thu, 29 May 2008 23:14:12 +1000 Subject: [Melbourne-pm] Data::Token documented Message-ID: <8A6A111E-2B19-49E5-8465-2BF6B076403F@dd.com.au> I have had a go documenting the discussion and outcomes: http://scott.dd.com.au/wiki/Data-Token Some of it I will put into the module directly, but there is too much there for the whole thing. It is my first attempt but feel free to feedback any changes, directly or on list. Scott From scottp at dd.com.au Thu May 29 17:08:54 2008 From: scottp at dd.com.au (Scott Penrose) Date: Fri, 30 May 2008 10:08:54 +1000 Subject: [Melbourne-pm] Alternatives to Crypt::Random ? Message-ID: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au> Hey Guys After all our discussion about using better andomness, I am having major issues with Crypt::Random. It says in the doc it does not depend on Math::Pari, but it does. Unfortunately I can't get Math::Pari to install. This unfortunately moves the module from useful and usable into too difficult for the average person to install. Ahh what is worse, the Crypt::Random on CPAN requires a version of Math::Pari that is not on CPAN. Scott From akievsky at yahoo.com.au Thu May 29 17:14:22 2008 From: akievsky at yahoo.com.au (Andres Kievsky) Date: Thu, 29 May 2008 17:14:22 -0700 (PDT) Subject: [Melbourne-pm] Data::Token documented Message-ID: <467016.14431.qm@web63201.mail.re1.yahoo.com> > I have had a go documenting the discussion and outcomes: > > http://scott.dd.com.au/wiki/Data-Token > > Some of it I will put into the module directly, but there is too much > there for the whole thing. > > It is my first attempt but feel free to feedback any changes, directly > or on list. The documentation is excellent. I wish i had it years ago. "You can also change the token on each request. This is extreme and has quite a bit of overhead but useful. Alternatives may also be to change it over short periods, like 5 minutes." I wholeheartedly agree with that practice :) Regards, - Andres Kievsky. Get the name you always wanted with the new y7mail email address. www.yahoo7.com.au/mail From daniel at rimspace.net Fri May 30 02:50:03 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Fri, 30 May 2008 19:50:03 +1000 Subject: [Melbourne-pm] Alternatives to Crypt::Random ? In-Reply-To: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au> (Scott Penrose's message of "Fri, 30 May 2008 10:08:54 +1000") References: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au> Message-ID: <8763sw3yus.fsf@rimspace.net> Scott Penrose writes: > After all our discussion about using better andomness, I am having > major issues with Crypt::Random. It says in the doc it does not depend > on Math::Pari, but it does. Unfortunately I can't get Math::Pari to > install. > > This unfortunately moves the module from useful and usable into too > difficult for the average person to install. > > Ahh what is worse, the Crypt::Random on CPAN requires a version of > Math::Pari that is not on CPAN. Joy! I presume you are not satisfied with the prerequisites being in the various common distributions; I wouldn't blame you. Math::TrulyRandom looks like an acceptable fallback if you can't read from /dev/random on your platform, although it involves XS code, and is bound to be fairly slow. Math::Random::MT::Auto looks likely to be the best choice, being as it provides a wide range of initialization functions as well as a good PRNG that will delivery considerably higher quality results than the built-in code. Otherwise, looks like you get to write it yourself. Yay. Regards, Daniel From jarich at perltraining.com.au Fri May 30 04:48:51 2008 From: jarich at perltraining.com.au (Jacinta Richardson) Date: Fri, 30 May 2008 21:48:51 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> Message-ID: <483FE9A3.6010707@perltraining.com.au> Scott Penrose wrote: > So the question is: > > 1) Am I missing the threads on the net > 2) Are we jumping to the wrong conclusion because we are mixing > document signature faking with unpredictability > 3) Is this really a problem and we are the first to really solve it. I think it's 3 in so far that many of these modules were written before 17th August 2004 (which is when Xiaoyun Wang,Dengguo Feng, Xuejia Lai, and Hongbo Yu announced collisions for the full MD5 space (Their analytical attack was reported to take only one hour on an IBM p690 cluster.)). Prior to this, the general assumption seemed to be that engineering a collision would be really hard, and finding a collision by accident would be next to impossible. Since not everyone keeps up with cryptography news, people continue to use md5 despite its issues. This is not necessarily because it's a good idea. It may even be as simple as when people think of hashing algorithms the first one that comes to mind is md5. I expect that for the purposes of generating tokens, particularly with the use of a salt, that these issues aren't really a problem. However, if you do so you are choosing to provide a less secure token than you could otherwise. I think in general, using md5 for anything to do with security or with anything which might even be vaguely connected with the idea of security, is looking like a bad idea. Regarding SHA1 and SHA2, "the security of SHA-1 has been somewhat compromised by cryptography researchers. Although no attacks have yet been reported on the SHA-2 variants, they are algorithmically similar to SHA-1 and so efforts are underway to develop improved alternative hashing algorithms." ( http://en.wikipedia.org/wiki/SHA_hash_functions ) All the best, J From daniel at rimspace.net Fri May 30 06:42:03 2008 From: daniel at rimspace.net (Daniel Pittman) Date: Fri, 30 May 2008 23:42:03 +1000 Subject: [Melbourne-pm] Data::Token In-Reply-To: <483FE9A3.6010707@perltraining.com.au> (Jacinta Richardson's message of "Fri, 30 May 2008 21:48:51 +1000") References: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au> <483E006A.7070902@perltraining.com.au> <87tzghx1pm.fsf@rimspace.net> <483FE9A3.6010707@perltraining.com.au> Message-ID: <87iqwv3o44.fsf@rimspace.net> Jacinta Richardson writes: > Scott Penrose wrote: > >> So the question is: >> >> 1) Am I missing the threads on the net >> 2) Are we jumping to the wrong conclusion because we are mixing >> document signature faking with unpredictability >> 3) Is this really a problem and we are the first to really solve it. > > I think it's 3 in so far that many of these modules were written > before 17th August 2004 (which is when Xiaoyun Wang,Dengguo Feng, > Xuejia Lai, and Hongbo Yu announced collisions for the full MD5 space > (Their analytical attack was reported to take only one hour on an IBM > p690 cluster.)). Prior to this, the general assumption seemed to be > that engineering a collision would be really hard, and finding a > collision by accident would be next to impossible. > > Since not everyone keeps up with cryptography news, people continue to > use md5 despite its issues. This is not necessarily because it's a > good idea. It may even be as simple as when people think of hashing > algorithms the first one that comes to mind is md5. > > I expect that for the purposes of generating tokens, particularly with > the use of a salt, that these issues aren't really a problem. > However, if you do so you are choosing to provide a less secure token > than you could otherwise. I think in general, using md5 for anything > to do with security or with anything which might even be vaguely > connected with the idea of security, is looking like a bad idea. Mmmm. I am still trying to work out how to respond to the documentation Scott wrote, but my general feeling is that these tokens *are* used in a security sensitive context, and that token forgery is a genuine risk. As I said previously, though, it probably isn't a significant risk compared to other threats to your deployment: breaking an MD5 session token hash isn't (yet) an economically viable way for most attackers to abuse available services. On that basis the continued use of (compromised) MD5 or (soon to be compromised) SHA1 for the tokens is probably not sufficiently worrying to have to rush into changing them... yet. Like Jacinta, I also expect that Data::Token will be used in security related areas -- Apache::AuthCookie, for example -- even if the documentation *explicitly* states that it isn't suitable. On that basis planning for MD5 and SHA1 cracking being economically viable[1] on day, and having the module cope, is probably a good move. Regards, Daniel Footnotes: [1] If breaking CAPTCHA images is economically viable then stealing sessions by brute-force (or worse) attacks on the token identifying them is going to happen one of these days. One resource the attackers have in spades is CPU time. From thogard at abnormal.com Fri May 30 22:11:43 2008 From: thogard at abnormal.com (Tim Hogard) Date: Sat, 31 May 2008 05:11:43 +0000 (UTC) Subject: [Melbourne-pm] Data::Token In-Reply-To: <87iqwv3o44.fsf@rimspace.net> Message-ID: <200805310511.m4V5BhXo069679@v.abnormal.com> I think the real problem is everyone is missing the point of tokens. They are a risk mitigation device which means the question becomes "what is the risk?" I don't care if your using sha-1, sha-2, md5, md4 md1, crypt or crc as none of them change the security of the token, they only change the level of risk. Key aspects of token are: 1) they must appear random (as in test 100,000 of them for randomness) 2) they must not be guessable (this bit is hard) 3) there must be a process in place to lock out users who attempt to offer bad tokens. Number 3 is the key. If I can give out tokens that use a crc-16 as a hash, then I can offer one of 65,536 random numbers a as hash to your system and will have a 0.00152541338703% chance of getting in, if my other are ok. If your system lets me send and average of 32,000 hashs and then lets me in, you have a major problem. Another issue you need to be concerned with is looking at the one to many relationship the other way around. Assume your token ends up being a 4 digit pin line number. It would take an average of 5,000 guesses to get your pin number but if I was guessing 0001 today and 0002 at for many uses at once, that may change the game. Think about hijacking a grocery stoes pin pad system and just trying 0001 for everyone the first day and 0002 the next and so on... If they get 5,000 customers, how many vaild pins will you have by the end of the month? Now consider that problem in the DDOS coordinated many to many attack... each of a million host is offering 3 bad tokens to your system. What are the odds then? The solution is the user needs to hand you a token and other id with every transaction. Even if your hitting bank high value accounts, what is the cost risk if a valid token is hit? Once you figure in costs of insurance, odds of reversing transactions and time wasted, it doesn't justify the level of hashes typicaly used from an actuarial point of view and the only reason its so secure is that big number crypto is cheap. If your token system doesn't have good odds of keeping people out if its hash just mirrors the input data, you need to find a better way. -tim > > Jacinta Richardson writes: > > Scott Penrose wrote: > > > >> So the question is: > >> > >> 1) Am I missing the threads on the net > >> 2) Are we jumping to the wrong conclusion because we are mixing > >> document signature faking with unpredictability > >> 3) Is this really a problem and we are the first to really solve it. > > > > I think it's 3 in so far that many of these modules were written > > before 17th August 2004 (which is when Xiaoyun Wang,Dengguo Feng, > > Xuejia Lai, and Hongbo Yu announced collisions for the full MD5 space > > (Their analytical attack was reported to take only one hour on an IBM > > p690 cluster.)). Prior to this, the general assumption seemed to be > > that engineering a collision would be really hard, and finding a > > collision by accident would be next to impossible. > > > > Since not everyone keeps up with cryptography news, people continue to > > use md5 despite its issues. This is not necessarily because it's a > > good idea. It may even be as simple as when people think of hashing > > algorithms the first one that comes to mind is md5. > > > > I expect that for the purposes of generating tokens, particularly with > > the use of a salt, that these issues aren't really a problem. > > However, if you do so you are choosing to provide a less secure token > > than you could otherwise. I think in general, using md5 for anything > > to do with security or with anything which might even be vaguely > > connected with the idea of security, is looking like a bad idea. > > Mmmm. I am still trying to work out how to respond to the documentation > Scott wrote, but my general feeling is that these tokens *are* used in a > security sensitive context, and that token forgery is a genuine risk. > > As I said previously, though, it probably isn't a significant risk > compared to other threats to your deployment: breaking an MD5 session > token hash isn't (yet) an economically viable way for most attackers to > abuse available services. > > On that basis the continued use of (compromised) MD5 or (soon to be > compromised) SHA1 for the tokens is probably not sufficiently worrying > to have to rush into changing them... yet. > > > Like Jacinta, I also expect that Data::Token will be used in security > related areas -- Apache::AuthCookie, for example -- even if the > documentation *explicitly* states that it isn't suitable. > > On that basis planning for MD5 and SHA1 cracking being economically > viable[1] on day, and having the module cope, is probably a good move. > > Regards, > Daniel > > Footnotes: > [1] If breaking CAPTCHA images is economically viable then stealing > sessions by brute-force (or worse) attacks on the token identifying > them is going to happen one of these days. One resource the > attackers have in spades is CPU time. > > _______________________________________________ > Melbourne-pm mailing list > Melbourne-pm at pm.org > http://mail.pm.org/mailman/listinfo/melbourne-pm >