From alec.clews at gmail.com  Thu May  1 02:24:38 2008
From: alec.clews at gmail.com (Alec Clews)
Date: Thu, 01 May 2008 19:24:38 +1000
Subject: [Melbourne-pm] Next meeting: 14th May.  LIGHTNING TALKS
In-Reply-To: <48195B2B.6030700@perltraining.com.au>
References: <48192965.7030601@perltraining.com.au>
	<48193686.9040700@rea-group.com> <48195B2B.6030700@perltraining.com.au>
Message-ID: <1209633878.6358.1.camel@seven>

If possible I'd love a quick look at how to use git as a client for
Subversion?


On Thu, 2008-05-01 at 15:54 +1000, Paul Fenwick wrote:
> G'day Toby / MPM,
> 
> Toby Corkindale wrote:
> 
> > It's not strictly Perl, nor a 5 min talk, but I mentioned to Paul at the 
> > last meeting that I could do a quick talk on using Git for source 
> > control, with a quick run-through of how to use it.
> 
> Oh goodness!  You did indeed!  Talking about git could be considered a bit 
> Perlish, since the Perl 5 source is moving to git as its source-control 
> system.  You could make it especially Perlish if you added git-fu on how to 
> run Perl::Critic or something similar before code gets committed.
> 
> > I'm still happy to do a cut-down version in five minutes though, if 
> > people are interested?
> 
> You immediately qualify on the 5-minute version because if we're doing 
> lightning talks, then that's a lightning talk.  ;)
> 
> I'd personally love to see git as a longer talk (unless anyone complains), 
> so I'd propose the git talk as the "feature talk" for the evening, with 
> lightning talks either before or after.
> 
> Cheerio,
> 
> 	Paul
> 


From ddick at aapt.net.au  Thu May  1 02:25:10 2008
From: ddick at aapt.net.au (David Dick)
Date: Thu, 01 May 2008 19:25:10 +1000
Subject: [Melbourne-pm] Next meeting: 14th May.  LIGHTNING TALKS
In-Reply-To: <48192965.7030601@perltraining.com.au>
References: <48192965.7030601@perltraining.com.au>
Message-ID: <48198C76.1000801@aapt.net.au>

Jacinta Richardson wrote:
> The next Melbourne Perl Mongers meeting will be held:
>
> 	6:30pm 14th May
> 	Level 1
> 	172 Flinders St
> 	(just opposite Federation Square)
>
> David Dick and AAPT have kindly volunteered to host us
ummm.... actually the sponsor is Remasys Pty Ltd. :) Everyone will get 
confused if they show up looking for the AAPT offices... :)


From jarich at perltraining.com.au  Thu May  1 18:43:48 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Fri, 02 May 2008 11:43:48 +1000
Subject: [Melbourne-pm] Next meeting: 14th May.  LIGHTNING TALKS
In-Reply-To: <48198C76.1000801@aapt.net.au>
References: <48192965.7030601@perltraining.com.au>
	<48198C76.1000801@aapt.net.au>
Message-ID: <481A71D4.2050604@perltraining.com.au>

David Dick wrote:
> Jacinta Richardson wrote:
>> The next Melbourne Perl Mongers meeting will be held:
>>
>> 	6:30pm 14th May
>> 	Level 1
>> 	172 Flinders St
>> 	(just opposite Federation Square)
>>
>> David Dick and AAPT have kindly volunteered to host us
> ummm.... actually the sponsor is Remasys Pty Ltd. :) Everyone will get 
> confused if they show up looking for the AAPT offices... :)

My mistake, thankyou for the clarification.

	J


-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |

From guy at alchemy.com.au  Thu May  1 19:35:33 2008
From: guy at alchemy.com.au (Guy Morton)
Date: Fri, 2 May 2008 12:35:33 +1000
Subject: [Melbourne-pm] Amazon S3
Message-ID: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au>

Hello perlers

Anyone here had experience using perl and Amazon::S3 to do mysql  
database backups to S3?

I've tried this guy's script as a way to get started, but it no workee:

http://dparrish.com/2008/02/mysql-backup-to-amazon-s3/

It seems to die on the add_bucket command - fails with a file not  
found error...which I don't really understand.

Anyone here got any ideas or pointers?

TIA

Guy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080502/7a51d84e/attachment.html 

From scottp at dd.com.au  Sat May  3 17:31:26 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Sun, 04 May 2008 10:31:26 +1000
Subject: [Melbourne-pm] $SIG{CHLD}
Message-ID: <481D03DE.7090401@dd.com.au>

Hey Dudes

To capture exit values of forked daemons and not end up with a set of 
zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do it 
fully. However once you do you loose the ability to capture the return 
value of a 'system' call - unless you do it the hard way (record in a 
hash the value by process id and then use that and remove it after your 
system call).

Anyway to get this all written down I wrote it on my site, but also as 
partly an open question - is there a better way of doing 'system' which 
does not depend on changes to $SIG{CHLD} or other solutions:

http://scott.dd.com.au/wiki/SIG_CHLD

So anyone know of one?

Scott

From daniel at rimspace.net  Sun May  4 01:23:54 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Sun, 04 May 2008 18:23:54 +1000
Subject: [Melbourne-pm] $SIG{CHLD}
In-Reply-To: <481D03DE.7090401@dd.com.au> (Scott Penrose's message of "Sun, 04
	May 2008 10:31:26 +1000")
References: <481D03DE.7090401@dd.com.au>
Message-ID: <87iqxuqxyd.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:

> To capture exit values of forked daemons and not end up with a set of
> zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do
> it fully. However once you do you loose the ability to capture the
> return value of a 'system' call - unless you do it the hard way
> (record in a hash the value by process id and then use that and remove
> it after your system call).
>
> Anyway to get this all written down I wrote it on my site, but also as
> partly an open question - is there a better way of doing 'system'
> which does not depend on changes to $SIG{CHLD} or other solutions:
>
> http://scott.dd.com.au/wiki/SIG_CHLD
>
> So anyone know of one?

Well, my very strong preference for doing /anything/ related to child
processes is to use the IPC::Run module.  This wraps up a whole bunch of
stuff from a dead simple 'run this' through to a complex 'write to and
read from a filter, looking for specific out' and 'build a pipeline'
stuff.

The interface is sensible, light-weight, and the tool scales very well
from start to finish.  

It also, on investigation, handles $? appropriately internally so that
it does the right stuff as far as I can tell.

You may want to look into it, although it isn't always perfect:

http://www.perlmonks.org/?node_id=674306

Also, not always playing nice with SIG{CHLD} handlers, although this is
very much in the "point gun at foot, pull trigger" style:

http://www.depesz.com/index.php/2008/02/07/failing-ls/

(Answer for those who don't want to read the code below the cut)

Anyway, it should play nicely with existing SIG{CHLD} handlers that are
written such that they don't break random library code and the like, and
certainly beats hand-coding everything.

Regards,
        Daniel

I have not actually tried the Perl co-process support, but everything
else seems solid enough.


The answer is that the install SIG{CHLD} handler will wait for and
collect the exit status from *everything*, which means that my the time
the IPC::Run code in _cleanup (IPC/Run.pm:3157) is called the exit
status is already gone as is the zombie process.

See the manual page (and error code) for the waitpid system call, 

From scottp at dd.com.au  Sun May  4 05:11:04 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Sun, 04 May 2008 22:11:04 +1000
Subject: [Melbourne-pm] $SIG{CHLD}
In-Reply-To: <87iqxuqxyd.fsf@rimspace.net>
References: <481D03DE.7090401@dd.com.au> <87iqxuqxyd.fsf@rimspace.net>
Message-ID: <481DA7D8.7010504@dd.com.au>

Daniel Pittman wrote:
> Scott Penrose <scottp at dd.com.au> writes:
>
>   
>> To capture exit values of forked daemons and not end up with a set of
>> zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do
>> it fully. However once you do you loose the ability to capture the
>> return value of a 'system' call - unless you do it the hard way
>> (record in a hash the value by process id and then use that and remove
>> it after your system call).
>>
>> Anyway to get this all written down I wrote it on my site, but also as
>> partly an open question - is there a better way of doing 'system'
>> which does not depend on changes to $SIG{CHLD} or other solutions:
>>
>> http://scott.dd.com.au/wiki/SIG_CHLD
>>
>> So anyone know of one?
>>     
>
> Well, my very strong preference for doing /anything/ related to child
> processes is to use the IPC::Run module.  This wraps up a whole bunch of
> stuff from a dead simple 'run this' through to a complex 'write to and
> read from a filter, looking for specific out' and 'build a pipeline'
> stuff.
>
> The interface is sensible, light-weight, and the tool scales very well
> from start to finish.  
>
> It also, on investigation, handles $? appropriately internally so that
> it does the right stuff as far as I can tell.
>
> You may want to look into it, although it isn't always perfect:
>
> http://www.perlmonks.org/?node_id=674306
>
> Also, not always playing nice with SIG{CHLD} handlers, although this is
> very much in the "point gun at foot, pull trigger" style:
>
> http://www.depesz.com/index.php/2008/02/07/failing-ls/
>
> (Answer for those who don't want to read the code below the cut)
>
> Anyway, it should play nicely with existing SIG{CHLD} handlers that are
> written such that they don't break random library code and the like, and
> certainly beats hand-coding everything.
>
> Regards,
>         Daniel
>
> I have not actually tried the Perl co-process support, but everything
> else seems solid enough.
>
> 
>
> The answer is that the install SIG{CHLD} handler will wait for and
> collect the exit status from *everything*, which means that my the time
> the IPC::Run code in _cleanup (IPC/Run.pm:3157) is called the exit
> status is already gone as is the zombie process.
>   
THAT'S IT ! I knew I had seen a module around that I had used before, 
and I could not find it on CPAN, or remember it - sometimes getting the 
name right is tricky (better search for CPAN is another topic).

Have you ever spent a day writing a module that does not exist on CPAN, 
only to find at the end of that day you gained enough knowledge to find 
the module that did exist on CPAN :-)

Thanks

Scott
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080504/cec68c11/attachment.html 

From daniel at rimspace.net  Sun May  4 18:18:32 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Mon, 05 May 2008 11:18:32 +1000
Subject: [Melbourne-pm] $SIG{CHLD}
In-Reply-To: <481DA7D8.7010504@dd.com.au> (Scott Penrose's message of "Sun, 04
	May 2008 22:11:04 +1000")
References: <481D03DE.7090401@dd.com.au> <87iqxuqxyd.fsf@rimspace.net>
	<481DA7D8.7010504@dd.com.au>
Message-ID: <87zlr5im53.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:
> Daniel Pittman wrote:
>     Scott Penrose <scottp at dd.com.au> writes:
>
>         To capture exit values of forked daemons and not end up with a set of
>         zombie processes, we need to set $SIG{CHLD} to either 'ignore' or do
>         it fully. 

[...]

>     Well, my very strong preference for doing /anything/ related to child
>     processes is to use the IPC::Run module.  

[...]

> THAT'S IT ! I knew I had seen a module around that I had used before,
> and I could not find it on CPAN, or remember it - sometimes getting
> the name right is tricky (better search for CPAN is another topic).

Mmmm.  For the audience there is also IPC::Run3, which aims to be
IPC::Run without the complexity; I don't advise it because the
complexity doesn't really slow you up or show up 'til you need it in the
larger module.

> Have you ever spent a day writing a module that does not exist on
> CPAN, only to find at the end of that day you gained enough knowledge
> to find the module that did exist on CPAN :-)

Oh, sure.  Happens all the time: I went looking for finance related
modules in CPAN just the other day, couldn't find what I wanted, and
only worked out which root to search when I read the documentation for
writing a plugin to another module...

        Daniel

From tconnors at astro.swin.edu.au  Tue May  6 21:11:23 2008
From: tconnors at astro.swin.edu.au (Tim Connors)
Date: Wed, 7 May 2008 14:11:23 +1000 (EST)
Subject: [Melbourne-pm] case insensitive REs
Message-ID: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>

G'day.

I want the user to be able to supply a -i flag to my program to make 
global case insensitive searching.

Except that when I actually go to perform the RE operation in perl, it 
only takes /i as a modifier.  I can't simply say, where $case contains 
either "i" or "":

  while (/($re)/g$case) {
     ...
  }

since perl complains that I am not allowed to put a variable there:

Scalar found where operator expected at /home/ssi/tconnors/bin/phrasegrep 
line 131, near "/($re)/g$case"
        (Missing operator before $case?)

I would have expected perhaps a global variable in perhaps perlvar(1) 
telling me I could force a global case insensitive match.

The only way around this that I can see is the butt ugly:

if ($case) {
  while (/($re)/gi) {
     ...
  }
} else 
  while (/($re)/g) {
     ...
  }
}

(or perhaps an eval -- ick?)

And obviously, this would be thouroughly stupid.  So what am I doing 
wrong?

-- 
Tim Connors


From jarich at perltraining.com.au  Tue May  6 21:16:09 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Wed, 07 May 2008 14:16:09 +1000
Subject: [Melbourne-pm] case insensitive REs
In-Reply-To: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
References: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
Message-ID: <48212D09.3050209@perltraining.com.au>

Tim Connors wrote:
> G'day.
> 
> I want the user to be able to supply a -i flag to my program to make 
> global case insensitive searching.
> 
> Except that when I actually go to perform the RE operation in perl, it 
> only takes /i as a modifier.  I can't simply say, where $case contains 
> either "i" or "":
> 
>   while (/($re)/g$case) {
>      ...
>   }

You can put modifiers inside regular expressions too.

	while(m/(?i:($re)/g) {
		...
	}

Ever wondered why non-capturing braces look so ugly?  (?: ...  ) now you know!

	Here's a test:

	my $foo = "ABC";
	my $bar = "abc";

	foreach ($foo, $bar) {
	        print "matched $_\n" if m/(?i:A)/;
	}

	matched ABC
	matched abc

All the best,

	J


-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |

From wigs at stirfried.org  Tue May  6 21:20:22 2008
From: wigs at stirfried.org (wigs at stirfried.org)
Date: Wed, 7 May 2008 14:20:22 +1000
Subject: [Melbourne-pm] case insensitive REs
In-Reply-To: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
References: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
Message-ID: <20080507042022.GA9733@stirfried.org>

On Wed, May 07, 2008 at 02:11:23PM +1000, Tim Connors wrote:
> I want the user to be able to supply a -i flag to my program to make 
> global case insensitive searching.
> 
> Except that when I actually go to perform the RE operation in perl, it 
> only takes /i as a modifier.  I can't simply say, where $case contains 
> either "i" or "":
> 
>   while (/($re)/g$case) {
>      ...
>   }

You can include matching operators inside the regex; for example:

   my $pattern = "(?i)foobar";
   if ( /$pattern/ ) { }

This example is taken straight from perldoc perlre, under the 'Extended
Patterns' section.

Cheers,

-- 
Aaron

From cas at taz.net.au  Tue May  6 22:23:58 2008
From: cas at taz.net.au (Craig Sanders)
Date: Wed, 7 May 2008 15:23:58 +1000
Subject: [Melbourne-pm] case insensitive REs
In-Reply-To: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
References: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
Message-ID: <20080507052358.GA14155@taz.net.au>

On Wed, May 07, 2008 at 02:11:23PM +1000, Tim Connors wrote:
> I want the user to be able to supply a -i flag to my program to make 
> global case insensitive searching.
>
> [...]
>
> Scalar found where operator expected at /home/ssi/tconnors/bin/phrasegrep 
> line 131, near "/($re)/g$case"
>         (Missing operator before $case?)
> 
> I would have expected perhaps a global variable in perhaps perlvar(1) 
> telling me I could force a global case insensitive match.
> 
> The only way around this that I can see is the butt ugly:
> 
> if ($case) {
>   while (/($re)/gi) {
>      ...
>   }
> } else 
>   while (/($re)/g) {
>      ...
>   }
> }

NOTE: the following is "Untested but it should work because i've done
similar stuff before and the docs say so too<TM>". remember that
trademark, it's your non-guarantee of quality :)


$re = '(?i)' . $re if ($case);
while (/($re)/g) {
  ...
} 

alternatively:

$mods = 'g';
$mods = 'i' . $mods if ($case);
$re = "(?$mods)$re";

while (/($re)/g) {
  ...
} 


from perlre(1):


    "(?imsx-imsx)"
        One or more embedded pattern-match modifiers, to be turned on
        (or turned off, if preceded by "-") for the remainder of the
        pattern or the remainder of the enclosing pattern group (if
        any). This is particularly useful for dynamic patterns, such as
        those read in from a configuration file, read in as an argument,
        are specified in a table somewhere, etc.  Consider the case that
        some of which want to be case sensitive and some do not.  The
        case insensitive ones need to include merely "(?i)" at the front
        of the pattern.  For example:

            $pattern = "foobar";
            if ( /$pattern/i ) { }

            # more flexible:

            $pattern = "(?i)foobar";
            if ( /$pattern/ ) { }

        These modifiers are restored at the end of the enclosing
        group. For example,

            ( (?i) blah ) \s+ \1

        will match a repeated (including the case!) word "blah" in any
        case, assuming "x" modifier, and no "i" modifier outside this
        group.


also remember: if $re is never going to change during the life of the
program, then you can gain a significant performance boost by using the
"/o" modifier. this compiles the regexp only once, which is very useful
if you're matching the same regexp repeatedly in a loop.

(digression: i just noticed that the /o modifier isn't mentioned in my
perlre man page, but it is discussed in the perlretut man page. odd.
perl v5.8.8)


so:


$re = '(?i)' . $re if ($case);
while (/($re)/go) {
  ...
} 

or:

$mods = 'go';
$mods = 'i' . $mods if ($case);
$re = "(?$mods)$re";

while (/($re)/g) {
  ...
} 


see also perlretut(1). search for the section "Embedding comments and
modifiers in a regular expression".

and just above that is a section on compiling and saving regexps (i.e.
the /o modifier).


craig

-- 
craig sanders <cas at taz.net.au>

BOFH excuse #451:

astropneumatic oscillations in the water-cooling

From cas at taz.net.au  Tue May  6 22:31:05 2008
From: cas at taz.net.au (Craig Sanders)
Date: Wed, 7 May 2008 15:31:05 +1000
Subject: [Melbourne-pm] case insensitive REs
In-Reply-To: <20080507052358.GA14155@taz.net.au>
References: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
	<20080507052358.GA14155@taz.net.au>
Message-ID: <20080507053105.GB14155@taz.net.au>

On Wed, May 07, 2008 at 03:23:58PM +1000, Craig Sanders wrote:

> $mods = 'go';
> $mods = 'i' . $mods if ($case);
> $re = "(?$mods)$re";
> 
> while (/($re)/g) {
>   ...
> } 

doh!  bad cut/paste/edit.  change that final '/g' on the while line to
just '/':

    while (/($re)/) {


repeat for both my examples.

craig

-- 
craig sanders <cas at taz.net.au>

Ninety percent of the politicians give the other ten percent a bad reputation.
		-- Henry Kissinger

From tconnors at astro.swin.edu.au  Wed May  7 03:35:59 2008
From: tconnors at astro.swin.edu.au (Tim Connors)
Date: Wed, 7 May 2008 20:35:59 +1000 (EST)
Subject: [Melbourne-pm] grep independant of newlines (Was Re: case
 insensitive REs)
In-Reply-To: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
References: <Pine.LNX.4.61.0805071404330.3840@pentane.ssi.swin.edu.au>
Message-ID: <Pine.LNX.4.61.0805072027460.23193@radium.ssi.swin.edu.au>

On Wed, 7 May 2008, Tim Connors wrote:

> G'day.
> 
> I want the user to be able to supply a -i flag to my program to make 
> global case insensitive searching.

Yeehaw.

My day job always seems to come back to LaTeX code.  grepping for stuff 
that has been nicely folded at the 72 column mark is a pain, because grep 
usually looks at just the one line.  The sed & awk book had a recipe for 
phrasegrep, looking over two consequetive lines at once.  But had a few 
bugs that I worked around over the years, and if your regexp ought to 
match things over 3 lines, you were out of luck.  Well, it now works :)

Feel free to appropriate as you choose (or this is where I usually get 
told what program I should have been using instead of reinventing the 
wheel :-):

#!/usr/bin/perl -w
# -*- Mode: perl -*-

# $Revision: 1.10 $ $Date: 2008/05/07 10:27:35 $
# $Id: phrasegrep,v 1.10 2008/05/07 10:27:35 tconnors Exp $
# $Header: /home/ssi/tconnors/cvsroot/bin/phrasegrep,v 1.10 2008/05/07 10:27:35 tconnors Exp $
# $RCSfile: phrasegrep,v $

# greps for a re in files without regards for newlines.

use strict;
use warnings;
use Carp::Assert;
use Getopt::Long;
Getopt::Long::Configure ("bundling");
use Pod::Usage;

my $verbose=0;
my $debug=0;
my $colour="tty";
my $case=0;
my $greedy=0;

my $VERSION='$Revision: 1.10 $';
$VERSION=~s/\$[R]evision: ([^ ].*[^ ]) *\$/$1/;
my $DATE='$Date: 2008/05/07 10:27:35 $';
$DATE=~s/\$[D]ate: ([^ ].*[^ ]) *\$/$1/;
my $FILE='$RCSfile: phrasegrep,v $';
$FILE=~s/\$[R]CSfile: ([^ ].*[^ ]),v *\$/$1/;
my $WHAT="greps for a re in files without regards for newlines";

my (@SAVEARGV)=@ARGV;

sub isNum($) {
  ($_[0] =~ /^[+-]?\d+$/);
}

my $getOptVerbose = sub {
  my ($junk, $v)=(@_);
  $v=$verbose+1 if ($v eq "");
  die "verbosity level is not a number: $v\n" if (!isNum $v);
  $verbose=$v;
};

my $getOptDebug = sub {
  my ($junk, $d)=(@_);
  $d=$debug+1 if ($d eq "");
  die "debug level is not a number: $d\n" if (!isNum $d);
  $debug=$d;
};

my $getOptColour = sub {
  my ($junk, $c)=(@_);
  $c=1 if ($c eq "");   #could also be "tty"
  $colour=$c;
};

my ($opt_help, $opt_man, $opt_version);
my $result = GetOptions ('colour:s' => $getOptColour,
                         'debug:s' => $getOptDebug,
                         'verbose:s' => $getOptVerbose,
                         'c' => sub { $colour=1 },
                         'd' => sub { $debug++},
                         'v' => sub { $verbose++ },
                         'nocolour' => sub { $colour = 0 },
                         'i|case!' => \$case,
                         'g|greedy!' => \$greedy,
                         'help|?|h' => \$opt_help,
                         'man' => \$opt_man,
                         'version|V' => \$opt_version,
                        ) || pod2usage(2);


pod2usage(1) if ($opt_help);
pod2usage(-verbose => 2) if ($opt_man);
#pod2usage(-verbose => 0) if ($opt_version);
if ($opt_version) {
  print "$FILE ($WHAT) $VERSION ($DATE)\n";
  print "Copyright Tim Connors (2002-2008)\n";
  print "License: GPL\n";
  print "Author(s): Tim Connors <twc+nospam\@thanks+aaocbn.aao.gov.au\n";
  exit 1;
}
## Check for not enough args
pod2usage("$0: Not enough parameters.  Supply at least a regexp\n")  if (@ARGV == 0);

my $re=shift;
my $manyfiles=(@ARGV > 1);
@ARGV='-' if (!@ARGV);

my $colopen="";
my $colclose="";
if ($colour eq "tty") {
  if (-t STDOUT) {
    $colour=1 ;
  } else {
    $colour=0;
  }
}
if ($colour) {
  $colopen="\033[1;31m";
  $colclose="\033[0m";
}

print STDERR "transforming match re from '$re' to " if $verbose;
$re =~ s/ /\\s+/g;                        #spaces in the match always get 
                                          #  transformed into whitespace matches
$re =~ s/([*+])/$1?/g if !$greedy;        #use non greedy matches by default
$re = "(?i)$re" if !$case;                #case insensitive by default
$re = "($re)";
print STDERR "'$re'\n" if $verbose;

foreach my $file (@ARGV) {
  my $incfilename= $manyfiles ? "$file:" : "";
  if (!open(FH, $file)) {
    warn "can't open $file for read";
    next;
  }
  local $/;
  undef $/;  #slurp input files

  my $input = <FH>;
  $_=$input;

  #to be able to match occurences on overlapping lines, log the start
  #and end of the line where each match occurs, as well as, for
  #colouring purposes, where the matches themselves start and end
  my @nlmatch=();
  my @eolmatch=();
  my @startmatch=();
  my @endmatch=();
  while (/$re/goms) {
    #man perlretut(1): "@-" and "@+"
    push @startmatch, $-[0];
    push @endmatch, $+[0];

    my $curpos=$-[0];
    while ($curpos > 0) {
      if ((substr $input, $curpos, 1) eq "\n") {
        $curpos++;
        last;
      }
      $curpos--;
    }
    push @nlmatch, $curpos;
    $curpos=$+[0];
    while ($curpos < length($input)-1) {
      if ((substr $input, $curpos, 1) eq "\n") {
        $curpos--;
        last;
      }
      $curpos++;
    }
    push @eolmatch, $curpos;
  }

  print "nl=@nlmatch\n" if $verbose;
  print "eol=@eolmatch\n" if $verbose;
  print "s=@startmatch\n" if $verbose;
  print "e=@endmatch\n" if $verbose;

  my $curpos;
  my $length;
  $curpos=$nlmatch[0];
  foreach my $i (0.. at nlmatch) {
    #iterate through each of the matches of the regexp, and if a new
    #line, then print the start of the line to the start of the next
    #re, print the colours and that re, then print the line to the
    #next re if same line...

    if (($i>0) && (($i==@nlmatch) || ($nlmatch[$i] != $nlmatch[$i-1]))) {
      print STDERR "new line: $i " if $verbose;
      $length = $eolmatch[$i-1] - $endmatch[$i-1] + 2; #+1 to get the nl
      print STDERR "length: $length\n" if $verbose;
      print substr($input, $curpos, $length);

      $curpos=$nlmatch[$i];
    }
    last if ($i == @nlmatch);

    $length = $startmatch[$i] - $curpos;
    print substr($input, $curpos, $length);
    $curpos += $length;

    print $colopen;
    $length = $endmatch[$i] - $startmatch[$i];
    print substr($input, $curpos, $length);
    $curpos += $length;
    print $colclose;
  }

##/m -- ^/$ becomes start/end of any line
##/s may also be necessary

  #previous attempts:

  #   #this doesn't yet match multiple occurences on overlapping sets of
  #   #lines.  This makes me sad.  The third bracket somehow has to
  #   #exclude the second

  #   while (/(\n?[^\n]*?)($re)([^\n]*?\n?)/msg) {  #$case
  # #  while (/^([^\n]*?)($re)([^\n]*?)$/msg) {  #$case
  #     my $match = "$incfilename$1$colopen$2$colclose$3";
  #     print "$match";
  #   }
}


# $Log: phrasegrep,v $
# Revision 1.10  2008/05/07 10:27:35  tconnors
# licence information
#
# Revision 1.9  2008/05/07 10:24:33  tconnors
# non-greedy match by default
#
# Revision 1.8  2008/05/07 10:14:20  tconnors
# no need to transform \n in a temporary string -- already knew about matching /sm modifiers, but in this iteration of the code, couldnt quite see what I was doing
#
# Revision 1.7  2008/05/07 06:54:06  tconnors
# port to perl, and suck in the entire files at once so can compare over more than 2 lines at a time
#


__END__

=head1 NAME

phrasegrep - greps for a re in files without regards for newlines

=head1 SYNOPSIS

phrasegrep [options] <re> <files>

Options:
[--help|-?|-h]
[--man]
[--version|-V]
[--colour <yes|auto|no>|--nocolour|-c]
[--debug <level>|-d]
[--verbose <level>|-v]
[--case|-i]
[--greedy|-g]

=head1 OPTIONS

=over 8

=item B<--help|-h|-?>

Print a brief help message and exits.

=item B<--man>

Prints the manual page and exits.

=item B<--version|-V>

Prints version information and exits.

=item B<--colour {yes|auto|no}|--nocolour|-c>

STDIO uses colour always, only when STDOUT is a terminal, or never

=item B<--debug {level}|-d>

Sets or increments the debug level.  Current level is 1

=item B<--verbose {level}|-d>

Sets or increments the verbosity level.  Current level is 1

=item B<--case|-i>

Performs a case sensetive regexp search

=item B<--greedy|-g>

Performs the default perl greedy match instead of non greedy

=back

=head1 DESCRIPTION

B<phrasegrep> greps for a re in files without regards for newlines

=cut

-- 
Tim Connors


From jarich at perltraining.com.au  Mon May 12 05:53:19 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Mon, 12 May 2008 22:53:19 +1000
Subject: [Melbourne-pm] SAGE-AU Victorian IT Symposium - Friday 30th May 2008
Message-ID: <48283DBF.6010107@perltraining.com.au>

The SAGE-AU Victorian IT Symposium - Friday 30th May 2008
=========================================================

    Hotel Grand Chancellor
    131 Lonsdale Street
    Melbourne

    Friday 30th May 2008, 9am - 5pm

Book before Friday (16th May) to take advantage of our early bird offer!

The SAGE-AU Victorian IT Symposium is a one day technical conference held in
Melbourne. It is organised by the SAGE-AU Victorian Chapter and aims to provide
an educational forum for systems and network administrators, system managers,
developers and other technical professionals to meet and share their knowledge
and experiences. This is the fifth year running for this event, focusing on a
providing a fast paced stream of technical presentations. Morning and afternoon
teas, and lunch will be provided. Come and spend a day with your peers and share
your knowledge!

Register:

    * Early bird registrations until 16th May 2008
    * Register online at: http://www.sage-au.org.au/display/2008VIC/Registrations

Programme:

    * Evolution of Storage - Cameron Huysmans (Total RISC Technology)
    * EMC Next Generation Products - Shane Moore (EMC)
    * Routing and Security Platforms - Lachlan Kidd (Cisco)
    * Life-cycle Management of Red Hat Enterprise Linux - Michael Wahren (Red Hat)
    * Apple Technology Update - Joseph Cox (Apple)
    * An Illustrated History of Software Failure - Paul Fenwick (Perl Training
Australia)

The SAGE-AU Victorian IT Symposium is proudly supported by our Gold Sponsors Red
Hat, EMC Corporation and Total RISC Technology.

You can find out more details at: http://www.sage-au.org.au/display/2008VIC/Home

From pjf at perltraining.com.au  Mon May 12 22:24:15 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Tue, 13 May 2008 15:24:15 +1000
Subject: [Melbourne-pm] Reminder: Meeting Wednesday (tomorrow) night!
Message-ID: <482925FF.5060806@perltraining.com.au>

G'day Everyone,

It's that time again!  Tomorrow night is Melbourne Perl Mongers night!

When:	Wednesday, 14th May (tomorrow)
	6:30pm

Where:	Remasys
	Level 1
	172 Flinders St
	(Opposite Deferation Square)

Talk:	Toby Corkindale - How awesome is git[1]?

	After discovering that all revision control software sucks,
	Linus Torvalds, inventor of Linux, created the git source
	control system.  Supporting distributed development,
	incredible branching tools, amazing support tools, and
	more distribution mechanisms that you can poke a stick at.
	Git is not only used for source control of the Linux project,
	but also the new source control system for the Perl 5 core.

	Toby will reveal the secrets of how git solved his source
	control headaches, toned his muscles, and gave him a full
	head of hair[2]!

After:	Lightning talks, news, announcements.
	Drinks and dinner for those hungry and/or thirsty.	

Looking forward to seeing you all there!

	Paul

[1] I didn't actually have a real abstract from Toby, so I made it up. 
However it is definitely about git.

[2] Actual results may vary.

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From toby.corkindale at rea-group.com  Tue May 13 20:58:21 2008
From: toby.corkindale at rea-group.com (Toby Corkindale)
Date: Wed, 14 May 2008 13:58:21 +1000
Subject: [Melbourne-pm] MacBook DVI or VGA adaptor?
Message-ID: <482A635D.2090108@rea-group.com>

I've just realised I left the magic video-out adaptor for my MacBook at 
home, but I was going to use it for a talk at the meeting tonight.
 >.<

Does anyone have one who is coming to the meeting?
(PS. Is DVI OK for the data projector?)

Otherwise I'll try and work something else out - I could use another 
laptop, if I may borrow one and ssh out from it, if there's internet 
available.

Or I just run home first and come back into town, and just arrive late. 
Not the end of the world really.

Toby

-- 
Toby Corkindale
Software developer
w: www.rea-group.com
REA Group refers to realestate.com.au Ltd (ASX:REA)

Warning - This e-mail transmission may contain confidential information.
If you have received this transmission in error, please notify us
immediately on (61 3) 9897 1121 or by reply email to the sender. You
must destroy the e-mail immediately and not use, copy, distribute or
disclose the contents.

From toby.corkindale at rea-group.com  Tue May 13 21:01:58 2008
From: toby.corkindale at rea-group.com (Toby Corkindale)
Date: Wed, 14 May 2008 14:01:58 +1000
Subject: [Melbourne-pm] MacBook DVI or VGA adaptor?
In-Reply-To: <482A635D.2090108@rea-group.com>
References: <482A635D.2090108@rea-group.com>
Message-ID: <482A6436.5040601@rea-group.com>

Toby Corkindale wrote:
> I've just realised I left the magic video-out adaptor for my MacBook at 
> home, but I was going to use it for a talk at the meeting tonight.
>  >.<
> 
> Does anyone have one who is coming to the meeting?

Woah. Perlmongers to the rescue in record time!
I now have a borrowed MacBook->VGA (analog, not DVI) adaptor on my desk.

cheers! :D

From wjmoore at gmail.com  Wed May 14 03:07:00 2008
From: wjmoore at gmail.com (Wesley Moore)
Date: Wed, 14 May 2008 21:07:00 +1100
Subject: [Melbourne-pm] Lego USB Flash Drive
Message-ID: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com>

This is a review of the Lego USB flash drives that are being sold by a
Melbourne company that I mentioned at the meeting tonight.

http://forums.mactalk.com.au/20/48480-zip-zip-lego-usb-drive-review.html

From bjdean at bjdean.id.au  Wed May 14 04:54:32 2008
From: bjdean at bjdean.id.au (Bradley Dean)
Date: Wed, 14 May 2008 12:54:32 +0100
Subject: [Melbourne-pm] Amazon S3
In-Reply-To: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au>
References: <2396534B-4B34-4C61-B5A9-416E771B5870@alchemy.com.au>
Message-ID: <20080514115432.GI3704@bjdean.id.au>

Greetings,

On Fri, May 02, 2008 at 12:35:33PM +1000, Guy Morton wrote:
> Hello perlers
> 
> Anyone here had experience using perl and Amazon::S3 to do mysql database
> backups to S3?
> 
> I've tried this guy's script as a way to get started, but it no workee:
> 
> http://dparrish.com/2008/02/mysql-backup-to-amazon-s3/
> 
> It seems to die on the add_bucket command - fails with a file not found
> error...which I don't really understand.

Amazon S3 has fairly restrictive rules on bucket names (including that they
cannot contain upper-case letters). That script tries to create a bucket
called:

 $aws_access_key_id. '-mysql-$hostname'

An access key usually has uppercase characters so this won't work -
incidentally there's not much point naming a bucket with the access key
given that the bucket will be created inside the account defined by that
access key. It's also part of the account credentials so logs containing
the name of buckets will now have half of your login.

Try changing the bucket name to lc('mysql-' . hostname()) and see if that
helps.

Here's the bucket naming restriction docs:

 http://docs.amazonwebservices.com/AmazonS3/2006-03-01/BucketRestrictions.html

Cheerio,

 Brad

> 
> Anyone here got any ideas or pointers?
> 
> TIA
> 
> Guy

> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm

-- 
Bradley Dean
Software Engineer - http://bjdean.id.au/
Email: bjdean at bjdean.id.au Skype: skype at bjdean.id.au
Mobile(Aus): +61-413014395 Mobile(UK): +44-7846895073

From pat at patspam.com  Wed May 14 05:37:13 2008
From: pat at patspam.com (Patrick Donelan)
Date: Wed, 14 May 2008 22:37:13 +1000
Subject: [Melbourne-pm] Lego USB Flash Drive
In-Reply-To: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com>
References: <664f64be0805140307n69678db7oc538b0697fc200d1@mail.gmail.com>
Message-ID: <42321ee20805140537w519a11a5k81f16c048e00e28@mail.gmail.com>

And here's the link <http://ejohn.org/blog/processingjs/> to the John
Resig's port of the Processing <http://processing.org/> visualization
language to JavaScript, using the Canvas element, as discussed on the
southern end of the dinner table - as of today the project is being hosted
on github, which dovetails nicely with Toby's presentation :)

Patrick

On Wed, May 14, 2008 at 8:07 PM, Wesley Moore <wjmoore at gmail.com> wrote:

> This is a review of the Lego USB flash drives that are being sold by a
> Melbourne company that I mentioned at the meeting tonight.
>
> http://forums.mactalk.com.au/20/48480-zip-zip-lego-usb-drive-review.html
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080514/b2892902/attachment.html 

From tjc at wintrmute.net  Wed May 14 18:41:12 2008
From: tjc at wintrmute.net (Toby Corkindale)
Date: Thu, 15 May 2008 11:41:12 +1000
Subject: [Melbourne-pm] Git
Message-ID: <20080515014112.GB2391@roseberry>

Some links relating to the talk last night:

Git's official home is: http://git.or.cz/

Gui tool screenshots, of a better tool than the one I didn't demonstrate well
last night:
http://sourceforge.net/project/screenshots.php?group_id=139897
http://sourceforge.net/project/screenshots.php?group_id=139897&ssid=33925

GitWeb in action: http://git.kernel.org/?p=git/git.git;a=summary

Toby

From jarich at perltraining.com.au  Thu May 15 23:05:13 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Fri, 16 May 2008 16:05:13 +1000
Subject: [Melbourne-pm] OSDC 2008 Sydney (1-5 Dec 2008) - Call for Papers
Message-ID: <482D2419.1050404@perltraining.com.au>

Sorry if this results in a duplicate.  I just haven't seen it around as much as 
I'd like

-------------------------------------------------------------------------
Call for Papers
Open Source Developers' Conference 2008
1st - 5th December 2008, Sydney, Australia

The Open Source Developers' Conference 2008 is a conference run by open source 
developers, for developers and business people. It covers numerous programming 
languages across a range of operating systems, and related topics such as 
business processes, licensing, and strategy. Talks vary from introductory pieces 
through to the deeply technical. It is a great opportunity to meet, share, and
learn with like-minded individuals.

This year, the conference will be held in Sydney, Australia during the first 
week of December (1st - 5th). If you are an Open Source maintainer, developer or 
user, the organising committee would encourage you to submit a talk proposal on 
open source tools, solutions, languages or technologies you are working with.

For more details and to submit your proposal(s), go to:
http://osdc.com.au/2008/papers/cfp.html

If you have any questions or require assistance with your submission, please 
don't hesitate to ask!

We recognise the importance of Open Source in providing a medium for 
collaboration between individuals, researchers, business and government. In 
recognition of this and ensure a high standard of presentations, we intend to 
peer-review all submitted papers.

OSDC 2008 Sydney (Australia) - Key Program Dates:

30 Jun - Initial proposals (short abstract) due
21 Jul - Proposal acceptance
15 Sep - Accepted paper submissions
13 Oct - Reviews completed
27 Oct - Final paper submission cut-off

For all information, contacts and updates, see the OSDC conference
web site at http://osdc.com.au/2008/

Also if you are interested in sponsoring, please see:
http://www.osdc.com.au/2008/sponsors/opportunities.html

Regards

Mark Rees
OSDC 2008 Marketing Co-ordinator

From pjf at perltraining.com.au  Sat May 17 21:12:01 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Sun, 18 May 2008 14:12:01 +1000
Subject: [Melbourne-pm] White Camel nominations are now open
Message-ID: <482FAC91.7020500@perltraining.com.au>

----------  Forwarded Message:  ----------

Subject: [pm_groups] White Camel nominations are now open
Date: Saturday 17 May 2008
From: "Jos? Castro" <jose at pm.org>

Every year, at OSCON, the White Camels are presented.

If you look at the previous winners [1], you'll notice that these are
mostly unsung heroes, like previous awardee Eric Cholet, the human
moderator of so many Perl mailing lists, or Jay Hannah, one of the
people running pm.org [2] (if you ever created/maintained a pm group,
chances are that Jay walked you through the process).

Some of these people may be well known, like Allison Randal or Randal
Schwartz, while others may be complete strangers to at least part of the
globe, like Josh McAdams or Jay. Some of them may be extreme Perl
hackers who created the original JAPH, but they actually received this
award as a recognition for their community contributions to Perl.

That's not to say a great hacker can't receive the award, but you don't
have to be one in order to be eligible.

That being said, the nomination process for the 2008 White Camels is now
open.

If you think there's someone who deserves a White Camel, this is the
time for you to send in your nominations. Send them to jose at pm.org, if
possible with a subject along the lines of "White Camel Nomination ::
$name". Make sure you properly identify the nominee and tell us why you
think that's a worthy nomination.

Don't go thinking "nah, somebody else will do it" because: a) everybody
else may be thinking the same, and b) you may state your case
differently than the next person.

We'll be receiving nominations until June 11, 2008, by midnight, but
don't wait up or you'll forget. Do it now!

Regards,

jac

PS: Please forward as you see fit.

  [1] - http://www.perl.org/advocacy/white_camel/
  [2] - http://pm.org/


-- Jos? Castro TPF Community Relations Leader 
-------------------------------------------------------

From scottp at dd.com.au  Thu May 22 05:58:18 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 22 May 2008 22:58:18 +1000
Subject: [Melbourne-pm] Ahhh... so close
Message-ID: <48356DEA.70308@dd.com.au>

Hey Guys

I have been working on getting a new module written with another perl 
programmer for our gliding club. I decided to do it all the way of Perl 
Best Practice. And it worked beautifully. My tests passed everywhere.

My friend works on Windows using Active State and his code installed and 
tested ok too.

But where did it fall down - IO::Prompt !!!

Normally I would just use something like my $in = <STDIN> to get basic 
input, maybe put it in a loop to make sure you get the data you want. 
But I thought no, lets do the PBP and use prompt.

Since it is a recommended PBP and all the other code we have tried has 
compiled and worked beautifully cross-platform - it seems a shame to 
have this one let us down.

So... do you think we could do a little re-write to make it a little 
more friendly for Win32?

Anyone up for it?

Scott

From jarich at perltraining.com.au  Fri May 23 00:55:33 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Fri, 23 May 2008 17:55:33 +1000
Subject: [Melbourne-pm] The 2008 SAGE-AU Victorian IT Symposium - 1 Week Left
Message-ID: <48367875.2050004@perltraining.com.au>

The 2008 SAGE-AU Victorian IT Symposium - Friday 30th May 2008
=========================================================

    Hotel Grand Chancellor
    131 Lonsdale Street
    Melbourne

    Friday 30th May 2008, 9am - 5pm

The System Administrators Guild of Australia (SAGE-AU) 2008 Victorian IT
Symposium is a one day technical conference held in Melbourne. It is organised
by the SAGE-AU Victorian Chapter and aims to provide an educational forum for
systems and network administrators, system managers, developers and other
technical professionals to meet and share their knowledge and experiences. This
is the fifth year running for this event, focusing on a providing a fast paced
stream of technical presentations. Morning and afternoon teas, and lunch will be
provided. Come and spend a day with your peers and share your knowledge!

Register online at: http://www.sage-au.org.au/display/2008VIC/Registrations

Programme:

* Evolution of Storage - Cameron Huysmans (Total RISC Technology)
* Backup Innovations - Shane Moore (EMC)
* Routing and Security Platforms - Lachlan Kidd (Cisco)
* Life-cycle Management of Red Hat Enterprise Linux - Michael Wahren (Red Hat)
* Apple Technology Update - Joseph Cox (Apple)
* An Illustrated History of Software Failure - Paul Fenwick (Perl Training
Australia)

The SAGE-AU Victorian 2008 IT Symposium is proudly supported by our Gold
Sponsors Red Hat, EMC Corporation and Total RISC Technology.

You can find out more details at: http://www.sage-au.org.au/display/2008VIC/Home


From sisyphus1 at optusnet.com.au  Fri May 23 02:38:52 2008
From: sisyphus1 at optusnet.com.au (Sisyphus)
Date: Fri, 23 May 2008 19:38:52 +1000
Subject: [Melbourne-pm] Ahhh... so close
In-Reply-To: <48356DEA.70308@dd.com.au>
References: <48356DEA.70308@dd.com.au>
Message-ID: <B6ADC305D92648D49BD827301BC9436F@desktop2>


----- Original Message ----- 
From: "Scott Penrose" <scottp at dd.com.au>
.
.
>
> But where did it fall down - IO::Prompt !!!
.
.
> So... do you think we could do a little re-write to make it a little
> more friendly for Win32?
>


Hmmm ... would our (my) re-write have to conform to the recommendations of 
PBP ? ... or can we (I) just write our (my) usual crap code ?

>
> Anyone up for it?
>

Sounds a little bit interesting - though I'm not a big fan of PBP (and, 
undoubtedly, have reams of code to prove it :-)
Maybe just post a demo of the problem, and see where that leads.

Is there anything at 
http://rt.cpan.org/Public/Dist/Display.html?Name=IO-Prompt that raises the 
problem you found ? (Better still, is there anything there that solves the 
problem ?)

Cheers,
Rob 


From rob at cataclysm.cx  Fri May 23 04:35:38 2008
From: rob at cataclysm.cx (Robert Norris)
Date: Fri, 23 May 2008 21:35:38 +1000
Subject: [Melbourne-pm] Ahhh... so close
In-Reply-To: <48356DEA.70308@dd.com.au>
References: <48356DEA.70308@dd.com.au>
Message-ID: <20080523113538.GA29214@plastic.home>

Hi Scott,

> But where did it fall down - IO::Prompt !!!

I guess thats my cue to come out of the wordwork. A couple of years ago
I wrote a patch[1] to add completion and history support to IO::Prompt.
I spoke to Damian about it later and he had a pile of comments about it.

I volunteered to take on maintenance of the module. Shortly after that I
got sidetracked and didn't touch it again until I saw your email this
morning.

Amongst other things, I'm sitting on a patch from a Thomas Glaesser to
make IO::Prompt work on Win32. Its in pretty poor shape though. It has a
pile of control code handling and such that really belongs in
Term::ReadKey.

The first thing I've been trying to do is get a test suite in place,
which is kinda hard as the whole thing is terminal-centric. I've been
writing a module, Test::MockTerm, that fakes a terminal, but its a real
mess at the moment. I should have it into some sort of shape in the next
few days.

Once the test suite is there, work can begin on new features. I want to
shift all the knowledge of how to open, read from and write to the
console on different platforms out to another module. I'm not sure if
thats Term::ReadKey or a whole new package. Once it exists IO::Prompt
can be modified to use it and then Win32 support can happen.

Anyway I'm getting the code into my git repositories[2]. Again, its all
a bit all over the place but it shouldn't take long to get it into some
kind of shape. All help gratefully received :)

Cheers,
Rob.

[1] http://rt.cpan.org/Ticket/Display.html?id=21055
[2] http://cataclysm.cx/git/

From thogard at abnormal.com  Tue May 27 22:54:09 2008
From: thogard at abnormal.com (Tim Hogard)
Date: Wed, 28 May 2008 05:54:09 +0000 (UTC)
Subject: [Melbourne-pm] An intermittent problem with open for append
Message-ID: <200805280554.m4S5s9jB083498@v.abnormal.com>


Hi,

I've got a CGI program that has a problem every once in a while.

The problem code looks like:

open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details";
print OUT "$ip:t:date=",scalar localtime,"\n";
... then it prints to OUT all the rest of ${ENV} and CGI vars.

Sometimes apache will record 2 hits on the page (a double click?)
and most of the time I get two sets of all the data however sometimes
while running perl 5.8.8 I only get the first or second sometimes.
This never happens with perl 5.005_02.

Can anyone explain why perl 5.005 works yet 5.8.8 doesn't?
I was under the impresson that the ">>" means tell the OS to
open in append mode, any data written should go in the file
and not just end up lost.

This is perl, v5.8.8 built for sun4-solaris
This is perl, version 5.005_02 built for sun4-solaris
Solaris 5.5.1 is the OS.

I'm not even which direction to try to debug this problem is its
it only happens once in a million times or so.

I guess I could write a program to produce 3 children and have each of them
open a file and append their PID and hunt for errors or maybe even trace
that the append flag is in fact on (is there an easy way to get that info?)
or maybe its a singal problem where its getting an odd signal.

Any ideas?

-tim


From pjf at perltraining.com.au  Tue May 27 23:14:44 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Wed, 28 May 2008 16:14:44 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <200805280554.m4S5s9jB083498@v.abnormal.com>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
Message-ID: <483CF854.40601@perltraining.com.au>

G'day Tim,

Tim Hogard wrote:

> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details";
> print OUT "$ip:t:date=",scalar localtime,"\n";
> ... then it prints to OUT all the rest of ${ENV} and CGI vars.

Well, if there's a problem opening the file, then I expect you have 
something in @error that may tell you what's wrong, but I'll assume that if 
it was that simple youd' know about it.  So...

> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't?
> I was under the impresson that the ">>" means tell the OS to
> open in append mode, any data written should go in the file
> and not just end up lost.

It absolutely does mean it should append.  My guess is that you may be 
seeing a buffering issue; if something later causes your program to exit 
unexpectedly, it may not have finished writing to the file.

I'd throw a:

	use IO::Handle;

at the top of your code, and a:

	OUT->flush or die "Can't flush OUT: $!";

when you've finished writing a record to your file.  ->flush will force the 
data to be written, and will return false (and should set $!) if there's any 
problems.

> open a file and append their PID and hunt for errors or maybe even trace
> that the append flag is in fact on (is there an easy way to get that info?)
> or maybe its a singal problem where its getting an odd signal.

If you're using strace, you should be able to see the file open with 
O_APPEND as one of the options.  If you have an existing filehandle, you can 
test for O_APPEND using fcntl:

	use Fcntl;

	my $flags = fcntl(MYFILE, F_GETFL, 0);
	print( ($flags & O_APPEND) ? "append" : "not append");

Cheerio,

	Paul

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From tjc at wintrmute.net  Tue May 27 23:22:35 2008
From: tjc at wintrmute.net (Toby Corkindale)
Date: Wed, 28 May 2008 16:22:35 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <200805280554.m4S5s9jB083498@v.abnormal.com>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
Message-ID: <20080528062234.GE16797@roseberry>

On Wed, May 28, 2008 at 05:54:09AM +0000, Tim Hogard wrote:
> 
> Hi,
> 
> I've got a CGI program that has a problem every once in a while.
> 
> The problem code looks like:
> 
> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details";
> print OUT "$ip:t:date=",scalar localtime,"\n";
> ... then it prints to OUT all the rest of ${ENV} and CGI vars.
> 
> Sometimes apache will record 2 hits on the page (a double click?)
> and most of the time I get two sets of all the data however sometimes
> while running perl 5.8.8 I only get the first or second sometimes.
> This never happens with perl 5.005_02.
> 
> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't?
> I was under the impresson that the ">>" means tell the OS to
> open in append mode, any data written should go in the file
> and not just end up lost.

I can't explain why it works on 5.005 and not 5.8.8, but since you have
mentioned it is a very rare occurence, it is possible that it /would/ occur on
5.005 eventually. Maybe the code just runs slower or faster and flukily avoids
a race condition as a result?

Also, it's worth noting that append isn't always safe for use by multiple
processes - it works by seeking to the end of the file before writing, but
according to the man page, this doesn't work reliably on networked file systems
like NFS.

Also - I think Apache will send a signal to the CGIs running, to kill them if
the connection dies - is it is simply a case that when someone double-clicked,
one of the cgi instances was killed before it could write to the logfile?

cheers,
Toby

From mathew.robertson at netratings.com.au  Wed May 28 01:58:06 2008
From: mathew.robertson at netratings.com.au (Mathew Robertson)
Date: Wed, 28 May 2008 18:58:06 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <20080528062234.GE16797@roseberry>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
	<20080528062234.GE16797@roseberry>
Message-ID: <483D1E9E.8090905@netratings.com.au>


>> I've got a CGI program that has a problem every once in a while.
>>
>> The problem code looks like:
>>
>> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details";
>> print OUT "$ip:t:date=",scalar localtime,"\n";
>> ... then it prints to OUT all the rest of ${ENV} and CGI vars.
>>
>> Sometimes apache will record 2 hits on the page (a double click?)
>> and most of the time I get two sets of all the data however sometimes
>> while running perl 5.8.8 I only get the first or second sometimes.
>> This never happens with perl 5.005_02.
>>
>> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't?
>> I was under the impresson that the ">>" means tell the OS to
>> open in append mode, any data written should go in the file
>> and not just end up lost.
>>     
>
> I can't explain why it works on 5.005 and not 5.8.8, but since you have
> mentioned it is a very rare occurence, it is possible that it /would/ occur on
> 5.005 eventually. Maybe the code just runs slower or faster and flukily avoids
> a race condition as a result?
>
> Also, it's worth noting that append isn't always safe for use by multiple
> processes - it works by seeking to the end of the file before writing, but
> according to the man page, this doesn't work reliably on networked file systems
> like NFS.
>   
I suspect this is root of the problem, irrespective of NFS -> the 
webserver is using two instances of the script, to execute the request.

If two processes open the same file for append, they will both succeed.  
Both processes will move their file pointer to the "end of the file" - 
which both happens to be at the same byte offset. One starts 
"print"ing... then the other "print"s -> the second write will clobber 
the first write.

This applies to both mod_perl and CGI environments.  If you want 
cooperative access to a "shared resource, aka the $ip file, then you 
need locking (or something similar).

> Also - I think Apache will send a signal to the CGIs running, to kill them if
> the connection dies - is it is simply a case that when someone double-clicked,
> one of the cgi instances was killed before it could write to the logfile
>   

regards,
Mathew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080528/264862f4/attachment.html 

From ddick at aapt.net.au  Wed May 28 03:03:50 2008
From: ddick at aapt.net.au (David Dick)
Date: Wed, 28 May 2008 20:03:50 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483D1E9E.8090905@netratings.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<20080528062234.GE16797@roseberry>
	<483D1E9E.8090905@netratings.com.au>
Message-ID: <483D2E06.7090208@aapt.net.au>

Mathew Robertson wrote:
> I suspect this is root of the problem, irrespective of NFS -> the 
> webserver is using two instances of the script, to execute the request.
>
> If two processes open the same file for append, they will both 
> succeed.  Both processes will move their file pointer to the "end of 
> the file" - which both happens to be at the same byte offset. One 
> starts "print"ing... then the other "print"s -> the second write will 
> clobber the first write.
no.  actually, NFS is the important factor for appending  Over nfs (at 
least for older versions), O_APPEND is unreliable.  On a local (modern) 
unix filesystem it is a guarantee.  concept is explained by W.R. Stevens 
in Advanced Programming in the UNIX Environment viewable at 
http://www.informit.com/articles/article.aspx?p=99706&seqNum=11


From guy at alchemy.com.au  Tue May 27 23:21:01 2008
From: guy at alchemy.com.au (Guy Morton)
Date: Wed, 28 May 2008 16:21:01 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483CF854.40601@perltraining.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
	<483CF854.40601@perltraining.com.au>
Message-ID: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>

aren't you supposed to use "or" instead of "||" after an open, due to  
operator precedence?

http://perl.plover.com/FAQs/Precedence.html#Precedence_Traps_and_Surprises


On 28/05/2008, at 4:14 PM, Paul Fenwick wrote:

> G'day Tim,
>
> Tim Hogard wrote:
>
>> open OUT,">>/home/foo/que/$ip" || push @error, "Cant save details";
>> print OUT "$ip:t:date=",scalar localtime,"\n";
>> ... then it prints to OUT all the rest of ${ENV} and CGI vars.
>
> Well, if there's a problem opening the file, then I expect you have
> something in @error that may tell you what's wrong, but I'll assume  
> that if
> it was that simple youd' know about it.  So...
>
>> Can anyone explain why perl 5.005 works yet 5.8.8 doesn't?
>> I was under the impresson that the ">>" means tell the OS to
>> open in append mode, any data written should go in the file
>> and not just end up lost.
>
> It absolutely does mean it should append.  My guess is that you may be
> seeing a buffering issue; if something later causes your program to  
> exit
> unexpectedly, it may not have finished writing to the file.
>
> I'd throw a:
>
> 	use IO::Handle;
>
> at the top of your code, and a:
>
> 	OUT->flush or die "Can't flush OUT: $!";
>
> when you've finished writing a record to your file.  ->flush will  
> force the
> data to be written, and will return false (and should set $!) if  
> there's any
> problems.
>
>> open a file and append their PID and hunt for errors or maybe even  
>> trace
>> that the append flag is in fact on (is there an easy way to get  
>> that info?)
>> or maybe its a singal problem where its getting an odd signal.
>
> If you're using strace, you should be able to see the file open with
> O_APPEND as one of the options.  If you have an existing filehandle,  
> you can
> test for O_APPEND using fcntl:
>
> 	use Fcntl;
>
> 	my $flags = fcntl(MYFILE, F_GETFL, 0);
> 	print( ($flags & O_APPEND) ? "append" : "not append");
>
> Cheerio,
>
> 	Paul
>
> -- 
> Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
> Director of Training                   | Ph:  +61 3 9354 6001
> Perl Training Australia                | Fax: +61 3 9354 2681
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm


From scottp at dd.com.au  Wed May 28 04:36:03 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Wed, 28 May 2008 21:36:03 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
	<483CF854.40601@perltraining.com.au>
	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
Message-ID: <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>

There has been comment on is append safe or not.

NFS - Absolutely not. You will need to consider locking (which also  
has issues)

UNIX - Yes no problem BUT you must be under the internal buffer on the  
system, and it is line bound. So multi line insert will not be in  
order, but single lines will. This means you are totally safe doing an  
append with single line log files, locally. Using the Sync and Buffer  
changes Paul suggested won't improve the situation or make it any  
safer. This is because you may have your two scripts hit the file the  
same time - even if exactly the same time, the OS will put both lines  
in without garbaling it - UNLESS you go over the buffer size (not sure  
what that is, but 512 bytes would probably be safe guess).

Windows - Anyone know?

Scott

From jarich at perltraining.com.au  Wed May 28 05:04:00 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Wed, 28 May 2008 22:04:00 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>
	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
Message-ID: <483D4A30.1030606@perltraining.com.au>


Guy Morton wrote:
> aren't you supposed to use "or" instead of "||" after an open, due to  
> operator precedence?
> 
> http://perl.plover.com/FAQs/Precedence.html#Precedence_Traps_and_Surprises

This is correct.  Tim's program will be interpreted as:

	open OUT, (">>/home/foo/que/$ip" || push @error, "Cant save details");
	print OUT "$ip:t:date=",scalar localtime,"\n";

which means that the push will only occur if ">>/home/foo/que/$ip" is false -
which it won't be.  The correct file will be opened for appending however.
Since the program isn't dying on an error, this just means that Tim's
diagnostics will be ignored.  Since he's then going to try printing to the
possibly not-opened file handle ANYWAY, I suspect he doesn't care too much.

In this kind of instance, I'd recommend:

	if(open OUT, ">>/home/foo/que/$ip") {
		print OUT "$ip:t:date=",scalar localtime,"\n";
	}
	else {
		push @error, "Can't save details";
	}

as this ensures both the correct precedence and removes the warning (you have
warnings turned on right?) about printing to an unopened filehandle in the case
of an error.

I don't think this is the cause of Tim's current problem, but it could be the
cause of an error in the future.

All the best,

	J

-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |

From pjf at perltraining.com.au  Wed May 28 05:30:03 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Wed, 28 May 2008 22:30:03 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
	<9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>
Message-ID: <483D504B.80309@perltraining.com.au>

G'day Scott/Tim/MPM,

Scott Penrose wrote:

> UNIX - Yes no problem BUT you must be under the internal buffer on the  
> system, and it is line bound. So multi line insert will not be in  
> order, but single lines will.

I do agree that O_APPEND on a local unix filesystem is atomic provided 
you're within the relevant limit for block IO.  I beg to disagree that it 
has anything to do with *lines*.  As far as your OS and filesystem is 
concerned, a file is just a bunch of bytes.  If you write a 40MB "line" to 
that file, you can be pretty sure it won't be an atomic write.  If you write 
ten "lines" of six characters each, you can be pretty certain it *will* be 
atomic.

The preferred size for block IO for your filesystem can be found in the 11th 
field from Perl's stat() function.  On most systems that corresponds to the 
size of a block on the filesystem, and is typically about 4k on ext2/ext3. 
AFAIK, it should also correspond to the smallest atomic write on your system.

> Using the Sync and Buffer changes Paul suggested won't improve the
 > situation or make it any safer.

My suggestion of forcing writes after we've written a logical record was to 
catch three possible problems:

1) If the data was completely missing from the file, it could be because the 
process is being zapped by a signal.  This could be the case if the 
web-server zaps processes if the connection goes away, as Toby suggested 
earlier in this thread.  Perl doesn't usually flush its buffers when dying 
to a signal, and so we can lose the write.  You can observe this with a 
simple program like:

	use Fatal qw(open);
	open (my $fh, '>>', '/tmp/myfile.log');
	while (<STDIN>) {
		print {$fh} $_;
	}

Type a few lines, and then hit CTRL-C.  You'll discover that myfile.log ends 
up empty.  Tim indicated that he was *missing* data, and being zapped by a 
signal is a possible culprit[1].  That's less likely now that Tim has 
indicated he's unbuffering the whole filehandle (provided this is done 
before it's written to).

2) If we're writing a lot of records, and we're leaving the flushing up to 
stdio, then stdio is free to flush data that intersects a record boundary. 
In this case we can end up with our record being mangled.  You can see this 
in action by taking the above script, and repeatedly pasting a bunch of data 
into it while doing a 'tail -f' on myfile.log.  When your data *does* get 
written to the file, you'll notice that the end of the data written doesn't 
correspond to the end of the data that's been pasted (unless you're pasting 
in blocks which are an exact multiple of your buffer-size).  The last part 
of the data will be written when perl closes its filehandles (after we've 
hit CTRL-D to indicate end-of-input).

This can particularly be a problem with long-running processes that are 
writing to a shared logfile.

3) If we completely unbuffer the filehandle, and then use multiple print()s 
to write our data, then the data from other processes can become 
intermingled with ours, since we'll be flushing after every print().  If 
we're manually calling ->flush() then we can ensure all our data is kept 
together, provided it fits within a single IO block.

> Windows - Anyone know?

Windows append isn't atomic, it's emulated by perl.  It seeks, and then 
writes, meaning you can quite happily end up with race conditions and 
corrupted data if you don't take steps to avoid it (such as locking).

Cheerio,

	Paul

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From scottp at dd.com.au  Wed May 28 05:41:21 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Wed, 28 May 2008 22:41:21 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483D504B.80309@perltraining.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
	<9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>
	<483D504B.80309@perltraining.com.au>
Message-ID: <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au>


On 28/05/2008, at 10:30 PM, Paul Fenwick wrote:

> G'day Scott/Tim/MPM,
>
> Scott Penrose wrote:
>
>> UNIX - Yes no problem BUT you must be under the internal buffer on  
>> the  system, and it is line bound. So multi line insert will not be  
>> in  order, but single lines will.
>
> I do agree that O_APPEND on a local unix filesystem is atomic  
> provided you're within the relevant limit for block IO.  I beg to  
> disagree that it has anything to do with *lines*.  As far as your OS  
> and filesystem is concerned, a file is just a bunch of bytes.  If  
> you write a 40MB "line" to that file, you can be pretty sure it  
> won't be an atomic write.  If you write ten "lines" of six  
> characters each, you can be pretty certain it *will* be atomic.

Quite right. It is the block that matters, what I meant is if you  
write multiple lines you may pass that block size.

So you see this often works:
	print OUT "Some Error line\n";
and this often does not
	print OUT join("\n", @all_my_errors);

Sorry about that.

> 2) If we're writing a lot of records, and we're leaving the flushing  
> up to stdio, then stdio is free to flush data that intersects a  
> record boundary. In this case we can end up with our record being  
> mangled.  You can see this in action by taking the above script, and  
> repeatedly pasting a bunch of data into it while doing a 'tail -f'  
> on myfile.log.  When your data *does* get written to the file,  
> you'll notice that the end of the data written doesn't correspond to  
> the end of the data that's been pasted (unless you're pasting in  
> blocks which are an exact multiple of your buffer-size).  The last  
> part of the data will be written when perl closes its filehandles  
> (after we've hit CTRL-D to indicate end-of-input).

Sorry no, the record will still be mangled. Flushing does not fix  
that. If you are writing something greater than the buffer size the  
only answer is locking, nothing else works.

Your answer above works, only if there is one script writing to the  
log and then you are fixing the internal flusing of the data.

> Windows append isn't atomic, it's emulated by perl.  It seeks, and  
> then writes, meaning you can quite happily end up with race  
> conditions and corrupted data if you don't take steps to avoid it  
> (such as locking).

Typical, I expected that :) Then again it uses threads to emulate  
forks, so maybe not as big a problem :-)

Scott

From pjf at perltraining.com.au  Wed May 28 05:47:54 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Wed, 28 May 2008 22:47:54 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
	<9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>
	<483D504B.80309@perltraining.com.au>
	<239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au>
Message-ID: <483D547A.8080006@perltraining.com.au>

G'day Scott/MPM,

Scott Penrose wrote:

> Sorry no, the record will still be mangled. Flushing does not fix that. 
> If you are writing something greater than the buffer size the only 
> answer is locking, nothing else works.

Oops, I meant to qualify that with "provided your records are less than the 
atomic buffer size".  You're quite right that if we hit records bigger than 
our atomic buffer, we have to move to locking.

Cheerio,

	Paul

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From ddick at aapt.net.au  Wed May 28 15:20:54 2008
From: ddick at aapt.net.au (David Dick)
Date: Thu, 29 May 2008 08:20:54 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483D547A.8080006@perltraining.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>	<9E6C5977-5132-44B8-919C-0558A9B7BEB5@dd.com.au>	<483D504B.80309@perltraining.com.au>	<239C0A7E-2693-43A6-884C-EAC298DEFEB4@dd.com.au>
	<483D547A.8080006@perltraining.com.au>
Message-ID: <483DDAC6.40408@aapt.net.au>

Paul Fenwick wrote:
> Oops, I meant to qualify that with "provided your records are less than the 
> atomic buffer size".  You're quite right that if we hit records bigger than 
> our atomic buffer, we have to move to locking.
>   
Very interesting thread.  I had no idea that the kernel can mangle the 
output based on block size.  However, at least in my tests, there will 
be no data lost, but it may be mangled?


From cas at taz.net.au  Wed May 28 16:08:45 2008
From: cas at taz.net.au (Craig Sanders)
Date: Thu, 29 May 2008 09:08:45 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483D4A30.1030606@perltraining.com.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>
	<483CF854.40601@perltraining.com.au>
	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>
	<483D4A30.1030606@perltraining.com.au>
Message-ID: <20080528230845.GC14155@taz.net.au>

On Wed, May 28, 2008 at 10:04:00PM +1000, Jacinta Richardson wrote:
> In this kind of instance, I'd recommend:
> 
> 	if(open OUT, ">>/home/foo/que/$ip") {
> 		print OUT "$ip:t:date=",scalar localtime,"\n";
> 	}
> 	else {
> 		push @error, "Can't save details";
> 	}

in this instance, i'd recommend something very similar, but more like this:

 	my $logdir='/home/foo/que';
 	my $outfile="$logdir/$ip";

 	if(open(OUT,'>>',$outfile)) {
 		print OUT "$ip:t:date=",scalar localtime,"\n";
 	}
 	else {
 		push @error, "Can't open $outfile for append: $!";
 	}

advantages:

1. 3-argument open() is better practice, especially if there's a chance
   that the filename is based on user input. always using the 3-arg form
   of open() is a good habit to get into.

2. "Can't save details" is a useless error message.  I've updated it
   to say specifically what the problem was - including the filename and
   "$!" aka $OS_ERROR, which is the actual error message returned by the
   operating system.

3. hard-coding directory names is bad.  it's always good to make
   things easy for yourself - or your successor - in case you/they
   need to move things around later.  put stuff like $logdir in
   a "configuration" or "constants" section at the top of the script
   to make them easy to find and change later.


more general comments:

at a guess, i'd say that "$ip" is probably the IP address of the remote
client and that the OP wants to have a separate log file per IP address.

it's hard to imagine why that would or could be a good idea.  IMO, it's
better to write to just one log file and include sufficient information
in the log entries that you can extract whatever you need from it
later with grep or some post-processing script.  

hundreds or thousands of little log files just makes for clutter, and
makes management of the log files (e.g. daily or weekly rotation) more
difficult.

also, on some filesystem, it seriously impacts performance because
having thousands of little files in one directory slows down all file
access in that directory.


craig

-- 
craig sanders <cas at taz.net.au>

From scottp at dd.com.au  Wed May 28 17:17:06 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 10:17:06 +1000
Subject: [Melbourne-pm] Data::Token
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
Message-ID: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>

Hey Guys

Do you find you have to create unique and secure tokens? I keep  
finding that. The conflict we face is that unique tokens are easy with  
Data::UUID but they are predictable and therefore no good for  
authentication or other secure tokens. So the usual practice is to add  
a secret and take an MD5 of that number. The down side of that is they  
are no longer guaranteed unique (although my understanding of MD5 is  
that the closer the original string the further away the MD5).

Anyway, the point is the algorithm you use tends to be simple, but  
often repeated, and may change as one learns issues (such as what to  
use as a secret seed, or better alternatives to MD5 etc).

So I have created Data::Token, which you can run like this:

perl -MData::Token -e 'print token, qq{\n}'

Could you guys have a review of the module and give me some feedback  
before I stick it on CPAN.

Ta

Scott

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Data-Token-0.0.3.tar.gz
Type: application/x-gzip
Size: 3488 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/melbourne-pm/attachments/20080529/d8abf9c7/attachment.gz 
-------------- next part --------------


From jarich at perltraining.com.au  Wed May 28 17:22:29 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Thu, 29 May 2008 10:22:29 +1000
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <20080528230845.GC14155@taz.net.au>
References: <200805280554.m4S5s9jB083498@v.abnormal.com>	<483CF854.40601@perltraining.com.au>	<46565315-B2C5-41CA-9C1E-5C36FF2EDE4E@alchemy.com.au>	<483D4A30.1030606@perltraining.com.au>
	<20080528230845.GC14155@taz.net.au>
Message-ID: <483DF745.4060501@perltraining.com.au>

Craig Sanders wrote:

> in this instance, I'd recommend something very similar, but more like this:
> 
>  	my $logdir='/home/foo/que';
>  	my $outfile="$logdir/$ip";
> 
>  	if(open(OUT,'>>',$outfile)) {
>  		print OUT "$ip:t:date=",scalar localtime,"\n";
>  	}
>  	else {
>  		push @error, "Can't open $outfile for append: $!";
>  	}


All good points and I agree entirely.

All the best,

	Jacinta

-- 
   ("`-''-/").___..--''"`-._          |  Jacinta Richardson         |
    `6_ 6  )   `-.  (     ).`-.__.`)  |  Perl Training Australia    |
    (_Y_.)'  ._   )  `._ `. ``-..-'   |      +61 3 9354 6001        |
  _..`--'_..-_/  /--'_.' ,'           | contact at perltraining.com.au |
 (il),-''  (li),'  ((!.-'             |   www.perltraining.com.au   |

From pjf at perltraining.com.au  Wed May 28 18:01:30 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Thu, 29 May 2008 11:01:30 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
Message-ID: <483E006A.7070902@perltraining.com.au>

G'day Scott,

Hashing
=======

I notice that Data::Token is using MD5.  Unfortunately, we're starting to 
get very good at engineering MD5 collisions, with 
http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ as a 
striking example of this.  For Data::Token this could be considered a 
non-issue, as we just want our tokens to be hard-to-guess, rather than using 
them as hash of a real documentation.  Even so, I'd tend towards SHA1 as a 
hashing algorithm with less flaws.

Randomness
==========

Unfortunately, rand(time) isn't very random.  When Perl sees the use of rand 
it will first try to seed its pseudo-random number generate (PRNG) with a 
good source of entropy, typically from /dev/urandom on modern unixes.  On 
most systems, this gives you at most 32 bits of entropy, since that's all 
the random seed will take.  rand(time) then generates a floating point 
number between 0 and the seconds from the epoch.  This number can be 
predicted based upon the current time, and our original 32 bits of entropy 
(which we can brute force).

Uniqueness
==========
MD5 doesn't guarantee that its output is unique, even though the input has 
been generated from unique identifiers.  It's *very* unlikely that we'll see 
a collision, but it's still a possibility.

Suggestion
==========
Rather than pushing our UUID and our random number through MD5, I would 
suggest a simple concatenation.  The UUID guarantees that our resulting 
string will be unique, and our random number (appropriately encoded) will 
ensure that it's hard to guess.  I would allow the user to supply an 
argument specifying how many bits of randomness they want, and possibly an 
argument to specify the quality of that randomness (are we willing to block 
for good randomness?).

I recommend using Crypt::Random from CPAN as a way to get your random 
numbers.  It does the hard work of finding an appropriate source of 
randomness, including hooking into /dev/u?random, asking PARI, or talking to 
the entropy gathering daemon (if installed).  It also takes size and 
strength arguments, which can be passed straight through from the user.

Further reading
===============
I discuss the troubles with generating good random numbers in Perl in 
chapter 10 of "Perl Security", available from 
http://perltraining.com.au/notes.html .  Feedback and comments appreciated.

Cheerio,

	Paul

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From scottp at dd.com.au  Wed May 28 18:21:44 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 11:21:44 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <483E006A.7070902@perltraining.com.au>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
Message-ID: <B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>


On 29/05/2008, at 11:01 AM, Paul Fenwick wrote:

> G'day Scott,
>
> Hashing
> =======
>
> I notice that Data::Token is using MD5.  Unfortunately, we're  
> starting to get very good at engineering MD5 collisions, with http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ 
>  as a striking example of this.  For Data::Token this could be  
> considered a non-issue, as we just want our tokens to be hard-to- 
> guess, rather than using them as hash of a real documentation.  Even  
> so, I'd tend towards SHA1 as a hashing algorithm with less flaws.

Ta I will look at using SHA1 instead.

> Randomness
> ==========
>
> Unfortunately, rand(time) isn't very random.  When Perl sees the use  
> of rand it will first try to seed its pseudo-random number generate  
> (PRNG) with a good source of entropy, typically from /dev/urandom on  
> modern unixes.  On most systems, this gives you at most 32 bits of  
> entropy, since that's all the random seed will take.  rand(time)  
> then generates a floating point number between 0 and the seconds  
> from the epoch.  This number can be predicted based upon the current  
> time, and our original 32 bits of entropy (which we can brute force).

Most of the algorithms around use a simple text string - "MySecret".  
This is how things tokens are generated for apache cookies and  
examples for tokens in PHP and on Perl Monks - but that is silly in a  
CPAN module, so I thought a bit of randomness.

I am open to better random numbers, but even just adding time would be  
enough, after a hashing to make it different.

All systems using a token are always open for brute force attack, and  
you must still protect against that, by blocking IPs, increased  
timeout on failed requests etc. This system does just one thing,  
generate the token, it does not protect it, nor at least in some parts  
protect against duplicates.

The randomness is there to help you not guess the next free number, or  
at least take 1000s of attempts to do so. Preferably lots more.

It is a sad fact that most of the Token code on CPAN and in the wile  
use things like Database ID, Time stamp or similar to set the token  
for a cookie :-)

Ahhh I see you have a suggestion below, I will try that then.

> Uniqueness
> ==========
> MD5 doesn't guarantee that its output is unique, even though the  
> input has been generated from unique identifiers.  It's *very*  
> unlikely that we'll see a collision, but it's still a possibility.

I assume that SHA1 would be the same, but I think mainly the issue is  
we are taking a HASH, therefore we are always gong to have a chance of  
being collision.

In the end, I think if you are generating a token it should be checked  
against the existing ones before returning (I imagine in a life time  
we would never see a collision, but better safe than sorry).

> Suggestion
> ==========
> Rather than pushing our UUID and our random number through MD5, I  
> would suggest a simple concatenation.  The UUID guarantees that our  
> resulting string will be unique, and our random number  
> (appropriately encoded) will ensure that it's hard to guess.  I  
> would allow the user to supply an argument specifying how many bits  
> of randomness they want, and possibly an argument to specify the  
> quality of that randomness (are we willing to block for good  
> randomness?).
>
> I recommend using Crypt::Random from CPAN as a way to get your  
> random numbers.  It does the hard work of finding an appropriate  
> source of randomness, including hooking into /dev/u?random, asking  
> PARI, or talking to the entropy gathering daemon (if installed).  It  
> also takes size and strength arguments, which can be passed straight  
> through from the user.

Good one thanks. I think the module should try and do well with zero  
input (DWIM) - so I will look at Crypt::Random. But we can always  
allow input into the function for increased random by passing straight  
through.

Quick question on right format though... the normal case, for most  
users would be just

print token, "\n";

To pass in the higher level of randomness (which I think 999/1000 is  
unnecessary) what is the best way:

* On the line "use Data::Token"
* Passed into token "token(...)";
* Set variables - $Data::Token::strength (ok this one sux)
* Call methods - Data::Token::strength(...);

Thoughts?

> Further reading
> ===============
> I discuss the troubles with generating good random numbers in Perl  
> in chapter 10 of "Perl Security", available from http://perltraining.com.au/notes.html 
>  .  Feedback and comments appreciated.

Thanks, I will have a look.

Thanks for all your input Paul. I think making it stronger by default  
is the right approach. It is unlikely this needs to be fast as it is  
only for generating unique tokens, not for reading them. I think I  
will also add in a few references, in particular to security talks.  
And most importantly I should add some comments on checking for  
uniquness in a token system AND even more important to protect against  
bruit force attack.

Just out of interest, how many people have had to create these tokens  
and do the same research as above? From the feedback here I guess that  
this is a worth while module so that the next person does not have to  
do the same again :-)

Scott

From daniel at rimspace.net  Wed May 28 19:51:01 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Thu, 29 May 2008 12:51:01 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au> (Scott Penrose's
	message of "Thu, 29 May 2008 11:21:44 +1000")
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
Message-ID: <87tzghx1pm.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:
> On 29/05/2008, at 11:01 AM, Paul Fenwick wrote:
>
>> G'day Scott,
>>
>> Hashing
>> =======
>>
>> I notice that Data::Token is using MD5.  Unfortunately, we're
>> starting to get very good at engineering MD5 collisions, with
>> http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/
>>  as a striking example of this.  For Data::Token this could be  
>> considered a non-issue, as we just want our tokens to be hard-to-
>> guess, rather than using them as hash of a real documentation.  Even
>> so, I'd tend towards SHA1 as a hashing algorithm with less flaws.
>
> Ta I will look at using SHA1 instead.

SHA1 and MD5 are in the same family, and successful attacks on (full)
SHA1 have reduced collision generation to 2^69 trials from 2^80.

Plan on replacing SHA1 everywhere within the next ten years, and on
needing to step up to SHA256 or SHA512 in the interim, at the very
least.

[...]

> Most of the algorithms around use a simple text string - "MySecret".
> This is how things tokens are generated for apache cookies and
> examples for tokens in PHP and on Perl Monks - but that is silly in a
> CPAN module, so I thought a bit of randomness.

[...]

> It is a sad fact that most of the Token code on CPAN and in the wile
> use things like Database ID, Time stamp or similar to set the token
> for a cookie :-)

...I agree that your model is substantially better, but I would
generally encourage building secure first, then looking at allowing the
protection to be weakened later.

That way you fail safe rather than depending on programmers to actually
have an notion of how to effectively secure the system.

[...]

> Good one thanks. I think the module should try and do well with zero
> input (DWIM) - so I will look at Crypt::Random. But we can always
> allow input into the function for increased random by passing straight
> through.

Allowing the end user to pass in "random" data to increase entropy will,
in many cases, result in less entropy included because, frankly, most
people don't really understand how to generate that. :/

However, Crypt::Random is a blocking module, and your web server is
likely to be fairly entropy constrained[1], so you want to be careful to
set the strength of the input to low (Strength => 0) when setting it up.

[...]

> Thanks for all your input Paul. I think making it stronger by default
> is the right approach. It is unlikely this needs to be fast as it is
> only for generating unique tokens, not for reading them. 

Good randomness shouldn't need to be slow, and if you really care
seeding a good PRNG (the Mersenne Twister, in Math::Random::MT::*) from
Crypt::Random would be fast and effective.

(Seeding rand() probably isn't good enough, since it isn't a terribly
 high quality PRNG in many cases.)

> I think I will also add in a few references, in particular to security
> talks.  And most importantly I should add some comments on checking
> for uniquness in a token system AND even more important to protect
> against bruit force attack.

If you were extending this I would consider an implementation that can
answer the key question "Is this my token" in a cryptographically secure
fashion, ensuring that you don't need to store the token anywhere.

Something like:  

  base64(encrypt(key2, join(':', token, random, key1)), ":", token)

You can then verify that the secret part decrypts, contains key1, and
matches the public token, without needing to store anything.  key1 and
key2 can be randomly generated and only need to be stable for the life
of the tokens; adding a date to the outside can also help.

> Just out of interest, how many people have had to create these tokens  
> and do the same research as above? From the feedback here I guess that  
> this is a worth while module so that the next person does not have to  
> do the same again :-)

If there was a good, portable module to produce something like the
above, for arbitrary values of 'token', and optionally without exposing
token at all, I would be happy.

I don't know that is the use case for your module, though, but rather
the current module is a component of that larger system.

Regards,
        Daniel

...and now I wait to be pointed at the existing module that does all
that for me, because you always learn it exists afterwards.

Footnotes: 
[1]  There is very, very little true entropy on a headless server, and
     very little support for effectively using and *trusting* entropy
     from a hardware RNG, even if one is present.


From scottp at dd.com.au  Wed May 28 20:41:55 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 13:41:55 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <87tzghx1pm.fsf@rimspace.net>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
Message-ID: <ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>

> SHA1 and MD5 are in the same family, and successful attacks on (full)
> SHA1 have reduced collision generation to 2^69 trials from 2^80.
>
> Plan on replacing SHA1 everywhere within the next ten years, and on
> needing to step up to SHA256 or SHA512 in the interim, at the very
> least.

All the above is correct but not quite for this case. MD5 and SHA1 and  
up all just decrease how likely collisions are to help against bruit  
force attack - but for signatures against text. Remember that this is  
just a way of hiding the secret. What it needs to do is make it so  
that you need 1000s or more of guesses to get the next entry. Where as  
doing time (or as shown even rand(time)) is predictable.

One of the reasons Cryptography is so hard is you can't apply one rule  
to another. The MD5 birthday attack scenarios are useful only against  
documents you are signing. Where as this is just a one way hashing  
algorithm I need. I could probably use crypt :-) (not really).

> [...]
>
>> Most of the algorithms around use a simple text string - "MySecret".
>> This is how things tokens are generated for apache cookies and
>> examples for tokens in PHP and on Perl Monks - but that is silly in a
>> CPAN module, so I thought a bit of randomness.
>
> [...]
>
>> It is a sad fact that most of the Token code on CPAN and in the wile
>> use things like Database ID, Time stamp or similar to set the token
>> for a cookie :-)
>
> ...I agree that your model is substantially better, but I would
> generally encourage building secure first, then looking at allowing  
> the
> protection to be weakened later.
>
> That way you fail safe rather than depending on programmers to  
> actually
> have an notion of how to effectively secure the system.

Agreed.

>
> [...]
>
>> Good one thanks. I think the module should try and do well with zero
>> input (DWIM) - so I will look at Crypt::Random. But we can always
>> allow input into the function for increased random by passing  
>> straight
>> through.
>
> Allowing the end user to pass in "random" data to increase entropy  
> will,
> in many cases, result in less entropy included because, frankly, most
> people don't really understand how to generate that. :/
>
> However, Crypt::Random is a blocking module, and your web server is
> likely to be fairly entropy constrained[1], so you want to be  
> careful to
> set the strength of the input to low (Strength => 0) when setting it  
> up.

We don't need to create the secret every time, that can be generated  
once and kept in memory (yes that is safe, it is not a crypt key, just  
a means for making the token unpredictable). However that would only  
work if you are using mod_perl or similar.

But as for inputs - I intend to not give the user any inputs, but do  
it to the security good enough. Rather than provide a flexible module  
that does everything, this will just do one thing well. Then as issues  
arise, SHA-1 becomes no good, better randomness is required - I just  
change it.

> If you were extending this I would consider an implementation that can
> answer the key question "Is this my token" in a cryptographically  
> secure
> fashion, ensuring that you don't need to store the token anywhere.

That is a great idea, but not for this module I think. I will consider  
though a way of supporting it. The problem is of course you must keep  
your secret. A long time secret is vulnerable.

In the end though, a token really needs to be stored, so you can  
always just look it up. Nice idea though, good for form processing.


On another topic - Security of using MD5 - it seems that every module  
I find on the net from Java to PHP to Python to Perl are using what I  
originally wrote - MD5 of a random string (usually time) against a  
unique number (often just generated with a sequence, time or  
combination of time, ip etc).

The most common PHP code is
	$token = md5(uniqid(rand(), TRUE));

uniqid is equiv to Data::UUID (different way of calculating).

Even the praised Apache::Session and CGI::Session just use:

	md5_hex($$, time(), rand(time));

I can't find a single reference on the net that says this is insecure  
as has been documented in this thread. Some people raise in threads  
that you should use SHA1 and in each case it is said not to be required.

So the question is:

1) Am I missing the threads on the net
2) Are we jumping to the wrong conclusion because we are mixing  
document signature faking with unpredictability
3) Is this really a problem and we are the first to really solve it.

My gut is now telling me (2). If it is not then almost every single  
site on the internet is now vulnerable.

Note also that the PHP, Apache::Session, CGI::Session. Even  
Apache::AuthCookie just uses md5_hex($date, $PID, $PAC); I can't find  
a single example on the net that does not use MD5, except the insecure  
ones.

Scott

From pat at patspam.com  Wed May 28 21:23:06 2008
From: pat at patspam.com (Patrick Donelan)
Date: Thu, 29 May 2008 14:23:06 +1000
Subject: [Melbourne-pm] Internationali[sz]ation
Message-ID: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com>

Hello my fellow monguers,

I've been designing the API for some code I'm contributing to an open source
project (WebGUI), and I've been mulling over the use of English alternative
spelling in my code/documentation/file names/etc..

The majority of developers on this project are based in America, and while
I'm no zealot when it comes to preserving the Queen's English I do find it
takes a certain amount of effort to not start convulsing whenever I
encounter the word "instanciate" in the source.

So, I'm wondering what to do in my code. My heart just isn't in the 'z' and
I miss the absent 'u's, but then again I've long since gotten used to
writing "color" tags in html, so should I just bite the bullet and name my
"authorization" methods accordingly? What do you do when you are involved in
international projects? Should I just shut my eyes to it and think of the
developers from non English-speaking backgrounds? Or just shut my eyes and
think about Engla.. oh wait that's not right.

Patrick
http://patspam.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.pm.org/pipermail/melbourne-pm/attachments/20080529/4ee3eaa9/attachment.html 

From pjf at perltraining.com.au  Wed May 28 21:53:16 2008
From: pjf at perltraining.com.au (Paul Fenwick)
Date: Thu, 29 May 2008 14:53:16 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
Message-ID: <483E36BC.6020806@perltraining.com.au>

G'day Scott/MPM,

Scott Penrose wrote:

[much snippage, apologies, I've got a deadline today]

> 1) Am I missing the threads on the net
> 2) Are we jumping to the wrong conclusion because we are mixing document 
> signature faking with unpredictability
> 3) Is this really a problem and we are the first to really solve it.

> My gut is now telling me (2). If it is not then almost every single site 
> on the internet is now vulnerable.

(2a).  The ability to engineer collisions with MD5 can be considered a 
non-issue because we're not signing documents, the only requirement is that 
the hash is *hard to guess*.  In this sense, we're using MD5 as a way to 
distribute our entropy throughout a reasonably long string.  MD5 (or SHA1, 
or ROT13) won't increase the entropy that we have, but it can increase the 
work an attacker needs to do, and make it less obvious with regards to the 
data we're using to generate the hash to begin with.

The result is that the hashes are "good enough" for most applications.  Yes, 
all the hash algorithms can result in collisions, but the possibility of 
such a collision coming out of our random session generator is vanishingly 
small.

With regards to the entropy problem, we may have a session hash that has 
perhaps 32 bits of entropy, perhaps from a /dev/urandom seed.  It's possible 
for an attacker to walk through all these values, push them through our hash 
function, generate a potential session ID, and present it to our server. 
However:

	1) It would be obvious an attack is taking place, with up
	   to 2^32 requests being presented to our server.

	2) It would take a long time.  Even if an attacker could
	   present 100 hashes per second, it would take almost 500
	   days to walk the entire keyspace, although for a service
	   with many active sessions, a collision could occur much
	   sooner.

	3) They need to hit a hash that's valid at the time it's presented.
	   If sessions time out rapidly, then even walking through the
	   entire keyspace may not result in a hit.

	4) The session the attacker gains access to may not be very
	   valuable, as it will almost always be a random user.

	5) The service may still require a password before revealing
	   credit card details, transferring money, changing delivery
	   addresses, etc.

	6) The service may invalidate a session if it sees the IP address,
	   browser string, etc change, even though the session is active.

	7) In most cases, it's much easier to just sniff a hash off
	   the wire if not encrypted, or use other exploits to compromise
	   the user.

It's worth noting that tokens with poor randomness stop being "good enough" 
when you start having lots of sessions, or sessions which are active for a 
long time, or a very valuable prize for breaking a session.  I'd expect the 
session generation for on-line banking to contain significantly more 
entropy, and be significantly more paranoid than the session generation for 
my delicious bookmarks.  Heck, even eBay wants your password via https 
whenever you do something that an attacker may even find modestly valuable 
(selling/buying/changing details).

Having said all that, we're going to generate tokens, and we have the stated 
goals of wanting them to be unique, and wanting them to be hard to guess.  I 
don't see there being much harm in making sure they're absolutely unique, 
and *really* hard to guess if that doesn't cost us very much[1].

Cheerio,

	Paul

[2] As Daniel has pointed out, blocking for entropy is likely to be costing 
us too much, so asking Crypt::Random to be non-blocking is a great default.

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681

From simon at unisolve.com.au  Wed May 28 22:16:01 2008
From: simon at unisolve.com.au (Simon Taylor)
Date: Thu, 29 May 2008 15:16:01 +1000
Subject: [Melbourne-pm] Internationali[sz]ation
In-Reply-To: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com>
References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com>
Message-ID: <483E3C11.6060908@unisolve.com.au>

Hello Patrick,

> Hello my fellow monguers,
;-)

> I've been designing the API for some code I'm contributing to an open 
> source project (WebGUI), and I've been mulling over the use of English 
> alternative spelling in my code/documentation/file names/etc..
>
> The majority of developers on this project are based in America, and 
> while I'm no zealot when it comes to preserving the Queen's English I 
> do find it takes a certain amount of effort to not start convulsing 
> whenever I encounter the word "instanciate" in the source.
>
> So, I'm wondering what to do in my code. My heart just isn't in the 
> 'z' and I miss the absent 'u's, but then again I've long since gotten 
> used to writing "color" tags in html, so should I just bite the bullet 
> and name my "authorization" methods accordingly? What do you do when 
> you are involved in international projects? Should I just shut my eyes 
> to it and think of the developers from non English-speaking 
> backgrounds? Or just shut my eyes and think about Engla.. oh wait 
> that's not right.

I have thought about this long and hard and my 10c worth is that our 
cultural inclination to feel protective of British
spelling is mis-placed.

Of course it *is* right to do all we can to stop US culture rampaging 
across the things we hold dear, whether it's
our local films, the businesses we buy from, the authors we read or our 
football.

But I'm firmly of the view that we could switch to US English tomorrow 
and not miss out on a single cultural thing that
matters.

Culture is substrate-neutral, and the things that make our culture 
better, (IMHO), don't rely on spelling to have the
effect they do.

It's a peculiar quirk of history that British English has ended up being 
the odd cousin, with it's French influences
and quirky spellings, whilst US English is by far cleaner and more 
rationale.

We moved effortlessly to the metric system because of our culture, (even 
if we spell 'metre' the French way), and the US
has not manged this transition because of theirs.

But no matter how you dice it, their spelling is better....

- Simon

 
From peter at machell.net  Wed May 28 22:37:16 2008
From: peter at machell.net (Peter Machell)
Date: Thu, 29 May 2008 15:37:16 +1000
Subject: [Melbourne-pm] Internationali[sz]ation
In-Reply-To: <483E3C11.6060908@unisolve.com.au>
References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com>
	<483E3C11.6060908@unisolve.com.au>
Message-ID: <B52D7CDC-55F0-432D-AD1A-28B7F3CA0AF7@machell.net>

On 29/05/2008, at 3:16 PM, Simon Taylor wrote:

> We moved effortlessly to the metric system because of our culture,  
> (even
> if we spell 'metre' the French way), and the US
> has not manged this transition because of theirs.

I don't understand this argument. Nor do I think we moved  
effortlessly. I'm 36 and was taught metric at school, but still think  
of small distances in feet and inches and long ones in kilometres, a  
result of the culture I was raised in.

> But no matter how you dice it, their spelling is better....

My opinion is better than yours? Color and Mom are horrible and don't  
make phonetic sense without the US accent (not that their aren't lots  
of similar English examples). I can't help but correct Program every  
time I see it, not to mention almost anything with a z in it.

Anyway I agree that our culture wouldn't suffer much if we all  
submitted to the US way, but isn't the ultimate end of that line of  
thinking complete Americanizzzation?

regards,
Peter.

From thogard at abnormal.com  Wed May 28 22:38:31 2008
From: thogard at abnormal.com (Tim Hogard)
Date: Thu, 29 May 2008 05:38:31 +0000 (UTC)
Subject: [Melbourne-pm] An intermittent problem with open for append
In-Reply-To: <483DDAC6.40408@aapt.net.au>
Message-ID: <200805290538.m4T5cVcd052738@v.abnormal.com>

> 
> Paul Fenwick wrote:
> > Oops, I meant to qualify that with "provided your records are less than the 
> > atomic buffer size".  You're quite right that if we hit records bigger than 
> > our atomic buffer, we have to move to locking.
> >   
> Very interesting thread.  I had no idea that the kernel can mangle the 
> output based on block size.  However, at least in my tests, there will 
> be no data lost, but it may be mangled?

Early Unix systems had 2 atomic file system calls, one was "open
exclusive" and the other was "this data always gets appended".  The
second needs to be guaranteed for system logs where you will have
several things writing to a common log file.  In that case the order
isn't critical but getting the data in the file is.

My solaris internal book describes a write with the O_APPEND bit
set as simply setting the "where to write the next block pointer"
to the save value as the "file length" before the write.  While
this is like seeking to the end before a write, it is in the atomic
operation section of write so it was a trivial way to guarantee
that it works properly every time.

What I'm doing about the original problem is:
1) fixing the || vs or vs () bug.
2) logging a failed open >> to someplace else.
3) checking for unexpected signals that might be showing up.
4) rebuilding perl 5.8.8 without the perl's IO abstraction layer.

We have two CGIs, one uses /usr/local/bin/perl (5.8.8) and the other
uses /usr/bin/perl (5.003) and the only other difference in the
perl CGI scrips is one includes the base64 code (which is built
into 5.8.8) and uses that function.

The one has been running for at least 9 years and millions of times
with out every showing this problem, yet the other one get hit about
20,000 times and has had this problem 4 times in the last month.  They
are both called from the very same apache and the working one still gets
used far more often than the broken one.

The only major difference is version of perl and other things that
are not in scope.

I think there is a problem in perlapio.

The comments so far seem to fit into one of the following groups:
1) the open || report error
The failed open won't report an error... at that location but
will later down where the code with || doesn't have any commas in it.

2) >> is overwriting
This is to a local Solaris 5 ufs file system.  I don't think thats a problem
or lots of other people would have all sorts of odd problems.  If it was NFS,
then I could see it being a problem or if the records were out of order.  But
since this is just and audit log, all it means is someone has to unscramble the
data. (on the working one, I've never seen the data scrambled and it includes
about 2k worth of data)

3) Apache signals
4) program not flushing
These two might be the case but I do set the $| to flush and it doesn't
happen with perl 5.003.  It still could be a race condition.  And it doesn't
rule out those issues working differently in the new perl io abstration layer.

Thanks for everyones help.

-tim

From guy at alchemy.com.au  Wed May 28 23:11:57 2008
From: guy at alchemy.com.au (Guy Morton)
Date: Thu, 29 May 2008 16:11:57 +1000
Subject: [Melbourne-pm] Internationali[sz]ation
In-Reply-To: <B52D7CDC-55F0-432D-AD1A-28B7F3CA0AF7@machell.net>
References: <42321ee20805282123m78402adcgf78dafe4f4d307e6@mail.gmail.com>
	<483E3C11.6060908@unisolve.com.au>
	<B52D7CDC-55F0-432D-AD1A-28B7F3CA0AF7@machell.net>
Message-ID: <D86F5338-FF44-4FDF-85EE-C8B6449E5FA7@alchemy.com.au>

ize is arguably more correct, and is not really an americanisation.  
That said, I favour ise, probably for the reasons outlined here:

http://www.askoxford.com/asktheexperts/faq/aboutspelling/ize

Guy

On 29/05/2008, at 3:37 PM, Peter Machell wrote:

> On 29/05/2008, at 3:16 PM, Simon Taylor wrote:
>
>> We moved effortlessly to the metric system because of our culture,
>> (even
>> if we spell 'metre' the French way), and the US
>> has not manged this transition because of theirs.
>
> I don't understand this argument. Nor do I think we moved
> effortlessly. I'm 36 and was taught metric at school, but still think
> of small distances in feet and inches and long ones in kilometres, a
> result of the culture I was raised in.
>
>> But no matter how you dice it, their spelling is better....
>
> My opinion is better than yours? Color and Mom are horrible and don't
> make phonetic sense without the US accent (not that their aren't lots
> of similar English examples). I can't help but correct Program every
> time I see it, not to mention almost anything with a z in it.
>
> Anyway I agree that our culture wouldn't suffer much if we all
> submitted to the US way, but isn't the ultimate end of that line of
> thinking complete Americanizzzation?
>
> regards,
> Peter.
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm


From daniel at rimspace.net  Wed May 28 23:36:40 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Thu, 29 May 2008 16:36:40 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au> (Scott Penrose's
	message of "Thu, 29 May 2008 13:41:55 +1000")
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
Message-ID: <878wxtvcp3.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:

>> SHA1 and MD5 are in the same family, and successful attacks on (full)
>> SHA1 have reduced collision generation to 2^69 trials from 2^80.
>>
>> Plan on replacing SHA1 everywhere within the next ten years, and on
>> needing to step up to SHA256 or SHA512 in the interim, at the very
>> least.
>
> All the above is correct but not quite for this case. MD5 and SHA1 and
> up all just decrease how likely collisions are to help against bruit
> force attack - but for signatures against text. 

I am not quite convinced that your response is correct.  The issue is
that finding two inputs that generate colliding outputs.  

The document signature case is a situation where the signed document can
be replaced with a colliding document and the signature will still
validate.

> Remember that this is just a way of hiding the secret. What it needs
> to do is make it so that you need 1000s or more of guesses to get the
> next entry. Where as doing time (or as shown even rand(time)) is
> predictable.

I guess it depends on what you are using the token for, as Paul
correctly pointed out -- MD5 and SHA1 distribute the entropy and make it
harder to guess the next item in the sequence, but they don't add any
entropy.

time, or rand(time), has very, very little entropy, and can often be
trivially determined for a network server.

> One of the reasons Cryptography is so hard is you can't apply one rule
> to another. The MD5 birthday attack scenarios are useful only against
> documents you are signing. Where as this is just a one way hashing
> algorithm I need. I could probably use crypt :-) (not really).

As far as I can tell your design is vulnerable to token forgery -- if
someone can mint tokens at will they can abuse your service, correct?

Ah.  Wait.  You are storing generated tokens, so only something that was
both generated on the server *and* recorded will be valid, right?

Yes, on that basis this isn't a threat: tokens that might be valid but
are not minted by your server are not going to grant any access.


If you /didn't/ store the token information[1] then you are vulnerable
to collisions, on the basis that:

1. Your UUID is (sufficiently) predictable, or you would just use that.
2. Your token comprises sha1(uuid . secret)
3. The attacker can read the source code and determine the model you are
   using for generating tokens.[2]

On this basis we can assume that the attacker can successfully forge
UUID generation from your site, then they can find any value secret'
such that:

    sha1(uuid . secret) == sha1(uuid . secret')

At that stage they can mint new tokens and abuse your services at will.  


Hrm.  Even with token recording that means they could potentially abuse
your service by speculatively generating tokens and then submitting
input in the hope that a genuine matching token will be generated.

It would probably be easier to just fetch tokens from your system
though. :)

[...]

> On another topic - Security of using MD5 - it seems that every module
> I find on the net from Java to PHP to Python to Perl are using what I
> originally wrote - MD5 of a random string (usually time) against a
> unique number (often just generated with a sequence, time or
> combination of time, ip etc).
>
> The most common PHP code is
> 	$token = md5(uniqid(rand(), TRUE));
>
> uniqid is equiv to Data::UUID (different way of calculating).
>
> Even the praised Apache::Session and CGI::Session just use:
>
> 	md5_hex($$, time(), rand(time));
>
> I can't find a single reference on the net that says this is insecure
> as has been documented in this thread. 

Security is relative: it would be much easier for me to predict the
Apache::Session session ID value than your Data::Token value.  

It is almost certainly easier to find some other security hole, though,
than to brute force that.  Social engineering, paying pennies per spam
to humans in inexpensive locations, and other technical threats are much
more profitable than hacking cryptography today.

> Some people raise in threads that you should use SHA1 and in each case
> it is said not to be required.

Well, I just read checked the code for Apache::AuthCookie to make sure
it is insecure, and it is vulnerable to exactly the risk here:

It authenticates the values in the cookie with a secret, where the
secret is absolutely vulnerable to the generation of collisions.


> So the question is:
>
> 1) Am I missing the threads on the net
> 2) Are we jumping to the wrong conclusion because we are mixing document
>    signature faking with unpredictability
> 3) Is this really a problem and we are the first to really solve it.
>
> My gut is now telling me (2). If it is not then almost every single
> site on the internet is now vulnerable.

The answer is kind of 3: it is really a problem, with a caveat, and we
are absolutely not the first people to solve it.[3]

However...

[...]

Paul Fenwick <pjf at perltraining.com.au> writes:

> (2a).  The ability to engineer collisions with MD5 can be considered a
> non-issue because we're not signing documents, the only requirement is that
> the hash is *hard to guess*.  

...this is sometimes the case, and sometimes it isn't.

When it isn't (Apache::AuthCookie) then the site really is vulnerable,
but.  Again, the but is "in the real world...", where the cost of
exploiting the MD5 weakness is much higher than exploiting some other
weakness.

So, yeah.  In some cases this doesn't matter, for this reason, but in
others it /does/ matter theoretically, but not practically for some
years yet.

> In this sense, we're using MD5 as a way to distribute our entropy
> throughout a reasonably long string.  MD5 (or SHA1, or ROT13) won't
> increase the entropy that we have, but it can increase the work an
> attacker needs to do, and make it less obvious with regards to the
> data we're using to generate the hash to begin with.

For Data::Token this is probably enough, as Paul says.

[...]

> It's worth noting that tokens with poor randomness stop being "good enough"
> when you start having lots of sessions, or sessions which are active for a
> long time, or a very valuable prize for breaking a session.  I'd expect the
> session generation for on-line banking to contain significantly more entropy,
> and be significantly more paranoid than the session generation for my
> delicious bookmarks.  

You would hope, eh?  My online banking, which is some of the best I have
seen, uses an unsalted SHA1 transformation, making my password
vulnerable to a "rainbow table" attack if the SSL protection ever fails.

Oh, well.  I guess they didn't attend classes the day that the risks of
that were discussed.

[...]

> Having said all that, we're going to generate tokens, and we have the
> stated goals of wanting them to be unique, and wanting them to be hard
> to guess.  I don't see there being much harm in making sure they're
> absolutely unique, and *really* hard to guess if that doesn't cost us
> very much[1].

For the use case this is probably a more reasonable approach than my
more secure comments.  

Regards,
        Daniel


Footnotes: 
[1]  Which, to my eye, looks like an invitation to an attacker to
     consume unbounded storage on your server, baring other limitations,
     but you did note that you address that threat outside the token
     system in a previous post.

[2]  This is probably the most unlikely part of this threat model, but
     essential if you want to consider any real uniqueness from the
     token.

[3]  My knowledge of this comes from cryptographic literature, and 
     I didn't design my own security protocol, because I am not /that/
     knowledgeable in the area.


From scottp at dd.com.au  Thu May 29 03:50:34 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 20:50:34 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <878wxtvcp3.fsf@rimspace.net>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
	<878wxtvcp3.fsf@rimspace.net>
Message-ID: <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au>

> At that stage they can mint new tokens and abuse your services at  
> will.

You are no longer talking about tokens. Tokens are unpredictable  
numbers used for things like authentication and session tracking. They  
must be stored. What you are talking about is encrypting cookies, or  
encoding other data into the data back. What we want here is just an  
id that if someone tries say 100,000 attempts they would fail.

As for your entropy comments you are repeating what I said. Which is  
that the MD5/SHA1 is just a way of hiding the secret.

Again, look at every implementation on the net from CGI::Session to  
PHP - they pretty much all use

md5_hex(time, rand(time));

Now I agree, that is predictable, so we fix the randomness and a more  
unique key. But using SHA1 instead of MD5 does not provide any greater  
security for tokens - except that they are just longer.  But I think I  
will use it anyway just to make it a little safer.

> Hrm.  Even with token recording that means they could potentially  
> abuse
> your service by speculatively generating tokens and then submitting
> input in the hope that a genuine matching token will be generated.

Sorry to ask this Danial, but have you read any of my previous  
replies. This has all been discussed and pointed out. Any token scheme  
in the world suffers from the above. Sure you can keep making the  
space bigger and harder to hit, but in the end you really must push  
back on failed lookups. The easiest way to do this is as old as  
password entry, and that is just to add more and more delay. Remember  
there is no data in this token - it is just a pointer to the local data.

> Security is relative: it would be much easier for me to predict the
> Apache::Session session ID value than your Data::Token value.

Yeah and Apache::Session (also same as Apache::AuthCookie) is used for  
Authentication.

> It is almost certainly easier to find some other security hole,  
> though,
> than to brute force that.  Social engineering, paying pennies per spam
> to humans in inexpensive locations, and other technical threats are  
> much
> more profitable than hacking cryptography today.

Sounds like you are arguing in circles :-)

>
>> Some people raise in threads that you should use SHA1 and in each  
>> case
>> it is said not to be required.
>
> Well, I just read checked the code for Apache::AuthCookie to make sure
> it is insecure, and it is vulnerable to exactly the risk here:

Yeah it seems just about everything is using at best a process ID +  
time + rand(time) - take the MD5 - not great.

> It authenticates the values in the cookie with a secret, where the
> secret is absolutely vulnerable to the generation of collisions.

Yeah. So what we are discussing here is great. We are making a better  
token generator, then we will encourage Apache::AuthCookie,  
CGI::Session and Apache::Session to use it.

>> So the question is:
>>
>> 1) Am I missing the threads on the net
>> 2) Are we jumping to the wrong conclusion because we are mixing  
>> document
>>   signature faking with unpredictability
>> 3) Is this really a problem and we are the first to really solve it.
>>
>> My gut is now telling me (2). If it is not then almost every single
>> site on the internet is now vulnerable.
>
> The answer is kind of 3: it is really a problem, with a caveat, and we
> are absolutely not the first people to solve it.[3]

:-)

What I really need to do now is capture this discussion into my docs  
so future people can understand the reason. I might need some help,  
especially from you Daniel and Paul because I didn't attend those  
lectures :-) so I may miss something important.

ITMT I think it is worth adding SHA1 and better Random secret  
generators as discussed. That means the ID is 160 characters though,  
but I still think that is reasonable.

Thanks

Scott


From daniel at rimspace.net  Thu May 29 03:56:30 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Thu, 29 May 2008 20:56:30 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au> (Scott Penrose's
	message of "Thu, 29 May 2008 20:50:34 +1000")
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
	<878wxtvcp3.fsf@rimspace.net>
	<0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au>
Message-ID: <87prr5750h.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:

>> At that stage they can mint new tokens and abuse your services at
>> will.
>
> You are no longer talking about tokens. Tokens are unpredictable
> numbers used for things like authentication and session tracking. 

That is a fair point.

[...]

> Sorry to ask this Daniel, but have you read any of my previous replies. 

Unfortunately my fairly persistent cold seems to be acting up again, so
the odds of my having missed the point seem high.  Sorry.  It wasn't my
attention to waste your time.

Regards,
        Daniel

From scottp at dd.com.au  Thu May 29 04:01:50 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 21:01:50 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <87prr5750h.fsf@rimspace.net>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
	<878wxtvcp3.fsf@rimspace.net>
	<0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au>
	<87prr5750h.fsf@rimspace.net>
Message-ID: <537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au>

On 29/05/2008, at 8:56 PM, Daniel Pittman wrote:
>
> Unfortunately my fairly persistent cold seems to be acting up again,  
> so
> the odds of my having missed the point seem high.  Sorry.  It wasn't  
> my
> attention to waste your time.

Your feedback has been great. And I hope that I can get you to review  
my documentation when I update it.

Ta

Scott

From daniel at rimspace.net  Thu May 29 04:06:41 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Thu, 29 May 2008 21:06:41 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au> (Scott Penrose's
	message of "Thu, 29 May 2008 21:01:50 +1000")
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
	<878wxtvcp3.fsf@rimspace.net>
	<0774BDC7-E00B-4FFD-810C-33085CD7EC10@dd.com.au>
	<87prr5750h.fsf@rimspace.net>
	<537E5370-1873-40E7-A144-1DE6B5ED7F9B@dd.com.au>
Message-ID: <87lk1t74ji.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:
> On 29/05/2008, at 8:56 PM, Daniel Pittman wrote:
>>
>> Unfortunately my fairly persistent cold seems to be acting up again, so
>> the odds of my having missed the point seem high.  Sorry.  It wasn't my
>> attention to waste your time.
>
> Your feedback has been great. And I hope that I can get you to review
> my documentation when I update it.

I do my best.  I still feel that I am missing something, probably in how
the tokens are going to be used, that makes them less security critical
than I perceive.

Hopefully updated documentation will make that clear, though, by
discussing the sort of role where they are applicable.  It certainly
wouldn't /hurt/ compared to many of the available modules which simply
don't discuss that.

Regards,
        Daniel

From scottp at dd.com.au  Thu May 29 06:14:12 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Thu, 29 May 2008 23:14:12 +1000
Subject: [Melbourne-pm] Data::Token documented
Message-ID: <8A6A111E-2B19-49E5-8465-2BF6B076403F@dd.com.au>

I have had a go documenting the discussion and outcomes:

http://scott.dd.com.au/wiki/Data-Token

Some of it I will put into the module directly, but there is too much  
there for the whole thing.

It is my first attempt but feel free to feedback any changes, directly  
or on list.

Scott


From scottp at dd.com.au  Thu May 29 17:08:54 2008
From: scottp at dd.com.au (Scott Penrose)
Date: Fri, 30 May 2008 10:08:54 +1000
Subject: [Melbourne-pm] Alternatives to Crypt::Random ?
Message-ID: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au>

Hey Guys

After all our discussion about using better andomness, I am having  
major issues with Crypt::Random. It says in the doc it does not depend  
on Math::Pari, but it does. Unfortunately I can't get Math::Pari to  
install.

This unfortunately moves the module from useful and usable into too  
difficult for the average person to install.

Ahh what is worse, the Crypt::Random on CPAN requires a version of  
Math::Pari that is not on CPAN.

Scott

From akievsky at yahoo.com.au  Thu May 29 17:14:22 2008
From: akievsky at yahoo.com.au (Andres Kievsky)
Date: Thu, 29 May 2008 17:14:22 -0700 (PDT)
Subject: [Melbourne-pm] Data::Token documented
Message-ID: <467016.14431.qm@web63201.mail.re1.yahoo.com>

> I have had a go documenting the discussion and outcomes:
> 
> http://scott.dd.com.au/wiki/Data-Token
> 
> Some of it I will put into the module directly, but there is too much  
> there for the whole thing.
> 
> It is my first attempt but feel free to feedback any changes, directly  
> or on list.

The documentation is excellent. I wish i had it years ago.

"You can also change the token on each request. This is extreme and has
quite a bit of overhead but useful. Alternatives may also be to change
it over short periods, like 5 minutes."

I wholeheartedly agree with that practice :)

Regards,
- Andres Kievsky.


      Get the name you always wanted with the new y7mail email address.
www.yahoo7.com.au/mail

From daniel at rimspace.net  Fri May 30 02:50:03 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Fri, 30 May 2008 19:50:03 +1000
Subject: [Melbourne-pm] Alternatives to Crypt::Random ?
In-Reply-To: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au> (Scott Penrose's
	message of "Fri, 30 May 2008 10:08:54 +1000")
References: <23DB1E95-785A-4AC9-9C30-B6FBDB48A684@dd.com.au>
Message-ID: <8763sw3yus.fsf@rimspace.net>

Scott Penrose <scottp at dd.com.au> writes:

> After all our discussion about using better andomness, I am having
> major issues with Crypt::Random. It says in the doc it does not depend
> on Math::Pari, but it does. Unfortunately I can't get Math::Pari to
> install.
>
> This unfortunately moves the module from useful and usable into too
> difficult for the average person to install.
>
> Ahh what is worse, the Crypt::Random on CPAN requires a version of
> Math::Pari that is not on CPAN.

Joy!  I presume you are not satisfied with the prerequisites being in
the various common distributions; I wouldn't blame you.

Math::TrulyRandom looks like an acceptable fallback if you can't read
from /dev/random on your platform, although it involves XS code, and is
bound to be fairly slow.

Math::Random::MT::Auto looks likely to be the best choice, being as it
provides a wide range of initialization functions as well as a good PRNG
that will delivery considerably higher quality results than the built-in
code.

Otherwise, looks like you get to write it yourself.  Yay.

Regards,
        Daniel

From jarich at perltraining.com.au  Fri May 30 04:48:51 2008
From: jarich at perltraining.com.au (Jacinta Richardson)
Date: Fri, 30 May 2008 21:48:51 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>	<483E006A.7070902@perltraining.com.au>	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
Message-ID: <483FE9A3.6010707@perltraining.com.au>

Scott Penrose wrote:

> So the question is:
> 
> 1) Am I missing the threads on the net
> 2) Are we jumping to the wrong conclusion because we are mixing  
> document signature faking with unpredictability
> 3) Is this really a problem and we are the first to really solve it.

I think it's 3 in so far that many of these modules were written before 17th 
August 2004 (which is when Xiaoyun Wang,Dengguo Feng, Xuejia Lai, and Hongbo Yu 
announced collisions for the full MD5 space (Their analytical attack was 
reported to take only one hour on an IBM p690 cluster.)).  Prior to this, the 
general assumption seemed to be that engineering a collision would be really 
hard, and finding a collision by accident would be next to impossible.

Since not everyone keeps up with cryptography news, people continue to use md5 
despite its issues.  This is not necessarily because it's a good idea.  It may 
even be as simple as when people think of hashing algorithms the first one that 
comes to mind is md5.

I expect that for the purposes of generating tokens, particularly with the use 
of a salt, that these issues aren't really a problem.  However, if you do so you 
are choosing to provide a less secure token than you could otherwise.  I think 
in general, using md5 for anything to do with security or with anything which 
might even be vaguely connected with the idea of security, is looking like a bad 
idea.

Regarding SHA1 and SHA2, "the security of SHA-1 has been somewhat compromised by 
cryptography researchers.  Although no attacks have yet been reported on the 
SHA-2 variants, they are algorithmically similar to SHA-1 and so efforts are 
underway to develop improved alternative hashing algorithms." ( 
http://en.wikipedia.org/wiki/SHA_hash_functions )

All the best,

	J

From daniel at rimspace.net  Fri May 30 06:42:03 2008
From: daniel at rimspace.net (Daniel Pittman)
Date: Fri, 30 May 2008 23:42:03 +1000
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <483FE9A3.6010707@perltraining.com.au> (Jacinta Richardson's
	message of "Fri, 30 May 2008 21:48:51 +1000")
References: <mailman.1515.1211522258.2907.melbourne-pm@pm.org>
	<11A7771B-466B-4FFF-8787-C4859309E651@dd.com.au>
	<483E006A.7070902@perltraining.com.au>
	<B1A33E87-2966-418E-AD0A-6B962B3E0FD7@dd.com.au>
	<87tzghx1pm.fsf@rimspace.net>
	<ACAE3017-B4F6-4477-8E5F-E07E1627BD00@dd.com.au>
	<483FE9A3.6010707@perltraining.com.au>
Message-ID: <87iqwv3o44.fsf@rimspace.net>

Jacinta Richardson <jarich at perltraining.com.au> writes:
> Scott Penrose wrote:
>
>> So the question is:
>> 
>> 1) Am I missing the threads on the net
>> 2) Are we jumping to the wrong conclusion because we are mixing  
>> document signature faking with unpredictability
>> 3) Is this really a problem and we are the first to really solve it.
>
> I think it's 3 in so far that many of these modules were written
> before 17th August 2004 (which is when Xiaoyun Wang,Dengguo Feng,
> Xuejia Lai, and Hongbo Yu announced collisions for the full MD5 space
> (Their analytical attack was reported to take only one hour on an IBM
> p690 cluster.)).  Prior to this, the general assumption seemed to be
> that engineering a collision would be really hard, and finding a
> collision by accident would be next to impossible.
>
> Since not everyone keeps up with cryptography news, people continue to
> use md5 despite its issues.  This is not necessarily because it's a
> good idea.  It may even be as simple as when people think of hashing
> algorithms the first one that comes to mind is md5.
>
> I expect that for the purposes of generating tokens, particularly with
> the use of a salt, that these issues aren't really a problem.
> However, if you do so you are choosing to provide a less secure token
> than you could otherwise.  I think in general, using md5 for anything
> to do with security or with anything which might even be vaguely
> connected with the idea of security, is looking like a bad idea.

Mmmm.  I am still trying to work out how to respond to the documentation
Scott wrote, but my general feeling is that these tokens *are* used in a
security sensitive context, and that token forgery is a genuine risk.

As I said previously, though, it probably isn't a significant risk
compared to other threats to your deployment: breaking an MD5 session
token hash isn't (yet) an economically viable way for most attackers to
abuse available services.

On that basis the continued use of (compromised) MD5 or (soon to be
compromised) SHA1 for the tokens is probably not sufficiently worrying
to have to rush into changing them... yet.


Like Jacinta, I also expect that Data::Token will be used in security
related areas -- Apache::AuthCookie, for example -- even if the
documentation *explicitly* states that it isn't suitable.

On that basis planning for MD5 and SHA1 cracking being economically
viable[1] on day, and having the module cope, is probably a good move.

Regards,
        Daniel

Footnotes: 
[1]  If breaking CAPTCHA images is economically viable then stealing
     sessions by brute-force (or worse) attacks on the token identifying
     them is going to happen one of these days.  One resource the
     attackers have in spades is CPU time.


From thogard at abnormal.com  Fri May 30 22:11:43 2008
From: thogard at abnormal.com (Tim Hogard)
Date: Sat, 31 May 2008 05:11:43 +0000 (UTC)
Subject: [Melbourne-pm] Data::Token
In-Reply-To: <87iqwv3o44.fsf@rimspace.net>
Message-ID: <200805310511.m4V5BhXo069679@v.abnormal.com>

I think the real problem is everyone is missing the point of tokens.

They are a risk mitigation device which means the question
becomes "what is the risk?"

I don't care if your using sha-1, sha-2, md5, md4 md1, crypt or crc
as none of them change the security of the token, they only change
the level of risk.

Key aspects of token are:
1) they must appear random (as in test 100,000 of them for randomness)
2) they must not be guessable (this bit is hard)
3) there must be a process in place to lock out users who attempt
to offer bad tokens.

Number 3 is the key.  If I can give out tokens that use a crc-16
as a hash, then I can offer one of 65,536 random numbers a as hash
to your system and will have a 0.00152541338703% chance of getting
in, if my other are ok.  If your system lets me send and average
of 32,000 hashs and then lets me in, you have a major problem.

Another issue you need to be concerned with is looking at the
one to many relationship the other way around.  Assume your token
ends up being a 4 digit pin line number.  It would take an
average of 5,000 guesses to get your pin number but if I was
guessing 0001 today and 0002 at for many uses at once, that
may change the game.  Think about hijacking a grocery stoes
pin pad system and just trying 0001 for everyone the first day
and 0002 the next and so on...  If they get 5,000 customers,
how many vaild pins will you have by the end of the month?

Now consider that problem in the DDOS coordinated many to many
attack... each of a million host is offering 3 bad tokens to your
system.  What are the odds then?  The solution is the user needs
to hand you a token and other id with every transaction.

Even if your hitting bank high value accounts, what is the cost
risk if a valid token is hit?  Once you figure in costs of insurance,
odds of reversing transactions and time wasted, it doesn't justify
the level of hashes typicaly used from an actuarial point of view
and the only reason its so secure is that big number crypto is
cheap.

If your token system doesn't have good odds of keeping people out
if its hash just mirrors the input data, you need to find a better
way.

-tim

> 
> Jacinta Richardson <jarich at perltraining.com.au> writes:
> > Scott Penrose wrote:
> >
> >> So the question is:
> >> 
> >> 1) Am I missing the threads on the net
> >> 2) Are we jumping to the wrong conclusion because we are mixing  
> >> document signature faking with unpredictability
> >> 3) Is this really a problem and we are the first to really solve it.
> >
> > I think it's 3 in so far that many of these modules were written
> > before 17th August 2004 (which is when Xiaoyun Wang,Dengguo Feng,
> > Xuejia Lai, and Hongbo Yu announced collisions for the full MD5 space
> > (Their analytical attack was reported to take only one hour on an IBM
> > p690 cluster.)).  Prior to this, the general assumption seemed to be
> > that engineering a collision would be really hard, and finding a
> > collision by accident would be next to impossible.
> >
> > Since not everyone keeps up with cryptography news, people continue to
> > use md5 despite its issues.  This is not necessarily because it's a
> > good idea.  It may even be as simple as when people think of hashing
> > algorithms the first one that comes to mind is md5.
> >
> > I expect that for the purposes of generating tokens, particularly with
> > the use of a salt, that these issues aren't really a problem.
> > However, if you do so you are choosing to provide a less secure token
> > than you could otherwise.  I think in general, using md5 for anything
> > to do with security or with anything which might even be vaguely
> > connected with the idea of security, is looking like a bad idea.
> 
> Mmmm.  I am still trying to work out how to respond to the documentation
> Scott wrote, but my general feeling is that these tokens *are* used in a
> security sensitive context, and that token forgery is a genuine risk.
> 
> As I said previously, though, it probably isn't a significant risk
> compared to other threats to your deployment: breaking an MD5 session
> token hash isn't (yet) an economically viable way for most attackers to
> abuse available services.
> 
> On that basis the continued use of (compromised) MD5 or (soon to be
> compromised) SHA1 for the tokens is probably not sufficiently worrying
> to have to rush into changing them... yet.
> 
> 
> Like Jacinta, I also expect that Data::Token will be used in security
> related areas -- Apache::AuthCookie, for example -- even if the
> documentation *explicitly* states that it isn't suitable.
> 
> On that basis planning for MD5 and SHA1 cracking being economically
> viable[1] on day, and having the module cope, is probably a good move.
> 
> Regards,
>         Daniel
> 
> Footnotes: 
> [1]  If breaking CAPTCHA images is economically viable then stealing
>      sessions by brute-force (or worse) attacks on the token identifying
>      them is going to happen one of these days.  One resource the
>      attackers have in spades is CPU time.
> 
> _______________________________________________
> Melbourne-pm mailing list
> Melbourne-pm at pm.org
> http://mail.pm.org/mailman/listinfo/melbourne-pm
>