[Melbourne-pm] Data::Token

Wed May 28 18:21:44 PDT 2008

On 29/05/2008, at 11:01 AM, Paul Fenwick wrote:

> G'day Scott,
>
> Hashing
> =======
>
> I notice that Data::Token is using MD5.  Unfortunately, we're  
> starting to get very good at engineering MD5 collisions, with http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/ 
>  as a striking example of this.  For Data::Token this could be  
> considered a non-issue, as we just want our tokens to be hard-to- 
> guess, rather than using them as hash of a real documentation.  Even  
> so, I'd tend towards SHA1 as a hashing algorithm with less flaws.

Ta I will look at using SHA1 instead.

> Randomness
> ==========
>
> Unfortunately, rand(time) isn't very random.  When Perl sees the use  
> of rand it will first try to seed its pseudo-random number generate  
> (PRNG) with a good source of entropy, typically from /dev/urandom on  
> modern unixes.  On most systems, this gives you at most 32 bits of  
> entropy, since that's all the random seed will take.  rand(time)  
> then generates a floating point number between 0 and the seconds  
> from the epoch.  This number can be predicted based upon the current  
> time, and our original 32 bits of entropy (which we can brute force).

Most of the algorithms around use a simple text string - "MySecret".  
This is how things tokens are generated for apache cookies and  
examples for tokens in PHP and on Perl Monks - but that is silly in a  
CPAN module, so I thought a bit of randomness.

I am open to better random numbers, but even just adding time would be  
enough, after a hashing to make it different.

All systems using a token are always open for brute force attack, and  
you must still protect against that, by blocking IPs, increased  
timeout on failed requests etc. This system does just one thing,  
generate the token, it does not protect it, nor at least in some parts  
protect against duplicates.

The randomness is there to help you not guess the next free number, or  
at least take 1000s of attempts to do so. Preferably lots more.

It is a sad fact that most of the Token code on CPAN and in the wile  
use things like Database ID, Time stamp or similar to set the token  
for a cookie :-)

Ahhh I see you have a suggestion below, I will try that then.

> Uniqueness
> ==========
> MD5 doesn't guarantee that its output is unique, even though the  
> input has been generated from unique identifiers.  It's *very*  
> unlikely that we'll see a collision, but it's still a possibility.

I assume that SHA1 would be the same, but I think mainly the issue is  
we are taking a HASH, therefore we are always gong to have a chance of  
being collision.

In the end, I think if you are generating a token it should be checked  
against the existing ones before returning (I imagine in a life time  
we would never see a collision, but better safe than sorry).

> Suggestion
> ==========
> Rather than pushing our UUID and our random number through MD5, I  
> would suggest a simple concatenation.  The UUID guarantees that our  
> resulting string will be unique, and our random number  
> (appropriately encoded) will ensure that it's hard to guess.  I  
> would allow the user to supply an argument specifying how many bits  
> of randomness they want, and possibly an argument to specify the  
> quality of that randomness (are we willing to block for good  
> randomness?).
>
> I recommend using Crypt::Random from CPAN as a way to get your  
> random numbers.  It does the hard work of finding an appropriate  
> source of randomness, including hooking into /dev/u?random, asking  
> PARI, or talking to the entropy gathering daemon (if installed).  It  
> also takes size and strength arguments, which can be passed straight  
> through from the user.

Good one thanks. I think the module should try and do well with zero  
input (DWIM) - so I will look at Crypt::Random. But we can always  
allow input into the function for increased random by passing straight  
through.

Quick question on right format though... the normal case, for most  
users would be just

print token, "\n";

To pass in the higher level of randomness (which I think 999/1000 is  
unnecessary) what is the best way:

* On the line "use Data::Token"
* Passed into token "token(...)";
* Set variables - $Data::Token::strength (ok this one sux)
* Call methods - Data::Token::strength(...);

Thoughts?

> Further reading
> ===============
> I discuss the troubles with generating good random numbers in Perl  
> in chapter 10 of "Perl Security", available from http://perltraining.com.au/notes.html 
>  .  Feedback and comments appreciated.

Thanks, I will have a look.

Thanks for all your input Paul. I think making it stronger by default  
is the right approach. It is unlikely this needs to be fast as it is  
only for generating unique tokens, not for reading them. I think I  
will also add in a few references, in particular to security talks.  
And most importantly I should add some comments on checking for  
uniquness in a token system AND even more important to protect against  
bruit force attack.

Just out of interest, how many people have had to create these tokens  
and do the same research as above? From the feedback here I guess that  
this is a worth while module so that the next person does not have to  
do the same again :-)

Scott