[Melbourne-pm] Data::Token

Wed May 28 19:51:01 PDT 2008

Scott Penrose <scottp at dd.com.au> writes:
> On 29/05/2008, at 11:01 AM, Paul Fenwick wrote:
>
>> G'day Scott,
>>
>> Hashing
>> =======
>>
>> I notice that Data::Token is using MD5.  Unfortunately, we're
>> starting to get very good at engineering MD5 collisions, with
>> http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/
>>  as a striking example of this.  For Data::Token this could be  
>> considered a non-issue, as we just want our tokens to be hard-to-
>> guess, rather than using them as hash of a real documentation.  Even
>> so, I'd tend towards SHA1 as a hashing algorithm with less flaws.
>
> Ta I will look at using SHA1 instead.

SHA1 and MD5 are in the same family, and successful attacks on (full)
SHA1 have reduced collision generation to 2^69 trials from 2^80.

Plan on replacing SHA1 everywhere within the next ten years, and on
needing to step up to SHA256 or SHA512 in the interim, at the very
least.

[...]

> Most of the algorithms around use a simple text string - "MySecret".
> This is how things tokens are generated for apache cookies and
> examples for tokens in PHP and on Perl Monks - but that is silly in a
> CPAN module, so I thought a bit of randomness.

[...]

> It is a sad fact that most of the Token code on CPAN and in the wile
> use things like Database ID, Time stamp or similar to set the token
> for a cookie :-)

...I agree that your model is substantially better, but I would
generally encourage building secure first, then looking at allowing the
protection to be weakened later.

That way you fail safe rather than depending on programmers to actually
have an notion of how to effectively secure the system.

[...]

> Good one thanks. I think the module should try and do well with zero
> input (DWIM) - so I will look at Crypt::Random. But we can always
> allow input into the function for increased random by passing straight
> through.

Allowing the end user to pass in "random" data to increase entropy will,
in many cases, result in less entropy included because, frankly, most
people don't really understand how to generate that. :/

However, Crypt::Random is a blocking module, and your web server is
likely to be fairly entropy constrained[1], so you want to be careful to
set the strength of the input to low (Strength => 0) when setting it up.

[...]

> Thanks for all your input Paul. I think making it stronger by default
> is the right approach. It is unlikely this needs to be fast as it is
> only for generating unique tokens, not for reading them. 

Good randomness shouldn't need to be slow, and if you really care
seeding a good PRNG (the Mersenne Twister, in Math::Random::MT::*) from
Crypt::Random would be fast and effective.

(Seeding rand() probably isn't good enough, since it isn't a terribly
 high quality PRNG in many cases.)

> I think I will also add in a few references, in particular to security
> talks.  And most importantly I should add some comments on checking
> for uniquness in a token system AND even more important to protect
> against bruit force attack.

If you were extending this I would consider an implementation that can
answer the key question "Is this my token" in a cryptographically secure
fashion, ensuring that you don't need to store the token anywhere.

Something like:  

  base64(encrypt(key2, join(':', token, random, key1)), ":", token)

You can then verify that the secret part decrypts, contains key1, and
matches the public token, without needing to store anything.  key1 and
key2 can be randomly generated and only need to be stable for the life
of the tokens; adding a date to the outside can also help.

> Just out of interest, how many people have had to create these tokens  
> and do the same research as above? From the feedback here I guess that  
> this is a worth while module so that the next person does not have to  
> do the same again :-)

If there was a good, portable module to produce something like the
above, for arbitrary values of 'token', and optionally without exposing
token at all, I would be happy.

I don't know that is the use case for your module, though, but rather
the current module is a component of that larger system.

Regards,
        Daniel

...and now I wait to be pointed at the existing module that does all
that for me, because you always learn it exists afterwards.

Footnotes: 
[1]  There is very, very little true entropy on a headless server, and
     very little support for effectively using and *trusting* entropy
     from a hardware RNG, even if one is present.