[Melbourne-pm] Data::Token
Scott Penrose
scottp at dd.com.au
Wed May 28 18:21:44 PDT 2008
On 29/05/2008, at 11:01 AM, Paul Fenwick wrote:
> G'day Scott,
>
> Hashing
> =======
>
> I notice that Data::Token is using MD5. Unfortunately, we're
> starting to get very good at engineering MD5 collisions, with http://th.informatik.uni-mannheim.de/People/lucks/HashCollisions/
> as a striking example of this. For Data::Token this could be
> considered a non-issue, as we just want our tokens to be hard-to-
> guess, rather than using them as hash of a real documentation. Even
> so, I'd tend towards SHA1 as a hashing algorithm with less flaws.
Ta I will look at using SHA1 instead.
> Randomness
> ==========
>
> Unfortunately, rand(time) isn't very random. When Perl sees the use
> of rand it will first try to seed its pseudo-random number generate
> (PRNG) with a good source of entropy, typically from /dev/urandom on
> modern unixes. On most systems, this gives you at most 32 bits of
> entropy, since that's all the random seed will take. rand(time)
> then generates a floating point number between 0 and the seconds
> from the epoch. This number can be predicted based upon the current
> time, and our original 32 bits of entropy (which we can brute force).
Most of the algorithms around use a simple text string - "MySecret".
This is how things tokens are generated for apache cookies and
examples for tokens in PHP and on Perl Monks - but that is silly in a
CPAN module, so I thought a bit of randomness.
I am open to better random numbers, but even just adding time would be
enough, after a hashing to make it different.
All systems using a token are always open for brute force attack, and
you must still protect against that, by blocking IPs, increased
timeout on failed requests etc. This system does just one thing,
generate the token, it does not protect it, nor at least in some parts
protect against duplicates.
The randomness is there to help you not guess the next free number, or
at least take 1000s of attempts to do so. Preferably lots more.
It is a sad fact that most of the Token code on CPAN and in the wile
use things like Database ID, Time stamp or similar to set the token
for a cookie :-)
Ahhh I see you have a suggestion below, I will try that then.
> Uniqueness
> ==========
> MD5 doesn't guarantee that its output is unique, even though the
> input has been generated from unique identifiers. It's *very*
> unlikely that we'll see a collision, but it's still a possibility.
I assume that SHA1 would be the same, but I think mainly the issue is
we are taking a HASH, therefore we are always gong to have a chance of
being collision.
In the end, I think if you are generating a token it should be checked
against the existing ones before returning (I imagine in a life time
we would never see a collision, but better safe than sorry).
> Suggestion
> ==========
> Rather than pushing our UUID and our random number through MD5, I
> would suggest a simple concatenation. The UUID guarantees that our
> resulting string will be unique, and our random number
> (appropriately encoded) will ensure that it's hard to guess. I
> would allow the user to supply an argument specifying how many bits
> of randomness they want, and possibly an argument to specify the
> quality of that randomness (are we willing to block for good
> randomness?).
>
> I recommend using Crypt::Random from CPAN as a way to get your
> random numbers. It does the hard work of finding an appropriate
> source of randomness, including hooking into /dev/u?random, asking
> PARI, or talking to the entropy gathering daemon (if installed). It
> also takes size and strength arguments, which can be passed straight
> through from the user.
Good one thanks. I think the module should try and do well with zero
input (DWIM) - so I will look at Crypt::Random. But we can always
allow input into the function for increased random by passing straight
through.
Quick question on right format though... the normal case, for most
users would be just
print token, "\n";
To pass in the higher level of randomness (which I think 999/1000 is
unnecessary) what is the best way:
* On the line "use Data::Token"
* Passed into token "token(...)";
* Set variables - $Data::Token::strength (ok this one sux)
* Call methods - Data::Token::strength(...);
Thoughts?
> Further reading
> ===============
> I discuss the troubles with generating good random numbers in Perl
> in chapter 10 of "Perl Security", available from http://perltraining.com.au/notes.html
> . Feedback and comments appreciated.
Thanks, I will have a look.
Thanks for all your input Paul. I think making it stronger by default
is the right approach. It is unlikely this needs to be fast as it is
only for generating unique tokens, not for reading them. I think I
will also add in a few references, in particular to security talks.
And most importantly I should add some comments on checking for
uniquness in a token system AND even more important to protect against
bruit force attack.
Just out of interest, how many people have had to create these tokens
and do the same research as above? From the feedback here I guess that
this is a worth while module so that the next person does not have to
do the same again :-)
Scott
More information about the Melbourne-pm
mailing list