[Melbourne-pm] Data::Token

Wed May 28 23:36:40 PDT 2008

Scott Penrose <scottp at dd.com.au> writes:

>> SHA1 and MD5 are in the same family, and successful attacks on (full)
>> SHA1 have reduced collision generation to 2^69 trials from 2^80.
>>
>> Plan on replacing SHA1 everywhere within the next ten years, and on
>> needing to step up to SHA256 or SHA512 in the interim, at the very
>> least.
>
> All the above is correct but not quite for this case. MD5 and SHA1 and
> up all just decrease how likely collisions are to help against bruit
> force attack - but for signatures against text. 

I am not quite convinced that your response is correct.  The issue is
that finding two inputs that generate colliding outputs.  

The document signature case is a situation where the signed document can
be replaced with a colliding document and the signature will still
validate.

> Remember that this is just a way of hiding the secret. What it needs
> to do is make it so that you need 1000s or more of guesses to get the
> next entry. Where as doing time (or as shown even rand(time)) is
> predictable.

I guess it depends on what you are using the token for, as Paul
correctly pointed out -- MD5 and SHA1 distribute the entropy and make it
harder to guess the next item in the sequence, but they don't add any
entropy.

time, or rand(time), has very, very little entropy, and can often be
trivially determined for a network server.

> One of the reasons Cryptography is so hard is you can't apply one rule
> to another. The MD5 birthday attack scenarios are useful only against
> documents you are signing. Where as this is just a one way hashing
> algorithm I need. I could probably use crypt :-) (not really).

As far as I can tell your design is vulnerable to token forgery -- if
someone can mint tokens at will they can abuse your service, correct?

Ah.  Wait.  You are storing generated tokens, so only something that was
both generated on the server *and* recorded will be valid, right?

Yes, on that basis this isn't a threat: tokens that might be valid but
are not minted by your server are not going to grant any access.

If you /didn't/ store the token information[1] then you are vulnerable
to collisions, on the basis that:

1. Your UUID is (sufficiently) predictable, or you would just use that.
2. Your token comprises sha1(uuid . secret)
3. The attacker can read the source code and determine the model you are
   using for generating tokens.[2]

On this basis we can assume that the attacker can successfully forge
UUID generation from your site, then they can find any value secret'
such that:

    sha1(uuid . secret) == sha1(uuid . secret')

At that stage they can mint new tokens and abuse your services at will.  

Hrm.  Even with token recording that means they could potentially abuse
your service by speculatively generating tokens and then submitting
input in the hope that a genuine matching token will be generated.

It would probably be easier to just fetch tokens from your system
though. :)

[...]

> On another topic - Security of using MD5 - it seems that every module
> I find on the net from Java to PHP to Python to Perl are using what I
> originally wrote - MD5 of a random string (usually time) against a
> unique number (often just generated with a sequence, time or
> combination of time, ip etc).
>
> The most common PHP code is
> 	$token = md5(uniqid(rand(), TRUE));
>
> uniqid is equiv to Data::UUID (different way of calculating).
>
> Even the praised Apache::Session and CGI::Session just use:
>
> 	md5_hex($$, time(), rand(time));
>
> I can't find a single reference on the net that says this is insecure
> as has been documented in this thread. 

Security is relative: it would be much easier for me to predict the
Apache::Session session ID value than your Data::Token value.  

It is almost certainly easier to find some other security hole, though,
than to brute force that.  Social engineering, paying pennies per spam
to humans in inexpensive locations, and other technical threats are much
more profitable than hacking cryptography today.

> Some people raise in threads that you should use SHA1 and in each case
> it is said not to be required.

Well, I just read checked the code for Apache::AuthCookie to make sure
it is insecure, and it is vulnerable to exactly the risk here:

It authenticates the values in the cookie with a secret, where the
secret is absolutely vulnerable to the generation of collisions.

> So the question is:
>
> 1) Am I missing the threads on the net
> 2) Are we jumping to the wrong conclusion because we are mixing document
>    signature faking with unpredictability
> 3) Is this really a problem and we are the first to really solve it.
>
> My gut is now telling me (2). If it is not then almost every single
> site on the internet is now vulnerable.

The answer is kind of 3: it is really a problem, with a caveat, and we
are absolutely not the first people to solve it.[3]

However...

[...]

Paul Fenwick <pjf at perltraining.com.au> writes:

> (2a).  The ability to engineer collisions with MD5 can be considered a
> non-issue because we're not signing documents, the only requirement is that
> the hash is *hard to guess*.  

...this is sometimes the case, and sometimes it isn't.

When it isn't (Apache::AuthCookie) then the site really is vulnerable,
but.  Again, the but is "in the real world...", where the cost of
exploiting the MD5 weakness is much higher than exploiting some other
weakness.

So, yeah.  In some cases this doesn't matter, for this reason, but in
others it /does/ matter theoretically, but not practically for some
years yet.

> In this sense, we're using MD5 as a way to distribute our entropy
> throughout a reasonably long string.  MD5 (or SHA1, or ROT13) won't
> increase the entropy that we have, but it can increase the work an
> attacker needs to do, and make it less obvious with regards to the
> data we're using to generate the hash to begin with.

For Data::Token this is probably enough, as Paul says.

[...]

> It's worth noting that tokens with poor randomness stop being "good enough"
> when you start having lots of sessions, or sessions which are active for a
> long time, or a very valuable prize for breaking a session.  I'd expect the
> session generation for on-line banking to contain significantly more entropy,
> and be significantly more paranoid than the session generation for my
> delicious bookmarks.  

You would hope, eh?  My online banking, which is some of the best I have
seen, uses an unsalted SHA1 transformation, making my password
vulnerable to a "rainbow table" attack if the SSL protection ever fails.

Oh, well.  I guess they didn't attend classes the day that the risks of
that were discussed.

[...]

> Having said all that, we're going to generate tokens, and we have the
> stated goals of wanting them to be unique, and wanting them to be hard
> to guess.  I don't see there being much harm in making sure they're
> absolutely unique, and *really* hard to guess if that doesn't cost us
> very much[1].

For the use case this is probably a more reasonable approach than my
more secure comments.  

Regards,
        Daniel

Footnotes: 
[1]  Which, to my eye, looks like an invitation to an attacker to
     consume unbounded storage on your server, baring other limitations,
     but you did note that you address that threat outside the token
     system in a previous post.

[2]  This is probably the most unlikely part of this threat model, but
     essential if you want to consider any real uniqueness from the
     token.

[3]  My knowledge of this comes from cryptographic literature, and 
     I didn't design my own security protocol, because I am not /that/
     knowledgeable in the area.