[Melbourne-pm] Data::Token

Wed May 28 21:53:16 PDT 2008

G'day Scott/MPM,

Scott Penrose wrote:

[much snippage, apologies, I've got a deadline today]

> 1) Am I missing the threads on the net
> 2) Are we jumping to the wrong conclusion because we are mixing document 
> signature faking with unpredictability
> 3) Is this really a problem and we are the first to really solve it.

> My gut is now telling me (2). If it is not then almost every single site 
> on the internet is now vulnerable.

(2a).  The ability to engineer collisions with MD5 can be considered a 
non-issue because we're not signing documents, the only requirement is that 
the hash is *hard to guess*.  In this sense, we're using MD5 as a way to 
distribute our entropy throughout a reasonably long string.  MD5 (or SHA1, 
or ROT13) won't increase the entropy that we have, but it can increase the 
work an attacker needs to do, and make it less obvious with regards to the 
data we're using to generate the hash to begin with.

The result is that the hashes are "good enough" for most applications.  Yes, 
all the hash algorithms can result in collisions, but the possibility of 
such a collision coming out of our random session generator is vanishingly 
small.

With regards to the entropy problem, we may have a session hash that has 
perhaps 32 bits of entropy, perhaps from a /dev/urandom seed.  It's possible 
for an attacker to walk through all these values, push them through our hash 
function, generate a potential session ID, and present it to our server. 
However:

	1) It would be obvious an attack is taking place, with up
	   to 2^32 requests being presented to our server.

	2) It would take a long time.  Even if an attacker could
	   present 100 hashes per second, it would take almost 500
	   days to walk the entire keyspace, although for a service
	   with many active sessions, a collision could occur much
	   sooner.

	3) They need to hit a hash that's valid at the time it's presented.
	   If sessions time out rapidly, then even walking through the
	   entire keyspace may not result in a hit.

	4) The session the attacker gains access to may not be very
	   valuable, as it will almost always be a random user.

	5) The service may still require a password before revealing
	   credit card details, transferring money, changing delivery
	   addresses, etc.

	6) The service may invalidate a session if it sees the IP address,
	   browser string, etc change, even though the session is active.

	7) In most cases, it's much easier to just sniff a hash off
	   the wire if not encrypted, or use other exploits to compromise
	   the user.

It's worth noting that tokens with poor randomness stop being "good enough" 
when you start having lots of sessions, or sessions which are active for a 
long time, or a very valuable prize for breaking a session.  I'd expect the 
session generation for on-line banking to contain significantly more 
entropy, and be significantly more paranoid than the session generation for 
my delicious bookmarks.  Heck, even eBay wants your password via https 
whenever you do something that an attacker may even find modestly valuable 
(selling/buying/changing details).

Having said all that, we're going to generate tokens, and we have the stated 
goals of wanting them to be unique, and wanting them to be hard to guess.  I 
don't see there being much harm in making sure they're absolutely unique, 
and *really* hard to guess if that doesn't cost us very much[1].

Cheerio,

	Paul

[2] As Daniel has pointed out, blocking for entropy is likely to be costing 
us too much, so asking Crypt::Random to be non-blocking is a great default.

-- 
Paul Fenwick <pjf at perltraining.com.au> | http://perltraining.com.au/
Director of Training                   | Ph:  +61 3 9354 6001
Perl Training Australia                | Fax: +61 3 9354 2681