SPUG:Re: Appropriateness of MD5

Brian Hatch spug at ifokr.org
Thu Mar 20 12:01:14 CST 2003



> Problems with MD5 can crop up when checksums are being generated for
> millions of entries, which increases the odds that two different strings
> share the same checksum.  This problem can be reduced by using a digest
> algorithm that uses more bits; CPAN has several modules under the Digest
> area that offer lower chances of collisions at higher CPU costs.
> 
> That being said, MD5 is likely to have a large enough bitspace for
> your needs.

Since you're not actively trying to make two different texts generate
the same MD5 hash, you're asking "what are the chances I'll have
two end up with the same hash" which is the birtday paradox.

A good estimate says "if you have X possible things, then if you
randomly pick 1.2 * sqrt(X) things, there's a 50/50 chance that
two will be the same".  That's probably pretty applicable to this
situation.

So, what would that number be?

	1.2 * sqrt(2^128) == 1.2 * 2^64 == 22136092888451461939

So, if you happen to have 22136092888451461939 records that you're
checking out, it's even money that you will have one collision.

This assumes that MD5 generates it's output 'evenly' (each hash value
is equally likely) which was one of the goals when it was created.


I now return you to your normally scheduled stick riddle.




--
Brian Hatch                  Hang up and drive.
   Systems and
   Security Engineer
http://www.ifokr.org/bri/

Every message PGP signed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/spug-list/attachments/20030320/fe82f9ce/attachment.bin


More information about the spug-list mailing list