SPUG:Re: Appropriateness of MD5

Brian Ingerson ingy at ttul.org
Thu Mar 20 22:32:13 CST 2003


On 20/03/03 10:01 -0800, Brian Hatch wrote:
> 
> 
> > Problems with MD5 can crop up when checksums are being generated for
> > millions of entries, which increases the odds that two different strings
> > share the same checksum.  This problem can be reduced by using a digest
> > algorithm that uses more bits; CPAN has several modules under the Digest
> > area that offer lower chances of collisions at higher CPU costs.
> > 
> > That being said, MD5 is likely to have a large enough bitspace for
> > your needs.
> 
> Since you're not actively trying to make two different texts generate
> the same MD5 hash, you're asking "what are the chances I'll have
> two end up with the same hash" which is the birtday paradox.
> 
> A good estimate says "if you have X possible things, then if you
> randomly pick 1.2 * sqrt(X) things, there's a 50/50 chance that
> two will be the same".  That's probably pretty applicable to this
> situation.
> 
> So, what would that number be?
> 
> 	1.2 * sqrt(2^128) == 1.2 * 2^64 == 22136092888451461939
> 
> So, if you happen to have 22136092888451461939 records that you're
> checking out, it's even money that you will have one collision.

Those odds are good enough for me.

BUt I've always wondered if anybody has ever found two strings (let
alone two text strings) (let alone two meaningful text strings) that
digested to the same MD5 value. What we do if we did find a match.

I think I'd just patch the MD5 library to look for the special case ;)

Cheers, Brian

> 
> This assumes that MD5 generates it's output 'evenly' (each hash value
> is equally likely) which was one of the goals when it was created.
> 
> 
> I now return you to your normally scheduled stick riddle.
> 
> 
> 
> 
> --
> Brian Hatch                  Hang up and drive.
>    Systems and
>    Security Engineer
> http://www.ifokr.org/bri/
> 
> Every message PGP signed



More information about the spug-list mailing list