SPUG:Re: Appropriateness of MD5
spug at ifokr.org
Thu Mar 20 12:01:14 CST 2003
> Problems with MD5 can crop up when checksums are being generated for
> millions of entries, which increases the odds that two different strings
> share the same checksum. This problem can be reduced by using a digest
> algorithm that uses more bits; CPAN has several modules under the Digest
> area that offer lower chances of collisions at higher CPU costs.
> That being said, MD5 is likely to have a large enough bitspace for
> your needs.
Since you're not actively trying to make two different texts generate
the same MD5 hash, you're asking "what are the chances I'll have
two end up with the same hash" which is the birtday paradox.
A good estimate says "if you have X possible things, then if you
randomly pick 1.2 * sqrt(X) things, there's a 50/50 chance that
two will be the same". That's probably pretty applicable to this
So, what would that number be?
1.2 * sqrt(2^128) == 1.2 * 2^64 == 22136092888451461939
So, if you happen to have 22136092888451461939 records that you're
checking out, it's even money that you will have one collision.
This assumes that MD5 generates it's output 'evenly' (each hash value
is equally likely) which was one of the goals when it was created.
I now return you to your normally scheduled stick riddle.
Brian Hatch Hang up and drive.
Every message PGP signed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://mail.pm.org/pipermail/spug-list/attachments/20030320/fe82f9ce/attachment.bin
More information about the spug-list