Re: MD5 collision

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Fri, 27 Oct 2000 07:36:25 -0600 (MDT)

On Thu, 26 Oct 2000, Henrik Nordstrom wrote:

> Junfeng Yang wrote:
> >
> > Hi, dear Squid guy, I have a question about Squid. how do you deal with
> > the MD5 collision? for example, two urls may have the same message digest.
>
> Then they both "fight" for the cache. The last requested is the one that
> will be in the cache. If it happens to be another object in the MD5
> digest slot when an object is stored then the other object is evicted
> from the cache.

... and, as log analysis shows, MD5 collisions for URLs are
practically impossible:

        Date: Tue, 27 Jul 1999 17:09:44 -0600 (MDT)
        From: Alex Rousskov <rousskov@ircache.net>
        To: adrian@creative.net.au
        Cc: squid-dev@ircache.net
        Subject: Re: Collisions in URLs using MD5

On Wed, 28 Jul 1999 adrian@creative.net.au wrote:

> Does anyone have figures for collisions of URL names when md5'ed ?
> I'm curious to know what it is like in the real world ..

I did some experiments in June 1997 using URLs from our SV cache. I
varied the length/size of an MD5 digest (in bytes) and varied the number
of days in the access log.

 trace length, number of number of MD5 collisions for
     days unique URLs a given URL digest length
                                 4 5 6 16
 ------------- ----------- ------ --- --- ---
             1 375066 13 0 0 0
             5 1494774 257 1 0 0
            10 2619168 817 2 0 0

Thus, for six byte and longer URL digests, there were no collisions in
the given set. A four byte URL digest gives negligible number of
collisions (817 or 0.04% for a 10 day trace). The standard MD5 digest
length is 16 bytes.

Alex.

--
To unsubscribe, see http://www.squid-cache.org/mailing-lists.html
Received on Fri Oct 27 2000 - 07:39:32 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:56:00 MST