Re: conflict of interest

From: Michael O'Reilly <michael@dont-contact.us>
Date: 05 Nov 1997 12:36:36 +0800

David Luyer <luyer@ucs.uwa.edu.au> writes:

> Hmmm. If we're using a 128-bit hash, storing it in 128-bits would be
> optimal, but it would be nice if we could also support a 'version number'
> if we think we might want to support further headers on the object.

good idea. Just nick a byte. Someone did a paper looking at the number
of hash collisions for md5, and even with only 8 bytes of the hash,
the number of collisions was in the highly improbably range.
 
> > 2. Fixed size == much easier memory management for the cache index.
>
> The big advantage I believe - also that the cache log could be transformed
> into a (much more compact) binary file since it would be entirely
> fixed-length
> records. The memory to store URL strings is getting very significant in
> some (very large) caches, the reason URL compression was originally
> mentioned.

>From cachemgr:

 StoreEntry 2112526 x 52 bytes = 107276 KB
 URL strings = 100803 KB

i.e. average of 48.8 bytes per URL. thus md5 will save around 60-70 meg
of ram on this cache.

 
> To answer someone else's suggestion that SHA is slow - isn't SHA is
> fast enough
> for the Linux kernel to do on syn/recv cookies, etc? (drivers/char/random.c
> has USE_SHA by default)

I think the suggestion was slow 'compared to md5'. No idea which would
be faster.

> I think the first line of the file it's in would be a good idea. Then
> we can check it when we open the file (for TCP requests ONLY!) (fixes any
> possible 'wrong swapfile' bugs and/or provides stats on the theoretically
> improbable hash collision).
 
> Accessing this on a purge isn't a big problem; you have to bring the inode
> data into RAM to unlink the file to begin with, it's just one more block
> to read off disk for the first block of the file and a few more syscalls to
> access it.

Be careful here. the unlink() is currently done in a seperate
process.. It's also 3 disk seeks instead of 2, and worse if readahead
kicks in.

Michael.
Received on Tue Jul 29 2003 - 13:15:44 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:28 MST