Direct mapped like caching

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Mon, 6 Nov 2000 23:09:12 +0200

 I've been wondering sometimes, what if we drop LRU and other
 replacement policies, and implement something that is used in
 CPU-caching: like direct-mapped and set-associative caches.

 For squid, MD5 hash has been tested to give very small collision
 frequency (I can't find it anywhere, but if I recall right, then
 Duane did a test with few millions of URLs), and at already quite
 a low bitcounts.
 We have fixed amount of files possible for given disk space, and
 filemap bitmap is usually a power of 2.

 Now, what if we mask MD5 bitmap to drop all bits higher than what
 we can handle within our file bitmap, and use resulting number as
 a direct pointer to cachedir swapfile, or even better, disk block?

 Lets assume we can handle 2M of objects, for a moment.
 First, we can use simple array[0..2M] of "something" (pointers?)
 related to corresponding StoreEntrys. Each array item can be used
 to determine existence of object in cache (pointer to actual
 StoreEntry, for eg, or refcount, or expires time, or obj size)
 StoreEntry database could also be a simple linear array, without
 need to have (double)linked lists. As we implement direct mapping
 of MD5 to disks, theres no need to manipulate this db in ram.
 Or we can even omit StoreEntry db altogether and keep in ram only
 pending requests.

 Cache hit/miss detection is very straightforward and fast.
 We need not keep separate MD5->filename translation.
 We even need not keep full MD5 keys in ram, as we accept
 occasional collisions of MD5 hash found in ondisk objects.

 We need not rebuild swapstate, we simply know how many files
 can fit into cachedir, and assume that all files are in use.
 (or sequentially stat all files to be sure) On false hit we
 invalidate array item, or most probably simply overwrite it with
 new just requested object.

 If we accept partial storedb inram, as have been talked about
 in relation to reiserfs, then we can even skip loading timestamp
 data into ram, and as soon as we have verfied existence of disk
 file, we can assume a hit.
 We can bypass all FS altogether and map MD5 directly onto raw
 disk block or fragment.

 I rely here on MD5's property to evenly distribute bitmap usage,
 (which is what basically makes collisions rare), thus I hope that
 disk space would be utilised quite well eventually.
 I also assume that collisions will very rarely happen in fast
 successions, ie. before a collision happens, an object would have
 some age it has stayed in cache, and thus perhaps served its purpose.
 If we assume 10% of collisions, and 100% of disk space is wiped
 through cache in 24h, then it seems reasonable to assume that any
 object will stay in cache for few hours before a collision occurs
 on it, on average.
 In fact, we rely on collisions and need them, as this is our only
 replacement policy. With 25% of collisions we expect full cache
 to be replaced in 4 days. So collisions are our friend.

 Of course alot of restrictions are imposed. It is assumed that
 disk storage is divided into fixedsized blocks, one per object.
 no replacement policy can be implemented other than either
 overwrite old object or drop the new one.
 Lots of disk space may stay unused. Some set of URL's may
 constantly compete for the same disk store due to collisions.
 Very difficult to maintain cache digests. For ICP hit/miss we
 could return hit based on our bitmasked search result, but
 this may lead to quite high false hit rate.

 But we could maintain enourmous amount of objects with very
 little ram usage. To solve for variable sized web objects, we
 can implement cachedir selection based on object size. Or we
 can increase complexity of array by allowing consecutive disk
 blocks be allocated to the same object.

 There are awful lot of web objects under 512byte size. We can
 fit 70M of these on a disk of 36G. Theres no way we can handle
 this amount of objects with current Squid mem usage, unless
 we have pretty expensive hardware.

 What seems attractive to me in this approach, is that we don't
 need to bother about replacement overhead, lookup overhead,
 FS overhead, memory overhead, startup overhead, (even dirty
 startups, as most disk ops are atomic). We can consider disks
 just a bunch of spindles, cheap and massive compared to anything
 else in a system.

 If all this sounds disgusting, beat me ;)
 
 PS.
 To quickly check what disk utilisation and collision rate I'd
 have I took access.log from one of recent days, which had
 4070655 urls, of which 2012485 where found unique, and made
 few tests with different array sizes and URL sets.
 Probably these are not very good samples, as MD5 is known to
 get less collisions as bitmask increases, and I have too few
 URLs to fill all arrays. But it seems that it might work ok
 starting from 2M objects in array.

MD5 test suite. Array size: 4194304
Done Reading URLS: 1000000
    0 - 3304705 78.79% of array unused

 hits - count urls%
    1 - 787675 78.77%
    2 - 93937 9.39%
    3 - 7521 0.75%
    4 - 445 0.04%
    5 - 18 0.00%
    6 - 3 0.00%

MD5 test suite. Array size: 2097152
Done Reading URLS: 1000000
    0 - 1301847 62.08% of array unused

 hits - count urls%
    1 - 620829 62.08%
    2 - 147766 14.78%
    3 - 23526 2.35%
    4 - 2887 0.29%
    5 - 271 0.03%
    6 - 24 0.00%
    7 - 2 0.00%

MD5 test suite. Array size: 1048576
Done Reading URLS: 1000000
    0 - 403796 38.51% of array unused

 hits - count urls%
    1 - 385997 38.60%
    2 - 183258 18.33%
    3 - 58314 5.83%
    4 - 14029 1.40%
    5 - 2715 0.27%
    6 - 419 0.04%
    7 - 44 0.00%
    8 - 4 0.00%

MD5 test suite. Array size: 4194304
Done Reading URLS: 2012129
    0 - 2596049 61.89% of array unused

 hits - count urls%
    1 - 1245622 61.91%
    2 - 298400 14.83%
    3 - 47851 2.38%
    4 - 5801 0.29%
    5 - 540 0.03%
    6 - 39 0.00%
    7 - 1 0.00%
    9 - 1 0.00%

MD5 test suite. Array size: 2097152
Done Reading URLS: 2012129
    0 - 802939 38.29% of array unused

 hits - count urls%
    1 - 771855 38.36%
    2 - 369207 18.35%
    3 - 118271 5.88%
    4 - 28500 1.42%
    5 - 5379 0.27%
    6 - 873 0.04%
    7 - 113 0.01%
    8 - 12 0.00%
    9 - 3 0.00%

MD5 test suite. Array size: 1048576
Done Reading URLS: 2012129
    0 - 153429 14.63% of array unused

 hits - count urls%
    1 - 295569 14.69%
    2 - 283591 14.09%
    3 - 181341 9.01%
    4 - 86995 4.32%
    5 - 33383 1.66%
    6 - 10475 0.52%
    7 - 2931 0.15%
    8 - 700 0.03%
    9 - 136 0.01%
   10 - 19 0.00%
   11 - 5 0.00%
   12 - 2 0.00%

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Mon Nov 06 2000 - 14:12:14 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:56 MST