Re: Direct mapped like caching

From: Bert Driehuis <driehuis@dont-contact.us>
Date: Tue, 7 Nov 2000 00:04:51 +0100 (CET)

On Mon, 6 Nov 2000, Andres Kroonmaa wrote:

> I've been wondering sometimes, what if we drop LRU and other
> replacement policies, and implement something that is used in
> CPU-caching: like direct-mapped and set-associative caches.

> Now, what if we mask MD5 bitmap to drop all bits higher than what
> we can handle within our file bitmap, and use resulting number as
> a direct pointer to cachedir swapfile, or even better, disk block?

This is a tantalising idea... especially if it can be limited to the
small files, where memory overhead is much more expensive (say, all data
that can fit in one 4K block, including metadata).

> Cache hit/miss detection is very straightforward and fast.
> We need not keep separate MD5->filename translation.
> We even need not keep full MD5 keys in ram, as we accept
> occasional collisions of MD5 hash found in ondisk objects.

Very tangible advantages...

> We need not rebuild swapstate, we simply know how many files
> can fit into cachedir, and assume that all files are in use.
> (or sequentially stat all files to be sure) On false hit we
> invalidate array item, or most probably simply overwrite it with
> new just requested object.

This of course hinges on a really low collision frequency. For small
objects, some loss of caching (i.e., hanging on to a cached object long
enough to avoid real frequent reloads) is acceptable, and probably
desirable because it brings overhead down and increases overall
capacity.

> If we accept partial storedb inram, as have been talked about
> in relation to reiserfs, then we can even skip loading timestamp
> data into ram, and as soon as we have verfied existence of disk
> file, we can assume a hit.
> We can bypass all FS altogether and map MD5 directly onto raw
> disk block or fragment.

I think metadata should be inside the block and be used to verify a
cache hit.

        [snip]
> But we could maintain enourmous amount of objects with very
> little ram usage. To solve for variable sized web objects, we
> can implement cachedir selection based on object size. Or we
> can increase complexity of array by allowing consecutive disk
> blocks be allocated to the same object.

I think there is no way of getting away from seperate queue
dirs. Ditching one big file (say, our beloved McAfee 1.5M virus
scanner update) can blow away a lot of bandwidth, and the added
complexity of allowing consecutive blocks just doesn't seem worth it.

> There are awful lot of web objects under 512byte size. We can
> fit 70M of these on a disk of 36G. Theres no way we can handle
> this amount of objects with current Squid mem usage, unless
> we have pretty expensive hardware.

Well, add the metadata to the object size and the picture changes
slightly (52 bytes currently, right?) I think pegging the block size a
2K or 4K would be better; disk is real cheap and memory is real
expensive (particularly if you run into motherboard maxima :-)

> What seems attractive to me in this approach, is that we don't
> need to bother about replacement overhead, lookup overhead,
> FS overhead, memory overhead, startup overhead, (even dirty
> startups, as most disk ops are atomic). We can consider disks
> just a bunch of spindles, cheap and massive compared to anything
> else in a system.
>
> If all this sounds disgusting, beat me ;)

No, it sounds good. I think we can't get away of traditional caching for
larger objects, simply because of the same economics in both bandwidth
and complexity. But the tremendous overhead of real small objects would
be reduced significantly, at little expense in complexity. And, the
space expense is known in advance.

Cheers,

                                -- Bert

Bert Driehuis -- driehuis@playbeing.org -- +31-20-3116119
If the only tool you've got is an axe, every problem looks like fun!
Received on Tue Nov 07 2000 - 01:03:28 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:56 MST