Re: [RFC] cache architecture

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 24 Jan 2012 12:21:07 -0700

On 01/24/2012 01:51 AM, Amos Jeffries wrote:
> On 24/01/2012 6:16 p.m., Alex Rousskov wrote:
>> On 01/23/2012 07:24 PM, Amos Jeffries wrote:
>>> Ideal Architecture;
>>>
>>> Squid starts with assumption of not caching.

>> I believe you wanted to say something like "Squid starts serving request
>> possibly before Squid loads some or all of the cache contents, if any".
>> Caching includes storing, loading, and serving hits. An ideal
>> architecture would not preclude storing and serving even if nothing was
>> loaded from disks yet. I believe you already document that below, but
>> the above sentence looks confusing/contradicting to me.

> I means a bit more extreme than that. Squid being prepared to serve
> requests before possibly even getting to the async call which
> initializes the first cache area.

Yes, that matches my understanding.

> We often think of cache_mem as being always present when any caching is
> done, but there really is no such guarantee.

We had to think that way before Rock because the intransit space and
cache space were the same thing. I agree that we should not, ideally,
assume that any cache (memory and/or disk) is present at any given time.
This helps with startup, shutdown, and when dealing with disk failures.

> Admin can already configure
> several cache_dir and "cache_mem 0". The problem is just that todays
> Squid do some horrible things when configured that way as side effects
> of our current design assumption.

The assumption is no longer there, but there is still code that uses it.
This is one of the reasons why I consider finishing deleting store_table
very important.

>>> 3) possibly multiple cache_mem. A traditional non-shared cache_mem, a
>>> shared memory space, and an in-transit unstructured space.
>> In-transit space is not a cache so we should not mix it and cache_mem in
>> an "ideal design" blueprint. Collapsed forwarding requires caching and
>> has to go through cache_mem, not in-transit space.
>
> So proposals for collapsables which are too large for cache_mem?

If I understand what collapsable requests are, cache admission,
allocation, or eviction policy can treat collapsables specially. This is
a lower-level issue, IMO, than the blueprint you are discussing though.

> or when "cache_mem 0"?

Collapsables can go through disk. If disk caching is also disabled, we
cannot collapse requests because the admin told us not to cache at all.
If the issue is important for some, it would be possible to implement a
caching policy that reserves space for collapsables and caches nothing
else, of course.

>> Please keep in mind that any non-shared cache would
>> violate HTTP in an SMP case.
>
> You have yet to convince me that the behavious *is* a violation. Yes the
> objects coming back are not identical to the pattern of a traditional
> Squid. But the new pattern is still within HTTP semantics IMO, in the
> same way that two proxies on anycast dont violate HTTP. The cases
> presented so far have been about side effects of already bad behaviour
> getting worse, or bad testing assumptions.

If "MUST purge" is just a MAY because things will usually work even if
the cache does not purge, then this is a question for HTTP WG because we
would not be able to reach an agreement on this issue here; IMHO, a
burden of proof that ignoring MUST is OK should be on those who want to
ignore it. I do not see a point in those MUSTs if we can violate them at
will by deploying a proxy with two out-of-sync caches.

>>> 8) cache_dir scan should account for externally added files. Regardless
>>> of CLEAN/DIRTY algorithm being used.
>>> by this I mean check for and handle (accept or erase) cache_dir
>>> entries not accounted for by the swap.state or equivalent meta data.
>>> + allows reporting what action was taken about the extra files. Be it
>>> erase or import and any related errors.
>> I think this should be left to individual Stores. Each may have their
>> own way of adding entries. For example, with Rock Store, you can add
>> entries even at runtime, but you need to update the shared maps
>> appropriately.
>
> How does rock recover from a third-party insertion of a record at the
> correct place in the backing DB followed by a shutdown?
> erase the slot? overwrite with something later? load the object details
> during restart and use it?

Since there is no swap.state, it is the latter: load the object details
during restart and use it.

> For now it is perfectly possible to inject entries into UFS and COSS (at
> least) provided one knows the storage structure and is willing to cope
> with a DIRTY restart.

For UFS, you need a DIRTY DIRTY restart (no swap.state at all) if you
only add a file. You do not need a DIRTY restart if you also update
swap.state* logs, of course.

For COSS, you may not need a DIRTY restart because, *IIRC*, Squid COSS
does not use swap.state although it does create it.

For Rock, I think you may safely import files without a restart.

Cheers,

Alex.
Received on Tue Jan 24 2012 - 19:21:49 MST

This archive was generated by hypermail 2.2.0 : Wed Jan 25 2012 - 12:00:11 MST