Re: [PATCH] reply_from_cache and reply_to_cache

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 15 Oct 2013 09:36:39 -0600

On 10/14/2013 10:42 PM, Amos Jeffries wrote:
> On 15/10/2013 5:09 p.m., Alex Rousskov wrote:
>> I forgot to mention that we can also try to do here what we did for
>> ssl_bump. That is, enlarging the set of actions from the default
>> allow/deny to allow/deny/ignore-miss/ignore-hit/store-miss/send-hit:
>>
>> cache deny foo # same as cache deny foo
>> cache send-hit baz # same as reply_from_cache allow baz
>> cache ignore-miss bar # same as reply_to_cache deny bar
>> ...
>>
>> I think that might be better than "store-miss allow" and friends because
>> this scheme follows the traditional "first matching rule wins" approach,
>> but I am not sure it is better than reply_to/from_cache.

> It does not solve the issue of using reply details in the ACLs though.

It can. Actions that need reply details can be ignored until those
details are available (that is what we tried with Peek and Splice as I
ranted earlier). Another solution (within the same overall approach) is
to add a ReplyAvailable ACL that can be used to protect rules with
reply-only ACLs.

>> Current decision points are:
>>
>> * before hit/miss is detected (the current cache directive)
>> * when a hit is detected (proposed reply_from_cache)
>> * when a miss is being received (proposed reply_to_cache)
>
> How do you see 1 and 2 on that list being different?

I look at the code. They are different there. The existing "cache"
directive is processed _before_ we even try to check whether there is a
hit. In fact, it prevents us from lookup up the cache contents. That is
point 1. Point 2 happens much later, after we started loading [what
could become] a hit from the cache.

> The old cache directive makes no decision about whether the stores are
> involved or not. It just determines between converting a HIT into a MISS

No. It prevents HIT from happening before it knows whether there is a
HIT. That is a big difference from both semantics/ACLs and performance
point of view. We can make a backwards-incompatible change and
essentially remove the "cache" directive functionality from Squid, but
that is not what the proposed patch does.

I am happy to discuss whether backwards-incompatible changes would be
better in this case. I do not know the answer to that question, but
before we even start discussing backwards-incompatible changes, I would
like to make sure that you actually want [to consider] them.

> and (wrongly) causes invalidation of any stored content.

which is another difference with the proposed directives. They do not
invalidate stored content (Note to self: fix that aspect in the updated
descriptions -- I suspect some of my invalidation tests were flawed!).

> We need to change that decision point to being a decision whether store
> is not-involved or is-involved.

Why do you think we need to make this backwards-incompatible change? I
am not trying to argue against that change right now, just asking why do
you think it should be done?

> * If the store is not-involved no HIT is possibe, but also invalidation
> and revalidation does not take place on already stored content.
> * if the store is involved, it may HIT, revalidate or invalidate stored
> content

> The 3rd decision point, which is the only completely new one here to
> make a local store behave as if CC:no-store was received from the server.
> * if the store writing is denied CC:no-store makes no statement about
> existing content (invalidation does not have to happen).
> * if the store write is allowed, then existing content gets
> invalidated/revalidated as per HTTP normal requirements.
>
> NP: we have been talking in terms of HIT/MISS so far, but for the MISS
> checks we also need to consider REFRESH/revalidate backend requests.
> ** In the event that Squid is performing a REFRESH to the server do we
> want the store-write denial case to prevent updating of the cached
> content? or to treat that somehow different?

In the proposed patch, "miss" is interpreted as "contact with the server
is needed", which includes refresh requests. Reply_to_cache will affect
refresh requests like any other miss request. I think that is the best
default (because it is simple and clear). In the future, we can add ACLs
to detect refresh transactions so that admins can exclude them from
reply_to_cache scope as needed.

> Overall I am inclined to scope these ACL acess checks in terms of
> read/write access to the store rather than HIT/MISS on stored contents.

The proposed patch uses the same approach. When describing the options,
it uses "serve from cache" and "stored in cache" terminology rather than
"hit" or "miss" terminology. Example and hints use "hit" in a few
places. Perhaps those can be polished further to minimize questions
about "refresh hits".

> Doing so makes the criteria much more simple:
> * First decision point is simply whether to involve store lookups
> yes/no. This can only be made on request details (current "cache"
> decision point with new semantics)
> * Second decision point being whether to write any new information found
> (regardless of MISS/REFRESH states) back to cache. This can be made
> after receiving reply.

The first decision point is insufficient as Store ID loops illustrate:
You want to prevent already cached 302s from being served from the
cache. You cannot do that at decision point 1. You need [mine] decision
point 2 (missing from your list above -- your decision point 2 is mine
decision point 3).

Going forward, I think we need to decide:

A) Whether altering the existing "cache" directive semantics is
desirable. If it is a good idea, we can remove or deprecate that option
and ignore its [end-of-life] existence when deciding how to structure
the new directives.

B) Whether we should keep decision point 1 (action before hit/miss
determination and partial loading of the response is made).

I am not sure about (A) but I suspect we may want keep the "fast"
decision point in (B) if for no other reason than performance of simple
cases.

Cheers,

Alex.
Received on Tue Oct 15 2013 - 15:36:47 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 15 2013 - 12:00:12 MDT