Re: [PATCH] support parameters for no-cache and private

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 18 Feb 2013 23:58:20 -0700

On 02/18/2013 08:20 PM, Amos Jeffries wrote:

> The HTTP semantic meaning of these three cases is very different between
> themselves but the same syntax<->meaning pattern for both private and
> no-cache.
>
> CC:private ... means *whole* response is private, MUST NOT cache.
>
> CC:private="foo" ... means the specific listed *headers* are private,
> response itself MAY be cached.

Agreed. And until Squid learns to delete private headers, the whole
response is private in both cases (otherwise we will be "leaking"
private headers and violating HTTP).

> CC:private=^%%$ ... (or other breakage) means invalid options and SHOULD
> be ignored.

Debatable. If the breakage is in our code, then we should not ignore
CC:private. If the breakage is in senders code, the correct action
cannot be determined, so it is best to be on the safe side and assume
private. Thus, in both cases, we should act as if CC:private was received.

Yes, RFC 2616 talks about ignoring CC directives, but only in the
context of _extensions_ (Section 14.9.6), not in the context of a
standard directive that we recognize but somehow cannot fully parse.

> CC:private, private="foo" ... is treated in HTTP as an explicit hack by
> the sender - such that caches coping with private="foo" will obey
> CC:private="foo" and older 1.0 caches will do "CC:private" handling.

Is that an HTTPbis change? AFAICT, RFC 2616 says CC:private applies to
the whole response and does not have any language (that I can find)
which would imply that CC:private="foo" overwrites that CC:private
meaning. I am looking at the following RFC 2616 text:

> When a directive appears without any 1#field-name parameter, the
> directive applies to the entire request or response. When such a
> directive appears with a 1#field-name parameter, it applies only to
> the named field or fields, and not to the rest of the request or
> response.

The text is sloppy, but I think the above considers two cases (just
CC:private and just CC:private="foo"), without explicitly covering the
combination case. My interpretation of the combination case does not
contradict the above rules. If overwriting was the intent in the
combination case, it should have been documented explicitly. Does
HTTPbis do a better job at documenting this?

Note that the above applies to CC:no-cache as well.

There is another RFC 2616 paragraph that shows how an _extension_
directive can be combined with CC:private to alter CC:private meaning
for the recipients in-the-know, but that example is different from what
we are discussing here because RFC never says that CC:private="foo"
alters another CC:private meaning.

I wonder whether RFC 2616 authors even considered a
parameter+parameterless combination for the same CC directive.

> This patch is supposed to pull the parameters into a string for later
> re-use and treat it as CC:private (MUST NOT cache).

I think the patch does the above for CC:private (except for corner cases
discussed elsewhere) but not for CC:no-cache. Here are a few places
where CC parameters change the behavior in various ways even though we
do not support them yet:

> - } else if (rep->cache_control->noCache() && ...
> + } else if (rep->cache_control->hasNoCache() && !rep->cache_control->noCache().defined() && ...

> - rep->cache_control->noCache() ||
> + const bool CcNoCacheNoParams = (rep->cache_control->hasNoCache() && rep->cache_control->noCache().undefined());

>> Also, I think the code will mishandle this case:
>>
>> Foo: 1
>> Cache-Control: private
>> Cache-Control: private="bar"
>> Bar: 2
>>
>> because it will treat the above as the following less restrictive case:
>>
>> Foo: 1
>> Cache-Control: private="bar"
>> Bar: 2
>>
>> instead of treating it as
>>
>> Foo: 1
>> Cache-Control: private
>> Bar: 2
>>
>> Does HTTP say that [later] less restrictive CC directives overwrite
>> [earlier] more restrictive ones?
>
> Yes. These two directives directly contradict each other.

I do not think they do. One is simply "larger" than the other. Nowhere
HTTP says that CC:private="foo" means that the request can be cached [no
matter what other headers it may have] so there is no contradiction,
just duplication of information.

>>> // NP: request CC:no-cache only means cache READ is
>>> forbidden. STORE is permitted.
>>> + if (rep->cache_control->hasNoCache() &&
>>> rep->cache_control->noCache().defined()) {
>>> + /* TODO: we are allowed to cache when no-cache= has
>>> parameters.
>>> + * Provided we strip away any of the listed headers
>>> unless they are revalidated
>>> + * successfully (ie, must revalidate AND these headers
>>> are prohibited on stale replies).
>>> + * That is a bit tricky for squid right now so we avoid
>>> caching entirely.
>>> + */
>>> + debugs(22, 3, HERE << "NO because server reply
>>> Cache-Control:no-cache has parameters");
>>> + return 0;
>>> + }
>> The new comment and behavior seems inconsistent, especially in light of
>> the old "NP" comment that is left intact: It seems like we should store
>> (or not) regardless of the header names presence. Why does presence of
>> header names make a difference in this case?
>
> The above matches the RFC no-cache definition...
>
> Response CC:no-cache="foo" requires header manipulation Squid cannot do
> yet. ==> Do not cache.

CC:no-cache:
    Requires revalidation. OK to STORE.

CC:no-cache="foo":
    Requires revalidation if foo is not stripped. OK to STORE.

Squid does not strip. Thus, the requirements for CC:no-cache and
CC:no-cache="foo" are identical in the current or patched Squid context.
Yet, the patched code will start to treat the two cases differently. Why?

> You are missing the meaning. Their presence alters the entire no-cache
> from being scoped to the whole response to be scoped to those headers
> alone.

I understand that part. Yes, the scope becomes narrower if we wish to
optimize, but until we have code that actually performs that
optimization, the scope should not become narrower. Yet, the patch seems
to assume that Squid is already optimizing and narrows the scope.

> Handling them properly involves a lot of tricky if-exists
> post-processing on the old reply and revalidation response [...]

Agreed.

> It is far easier to simply
> treat no-cache="foo" as a no-store option and make this a "Precondition
> failed" as Co-Advisor says instead of "HTTP violation".

It is even easier to simply treat no-cache="foo" as no-cache. Doing so
would not violate HTTP and will be sometimes better than not storing the
response at all, right?

> RFC specification has an "unless" clause on both these response
> directives. This whole patch is about making that "unless" case become
> non-cacheable again. As Dmitry pointed out it was broken by the earlier
> no-cache storge update.
> For the no-cache option we are reverting that behaviour change in the
> specific case of parameters being present.

OK, that may explain some of the confusion: The proposed patch tries to
fix some bug, so the old code was wrong and we should not preserve it.
This should be explicitly documented in the commit message IMO. I was
reviewing the patch under the assumption that the code it changes was
correct since nothing in the patch description indicated that this is a
bug fix.

However, why do we restrict the fix to that specific case? If Squid
revalidates, it should be OK to serve headers specified in no-cache. And
if Squid does not revalidate, then it should not be OK to serve the
response from cache. In both cases, the parameters seem to be irrelevant.

> For the private option we are storing the parameters in a String, but
> otherwise not changing the current behaviour.

Yes, except where we cannot parse the parameter.

Thank you,

Alex.
Received on Tue Feb 19 2013 - 06:58:34 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 20 2013 - 12:00:06 MST