Re: What is The logic of Vary Headers cachiness?

From: Henrik Nordström <henrik_at_henriknordstrom.net>
Date: Fri, 26 Jul 2013 12:48:55 +0200

tor 2013-07-25 klockan 14:20 -0600 skrev Alex Rousskov:

> > The variant of an URL is identified by it's ETag value.
>
> Are ETags required by HTTP in a Vary context? Or is it just what you see
> some origin servers implementing?

To identify the variant yes. Without ETag the variant have no identiy.
Variants with no identify can not be shared among different request
header combinations (unless Key is implemented as well).

> > Caches uses If-None-Match on unknown requests to ask which variant among
> > the known set of cached variants is the right response for this request.
> > The result of an If-None-Match is a 304 listing the ETag of the variant
> > to respond with + updated response headers.
> >
> > What you end up with is a map of
> >
> > * Request headers as indicated by Vary
> > * Which variant (etag value) is the right response for the matching
> > request.
> > * Possibly updated headers for the response (Date, Last-Modied,
> > Set-Cookie, X-Whatever) carried in a 304 response to If-None-Match.
>
> Can you express the above in an n:m map form you mentioned earlier? What
> maps to what?

In the ideal world it is an N:1 map where multiple requests maps to the
same variant.

Basic mechanism:

Vary indicated request headers maps to ETag

ETag maps to (or is) variant identifier.

If there is no ETag then Request headers maps to variant directly.

The list of known responses MAY have different set of Vary headers
making the same request match multiple entries.

In a distributed cache you may also run into multiple differently aged
entries with different outcome.

In squid-2 there is a special case extension of this to work around
mod_deflate and a couple of others mismanaging ETag. When this gets
triggered the variant identifier is extended with Accept-Encoding
related details to separete gzip from non-gzip variants even when they
both have the same ETag.

> I suspect your n:m map is a side effect of adding ETag optimization to
> "straight" Vary support. In the straight implementation, there is no
> If-None-Match optimization on top, just a hit (if Vary-controlled
> headers match one of the fully cached responses) or a miss (if they do not).

It's sufficient if you think of it as an N:1 map. But if you do a
trivial list based implementation with fall-through to next entry on
store miss then it becomes N:M by implementation.

> I also do not see why updated response headers belong to x-vary-object.
> Would not Squid receive them in the origin server response or load them
> from the cache? Why store them in x-vary-object?

I am not saying that they go into X-Vary object. But they are part of
the process and needs to be included in the response.

A simple and scalable implementation that would limit the X-Vary churn
is to cache the 304 responses separately based on the request headers.
This gives a store with 3 kinds of objects

key : object

Base URL: X-Vary, containing Vary and a list of known ETag values.

URL + Vary indicated request headers: 304 response with updated headers
+ ETag.

URL + ETag: Actual object.

we might also use a similar design to handle header updates in general
until the store can handle them.

The "304 object" should probably carry a full set of headers, which
makes lookup of the full object optional in conditional requests and
HEAD.

Regards
Henrik
Received on Fri Jul 26 2013 - 10:49:48 MDT

This archive was generated by hypermail 2.2.0 : Fri Jul 26 2013 - 12:01:00 MDT