Re: vary stuff from Henrik Nordstrom on 2007-02-12 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 12 Feb 2007 22:03:06 +0100

mån 2007-02-12 klockan 20:46 +0800 skrev Adrian Chadd:
> I'm about to start implementing replacement memory-only store client
> primitives and I'm not fully on top of how the vary code abuses
> store objects to do its thing in store.c.

He, abuses is a good description.

It actually doesn't do very much with the store objects as such, most of
the magic is currently taking place in the request and used by the
storeLookupByRequestMethod call..

> Would you mind if the Vary support was culled out of the storage work
> branch until I've tidied up the storage manager layer somewhat?

No problem. It's not really that tricky thing to support. The tricky
part was getting it into Squid-2 without a suitable store interface or
even intermediary layer..

The things you need to remember about Vary:ing objects and HTTP caching
in general.

0. Caching specifications in HTTP is primarily concerned with GET
requests resulting in 200 OK or derived responses (i.e. 206/304) and
variants of that 200 OK with N variants per URI on the server identified
uniquely by ETag and/or Content-Location. There is some odd twists like
POST which may return a cachable 200 OK suitable for later GET requests
of the same URI (doubt this is used anywhere btw..).

1. The client->intermediary "lookup" API needs to be async for it to be
able to do the vary dance. May need multiple store lookups and possibly
a conditional upstream request to find the correct response.

2. In the optimal world each variant has a unique ETag identifying the
response entity (body + entity headers). Such objects may be shared by
multiple request thanks to If-None-Match 304 replies building up the
knowledge of the Vary logics in the cache. Responses not having an ETag
is identified by their request headers selected by Vary and "unique" for
that request header combination.

3. There vary dance has two different but related results

a) On a cache "hit" (maching request found), the result is a the
matching response entity (headers + body), based on priory seen request
headers and Vary responses and the object (ETag or unique) this maps to.

b) On a cache miss not finding a matching Request headers + Vary
response header pair one need to find a list of ETag:s of the currently
cached variants (fresh and expired equal) of the URI. Used for building
an If-None-Match conditional request for finding out which (if any)
cached variant is valid for this request.

A twist here is that many server implementers of mainly dynamic gzip
content-encoding (which really really should be done as
transfer-encoding) don't understand that well HTTP and messes up wrt
ETag and Content-Location. Due to this we need a blacklist where ETag
alone isn't trused but must be combined with the Accept-Encoding request
header as well to identify the variants of the URI. The Content-Location
problem will bite us the day we start to follow the RFC and correctly
invalidate variants on changes and I have not yet identified if there is
a similar workaround possible..

Some words on ETag vs Content-Location:

This whole dance is based on the "server driven content negotiation"
scheme. thought of as a server having multiple variants of the same
object, differing in format (i.e. gif/jpeg/png), language (i.e
sv/en/de), encoding (i.e identify/gzip/deflate), each stored as a unique
file in the http directory of the server and each accessible separately
by unique URIs.

Content-Location defines the exact origin of the response. ETag
identifies the exact version of the response.

ETag is guaranteed to be unique for all variants and for a strong ETag
all versions of the URI so the protocol focuses on ETag in mapping
relations between requests and responses.

Content-Location is mainly used in invalidations to make sure all users
gets the most recently seen version of a variant.

There is still some small details wrt freshness of Vary:ing objects
which I have not fully understood how it's supposed to work. In the
worst case we may need to maintain it separately per request header
combination.

Regards
Henrik

application/pgp-signature attachment: Detta �r en digitalt signerad meddelandedel

Received on Mon Feb 12 2007 - 14:03:17 MST

This archive was generated by hypermail pre-2.1.9 : Thu Mar 01 2007 - 12:00:02 MST