Re: Content-Encoding and storage forma

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 26 Feb 2004 10:23:19 +0100 (CET)

On Thu, 26 Feb 2004, Jon Kay wrote:

> So far, there is one difficult and thought-provoking question about
> this design: object storage format. Should it be stored encoded once
> an encoding is done (using Vary headers to figure out circumstances,
> as in the spec)? Should it stay decoded like today, and always be
> reencoded during transfer? Or should both formats be available once
> created?

My vote is to have both available once created, with proper Vary
and ETag type headers, plus some internal information to connect the two
together to allow proper cache refreshes etc. Note that Vary is not
entirely suitable here and you want to use something which more closely
represents your decision logics on what encoding to use.

While the object is read, have it cached like usual. Then when you recode
the object with another encoding stream the resulting object back to the
cache as another object.

> Now, storing both encoded and decoded does present the challenge of
> linking for synchronization purposes all encodings of a particular
> object.

Indeed.

> Joe says you guys had already done something like that in
> implemting Vary functionality.

Not exacly.

Different negotiations results in different entities having their own
life. Neither Vary or ETag have any such connections between the different
entities of the same URI. The closest thing they have is invalidation of
the variant index if a change in what variance is based on is detected.

> Are there ideas / code along these lines that I can glom onto?

When implementing caching of content recodings (and also cached
transfer-encodings if considered) the criterias is different from normal
caching and everything must be synched with the original object. I would
propose to solve this by storing sufficient amount of information in the
recoded object to be able to verify that it matches the original object on
cache hits. You dot not need to automatically purge or redo encodings when
the original changes, but you must make sure so is done at latest before
giving it out as a cache hit.

Please note that messing with Content-Encoding in a proxy is somewhat
outside the HTTP specifications and some care is needed to do it correctly
or your risk causing major headache for the HTTP/1.1 content negotiation
and equality criterias and even risk causing object corruption at the
clients. By applying Content-Encoding you technically create a new entity
(which is supposed to only be done by origin servers). This new entity
must use a different ETag to differentiate from the original to not mess
up equality conditions at the clients, and any cache revalidations etc
must be based on the original not your recoded version to make sure there
is no confusion between the recoding proxy and the origin server as to the
cached objects freshness.

Regards
Henrik
Received on Thu Feb 26 2004 - 02:23:24 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST