Re: Content-Encoding and storage forma from Henrik Nordstrom on 2004-02-29 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sun, 29 Feb 2004 12:08:38 +0100 (CET)

On Sun, 29 Feb 2004, Jon Kay wrote:

> ...regardless of that last sentence, virtually all browsers appear to
> use Content-Encoding to indicate message encoding rather than entity
> encoding. Virtually none support Transfer-Encoding of any sort.

Content-Encoding is what a web server would generally do, as it is a lot
cheaper on the web server.

- The web server can precalculate different content-encodings. The
simplest way is to simply store the object in multiple encodings.

- In addition Content-Encoding works reasonably well with HTTP/1.0 (TE
requires HTTP/1.1).

But at the same time it is a sad truth that things like mod_gzip really
does not at all fit well within the Content-Encoding scheme of things.
mod_gzip is well suited for a TE engine but in most installation violates
several HTTP aspects when used as a Content-Encoding engine.

It is currently a chicken&egg problem. There is no servers supporting TE
as there is no browser supporting TE because there is no servers
supporting TE.. which is also an effect of TE requiring HTTP/1.1 while
Content-Encoding has been around for a very long time, basically since the
beginning of time.

Applying Content-Encoding in an accelerator makes sense, and can be done
reasonably well. Applying Content-Encoding in a general purpose Internet
proxy is a different beast and you then need to be very careful.

> This means that to do anything of much use (especially since very few
> sites use gzip coding), we will have to bend standards. This goes
> beyond simply replacing the strings "TE" and "Transfer-Encoding" with
> "Allow-Encoding" and "Content-Encoding". RFC2626 notes in 13.5.1 that
> Transfer-Encoding is a hop-by-hop header, and Content-Encoding is
> not, and you point out the Etag consequences.

Yes.

But the standard also incorporates a definition called "a [semantically]
non-transparent proxy" which means you are in fact allowed by the RFC to
do this this type of Content transformations. See chapter 13 paragraph 3,
and for Content-Encoding in specific see 13.5.2.

> my proposal is to treat Content-Encoding and Allow-Encoding as though
> they were Transfer-Encoding and TE. That is, a string replacement
> at the standards level instead of at the input/output text level.

There is no need to modify the standard. All you need is already there.

All that is needed is great care when implementing a non-transparent proxy
as you all of the sudden causes different behaviour to the client.

> Under this proposal, we would consider content-encoding to be
> hop-by-hop, and content-encoded objects to be the identical entities
> to unencoded ones.

The encoded and unencoded entities may be able to share the same weak ETag
as they maybe can be considered to share the same semantics but must not
have the same strong ETag as they are binary different. If the original
does not have an ETag then things are messy. But honestly I would not
recommend preserving the object identity (ETag).

If you can it is recommended to use strong ETags as these allow for Range
requests and other nice features.

The reasons to the above is the requirements from conditional and
sub-range requests. A strong ETag indicates the objects are binary
equivalent and multiple ranges of the same entity may be merged as one. A
weak ETag indicates the entities are semantically equivalent and may be
used interchangeable but may be binary different. A weak ETag is
sufficient for cache revalidations but disallows range merging as it does
not guarantee the two are binary identical.

A recoded object such as gzip can be regarded semantically equivalent
providing the user-agent knows how to decode gzip, but are obviously not
binary equivalent to the non-encoded entity. If you are 100% certain that
all user-agents ever accessing contents from this server accepts gzip
content-encoding then you may use the same weak ETag for both original and
encoded, but if there ever is cases where clients should get the original
then you must not, as if you do you instruct downstream caches the gzip
and original are equivalent regardless of what the client accepts.

Regards
Henrik
Received on Sun Feb 29 2004 - 04:40:05 MST

This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:04 MST