Re: gzip support

From: Robert Collins <robertc@dont-contact.us>
Date: 16 Dec 2002 21:17:23 +1100

On Mon, 2002-12-16 at 08:06, Stephen Sprunk wrote:
> Thus spake Henrik Nordstrom:

> Is there any functional difference between:
>
> Content-Encoding: gzip
> Transfer-Encoding: identity
>
> and:
>
> Content-Encoding: identity
> Transfer-Encoding: gzip
>
> I don't see a difference, but you guys know the RFC a lot better than
> I do :) I ask specifically because there's (at least) one browser
> that requests both gzip C-E and gzip T-E. If they are, in fact,
> identical, then most of the RFC's objections to munging the Content-
> Encoding are unnecessary.

Which browser?

And they are very different.
CE is end to end - i.e. I might send a tar ball gzip C-encoded, and
expect the browser to preserve that encoding when saving it to disk.
(i.e. content-type application/x-tar, content encoding gzip (yes, I
haven't looked up the actual mime types - this is a thought
experiment)).

TE is hop to hop - server and client software never see TE, just like
they never see Van Jacobsen header compression in TCP. (For both, I mean
in a general sense. There are API's to access such information, but it's
not a concern to the general programmer).

In other words, CE matters to browser writers and web site authors.
TE matters to HTTP transport writers.

CE alters the length of the entity. This 'breaks' range requests. (well,
it makes it *damn* hard to predict what you'll get).
TE does not alter the length of the entity - this preserves range
requests, and thus allows compression that CE does not.

There is a separate RFC on content transformations that seeks to address
this separation and a bunch of related weakness's in HTTP. Don't recall
it's name off-hand.
 
> Is the ETag supposed to be different if you get a chunked-encoded vs
> identity-encoded version of the same file? If not, why should a
> gzip-encoded version get a different ETag?

TE preserves all end to end headers = no change in ETAG for TE.

> If Vary is not needed for Transfer-Encoding correctness, I don't see
> it being necessary for Content-Encoding correctness either. Of
> course, it's still a good idea :)

Vary is a must for CE, because CE depends on things like Accept-Encoding
values, where a different Accept-Encoding value results in a different
CE coding. TE is hop to hop and therefore won't break cache behaviour by
being different every time.

Example of CE w/out vary:

Client A requests Foo, A-E includes gzip.
Foo response gets cached in lan squid-cache, with no vary, in gzip form.
Client B requests Foo, A-E includes gzip, but User-Agent is a known
'faulty' browser, so the server *would have* sent an identity object, if
it got the request...
But, the squid cache returns the cached object, gzip'd, and client B
breaks. (i.e. displays 5 squres in the top right, as IE does.)

> So, you agree the idea would fly if we limited it to Transfer-Encoding
> support and ignored the (apparently) functionally identical Content-
> Ecoding field?

See the te branch on sourceforge. And the te-modules branch for gzip,
compress etc coding types.

It's idle at the moment while we fix the internal request path to allow
range requests to work properly with squid+TE.

But, it worked really really well for non range requests in my testing
(IIRC nearly 2 years ago now).

Rob

Received on Mon Dec 16 2002 - 03:17:26 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:01 MST