Re: Antwort: Re: Antwort: [Mod_gzip] Vary: header and mod_gzip

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 26 Aug 2002 21:09:56 +0200

Michael.Schroepl@telekurs.de wrote:

> But mod_gzip is doing the decisions process based on informations
> that Squid cannot ever have a clue about.
> Several of these are no HTTP headers at all, but Apache internal
> informations, or they are HTTP response headers, not request
> headers (Content-Length, Content-Type, ...).

Content-Length, Content-Type etc is things Squid does not at all need to care
about in this context. What Squid needs to care about is how mod_gzip
responds to different requests for the exact same URL. If the object changes
obviously new rules might apply and is besides the purpose of Vary (for such
changes Expires and Cache-Control: max-age= is the proper mechanisms for
controlling caching).

What you need to care about is the rules for THIS object content for a
specific URL based on the request headers or other external input. Any static
rules based on the actual response object does not need to be mentioned,
neither do you need to mention "random" rules depending on internal server
state independent of the user unless you really want to (see below). A
threshold rule telling that all responses above a certain size may be
compresed is a typical static rule that does not need to be mentioned. For
the same object the rule will always trigger in the same manner.

If your server have dynamic rules that might give different responses for the
exact same request and URL with no changes in the content then you should
include a "Vary: *" header to indicate that special content negotiation rules
apply that cannot be expressed in terms of HTTP and that the server must
therefore always be queried on which response entity is the correct one for
this user. I don't think this really applies to mod_gzip. In such case you
really SHOULD support ETag and If-None-Match or else caching in shared caches
is kind of pointless as the cached content then never can be reused..

The minimum requirement of Vary is to include information expressing to caches
who might receive this kind of reply. For mod_gzip the minimal requirement is
that compressed content may never be sent to user-agents not supporting
comression, and this can easily be expressed in terms of Vary. (see below)

> mod_gzip will only serve one of two possible formats: compressed
> and uncompressed.
> Data will never be compressed if the client didn't send "Accept-
> Encoding: gzip"; but there may be _many_ cases when the client
> asks for compressed data and will still get uncompressed content.
>
> Am I right to think that "Vary: Accept-Encoding" for the compres-
> sed content and no "Vary:" header at all will be the best choice
> in this case? This is what the two published patches are doing.

I would suggest:

Alternative 1: (default)

"Vary: Accept-Encoding" if the reply is such that it might be compressed.

"Vary: Accept-Encoding, User-Agent" if you also want to use the User-Agent
header to determine if compression might be applied. (optional, default to
uncompressed if not enabled and no Accept-Encoding)

This applies to both compressed and uncompressed replies. If the reply is such
that mod_gzip might compress the reply for certain browsers/users then you
should include a Vary header.

If the reply is such that mod_gzip would never compress the reply no matter
who requested it then no Vary header should be included. Likevise if the
configuration is such that mod_gzip would always compress the reply no matter
who requested it.

Alternative 2: (optional, not the default configuration)

"Vary: Accept-Encoding" on any compressed replies, and no Vary: header on
uncompressed replies.

Alternative 1 is the "correct" one, telling caches exactly what to do and
provides optimal hit ratio if the HTTP server and cache is capable of ETag
and If-None-Match..

Alternative 2 is a best effort tradeoff for caches knowing about Vary but not
capable of making use of it. In this alternative such caches will hopefully
not cache any compressed results, but still cache uncompressed replies that
might be shared by all users. Once a uncompressed reply has been seen by the
cache this will be sent in response to future request (until expired).

> This is the reason why there should be a discussion which Squid
> version would like which mod_gzip/Apache behaviour most.

Isn't that what we are having right now?

Squid does not like to give out incorrect data to it's users, and therefore
wants servers to mention via the Vary header whenever there is server side
content negotiation taking place. This applies to all Squid versions.

Giving out incorrect data is much worse than not being able to cache If a
cache administrator gets irritated on not being able to cache then the
correct point of approach is Squid, not mod_gzip bending the HTTP rules. You
are welcome to redirect any mod_gzip flames caused by Squid not caching Vary
objects to me <hno@squid-cache.org> if you like.

> There have been examples in the past where "mod_gzip_item_exclude
> reqheader" has been used to detect proxy servers that are known
> to unconditionally store compressed content ...

Squid DOES NOT unconditionally cache compressed content unless told so and has
never done (not in the Squid-2.X series anyhow.. i.e. during the last 4
years). Neither does it compress/uncompress any content-encodings (disallowed
for proxies by RFC2616). Squid-2.4 and earlier unconditionally does NOT cache
content having a Vary header like a server negotiated compression SHOULD
have.

> Or resulting in the mod_gzip configuration adding the proxy to a
> non-compression blacklist, denying compressing for all requests
> coming from this direction - if the proxy tells who it is.

Sorry, I do not see the point here in this discussion. Squid is doing the best
it can. mod_gzip has intentionally selected to tell Squid and other caches to
do wrongly, why should then mod_gzip users blacklist Squid and other caches
rather than tell them correct information?

> Another possibility that I experienced myself: A proxy that is
> filtering out "Accept-Encoding" headers from forwarded requests,
> as to be sure it may cache each and every response.
> This one even must be a Squid 2.4, if I read my HTTP header
> traces correctly ...

Probably an administrator who have enabled a bit too agressive request
anonymization, selecting to not reveal to your server what kind of browser
the user is using. Defenitely not done in a default configuration.

What you should do in such case is to fall back onto the failsafe approach and
send back a uncompressed reply. Do not build "whitelists" of browsers known
to send Accept-Encoding: gzip, for such browsers you should use
Accept-Encoding exclusively. You do not know why the "Accept-Encoding" header
has been excluded.

Note: Squid-2.X always sends a Via: header in the request unless intentionally
disabled by the cache administrator for privacy reasons. Not that I
presonally think this is something you should make use of in mod_gzip, but
you asked..

Regards
Henrik
Received on Mon Aug 26 2002 - 13:10:00 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:13 MST