Re: Antwort: Re: Antwort: Re: Antwort: [Mod_gzip] Vary: header and mod_gzip

From: Henrik Nordström <hno@dont-contact.us>
Date: Wed, 28 Aug 2002 05:24:06 +0200 (CEST)

On Wed, 28 Aug 2002 Michael.Schroepl@telekurs.de wrote:

> _But_ there is a big caveat. I stated that a
> constant configuration _and_ file set would be
> a prerequisit for the behaviour of the "file"
> and "uri" rules. Let me explain why.
> The problem is URL translations inside Apache.
[snip]
>
> Now let some user remove the "index.html" file
> and replace it with some "index.shtml" file.
[snip]

New content. New rules. New ETags (if any). Vary is not about this.

If hoever the index.shtml responds differently depending on the type of
request then it should include a Vary header, and mod_gzip would add to
this existing header.

> mod_gzip has a configuration directive to enti-
> tle it to collect these chunks, remove the chun-
> king information, compress the whole packet and
> send it to the client. This is additive to the
> whole rule set - compressing a SSI content re-
> quires both the "dechunk" option being activated
> _and_ some configuration rule to explicitly ac-
> cept this object for compression.

Irrelevant. transfer-encoding or not it is still the same entitiy. For the
sake of mod_gzip a SSI or CGI generated content is a reply entity, just
like a static file. The fact that the entity was internally
transfer-encoded before reaching mod_gzip is an implementation detail of
the inner workings of Apache that has no implication on HTTP what so ever.

> So even the knowledge of the _whole_ item rule
> set would not be enough to decide whether a
> request will be compressed always or never.

Why not?

> The effective meaning of the "/" URL can change
> any time without any change within the mod_gzip
> configuration, because Apaches URL mapping is
> dynamic anyway, which may well put the same
> URL from a file that will be compressed always
> to a file that will be compressed never or
> anything else.

mod_gzip does not need to care about dynamics outside of mod_gzip. It is
the responsibility of those other modules to care in such case and to emit
the relevant Vary headers where they apply.

> Right now I doubt whether there may be a solution
> even if you evaluate the whole Apache configura-
> tion knowledge, because just removing or adding
> a file can change in the document tree is enough
> to change the effective meaning of some URL and
> change its "compression behaviour class".

Not relevant. You only need to care about "this reply" and how you would
have responded in case the request was different, not any possible future
replies.

The hinting on future suitability of a reply is indicated via
Expires/Cache-Control, and is an entirely different question.

> One may be able to find out _whether_ such a
> translation has taken place or not (I guess
> this knowledge to be available in some Apache
> request representation record), but _if_ it
> has taken place there will probably be _no_
> way to find out whether a "Vary:" header
> _needs_ to be sent or not. (And this applies
> to each and every request for an URL that is
> ending in a "/"!)

If a Vary: header needs to be sent due to such request translations it
should have been added by the module doing the translation. Not the
responsibility of mod_gzip.

> And then, there are configuration options that
> are based on the combination of the features
> of mod_gzip and Apache.
> You can write some Apache configuration section
> like
> <LocationMatch *.html$>
> mod_gzip_on no
> </FilesMatch>
>
> This _might_ be semantically identical to
>
> mod_gzip_item_exclude uri \.html$

True, but neither of these two examples has any request headers
dependencies and is therefore safe.

> So don't be too optimistic about mod_gzip telling
> you that a request will never be compressed - this
> isn't easy to find out if you want to detect all
> of these cases. mod_gzip is simply too powerful.
> (But detecting _some_ of them would already help
> - in fact each case that will end up without a
> "Vary:" header would help.)

> > Likevise if the configuration is
> > such that mod_gzip would always compress the reply
> > no matter who requested it.
>
> I am afraid it is even _more_ difficult to find out
> about this one.
> Again the validate function is looking for _at_least_
> one include rule to fire, and when it has found one,
> it doesn't care about other rules.

And it does not need to either.

If your rule processing finds a "static" include rule and haven't seen any
"request header" exclude rules prior to this rule then you know within
reasonable doubt that the reply will always be compressed.

> Maybe some priority list model would help:
> a) classify the mod_gzip rule types (level 1: "uri"
> and "file"; level 2: "reqheader"; level 3: the
> remaining three rule classes)

Yes.

> b) make the validate routine scan for them in some
> order so that the maximum information usable for
> the "Vary:" header to be created later can be
> stored in some data structure
>
> c) use this information when the time has come to
> _decide_ whether a "Vary:" header should be
> sent and which information it should contain.
> But as shown above, this will not be enough to even
> reliably find out whether a "Vary:" header (beyond
> "Accept-Encoding, that is) will be needed or not.

You can quite likely put some responsibility on the server administrator
to configure things in a reasonably rational manner. I don't see the need
to attempt to cover to 100% all possible cases.

> At the moment, I am not able to discuss the ETag and
> If-None-Match issues. I understand the concept of
> ETag (because I know exactly one browser that is
> sending ETag HTTP headers: Opera 6), but I am not
> sure what mod_gzip would have to do in the
> "If-None-Match" area.
> Please forgive me if I concentrate on the "Vary:"
> issue first and learn the rest later. ;-)

Vary and ETag are very close friends when it comes to caching.

The mod_gzip implications on ETag is only that mod_gzip must alter the
ETag of returned responses to differentiate them from uncompressed reply
entities for the same URI. A compressed and uncompressed reply entity is
two different objects in terms of HTTP and must have different ETag:s if
they share the same URI (or alternatively no ETag at all).

If-None-Match is simply that the HTTP server will be able to tell if any
if the ETag:s in the If-None-Match is identical to the current object
entity as it would have been returned for this request, and in such which
of them. If-None-Match is used by caches to query the server which (if
any) of the previously cached reply entities for the URI may be sent in
response to the request. It is the successor to If-Modified-Since and
solves all issues of If-Modified-Since.

> And even if I did, I would be aware of the fact that browsers
> that are not asking for gzipped content currently deny to
> understand it, even if they have the decompression code
> implemented.
> So sending compressed content if no "Accept-Encoding: gzip"
> at mod_gzip is simply no option anyway.

Good.

> Do you consider mod_gzip to behave like a cache in this area?
> Would this help anyone?

Not sure what you mean.

> I am asking mostly for other reasons:
>
> I myself have an implementation of a compressing HTTP cache
> (a Perl CGI script) that is to be embedded into Apache via
> the "AddHandler" hook.
> (And yes, I do send the "Vary" header. ;-)

If this is meant to run on a HTTP server then it is't a cache in terms of
HTTP as it occurs before HTTP. The internals of Apache may resemble HTTP,
but still the HTTP endpoint is Apache as a whole.

> and I would be glad to learn about everything this program
> is doing right or wrong about handling the HTTP protocol ...

Provided you send correct Vary headers and make sure to modify ETag when
you modify the reply entity there isn't much you can do wrong, but I
cannot speak for any illeffects in interactions with other features of
Apache..

Things to watch out for:

  - Range requests
  - If-XXX HTTP conditions (If-None-Match, If-Modified-Since, If-Match,
...)

In terms of HTTP the compressed object is a unique object on it's own,
separate from the uncompressed original. Care must be taken to ensure the
two are clearly identified (ETag) and never intermixed. A Range request on
a compressed object is requesting a range of the compressed entity, not
the original uncompressed one. A ETag is identifying a specific reply
entity, not the original object. Etc..

Regards
Henrik
Received on Tue Aug 27 2002 - 21:24:09 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:15 MST