Antwort: Re: Antwort: Re: Antwort: Re: Antwort: [Mod_gzip] Vary: header and mod_gzip

From: <Michael.Schroepl@dont-contact.us>
Date: Wed, 28 Aug 2002 20:56:38 +0200

Hi Henrik,

>> Now let some user remove the "index.html" file
>> and replace it with some "index.shtml" file.
> [snip]
> New content. New rules. New ETags (if any).
> Vary is not about this.

I think I understand. This will help a lot.

This will make "file" rules be static ones,
and will allow for detecting "exclude file"
rules as a reason for not ever sending a
"Vary:" header, and "exclude uri" rules as
well.

>> So even the knowledge of the _whole_ item rule
>> set would not be enough to decide whether a
>> request will be compressed always or never.
> Why not?

Because even if the complete filter rule set
(i. e. "mod_gzip_**clude_item ...", which is
the one to be parsed by the validate function)
would now allow for compression, the
     mod_gzip_dechunk No
would still overrule this.
And this would be depending on the result of
the URL translation of Apache. So mod_gzip
would maybe accept the request in phase one,
then wait for the result, inspect it, find a
transfer encoding it, check whether the admin
has entitled it to remove the transfer enco-
ding and now decline compressing the item if
this option hasn't been activated.
The "dechunk" flag is working like the "mime"
rules are, at least very similar. But they
are treated separately in the mod_gzip archi-
tecture: If mod_gzip can decline a compression
because of chunked content being found and
being denied to de-chunk it, than it can spare
the process of evaluating its complete rule set.
So the rules checking is somehow splattered
around the whole mod_gzip code, and all of the
decisions may influence whether and which "Vary:"
header would be necessary.

But as you stated above, we have to consider
only the request as it _has_ been handled, not
as it _might_ have been handled had there been
a different translation result.
If so, then the "decline" can be considered
unconditional, and there should not even be a
"Vary:" header be appropriate in this case.
The "dechunk" configuration would have to be
part of the decision process whether to send
a "Vary" header or not, like all those other
configuration directives, like
- mod_gzip_minimum_file_size and
- mod_gzip_maximum_file_size

There is another option to make mod_gzip not
compress:
# ---------------------------------------------------------------------
# Required HTTP version of the client
# Possible values: 1000 = HTTP/1.0 1001 = HTTP/1.1, ...
# This directive uses the same numeric protocol values as Apache internally
  mod_gzip_min_http 1000
# (By using this directive you may exclude old browsers, search engines
etc.
# from the compression procedure: if the user agent doesn't declare itself
# capable of understanding at least the HTTP level specified here, only
# uncompressed data will be delivered - no matter what else it claims to
# be able to. The value of '1001' will especially exclude Netscape 4.x.
# and a lot of proxy servers.)
# ---------------------------------------------------------------------

What about this one?
It obviously isn't a static case, as it depends
on the HTTP version of the incoming request.
How would I express this in a "Vary:" header?

> The hinting on future suitability of a reply is
> indicated via Expires/Cache-Control, and is an
> entirely different question.

Okay, this makes things much clearer to me.

>> > Likevise if the configuration is
>> > such that mod_gzip would always compress the reply
>> > no matter who requested it.
>>
>> I am afraid it is even _more_ difficult to find out
>> about this one.
>> Again the validate function is looking for _at_least_
>> one include rule to fire, and when it has found one,
>> it doesn't care about other rules.
> And it does not need to either.

I think it has, because the firing rule might be
one that will not fire for a different request to
the same ressource (like using a different HTTP
protocol version, as an example).

> If your rule processing finds a "static" include rule
> and haven't seen any "request header" exclude rules
> prior to this rule then you know within reasonable
> doubt that the reply will always be compressed.

Right. Finally, the level 1 and level 3 rules
tend to be of the same kind - they are the
static ones, but we need to add file size and
dechunking configuration to this set.
("Minimum HTTP level" seems an open issue yet.)
Only the level 2 rules, the "reqheader" items,
may change for different requests.

Therefore:
- If some non-level 2 exclusion rule fires,
  then the content will never be compressed,
  so don't send a "Vary:" header.
  reqheader rules are irrelevant in this case.
- else
  if at least one non-level 2 inclusion rule
  fires and no reqheader exclusion rules are
  there at all, the content will always be
  compressed, so don't send a "Vary:" header.
- else
  collect the HTTP header names of all reqheader
  rules valid for this request and build up a
  Vary: header, no matter whether the request
  will be compressed or not.

mod_gzip currently doesn't know what "static"
rules are - it will parse the list of rules
without bothering too much about their type.
This would have to be replaced by some more
clever parsing routine, and maybe the rules
should already be stored separately for each
class, to make the class-wise parsing easier.
So the data structures for the rules might be
subject to change.

Maybe it would even suffice to sort the rules
when creating the configuration data structure,
i. e. while Apache is parsing its configuration
and invokes the hook of the module function to
build up their configuration records, so that
the "reqheader" rules always come last.
In this case the strategy to abort as soon as
it is clear whether the request should be com-
pressed or not may be very close to also pro-
vide information about _why_ this decision has
been made. But this will still require some
additional storage element somewhere, that
would have to contain the "Vary:" list or an
indicator that there is no "Vary:" header to
be created.

>> Maybe some priority list model would help:
>> a) classify the mod_gzip rule types (level 1: "uri"
>> and "file"; level 2: "reqheader"; level 3: the
>> remaining three rule classes)
> Yes.

And add here:
level 3: content size restrictions and dechunking.

>> c) use this information when the time has come to
>> _decide_ whether a "Vary:" header should be
>> sent and which information it should contain.
>> But as shown above, this will not be enough to even
>> reliably find out whether a "Vary:" header (beyond
>> "Accept-Encoding, that is) will be needed or not.
> You can quite likely put some responsibility on the
> server administrator to configure things in a
> reasonably rational manner. I don't see the need
> to attempt to cover to 100% all possible cases.

I agree, but to even tell the server administrator
what "reasonable" is, I have to understand all the
inclinations.
I think we now have understood that using "reqheader"
rules is a way to make things more complicated, as
this will make the 1.3.19.1b patch to _not_ solve
the problem.
My next question is whether using the "minimum HTTP
level" requirement (see above) is of the same type,
and what to do with this one.

> The mod_gzip implications on ETag is only that mod_gzip
> must alter the ETag of returned responses to differentiate
> them from uncompressed reply entities for the same URI.
> A compressed and uncompressed reply entity is two
> different objects in terms of HTTP and must have
> different ETag:s if they share the same URI (or
> alternatively no ETag at all).

Is there any documentation about _how_ to create an
ETag or alter it, as to make it unique and different
from other ETags? I had a short glance into RFC 2616
but didn't find anything like an algorithm there.
Pointing me to an URL would be great.

> If-None-Match is simply that the HTTP server will be able
> to tell if any if the ETag:s in the If-None-Match is
> identical to the current object entity as it would have
> been returned for this request, and in such which of them.
> If-None-Match is used by caches to query the server which
> (if any) of the previously cached reply entities for the
> URI may be sent in response to the request. It is the
> successor to If-Modified-Since and solves all issues of
> If-Modified-Since.

mod_gzip can create exactly two variants of a content,
a gzipped one and an uncompressed one.

>> I myself have an implementation of a compressing HTTP cache
>> (a Perl CGI script) that is to be embedded into Apache via
>> the "AddHandler" hook.
>> (And yes, I do send the "Vary" header. ;-)
> If this is meant to run on a HTTP server then it is't a
> cache in terms of HTTP as it occurs before HTTP. The
> internals of Apache may resemble HTTP, but still the HTTP
> endpoint is Apache as a whole.

But gzip_cnc is doing negotiation, based on the
"Accept-Encoding: gzip" header (and nothing else,
I keep thinks as simple as possible for now).

>> and I would be glad to learn about everything this program
>> is doing right or wrong about handling the HTTP protocol ...
> Provided you send correct Vary headers

I hope so, it seems to be easy in my case.

> and make sure to modify ETag when you modify the reply
> entity

Same as above: How should I do it? As for now,
gzip_cnc doesn't even know about sending an ETag
at all.

> - Range requests

Uh. gzip_cnc will serve wrong data as of now, I'm
afraid, but with a status 200, not a 206. (How can
I test it? I have no tool at hand to send range re-
quests.)
Is there some proper HTTP status to at least reject
these requests, until I may implement supporting them?
I have never seen range requests in real life ...

> - If-XXX HTTP conditions (If-None-Match, If-Modified-Since,
> If-Match, ...)

None of these implemented, not even handled as of
now. (Actually, I don't know whether Apache would
handle part of these, the documentation about what
things a handler has to cope with is a little short.)

I considered at least handling If-Modified-Since,
as this one is a frequent situation, but postponed
it to a later release.
I have not yet tried to cope with If-Match and
If-None-Match, but I'll put these on my list, thanks.

> In terms of HTTP the compressed object is a unique object
> on it's own, separate from the uncompressed original.
> Care must be taken to ensure the two are clearly identified
> (ETag) and never intermixed.

In the gzip_cnc area, there is no HTTP request before
me. gzip_cnc is doing it all alone (and thus can only
compress static content, as it doesn't implement HTTP
subrequests).
So the only one who could create an ETag would be
gzip_cnc itself - and if I understand how to do this,
I will happily add ETags to each and every response.

> A Range request on a compressed object is requesting a
> range of the compressed entity, not the original uncom-
> pressed one.

Ah, this seems logical, as this will allow several
ranges to be combined, as they are parts of the same
total, regardless which ranges they cover.

So I can simply read the compressed file from the
cache and only submit the part that has been requested
as a range ... and add the proper HTTP headers, of
course.

Greetings, Michael
Received on Wed Aug 28 2002 - 15:48:49 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:15 MST