Antwort: Re: Antwort: Re: Antwort: Re: Antwort: Re: Antwort: [Mod _gzip] Vary: header and mod_gzip

From: <Michael.Schroepl@dont-contact.us>
Date: Wed, 28 Aug 2002 22:48:03 +0200

Hi Hendryk,

>> I am only afraid that already "Vary: UserAgent" would
>> be a thing we don't like to get, as this might cause
>> a very large number of variants to be stored in the proxy
>> cache, for the sake of a simple Netscape 4 exclusion ...
> Well, you don't really have a choice. It is all or nothing
> here. If you do not include User-Agent then caches will not
> have a chance of knowing.

Agreed.

My point is that from this situation I tend to derive
the statement "mod_gzip users, be aware that if you
use "reqheader" rules for UserAgents then you will
cause Squid 2.5 proxies to handle lots of different
entries for the potentially same ressource, so check
whether you _really_ need this configuration rule type".

After all, the Apache admin running mod_gzip has the
option whether he should rather
a) use the "reqheader" directive to support the broken
   UserAgents by sending them uncompressed content or
b) ignore the broken UserAgents and serve content that
   works best for caching proxies, with a short "Vary"
   header, thus supporting the correctly working User-
   Agents.
The fewer broken browsers are around, the more likely
it will be that scenario b) will work best.

> The excessive caching will be remedied in Squid-2.6,
> provided a mod_gzip enabled Apache is capable of correct
> ETag and If-None-Match processing.

If I get it right, then the requirements made by Squid
2.6 for mod_gzip and Apache are of the type that if
compression via negotiation is in use, so that an URI
will generally be mapped to a set of (exactly two)
entities with different ETags, then the whole HTTP
handling of the Apache would have to take that into
consideration.

Doing any "If-None-Match" processing seems to first
of all require Apache to compute these two possible
ETag values and check which of the possible cases in
     http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26
applies. This would have to be done even before the
request would have to be handled, whose output would
much later become an input into mod_gzip as to in-
spect whether it should be compressed or not.

I may be wrong but to my extent of understanding of
the source code mod_gzip doesn't try to modify any-
thing of Apache's general request handling.
All that mod_gzip currently seems to be interested
in would be to
- take a content buffer and a set of HTTP headers,
- decide whether to compress,
- if positive, then compress and add some HTTP headers,
and not much more.
At this time, the request handling Apache will have
been completed to an extent where there already are
all of the HTTP result headers and even a HTTP status
code.

One basic idea of mod_gzip is "there may be many
different HTTP status codes, but the only one I am
going to understand is the status 200 code, because
other HTTP stati are rare anyway and/or don't contain
any HTTP body that would be subject to compression.
So I am going to decline compressing anything but
HTTP 200 stuff."

Source code for this, mod_gzip.c line 6200ff:

 if ( resp_code != 200 )
   {
    #ifdef MOD_GZIP_DEBUG1
    mod_gzip_printf( "%s: resp_code is NOT '200'...",cn);
    mod_gzip_printf( "%s: Issuing send_as_is++",cn);
    #endif

    send_as_is++;

    #ifdef MOD_GZIP_USES_APACHE_LOGS
    mod_gzip_strcat( lbuf, ":NO_200");
    #endif
   }

From this moment on, mod_gzip has decided to send the
content "as is", i. e. not compress.
But this is all mod_gzip 1.3.19.1* will do with the HTTP
status. Especially, it is not going to ever change it.

All the decisions about complex HTTP protocol issues
seem to have been handled a long time earlier during
the request handling. mod_gzip steps in when all this
has finally been settled, checks whether it should
compress and does what it believes it has to do. It
doesn't rewrite the whole HTTP processing of Apache.

So if the requirement for _Apache_ would be to handle
If-None-Match then I believe the code positions to do
so would be far away from the code that currently
makes up mod_gzip.
I am not even sure that mod_gzip would technically be
able to get control at the right point during the re-
quest processing chain. One might ask the Apache Group
about these things.

On the other hand, Apache 1.3.26 _does_ already have
some "If-None-Match" support.
The source code file "main/http_protocol.c" contains
a lot of stuff that seems to handle the "If-None-Match"
thing.
There is a function named
     ap_meets_conditions(request_rec *r)
that has stuff like

    /* If an If-None-Match request-header field was given
     * AND the field value is "*" (meaning match anything)
     * OR our ETag matches any of the entity tags in that field, fail.
     *
     * If the request method was GET or HEAD, failure means the server
     * SHOULD respond with a 304 (Not Modified) response.
     * For all other request methods, failure means the server MUST
     * respond with a status of 412 (Precondition Failed).
     *
     * GET or HEAD allow weak etag comparison, all other methods require
     * strong comparison. We can only use weak if it's not a range
request.
     */
    if_nonematch = ap_table_get(r->headers_in, "If-None-Match");
    if (if_nonematch != NULL) {
        if (r->method_number == M_GET) {
            if (if_nonematch[0] == '*')
                return HTTP_NOT_MODIFIED;
            if (etag != NULL) {
                if (ap_table_get(r->headers_in, "Range")) {
                    if (etag[0] != 'W' &&
                        ap_find_list_item(r->pool, if_nonematch, etag)) {
                        return HTTP_NOT_MODIFIED;
                    }
                }
                else if (strstr(if_nonematch, etag)) {
                    return HTTP_NOT_MODIFIED;
                }
            }
        }
        else if (if_nonematch[0] == '*' ||
                 (etag != NULL &&
                  ap_find_list_item(r->pool, if_nonematch, etag))) {
            return HTTP_PRECONDITION_FAILED;
        }
    }

But of course this Apache core function has not even a
clue whether there is some 3rd party external compres-
sion module installed or even willing to compress any-
thing, thus causing the existance of at least two dif-
ferent potential ETags of the ressource being requested.

So if I didn't miss some important point, my conclusion
would be: If the correct use of Content-Encoding accor-
ding to HTTP/1.1 would require Apache to change its
basic HTTP behaviour, like now handling "If-None-Match"
headers differently and being aware of two separate en-
tities with separate Etags, then it might be rather
difficult to do this outside the Apache core by some
add-on module. In this case the whole Apache module
plug-in API would not provide any solution that would
satisfy the needs of Squid 2.6.
It might imply that mod_gzip (or whichever compression
module) would have to be embedded into the Apache core,
as an existing Apache core function would probably
never be allowed to depend upon some none-core func-
tionality.
Or it might imply that some future mod_gzip version
trying to meet the requirements of Squid 2.6 would
need to patch the Apache core.
In which case I don't see a solution for Apaches that
only are able to load DSOs for 3rd party modules, not
recompile the whole Apache source code - which will
include a whole lot of installations, and even more
when Apache will be used more frequently on Windows.

I am surprised to come to such a conclusion. This is
because the Apache group gave me the impression they
introduced their "filtering scheme" (allowing for
content to be easily forwarded from handler to hand-
ler) mainly for purposes like compression, which cur-
rently indeed is not much more than some filter on
the back end of the processing chain.
But if I now learn that the whole Content-Encoding
business is much more than just negotiating some
"Accept-Encoding" header and compressing a buffer
content, but needs the whole HTTP processing to know
about the existance of two related entities, then I
don't understand any more how the Apache Group would
ever have planned to solve issues like the one we are
discussing about. After all, they are shipping some
mod_deflate themselves.

I really _hope_ that I have failed to understand
something on the way to this point ...

Greetings, Michael
Received on Wed Aug 28 2002 - 18:50:04 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:16 MST