Re: Caching dynamic pages (CGI) for one service/server

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 13 May 1999 07:35:52 +0200

Nottingham, Mark (Australia) wrote:

> Where do you see this? I can't find any reference to the cacheability of
> queries. The first paragraph of 13.4 has a bit to say about this situation.

Last paragraph of section 13.9.

   We note one exception to this rule: since some applications have
   traditionally used GETs and HEADs with query URLs (those containing a
   "?" in the rel_path part) to perform operations with significant side
   effects, caches MUST NOT treat responses to such URIs as fresh unless
   the server provides an explicit expiration time. This specifically
   means that responses from HTTP/1.0 servers for such URIs SHOULD NOT
   be taken from a cache. See section 9.1.1 for related information.

> > If there is control directives that denies caching, request type
> > indicates uncacheable reply, response code is defined as uncachable or
> > if the object requires immediate revalidation (refresh_pattern/expire
> > says it is stale) then the object is not cached.
>
> If I read you correctly, I don't think that's quite true (or at least it
> shouldn't be); suppose you've got a generated object that requires
> revalidation on every hit, but can be validated sucessfully? The way to do

I was a bit unclear. The validation criteria is "the object requires
immediate revalidation and has no last-modified time". This is from the
assumtion that if the server does not send a last-modified header then
will it likely not support if-modified-since either.

> Expires: [sometime in the past]
> Cache-Control: max-age=0, must-revalidate
> Last-Modified: [now]
>
> This object *should* be stored, but validated on each subsequent request.

It is. Except that Squid does not currently fully understand the priority
order of Expires and max-age. Not that it matters on this specific example,
or many other practical examples for that matter.

> This is a first cut of how I think it may want to be happening...

Phew.. I intentionally did not include all directives, only those users
(i.e. Squid admins) needs to know about. But let's try to define a "complete"
algorithm.

> # weed out uncacheable requests
> if (CLIENT_MAX_AGE)
> if (AGE > CLIENT_MAX_AGE)
> return STALE

Fine.

> if (CLIENT_CC_NO_CACHE)
> return STALE

CC_NO_CACHE directive is strong and requires a completely fresh copy to
be fetched (unless it is a conditional request). It is not a matter of
stale/fresh. But we options for overriding client no-cache so this check
do belong here as well.

> # weed out uncacheable responses
> if (UNCACHEABLE_METHOD or \
> UNCACHEABLE_STATUS \
> or CC_NO_CACHE)
> return STALE

The "cachable response" check also does not belong here. It belongs to
the definition of a cachable object. Objects not meeting up to this
should not be seen in the cache in the first place.

> if (AGE > OVERRIDE_MAX_AGE)
> return STALE

Fine. There should be some means of specifying a upper limit on the age,
regardless what the server says.

> # Think about expiration
> if (CC_MAX_AGE or CC_S_MAXAGE or EXPIRES) {
> if (! CC_MUST_REVALIDATE and \
> ! CC_PROXY_REVALIDATE and \
> AGE <= OVERRIDE_MIN_AGE)
> return FRESH
> if (CC_MAX_AGE or CC_S_MAXAGE) {
> if (AGE > CC_MAX_AGE or AGE > CC_S_MAXAGE)
> return STALE
> else
> return FRESH
> }
> if (EXPIRES <= NOW)
> return STALE
> else
> return FRESH
> }

The *_VALIDATE options requires the object to be revalidated by them
selves alone. It is wrong have these inside the maxage/expires check.

The above should probably read

    # Server requested revalidation
    if (CC_MUST_REVALIDATE or CC_PROXY_REVALIDATE or CC_NO_CACHE)
        return STALE

    # Think about expiration
    if (CC_MAX_AGE or CC_S_MAXAGE or EXPIRES) {
        if (AGE < OVERRIDE_MIN_AGE)
            return FRESH
        if (CC_MAX_AGE or CC_S_MAXAGE)
            if (AGE > CC_MAX_AGE or AGE > CC_S_MAXAGE)
                return STALE
            else
                return FRESH
        else if (EXPIRES <= NOW)
            return STALE
        else
            return FRESH
    }

>
> # Think about validation (no ETags yet)
> if (LAST_MODIFIED) {
> if (LM_FACTOR < PERCENT)
> return FRESH
> }

There should be a else STALE above..

> # Dateless origin servers
> if (! DATE) {
> if (AGE <= OVERRIDE_MIN_AGE)
> return FRESH
> }

I think we can ignore dateless servers for now. But we do need a way to
specify cachability of objects without any expiry information, to enable
selective caching of search engine queries and other "read-only" scripts
which does not provide any expiry information.

    if (AGE < FORCED_DEFAULT_MIN_AGE)
        return FRESH

>
> # fallthrough
> return STALE
>
> I haven't considered Cache-Control: min-fresh and max-stale request headers;
> maybe can these be taken care of by fiddling the age? Hmm.
> Just noticed something... upon reading 14.9.3, end of first paragraph, a CC:
> max-age implies CC: public. Wow.

Yes, I beleive so.

Probably the whole section should begin with a data collection phase,
gathering what data is available.

Also this may be greatly simplified if the first step when an object is seen is to calculate the expiry time for the object. Then we get a clear separation of request and response values, and a clear definition of freshness/staleness where min-fresh and max-stale easily can be supported as well as the other client cache-control directives.

--
Henrik Nordstrom
Spare time Squid hacker
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:08 MST