RE: Caching dynamic pages (CGI) for one service/server

From: Nottingham, Mark (Australia) <mark_nottingham@dont-contact.us>
Date: Thu, 13 May 1999 16:49:48 +1000

> Last paragraph of section 13.9.

Hmm. It's a bit ambiguous (big surprise). First it says that query URLs
shouldn't be treated as fresh unless they have expiration, and then goes on
to specify that responses from 1.0 servers for these urls should not be
taken from the cache.

"treated as fresh" and "taken from the cache" are two different things;
"taken from the cache" implies that it shouldn't be validated.

I know this is splitting hairs, but I have a specific application in mind.
If someone has implemented validation on an object with a query string, and
the server advertises itself as 1.0, they lose unless they specify
expiration. Ah, well, it's a matter for the WG, perhaps. At least they don't
say anything about /cgi-bin/.

In practical terms, IMHO a cache shouldn't just mark all queries uncacheable
willy-nilly; it discourages people from working to make them cacheable. I
was very heartened to see that change in Squid 2.

> CC_NO_CACHE directive is strong and requires a completely
> fresh copy to
> be fetched (unless it is a conditional request). It is not a matter of
> stale/fresh. But we options for overriding client no-cache so
> this check
> do belong here as well.

Yes. It's unfortunate that no-cache has such different context depending on
request or response.

> The "cachable response" check also does not belong here. It belongs to
> the definition of a cachable object. Objects not meeting up to this
> should not be seen in the cache in the first place.

Yup, with the exception of CC_NO_CACHE (response header): it can be cached,
it just can't be considered FRESH.

> The *_VALIDATE options requires the object to be revalidated by them
> selves alone. It is wrong have these inside the maxage/expires check.

See previous thread about the meaning of *-revalidate. IMHO it does belong
here.

> > # Think about validation (no ETags yet)
> > if (LAST_MODIFIED) {
> > if (LM_FACTOR < PERCENT)
> > return FRESH
> > }
>
> There should be a else STALE above..

Yes.

> But we do need a way to specify cachability of objects
> without any expiry information, to enable
> selective caching of search engine queries and other
> "read-only" scripts which does not provide any expiry information.

If it doesn't have a validator, and it doesn't have any age hints, how can
you? Matching a regex against the URL isn't going to catch all of the truely
dynamic ones, and users getting incoherent results will IMHO cause a lot of
trouble and bad press for caching.

I'm very heartened by http://www.alltheweb.com/ - it's very cacheable.
(their results don't do validation, but they do have Expires:)

> Probably the whole section should begin with a data collection phase,
> gathering what data is available.

Ahh, *implementation* -- this is where I step out <0.6 wink>!

> Also this may be greatly simplified if the first step when an
> object is seen is to calculate the expiry time for the
> object. Then we get a clear separation of request and
> response values, and a clear definition of
> freshness/staleness where min-fresh and max-stale easily can
> be supported as well as the other client cache-control directives.

As long as that information isn't used to determine whether the object
should be cached as well -- this is what NetCache does (or at least did),
and it caused no end of problems. We couldn't give the cache a stale object
that was capable of being validated.
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:08 MST