Re: default cache_stoplist question from WWW server manager on 1997-10-17 (squid-users)

From: WWW server manager <webadm@dont-contact.us>
Date: Fri, 17 Oct 1997 14:15:45 +0100 (BST)

Jon Peatfield wrote:
>
> > squid doesn't cache cgi-bin and ? because these are the most common
> > methods of showing that something is a dynamically-generated page in
> > the URL. Other methods using HTTP headers exist, but are more
> > difficult to use, and aren't portable between different brands of web
> > server, so users tend not to use them.
>
> dynamically-generated does not imply uncacheable. For those pages which
> shouldn't be cached the script ought to insert headers to mark them as such.

Well, HTTP 1.0 had no way for a server to mark documents explicitly as
uncacheable, so expecting them to do so is not really an option (and just
because Apache supports HTTP 1.1 doesn't mean everyone will update their
scripts to send additional HTTP 1.1 headers!).

However, assuming (as should be the case) that Squid will not cache any
document that has neither a Last-Modified: header nor an Expires: header, it
*ought* to be safe to allow caching of URLs mentioning cgi-bin and ?. It
would be a bit bizarre for a script to include a Last-Modified: header
unless the response was stable enough over time that it was meaningful to
say it had been last modified at some particular time, and if Expires: is
included, that's as strong an indication as for any document (i.e. no
guarantees!) that caching up to that time (but possibly checking sooner) is
reasonable.

Not that many scripts on the web are run from URLs that don't mention
cgi-bin. I don't recall hearing howls of protest about them getting cached
inappropriately.

Perhaps more of an issue for "?" is that while in some cases (link from
another page has a hard-wired query string) it's quite possible that many
people will request the same URL, in other cases it's quite possible that
even though the response for a particular query can be cached safely, the
chances of anyone else submitting the same query are minimal.

If I remember correctly, Netscape's proxy server has a configuration option
allowing you to specify the maximum length of query string for which a
response will be cached. For example, you could allow URLs with query
strings up to 16 characters to be cached, subject to the other cacheability
checks, but not longer queries (the likelihood of repeat requests typically
decreasing with query string length).

At a quick glance, I can't see any way to tell Squid that URLs with query
strings beyond a certain length are not worth caching; should it have one?
Maybe so few query responses are cacheable (at present) that caching a few
more documents that will never be looked at again before they are discarded
is irrelevant, though.

> I've turned off the cgi-big, and ? entries in our Squid, and I'll see if I get
> lots of complaints. Personally I think that there will be few problems.

I haven't done that yet, but had been thinking about doing it.

Is there perhaps a case for dropping the cache_stoplist entry but
retaining the hierarchy_stoplist entry, on the grounds that most (or many)
such URLs will have uncacheable responses, so that bypassing parents
is sensible, while caching locally anything which was - atypically -
cacheable from such a URL?

John Line

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk

Received on Fri Oct 17 1997 - 06:28:47 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:18 MST