"Uncacheable" documents getting cached...

From: as web server manager <webadm@dont-contact.us>
Date: Mon, 9 Nov 1998 14:47:36 +0000 (GMT)

[Summary: if the origin server's clock is running fast (or it generates
incorrect timestamps as though it were), documents which should be
uncacheable (no last-mod or expires timestamps, etc.) may be cached and cause
problems in consequence.]

Investigation of a problem reported by a user of our cache found that the
problem was a dynamically-generated document at a remote server which was
being cached when it shouldn't, with the result that multiple users would
receive the same version of the page with the same link URL containing a token
which was clearly a session ID of some sort, and more importantly, the
document was cached long enough that people could receive the document after
the origin server had timed out the session ID and no longer deemed it
valid, giving an error; it broke things for the users, not just for
session-tracking by the server's operators. Without that situation to prompt
investigation, it's the sort of problem that could easily go unnoticed...

The document concerned did not have a Last-Modified: header but did have an
Expires: header, though that was syntactically invalid and was correctly
ignored by Squid (it actually read "Expires: content", which I can't see in
the HTTP 1.0 or 1.1 specifications). Thus, it should have behaved like most
dynamic documents and been deemed stale anytime it was referenced
subsequently, but with both 1.NOVM.22 and 2.0PATCH2 it was cached by our
server (and indeed, the copy that caused the initial problem was fetched
from a parent cache as a hit there). I tried a CGI-generated page on another
server (no last-mod or expires headers etc. to suppress caching actively)
and that behaved as expected, so I don't think it's just me getting confused
about what should happen...

The explanation was that the origin server was quoting the wrong time in its
Date: headers, an hour ahead of the true GMT time; our cache uses NTP and
its idea of time should be very close to correct. The effect was that for
the problem page, the test "age <= min" (with min=0) was succeeding as age
was negative; for another page on that server, with a last modification
timestamp, the effect in terms of debug 22,3 output showed the same effect
but differences of detail between Squid versions (declaring it fresh because
age was less than min for Squid 1.NOVM.22, and because the (negative) LM
factor was less than the cutoff percentage in 2.0PATCH2).

The HTTP 1.1 specification says nothing (in the section about heuristic
expiry) on how to handle obviously bogus Date: (or Last-Modified:)
timestamps, but does make clear that you have to use that timestamp rather
than when you received the document (most obviously to handle sensibly the
case where the document had been cached elsewhere for some period of time).
It makes the assumption that clocks will be reasonably closely synchronised,
but beyond suggesting web servers and caches should use NTP or similar,
dodges the issue.

So: if a web server erroneously serves documents with a time
["significantly"] in the future (compared to the Squid cache server's idea
of time), they are likely to be cached when they should (and maybe are
required to) be uncacheable, or to be considered fresh for longer than
should be the case; the former is more likely to cause problems.

Is it (a) reasonable and (b) feasible for Squid to declare "bogus" (more
specifically, uncacheable) documents which at the time they are received
have a Date: (or Last-Modified:) header which is in the future by more than
a trivial (few seconds, minutes at the most) amount? The alternative would
be to save the cache server's current date/time instead of the Date: header
value, but that's potentially bad in a variety of ways.

The downside is that you then have to guess how much "fuzz" to allow for
routine variation in clock settings on unsynchronised systems, given that
other systems' idea of the time will inevitably vary a little even when they
are "close enough" to being correct. And of course, if the Squid cache
system is not using NTP and *its* clock has drifted noticeably on the slow
side, anything up to 100% of documents might be deemed uncacheable.

Tricky... any ideas how this could be tackled, other than by saying "tough
luck" to any problems caused by origin servers with a bizarre idea of time?
[And putting up with the time wasted investigating problems which turn out
to have this problem as their cause.]

                                John Line

-- 
University of Cambridge WWW manager account (usually John Line)
Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Mon Nov 09 1998 - 08:04:16 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:59 MST