Re: "Uncacheable" documents getting cached...

From: Duane Wessels <wessels@dont-contact.us>
Date: Wed, 11 Nov 1998 17:32:06 -0700

John,

Thanks for the great summary.

Regarding cachability, If the server wants the page to be uncachable,
it should use Cache-control: no-cache, or (god forbid) Pragma: no-cache,
or Expires: 0. Thus, I don't think we should necessarily make objects with
bad timestamps uncachable.

This issue is also compicated because the Date: header isn't necessarily
generated at the current time. Of course it should never represent a
time in the future, but it will often be a time in the past, especially
for hits from neighbor caches. Even more worrisome is Date headers
from transparent caches when we think we talked directly to
the origin server.

HTTP/1.1 is also supposed to make things better by getting people
to use the Age: header. I guess its not very widely supported
yet.

Would it be sufficient to check for age < 0 in refreshCheck() and
then call it stale?

Duane W.

>[Summary: if the origin server's clock is running fast (or it generates
>incorrect timestamps as though it were), documents which should be
>uncacheable (no last-mod or expires timestamps, etc.) may be cached and cause
>problems in consequence.]
>
>Investigation of a problem reported by a user of our cache found that the
>problem was a dynamically-generated document at a remote server which was
>being cached when it shouldn't, with the result that multiple users would
>receive the same version of the page with the same link URL containing a token
 
>which was clearly a session ID of some sort, and more importantly, the
>document was cached long enough that people could receive the document after
>the origin server had timed out the session ID and no longer deemed it
>valid, giving an error; it broke things for the users, not just for
>session-tracking by the server's operators. Without that situation to prompt
>investigation, it's the sort of problem that could easily go unnoticed...
>
>The document concerned did not have a Last-Modified: header but did have an
>Expires: header, though that was syntactically invalid and was correctly
>ignored by Squid (it actually read "Expires: content", which I can't see in
>the HTTP 1.0 or 1.1 specifications). Thus, it should have behaved like most
>dynamic documents and been deemed stale anytime it was referenced
>subsequently, but with both 1.NOVM.22 and 2.0PATCH2 it was cached by our
>server (and indeed, the copy that caused the initial problem was fetched
>from a parent cache as a hit there). I tried a CGI-generated page on another
>server (no last-mod or expires headers etc. to suppress caching actively)
>and that behaved as expected, so I don't think it's just me getting confused
>about what should happen...
>
>The explanation was that the origin server was quoting the wrong time in its
>Date: headers, an hour ahead of the true GMT time; our cache uses NTP and
>its idea of time should be very close to correct. The effect was that for
>the problem page, the test "age <= min" (with min=0) was succeeding as age
>was negative; for another page on that server, with a last modification
>timestamp, the effect in terms of debug 22,3 output showed the same effect
>but differences of detail between Squid versions (declaring it fresh because
>age was less than min for Squid 1.NOVM.22, and because the (negative) LM
>factor was less than the cutoff percentage in 2.0PATCH2).
>
>The HTTP 1.1 specification says nothing (in the section about heuristic
>expiry) on how to handle obviously bogus Date: (or Last-Modified:)
>timestamps, but does make clear that you have to use that timestamp rather
>than when you received the document (most obviously to handle sensibly the
>case where the document had been cached elsewhere for some period of time).
>It makes the assumption that clocks will be reasonably closely synchronised,
>but beyond suggesting web servers and caches should use NTP or similar,
>dodges the issue.
>
>So: if a web server erroneously serves documents with a time
>["significantly"] in the future (compared to the Squid cache server's idea
>of time), they are likely to be cached when they should (and maybe are
>required to) be uncacheable, or to be considered fresh for longer than
>should be the case; the former is more likely to cause problems.
>
>Is it (a) reasonable and (b) feasible for Squid to declare "bogus" (more
>specifically, uncacheable) documents which at the time they are received
>have a Date: (or Last-Modified:) header which is in the future by more than
>a trivial (few seconds, minutes at the most) amount? The alternative would
>be to save the cache server's current date/time instead of the Date: header
>value, but that's potentially bad in a variety of ways.
>
>The downside is that you then have to guess how much "fuzz" to allow for
>routine variation in clock settings on unsynchronised systems, given that
>other systems' idea of the time will inevitably vary a little even when they
>are "close enough" to being correct. And of course, if the Squid cache
>system is not using NTP and *its* clock has drifted noticeably on the slow
>side, anything up to 100% of documents might be deemed uncacheable.
>
>Tricky... any ideas how this could be tackled, other than by saying "tough
>luck" to any problems caused by origin servers with a bizarre idea of time?
>[And putting up with the time wasted investigating problems which turn out
>to have this problem as their cause.]
>
> John Line
>--
>University of Cambridge WWW manager account (usually John Line)
>Send general WWW-related enquiries to webmaster@ucs.cam.ac.uk
Received on Wed Nov 11 1998 - 17:16:39 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:00 MST