Re: Bad interaction between max_stale and negative caching (2.HEAD)

From: Mark Nottingham <mnot_at_yahoo-inc.com>
Date: Thu, 18 Sep 2008 22:45:07 +1000

Will do tomorrow.

On 18/09/2008, at 10:39 PM, Amos Jeffries wrote:

> Mark Nottingham wrote:
>> I've got a user who's running a pair of peered accelerators, using
>> both stale-while-revalidate and max_stale.
>> Occasionally, they see extremely old content being served; e.g., if
>> CC: max-age is 60s, they might see something go by which is
>> 1000-3000 seconds old (but still within the max_stale window).
>> The pattern that appears to trigger this is when a resource with an
>> in-cache 200 response starts returning 404s; when this happens,
>> Squid will start returning TCP_NEGATIVE_HIT/200's. E.g. (traffic
>> driven by squidclient),
>> 1713703.815 0 127.0.0.1 TCP_STALE_HIT/200 5234 GET http://
>> server1//5012904 - NONE/- application/json
>> 1221713703.979 164 0.0.0.0 TCP_ASYNC_MISS/404 193 GET http://server
>> 1/5012904 - FIRST_UP_PARENT/back-end-server1 text/plain
>> 1221713711.431 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
>> 1/5012904 - NONE/- application/json
>> 1221713720.978 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
>> 1/5012904 - NONE/- application/json
>> 1221713723.483 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET http://server
>> 1/5012904 - NONE/- application/json
>> As you can see, stale-while-revalidate kicks in, and the async
>> refresh brings back a 404, but that doesn't get stored properly.
>> Looking at the code, I *think* the culprit is storeNegativeCache(),
>> which will, assuming that max_stale is set (either in squid.conf or
>> response headers), block the new response from updating the cache
>> -- no matter what its status code is.
>> It makes sense to do this for 5xx status codes, because they're
>> often transient, and reflect server-side problems. It doesn't make
>> as much sense to do this for 4xx status codes, which reflect client-
>> side issues. In those cases, you always want to update the cache
>> with the most recent response (and potentially negative cache it,
>> if the server is silly enough to not put a freshness lifetime on it).
>> The interesting thing, BTW, is that this only happens when
>> collapsed forwarding is on, because this in httpReplyProcessHeader:
>> if (neighbors_do_private_keys && !
>> Config.onoff.collapsed_forwarding)
>> httpMaybeRemovePublic(entry, reply);
>> masks this behaviour.
>> Thoughts? I'm not 100% on this diagnosis, as the use of peering and
>> stale-while-revalidate make things considerably more complex, but
>> I've had pretty good luck reproducing it... I'm happy to attempt a
>> fix, but wanted input on what approach people preferred. Left to my
>> own devices, I'd add another condition to this in
>> storeNegativeCache():
>> if (oe && !EBIT_TEST(oe->flags, KEY_PRIVATE) && !EBIT_TEST(oe-
>> >flags, ENTRY_REVALIDATE))
>> to limit it to 5xx responses.
>
> I'd agree with you based on that analysis. Can you add a bugzilla
> entry with a patch that does it?
>
> Amos
> --
> Please use Squid 2.7.STABLE4 or 3.0.STABLE9

--
Mark Nottingham       mnot_at_yahoo-inc.com
Received on Thu Sep 18 2008 - 12:45:44 MDT

This archive was generated by hypermail 2.2.0 : Thu Sep 18 2008 - 12:00:04 MDT