Re: Bad interaction between max_stale and negative caching (2.HEAD) from Amos Jeffries on 2008-09-18 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 19 Sep 2008 00:39:38 +1200

Mark Nottingham wrote:
> I've got a user who's running a pair of peered accelerators, using both
> stale-while-revalidate and max_stale.
>
> Occasionally, they see extremely old content being served; e.g., if CC:
> max-age is 60s, they might see something go by which is 1000-3000
> seconds old (but still within the max_stale window).
>
> The pattern that appears to trigger this is when a resource with an
> in-cache 200 response starts returning 404s; when this happens, Squid
> will start returning TCP_NEGATIVE_HIT/200's. E.g. (traffic driven by
> squidclient),
>
> 1713703.815 0 127.0.0.1 TCP_STALE_HIT/200 5234 GET
> http://server1//5012904 - NONE/- application/json
> 1221713703.979 164 0.0.0.0 TCP_ASYNC_MISS/404 193 GET
> http://server1/5012904 - FIRST_UP_PARENT/back-end-server1 text/plain
> 1221713711.431 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET
> http://server1/5012904 - NONE/- application/json
> 1221713720.978 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET
> http://server1/5012904 - NONE/- application/json
> 1221713723.483 0 127.0.0.1 TCP_NEGATIVE_HIT/200 5234 GET
> http://server1/5012904 - NONE/- application/json
>
> As you can see, stale-while-revalidate kicks in, and the async refresh
> brings back a 404, but that doesn't get stored properly.
>
> Looking at the code, I *think* the culprit is storeNegativeCache(),
> which will, assuming that max_stale is set (either in squid.conf or
> response headers), block the new response from updating the cache -- no
> matter what its status code is.
>
> It makes sense to do this for 5xx status codes, because they're often
> transient, and reflect server-side problems. It doesn't make as much
> sense to do this for 4xx status codes, which reflect client-side issues.
> In those cases, you always want to update the cache with the most recent
> response (and potentially negative cache it, if the server is silly
> enough to not put a freshness lifetime on it).
>
> The interesting thing, BTW, is that this only happens when collapsed
> forwarding is on, because this in httpReplyProcessHeader:
>
> if (neighbors_do_private_keys && !Config.onoff.collapsed_forwarding)
> httpMaybeRemovePublic(entry, reply);
>
> masks this behaviour.
>
> Thoughts? I'm not 100% on this diagnosis, as the use of peering and
> stale-while-revalidate make things considerably more complex, but I've
> had pretty good luck reproducing it... I'm happy to attempt a fix, but
> wanted input on what approach people preferred. Left to my own devices,
> I'd add another condition to this in storeNegativeCache():
>
> if (oe && !EBIT_TEST(oe->flags, KEY_PRIVATE) && !EBIT_TEST(oe->flags,
> ENTRY_REVALIDATE))
>
> to limit it to 5xx responses.
>

I'd agree with you based on that analysis. Can you add a bugzilla entry
with a patch that does it?

Amos

-- 
Please use Squid 2.7.STABLE4 or 3.0.STABLE9

Received on Thu Sep 18 2008 - 12:39:42 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 19 2008 - 12:00:04 MDT