Re: max_stale broken?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 27 Apr 2011 15:54:23 +1200

 On Tue, 26 Apr 2011 21:31:41 -0600, Alex Rousskov wrote:
> On 04/26/2011 05:23 PM, Amos Jeffries wrote:
>> On Tue, 26 Apr 2011 10:18:50 -0600, Alex Rousskov wrote:
>>> Hello,
>>>
>>> Can somebody double check the following code, please?
>>>
>>>> /*
>>>> * At this point the response is stale, unless one of
>>>> * the override options kicks in.
>>>> * NOTE: max-stale config blocks the overrides.
>>>> */
>>>> int max_stale = (R->max_stale >= 0 ? R->max_stale :
>>>> Config.maxStale);
>>>> if ( max_stale >= 0 && staleness < max_stale) {
>>>> debugs(22, 3, "refreshCheck: YES: max-stale limit");
>>>> if (request)
>>>> request->flags.fail_on_validation_err = 1;
>>>> return STALE_MAX_STALE;
>>>> }
>>>
>>>
>>> It makes no sense to me, and I see 60-second stale objects being
>>> declared STALE_MAX_STALE with the default configuration where
>>> max-stale
>>> is 1 week. The code declares the object stale if the object
>>> "staleness"
>>> is _less_ than the configured maximum. I would expect the opposite!
>>>
>>> It is not clear from the squid.conf documentation what the exact
>>> intent
>>> of the max-stale option is with the relation to other checks such
>>> as
>>> refresh_pattern.max, but at the minimum, should not the comparison
>>> be
>>> reversed?
>>
>>
>> Yes. This appears to be wrong. Frankly I'm a little confused as to
>> how
>> it got that way. I did test this rather thoroughly since it was a
>> tricky
>> directive when I started the portage.
>>
>>
>> "max_stale" directive is a global default. "refresh_pattern
>> max-stale=N"
>> is a specific override of the global. Cache-Control weighs in
>> occasionally *if* it is smaller than either of the config values.
>>
>> All of them are "up to" limits in accordance with RFC 2616. Where
>> the
>> client MAY receive a stale object UP TO an unspecified configured
>> limit.
>> (Thus we can avoid violations and Cache-Control weighing in if the
>> local
>> admin wants shorter staleness).
>
> IMO, the priority of these limits is not clear from the
> documentation.
> We have, at least:
>
> 1. global max_stale
> 2. refresh_pattern max-stale
> 3. Cache-Control: max-stale
>
> It is clear that the refresh_pattern max-stale overwrites global
> max-stale, but other possible combinations are not documented. Your
> "minimum value wins" explanation above should be morphed into
> squid.conf
> comments, I guess, but it does not match the code either, so there is
> more work to do than just updating the docs.
>
> Also, there are at least two different questions we must answer here:
>
> A) What happens when the staleness does not exceed the winning
> max-stale
> value? Your answer above is, essentially, the object is considered
> FRESH
> in that case (unless the checks before the above code overwrite
> that?).
>
> B) What happens when the staleness exceeds than the winning max-stale
> value? It is not clear whether the object must be considered stale
> (i.e., the winning max-stale overwrites all other checks below it) OR
> the object may still be considered fresh if some other check below it
> says so.

 By the time these tests are runs the object is already declared stale.
 It is just a matter of by how much and whether the stale object may be
 returned.
>
>> They all have no effect on 1xx-4xx server responses or fresh HITs.
>> Only
>> on 5xx when revalidating a stale HIT.
>
> That is possible, but in the particular case that triggered this
> thread,
> the code in question was the reported reason for why the response was
> considered stale beyond repair and, hence, not cachable:
>
>> refreshCheck: Matched '. 0 50%% 604800'
>> age: 60
>> check_time: Tue, 26 Apr 2011 08:44:08 GMT
>> entry->timestamp: Tue, 26 Apr 2011 08:43:08 GMT
>> STALE: age 60 >= min 0
>> Staleness = 60
>> refreshCheck: YES: max-stale limit
>> refreshIsCachable() returned non-cacheable..
>> StoreEntry::expireNow: '8D46C584D5269CF1B213D425C8F8944F'
>> StoreEntry::setReleaseFlag: '8D46C584D5269CF1B213D425C8F8944F'
>
>
> It is possible that bugs elsewhere led to that, of course, but on the
> surface of things, it looks like the refreshCheck code was
> responsible
> for 0% hit ratio in that specific environment (I know that we do get
> hits under different conditions where incoming responses are not
> stale).

 Interesting, okay. So the incoming object was already too much "stale"
 according to the broken logic (<). Fixing that operator is the right
 fix. You want to do the honours?

 Amos
Received on Wed Apr 27 2011 - 03:54:28 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 27 2011 - 12:00:05 MDT