Re: max_stale broken?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 26 Apr 2011 23:02:24 -0600

On 04/26/2011 09:54 PM, Amos Jeffries wrote:
> On Tue, 26 Apr 2011 21:31:41 -0600, Alex Rousskov wrote:
>> On 04/26/2011 05:23 PM, Amos Jeffries wrote:
>>> On Tue, 26 Apr 2011 10:18:50 -0600, Alex Rousskov wrote:
>>>> Hello,
>>>>
>>>> Can somebody double check the following code, please?
>>>>
>>>>> /*
>>>>> * At this point the response is stale, unless one of
>>>>> * the override options kicks in.
>>>>> * NOTE: max-stale config blocks the overrides.
>>>>> */
>>>>> int max_stale = (R->max_stale >= 0 ? R->max_stale :
>>>>> Config.maxStale);
>>>>> if ( max_stale >= 0 && staleness < max_stale) {
>>>>> debugs(22, 3, "refreshCheck: YES: max-stale limit");
>>>>> if (request)
>>>>> request->flags.fail_on_validation_err = 1;
>>>>> return STALE_MAX_STALE;
>>>>> }
>>>>
>>>>
>>>> It makes no sense to me, and I see 60-second stale objects being
>>>> declared STALE_MAX_STALE with the default configuration where max-stale
>>>> is 1 week. The code declares the object stale if the object "staleness"
>>>> is _less_ than the configured maximum. I would expect the opposite!
>>>>
>>>> It is not clear from the squid.conf documentation what the exact intent
>>>> of the max-stale option is with the relation to other checks such as
>>>> refresh_pattern.max, but at the minimum, should not the comparison be
>>>> reversed?
>>>
>>>
>>> Yes. This appears to be wrong. Frankly I'm a little confused as to how
>>> it got that way. I did test this rather thoroughly since it was a tricky
>>> directive when I started the portage.
>>>
>>>
>>> "max_stale" directive is a global default. "refresh_pattern max-stale=N"
>>> is a specific override of the global. Cache-Control weighs in
>>> occasionally *if* it is smaller than either of the config values.
>>>
>>> All of them are "up to" limits in accordance with RFC 2616. Where the
>>> client MAY receive a stale object UP TO an unspecified configured limit.
>>> (Thus we can avoid violations and Cache-Control weighing in if the local
>>> admin wants shorter staleness).
>>
>> IMO, the priority of these limits is not clear from the documentation.
>> We have, at least:
>>
>> 1. global max_stale
>> 2. refresh_pattern max-stale
>> 3. Cache-Control: max-stale
>>
>> It is clear that the refresh_pattern max-stale overwrites global
>> max-stale, but other possible combinations are not documented. Your
>> "minimum value wins" explanation above should be morphed into squid.conf
>> comments, I guess, but it does not match the code either, so there is
>> more work to do than just updating the docs.
>>
>> Also, there are at least two different questions we must answer here:
>>
>> A) What happens when the staleness does not exceed the winning max-stale
>> value? Your answer above is, essentially, the object is considered FRESH
>> in that case (unless the checks before the above code overwrite that?).
>>
>> B) What happens when the staleness exceeds than the winning max-stale
>> value? It is not clear whether the object must be considered stale
>> (i.e., the winning max-stale overwrites all other checks below it) OR
>> the object may still be considered fresh if some other check below it
>> says so.
>
> By the time these tests are runs the object is already declared stale.
> It is just a matter of by how much and whether the stale object may be
> returned.

The object is indeed "pre-declared stale" inside refreshCheck(), but
that internal declaration is invisible to the caller. If something
overwrites that internal declaration (e.g., we return
FRESH_OVERRIDE_EXPIRES), then the object is treated as cachable:

> if (reason < STALE_MUST_REVALIDATE)
> /* Does not need refresh. This is certainly cachable */
> return 1;
>

>>> They all have no effect on 1xx-4xx server responses or fresh HITs. Only
>>> on 5xx when revalidating a stale HIT.
>>
>> That is possible, but in the particular case that triggered this thread,
>> the code in question was the reported reason for why the response was
>> considered stale beyond repair and, hence, not cachable:
>>
>>> refreshCheck: Matched '. 0 50%% 604800'
>>> age: 60
>>> check_time: Tue, 26 Apr 2011 08:44:08 GMT
>>> entry->timestamp: Tue, 26 Apr 2011 08:43:08 GMT
>>> STALE: age 60 >= min 0
>>> Staleness = 60
>>> refreshCheck: YES: max-stale limit
>>> refreshIsCachable() returned non-cacheable..
>>> StoreEntry::expireNow: '8D46C584D5269CF1B213D425C8F8944F'
>>> StoreEntry::setReleaseFlag: '8D46C584D5269CF1B213D425C8F8944F'
>>
>>
>> It is possible that bugs elsewhere led to that, of course, but on the
>> surface of things, it looks like the refreshCheck code was responsible
>> for 0% hit ratio in that specific environment (I know that we do get
>> hits under different conditions where incoming responses are not stale).
>
> Interesting, okay. So the incoming object was already too much "stale"
> according to the broken logic (<). Fixing that operator is the right
> fix. You want to do the honours?

I do not think I should: I still think there are other problems with the
related code and/or documentation so it would be difficult for me to
explain exactly what problem the change from "<" to ">" would fix.
Please do that if you are comfortable.

Thank you,

Alex.
Received on Wed Apr 27 2011 - 05:02:52 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 27 2011 - 12:00:05 MDT