Re: max_stale broken?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 09 Sep 2011 10:43:10 -0600

On 04/26/2011 11:02 PM, Alex Rousskov wrote:
> On 04/26/2011 09:54 PM, Amos Jeffries wrote:
>> On Tue, 26 Apr 2011 21:31:41 -0600, Alex Rousskov wrote:
>>> On 04/26/2011 05:23 PM, Amos Jeffries wrote:
>>>> On Tue, 26 Apr 2011 10:18:50 -0600, Alex Rousskov wrote:
>>>>> Hello,
>>>>>
>>>>> Can somebody double check the following code, please?
>>>>>
>>>>>> /*
>>>>>> * At this point the response is stale, unless one of
>>>>>> * the override options kicks in.
>>>>>> * NOTE: max-stale config blocks the overrides.
>>>>>> */
>>>>>> int max_stale = (R->max_stale >= 0 ? R->max_stale :
>>>>>> Config.maxStale);
>>>>>> if ( max_stale >= 0 && staleness < max_stale) {
>>>>>> debugs(22, 3, "refreshCheck: YES: max-stale limit");
>>>>>> if (request)
>>>>>> request->flags.fail_on_validation_err = 1;
>>>>>> return STALE_MAX_STALE;
>>>>>> }
>>>>>
>>>>>
>>>>> It makes no sense to me, and I see 60-second stale objects being
>>>>> declared STALE_MAX_STALE with the default configuration where max-stale
>>>>> is 1 week. The code declares the object stale if the object "staleness"
>>>>> is _less_ than the configured maximum. I would expect the opposite!
>>>>>
>>>>> It is not clear from the squid.conf documentation what the exact intent
>>>>> of the max-stale option is with the relation to other checks such as
>>>>> refresh_pattern.max, but at the minimum, should not the comparison be
>>>>> reversed?
>>>>
>>>>
>>>> Yes. This appears to be wrong. Frankly I'm a little confused as to how
>>>> it got that way. I did test this rather thoroughly since it was a tricky
>>>> directive when I started the portage.
>>>>
>>>>
>>>> "max_stale" directive is a global default. "refresh_pattern max-stale=N"
>>>> is a specific override of the global. Cache-Control weighs in
>>>> occasionally *if* it is smaller than either of the config values.
>>>>
>>>> All of them are "up to" limits in accordance with RFC 2616. Where the
>>>> client MAY receive a stale object UP TO an unspecified configured limit.
>>>> (Thus we can avoid violations and Cache-Control weighing in if the local
>>>> admin wants shorter staleness).
>>>
>>> IMO, the priority of these limits is not clear from the documentation.
>>> We have, at least:
>>>
>>> 1. global max_stale
>>> 2. refresh_pattern max-stale
>>> 3. Cache-Control: max-stale
>>>
>>> It is clear that the refresh_pattern max-stale overwrites global
>>> max-stale, but other possible combinations are not documented. Your
>>> "minimum value wins" explanation above should be morphed into squid.conf
>>> comments, I guess, but it does not match the code either, so there is
>>> more work to do than just updating the docs.
>>>
>>> Also, there are at least two different questions we must answer here:
>>>
>>> A) What happens when the staleness does not exceed the winning max-stale
>>> value? Your answer above is, essentially, the object is considered FRESH
>>> in that case (unless the checks before the above code overwrite that?).
>>>
>>> B) What happens when the staleness exceeds than the winning max-stale
>>> value? It is not clear whether the object must be considered stale
>>> (i.e., the winning max-stale overwrites all other checks below it) OR
>>> the object may still be considered fresh if some other check below it
>>> says so.
>>
>> By the time these tests are runs the object is already declared stale.
>> It is just a matter of by how much and whether the stale object may be
>> returned.
>
> The object is indeed "pre-declared stale" inside refreshCheck(), but
> that internal declaration is invisible to the caller. If something
> overwrites that internal declaration (e.g., we return
> FRESH_OVERRIDE_EXPIRES), then the object is treated as cachable:
>
>> if (reason < STALE_MUST_REVALIDATE)
>> /* Does not need refresh. This is certainly cachable */
>> return 1;
>>
>
>
>
>>>> They all have no effect on 1xx-4xx server responses or fresh HITs. Only
>>>> on 5xx when revalidating a stale HIT.
>>>
>>> That is possible, but in the particular case that triggered this thread,
>>> the code in question was the reported reason for why the response was
>>> considered stale beyond repair and, hence, not cachable:
>>>
>>>> refreshCheck: Matched '. 0 50%% 604800'
>>>> age: 60
>>>> check_time: Tue, 26 Apr 2011 08:44:08 GMT
>>>> entry->timestamp: Tue, 26 Apr 2011 08:43:08 GMT
>>>> STALE: age 60 >= min 0
>>>> Staleness = 60
>>>> refreshCheck: YES: max-stale limit
>>>> refreshIsCachable() returned non-cacheable..
>>>> StoreEntry::expireNow: '8D46C584D5269CF1B213D425C8F8944F'
>>>> StoreEntry::setReleaseFlag: '8D46C584D5269CF1B213D425C8F8944F'
>>>
>>>
>>> It is possible that bugs elsewhere led to that, of course, but on the
>>> surface of things, it looks like the refreshCheck code was responsible
>>> for 0% hit ratio in that specific environment (I know that we do get
>>> hits under different conditions where incoming responses are not stale).
>>
>> Interesting, okay. So the incoming object was already too much "stale"
>> according to the broken logic (<). Fixing that operator is the right
>> fix. You want to do the honours?
>
> I do not think I should: I still think there are other problems with the
> related code and/or documentation so it would be difficult for me to
> explain exactly what problem the change from "<" to ">" would fix.
> Please do that if you are comfortable.

The max-stale limit check fix has been committed as trunk r11721 per
Amos request during Rock Store review.

Cheers,

Alex.
Received on Fri Sep 09 2011 - 16:43:41 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 10 2011 - 12:00:02 MDT