Re: max_stale broken?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 26 Apr 2011 21:31:41 -0600

On 04/26/2011 05:23 PM, Amos Jeffries wrote:
> On Tue, 26 Apr 2011 10:18:50 -0600, Alex Rousskov wrote:
>> Hello,
>>
>> Can somebody double check the following code, please?
>>
>>> /*
>>> * At this point the response is stale, unless one of
>>> * the override options kicks in.
>>> * NOTE: max-stale config blocks the overrides.
>>> */
>>> int max_stale = (R->max_stale >= 0 ? R->max_stale :
>>> Config.maxStale);
>>> if ( max_stale >= 0 && staleness < max_stale) {
>>> debugs(22, 3, "refreshCheck: YES: max-stale limit");
>>> if (request)
>>> request->flags.fail_on_validation_err = 1;
>>> return STALE_MAX_STALE;
>>> }
>>
>>
>> It makes no sense to me, and I see 60-second stale objects being
>> declared STALE_MAX_STALE with the default configuration where max-stale
>> is 1 week. The code declares the object stale if the object "staleness"
>> is _less_ than the configured maximum. I would expect the opposite!
>>
>> It is not clear from the squid.conf documentation what the exact intent
>> of the max-stale option is with the relation to other checks such as
>> refresh_pattern.max, but at the minimum, should not the comparison be
>> reversed?
>
>
> Yes. This appears to be wrong. Frankly I'm a little confused as to how
> it got that way. I did test this rather thoroughly since it was a tricky
> directive when I started the portage.
>
>
> "max_stale" directive is a global default. "refresh_pattern max-stale=N"
> is a specific override of the global. Cache-Control weighs in
> occasionally *if* it is smaller than either of the config values.
>
> All of them are "up to" limits in accordance with RFC 2616. Where the
> client MAY receive a stale object UP TO an unspecified configured limit.
> (Thus we can avoid violations and Cache-Control weighing in if the local
> admin wants shorter staleness).

IMO, the priority of these limits is not clear from the documentation.
We have, at least:

  1. global max_stale
  2. refresh_pattern max-stale
  3. Cache-Control: max-stale

It is clear that the refresh_pattern max-stale overwrites global
max-stale, but other possible combinations are not documented. Your
"minimum value wins" explanation above should be morphed into squid.conf
comments, I guess, but it does not match the code either, so there is
more work to do than just updating the docs.

Also, there are at least two different questions we must answer here:

A) What happens when the staleness does not exceed the winning max-stale
value? Your answer above is, essentially, the object is considered FRESH
in that case (unless the checks before the above code overwrite that?).

B) What happens when the staleness exceeds than the winning max-stale
value? It is not clear whether the object must be considered stale
(i.e., the winning max-stale overwrites all other checks below it) OR
the object may still be considered fresh if some other check below it
says so.

> They all have no effect on 1xx-4xx server responses or fresh HITs. Only
> on 5xx when revalidating a stale HIT.

That is possible, but in the particular case that triggered this thread,
the code in question was the reported reason for why the response was
considered stale beyond repair and, hence, not cachable:

> refreshCheck: Matched '. 0 50%% 604800'
> age: 60
> check_time: Tue, 26 Apr 2011 08:44:08 GMT
> entry->timestamp: Tue, 26 Apr 2011 08:43:08 GMT
> STALE: age 60 >= min 0
> Staleness = 60
> refreshCheck: YES: max-stale limit
> refreshIsCachable() returned non-cacheable..
> StoreEntry::expireNow: '8D46C584D5269CF1B213D425C8F8944F'
> StoreEntry::setReleaseFlag: '8D46C584D5269CF1B213D425C8F8944F'

It is possible that bugs elsewhere led to that, of course, but on the
surface of things, it looks like the refreshCheck code was responsible
for 0% hit ratio in that specific environment (I know that we do get
hits under different conditions where incoming responses are not stale).

>> if (max_stale >= 0 && staleness > max_stale)
>> ...
>> return STALE_...
>>
>> Please note that since Config.maxStale defaults to one week, the
>> max_stale variable is not negative and the above check kicks in for all
>> stale objects by default, no matter what refresh_pattern max is! Again,
>> it is not clear to me whether that is intentional. If it is, we should
>> document this surprising effect better.
>
> The limit is min(config, Cache-Control: max-stale). The above if() is
> supposed to be a shortcut for min().

I do not think that the "if" guard or the code below it is equivalent to
min(), even after the comparison is fixed. The current code is just
"refresh_pattern max-stale overwrites global max-stale", I think.

Cheers,

Alex.

> "1 week" is an arbitrary value copied from squid-2. It could be up to a
> year, but IMO having a server 5xx'ing for a whole week before anyone
> notices is already a bad situation.
> FWIW there was no limit on staleness prior to the portage. 3.0 and 3.1
> will simply emit stale HITs *forever* or until the validation succeeds.
>
>
>>
>> Another option would be to make the check specific to sf.max cases:
>>
>> if (sf.max && max_stale >= 0 && staleness <= max_stale)
>> ...
>> return FRESH_...
>>
>> but that has problems of its own (why would we need max_stale suboption
>> in refresh_pattern then?).
>
>
> ie with the config:
> max_stale 3600 second
> refresh_pattern *.html 0 100% 4320 max-stale=6000
>
> When cache-control *does not* get involved object /index.html should be
> stale HIT for 6000 seconds. Whereas /image.jpg could be stale HIT for up
> to 3600 seconds.
>
>
> When cache-control *does* get involved things get a bit tricky and
> dodgy. The idea is that min(config, Cache-Control) limit is used, but
> not strictly since the refresh_pattern estimation may not be applied.
>
>
> Amos
Received on Wed Apr 27 2011 - 03:32:10 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 27 2011 - 12:00:05 MDT