Re: refresh_pattern review

From: Doug Dixon <doug.dixon@dont-contact.us>
Date: Tue, 20 Jun 2006 17:40:51 +1200

I know this isn't the most hotly debated topic of recent years :) but
one other thing I've just thought of:

Suppose you've got a response with the following headers:

      Cache-Control: max-age=0
      Last-Modified: [10 days ago]

And suppose it matches the following refresh_pattern:

      refresh_pattern . 0 50% 252900 ignore-no-cache

Currently this won't be cached, despite the ignore-no-cache option.
I think we should give the Last-Modified heuristic a chance in these
circumstances (simple fix)

On 15 Jun 2006, at 23:33, Doug Dixon wrote:

> Following some IRC chat, I thought I'd start a discussion on a
> possible improvement of refresh_pattern in Squid3.
>
> The starting point for this discussion is the fact that
> refresh_pattern is a source of confusion for many users, even
> expert admins. It's not obvious what it does, how to achieve
> certain things, or under what circumstances different bits of it
> apply or don't apply.
>
> Currently refresh_pattern means different things depending on how
> the response freshness was calculated: whether by explicit header
> set by the origin server (Cache-Control, Expires), by invoking the
> Last-Modified algorithm (if it had a Last-Modified header), or
> whether it could not calculate a freshness by either of these methods.
>
> It's quite complicated. I don't know what the right answer is.
>
> Here is an idea though:
>
> We could separate the configuration out into "standard" and "HTTP
> violating" parts. Let us define "standard" as the two mechanisms
> that are most semantically transparent:
>
> 1. Explicit expiration set by server (Cache-Control, Expires)
> 2. Heuristic expiration based on Last-Modified
>
> And let's define "HTTP violating" as anything that either overrides
> these, or anything that enforces cacheability in the absence of any
> of these headers.
>
> What configuration options do we need for each of these two
> categories?
>
> For the "standard" configuration:
> We don't need any options for the explicit expiry mechanism, as
> it's... explicit :)
> However, we do need a couple of global options for the Last-
> Modified factor algorithm:
>
> TAG: refresh_lastmod_factor (percent)
> Default: 20
>
> TAG: refresh_lastmod_max (minutes)
> Default: 10080
>
> These, then, are the only refresh options I propose for a non-HTTP-
> violating setup.
>
>
> Now for the "HTTP violating" overrides, which are more complicated.
>
> Defaults are set first:
>
> TAG: refresh_override_default options
> Default: none
>
> These can be refined by regex:
>
> TAG: refresh_override_match [-i] pattern options
> Default: none
>
> where options can be any of:
> min=xxx
> minimum amount of time this object will be considered fresh
> max=xxx
> maximum amount of time this object will be considered fresh
> ignore-reload=on|off
> ignore all client headers that prevent serving a cached
> response
> reload-into-ims=on|off
> client reload is downgraded from unconditional to
> conditional GET
> ignore-no-cache=on|off
> ignore all server headers that prevent caching a response
> ignore-no-store=on|off
> ignore "Cache-Control: no-store" server header
> ignore-private=on|off
> ignore "Cache-Control: private" server header
> ignore-auth=on|off
> cache authorized responses, even if server didn't specify
> "Cache-Control: public"
> refresh-ims=on|off
> always pass client IMS requests through to the origin,
> even if we think our copy is fresh
>
> For example:
> refresh_override_default max=4320 reload-into-ims=on
>
> refresh_override_match http://host/ ignore-reload=on
> ignore-no-cache=on ignore-no-store=on
> refresh_override_match /path/ reload-into-ims=off
> refresh_override_match \.jpe?g$ min=1440
> refresh_override_match \.css$ max=60
>
>
> Main differences in usage:
>
> 1. The overrides would always apply, regardless of how the
> expiration time was arrived at - whether by explicit headers or
> last-modified algorithm heuristics. Currently the Min, Max and
> Percent settings only apply in different specific circumstances,
> e.g. Max and Percent only apply to L-M requests, Min only applies
> in the absence of L-M, Expires and CC max-age.
>
> 2. The refresh_override_default would always apply (although its
> options may be overridden by those of a refresh_override_match).
> Currently the default refresh_pattern only applies if no patterns
> match the request, meaning you can't ever override default
> behaviour, you can only fall back to it.
>
> 3. There is no way of setting the Last-Modified factor percentage
> by regex! This is perhaps a big problem, and it could be added as
> an option. But then it would be the only non-HTTP-violating
> directive possible in the option... and so would spoil it slightly.
>
> 4. No need for global counterparts of refresh_pattern directives,
> e.g. refresh_all_ims and reload_into_ims.
>
> 5. Frequently used override options could be stated in the default
> instead of every subsequent line
>
>
> This may be completely the wrong way of looking at it, or it may be
> just going too far. A smaller, but still helpful, step might be to
> introduce a refresh_pattern_default whose values would be inherited
> by any subsequent refresh_pattern match.
>
>
> Any help or input into this would be very welcome indeed
>
> Doug
>
>
> On 1 Jun 2006, at 20:06, Doug Dixon wrote:
>
>> Hi
>>
>> I'm fixing bug 1202 (it's a simple fix) and am cleaning up
>> refresh.cc at the same time.
>>
>> I'd like to review the various refresh_pattern options, as some of
>> them are mutually exclusive in practice (although you can
>> configure all of them) and it's not clear from the documentation
>> what they all mean. They're quite hard to understand and use
>> correctly.
>>
>>
>> 1. reload-into-ims
>>
>> The following is legal:
>>
>> refresh_pattern html$ 5 20% 60 ignore-
>> reload reload-into-ims
>>
>> but reload-into-ims will not have any effect. You could argue that
>> this is obvious, but I think it should be caught at parse time.
>>
>> 2. As an aside - but I want to mention it here - we need to make
>> it clearer that if an object does specify an expiry time, the Min,
>> Percent and Max values in refresh_pattern will be completely
>> ignored, but the options won't be. I'll change cf.data.pre
>> accordingly
>>
>> 3. override-expire
>>
>> override-expire enforces min age even if the server
>> sent a Expires: header. Doing this VIOLATES the HTTP
>> standard. Enabling this feature could make you liable
>> for problems which it causes.
>>
>> If you do want to modify the behaviour of blindly obeying the
>> server's explicit expiry time, you can - to an extent.
>>
>> The override-expire option enforces the Min time in cache, even if
>> the origin stated it should expire before then.
>> But it ignores the Max time (surprising!), and the L-M factor
>> (more expected - not obvious what this would do anyway)
>>
>> It's not very intuitive. I think we should probably make this
>> option enforce the Max time as well. Possibly even ignore the
>> explicit expiry of the object altogether and fall back to last-
>> modified factor??
>>
>> It could be a naming thing... override-expire doesn't really say
>> what it does. enforce-min might be better. But then you've already
>> stated a min and might expect it to be already enforced.
>>
>> 4. override-lastmod
>>
>> override-lastmod enforces min age even on objects
>> that were modified recently.
>>
>> The Min time isn't enforced even when the last-modified factor
>> algorithm does kick in. If the object was only just modified and
>> the L-M factor algorithm results in a figure lower than the Min,
>> it will be considered fresh for less than the configured Min.
>>
>> This isn't what I would expect. I know that the override-lastmod
>> exists to let you do this, but it's really non-intuitive. I think
>> the Min should always be enforced if we're using L-M factor
>> algorithm, and that we should therefore lose the override-lastmod
>> option. Can't see the point in the default (null) behaviour of Min
>> otherwise.
>>
>>
>> Thoughts?
>>
>> Doug
>>
>
Received on Tue Jun 20 2006 - 00:07:19 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Jun 30 2006 - 12:00:02 MDT