Re: refresh_pattern review

From: Doug Dixon <doug.dixon@dont-contact.us>
Date: Thu, 15 Jun 2006 23:33:05 +1200

Following some IRC chat, I thought I'd start a discussion on a
possible improvement of refresh_pattern in Squid3.

The starting point for this discussion is the fact that
refresh_pattern is a source of confusion for many users, even expert
admins. It's not obvious what it does, how to achieve certain things,
or under what circumstances different bits of it apply or don't apply.

Currently refresh_pattern means different things depending on how the
response freshness was calculated: whether by explicit header set by
the origin server (Cache-Control, Expires), by invoking the Last-
Modified algorithm (if it had a Last-Modified header), or whether it
could not calculate a freshness by either of these methods.

It's quite complicated. I don't know what the right answer is.

Here is an idea though:

We could separate the configuration out into "standard" and "HTTP
violating" parts. Let us define "standard" as the two mechanisms that
are most semantically transparent:

1. Explicit expiration set by server (Cache-Control, Expires)
2. Heuristic expiration based on Last-Modified

And let's define "HTTP violating" as anything that either overrides
these, or anything that enforces cacheability in the absence of any
of these headers.

What configuration options do we need for each of these two categories?

For the "standard" configuration:
We don't need any options for the explicit expiry mechanism, as
it's... explicit :)
However, we do need a couple of global options for the Last-Modified
factor algorithm:

      TAG: refresh_lastmod_factor (percent)
      Default: 20

      TAG: refresh_lastmod_max (minutes)
      Default: 10080

These, then, are the only refresh options I propose for a non-HTTP-
violating setup.

Now for the "HTTP violating" overrides, which are more complicated.

Defaults are set first:
        
      TAG: refresh_override_default options
      Default: none

These can be refined by regex:

      TAG: refresh_override_match [-i] pattern options
      Default: none

where options can be any of:
      min=xxx
           minimum amount of time this object will be considered fresh
      max=xxx
           maximum amount of time this object will be considered fresh
      ignore-reload=on|off
           ignore all client headers that prevent serving a cached
response
      reload-into-ims=on|off
           client reload is downgraded from unconditional to
conditional GET
      ignore-no-cache=on|off
           ignore all server headers that prevent caching a response
      ignore-no-store=on|off
           ignore "Cache-Control: no-store" server header
      ignore-private=on|off
           ignore "Cache-Control: private" server header
      ignore-auth=on|off
           cache authorized responses, even if server didn't specify
"Cache-Control: public"
      refresh-ims=on|off
           always pass client IMS requests through to the origin,
even if we think our copy is fresh

For example:
      refresh_override_default max=4320 reload-into-ims=on

      refresh_override_match http://host/ ignore-reload=on
ignore-no-cache=on ignore-no-store=on
      refresh_override_match /path/ reload-into-ims=off
      refresh_override_match \.jpe?g$ min=1440
      refresh_override_match \.css$ max=60

Main differences in usage:

1. The overrides would always apply, regardless of how the expiration
time was arrived at - whether by explicit headers or last-modified
algorithm heuristics. Currently the Min, Max and Percent settings
only apply in different specific circumstances, e.g. Max and Percent
only apply to L-M requests, Min only applies in the absence of L-M,
Expires and CC max-age.

2. The refresh_override_default would always apply (although its
options may be overridden by those of a refresh_override_match).
Currently the default refresh_pattern only applies if no patterns
match the request, meaning you can't ever override default behaviour,
you can only fall back to it.

3. There is no way of setting the Last-Modified factor percentage by
regex! This is perhaps a big problem, and it could be added as an
option. But then it would be the only non-HTTP-violating directive
possible in the option... and so would spoil it slightly.

4. No need for global counterparts of refresh_pattern directives,
e.g. refresh_all_ims and reload_into_ims.

5. Frequently used override options could be stated in the default
instead of every subsequent line

This may be completely the wrong way of looking at it, or it may be
just going too far. A smaller, but still helpful, step might be to
introduce a refresh_pattern_default whose values would be inherited
by any subsequent refresh_pattern match.

Any help or input into this would be very welcome indeed

Doug

On 1 Jun 2006, at 20:06, Doug Dixon wrote:

> Hi
>
> I'm fixing bug 1202 (it's a simple fix) and am cleaning up
> refresh.cc at the same time.
>
> I'd like to review the various refresh_pattern options, as some of
> them are mutually exclusive in practice (although you can configure
> all of them) and it's not clear from the documentation what they
> all mean. They're quite hard to understand and use correctly.
>
>
> 1. reload-into-ims
>
> The following is legal:
>
> refresh_pattern html$ 5 20% 60 ignore-reload
> reload-into-ims
>
> but reload-into-ims will not have any effect. You could argue that
> this is obvious, but I think it should be caught at parse time.
>
> 2. As an aside - but I want to mention it here - we need to make it
> clearer that if an object does specify an expiry time, the Min,
> Percent and Max values in refresh_pattern will be completely
> ignored, but the options won't be. I'll change cf.data.pre accordingly
>
> 3. override-expire
>
> override-expire enforces min age even if the server
> sent a Expires: header. Doing this VIOLATES the HTTP
> standard. Enabling this feature could make you liable
> for problems which it causes.
>
> If you do want to modify the behaviour of blindly obeying the
> server's explicit expiry time, you can - to an extent.
>
> The override-expire option enforces the Min time in cache, even if
> the origin stated it should expire before then.
> But it ignores the Max time (surprising!), and the L-M factor (more
> expected - not obvious what this would do anyway)
>
> It's not very intuitive. I think we should probably make this
> option enforce the Max time as well. Possibly even ignore the
> explicit expiry of the object altogether and fall back to last-
> modified factor??
>
> It could be a naming thing... override-expire doesn't really say
> what it does. enforce-min might be better. But then you've already
> stated a min and might expect it to be already enforced.
>
> 4. override-lastmod
>
> override-lastmod enforces min age even on objects
> that were modified recently.
>
> The Min time isn't enforced even when the last-modified factor
> algorithm does kick in. If the object was only just modified and
> the L-M factor algorithm results in a figure lower than the Min, it
> will be considered fresh for less than the configured Min.
>
> This isn't what I would expect. I know that the override-lastmod
> exists to let you do this, but it's really non-intuitive. I think
> the Min should always be enforced if we're using L-M factor
> algorithm, and that we should therefore lose the override-lastmod
> option. Can't see the point in the default (null) behaviour of Min
> otherwise.
>
>
> Thoughts?
>
> Doug
>
Received on Thu Jun 15 2006 - 05:33:13 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Jun 30 2006 - 12:00:02 MDT