Re: [squid-users] refresh_pattern dynamic content doubts?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 20 May 2012 19:57:27 +1200

On 20/05/2012 4:52 p.m., Beto Moreno wrote:
> Hi.
>
> I have read in the doc that squid default setup is using the old way
> to handle dynamic content:
>
> case A
> hierarchy_stoplist cgi-bin ?
> acl QUERY urlpath_regex cgi-bin \?
> cache deny QUERY
>
> And for the new way for this is using the next settings:
> case B
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
>
> Some sites I had seen they use things like:
> case C
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern -i \.index.(html|htm)$ 1440 90% 40320
> refresh_pattern -i \.(html|htm|css|js)$ 1440 90% 40320
> refresh_pattern . 0 20% 4320
>
> the old way in your experience is no longer the right way for this?

There is no right/wrong here.

HTTP/1.0 specification is clear that dynamic content created by CGI
scripts is *very likely* unsafe to cache *unless* the script emits
Cache-Control headers.

The "old" way was to simply not cache anything which came from a dynamic
script generator.

The refresh_pattern rules are only used for the objects which have no
cache-control (ie the unsafe requests) and "-i (/cgi-bin/|\?) 0 0% 0" is
a heuristic rule crafted specifically to match the "dynamic content"
criteria and prevent that unsafe content being cached.

The new way permits caching whenever the dynamic responses created by
modern script languages send cache-controls. All the modern dynamic
websites are cacheable (their script engines emit cache-control) and
using "?, so the old way would prevent caching. Leaving ISP with <20%
cache HIT ratios. Moving to the new rule gains a few % in HIT ratio
without much risk.

> What is the different between case B and case C?
> which is better?

There is no "better". Everything in refresh_pattern is relative to the
specific traffic pattern going through a specific proxy.

You can tune it perfectly for todays traffic, and a new website becomes
popular tomorrow that uses entirely different patterns. Or the popular
website you are trying to cache changes their headers.

> for dynamic content is the only settings we have?(I don't care about
> youtube or streaming).

The thing to understand is that to squid there is no distinction between
"dynamic" and "static" content. It is all just content. *individual*
objects have headers (or not) which indicate its *individual* cacheability.

"refresh_pattern" directive is a blunt-object regex pattern applied
universally to all requests to estimate cacheability time for objects
which have no specific mention of lifetime sent by the server.
"cache" directive is a sledge hammer to prevent caching or particular
ACL matching requests.

>
> exist a formula to setup min/max percent?

No. They are the *input* values to a formula for calculating expiry
time. They are how long *you want* to store any object which matches the
regex.

Amos
Received on Sun May 20 2012 - 07:57:31 MDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 12:00:04 MDT