Re: [squid-users] refresh_pattern dynamic content doubts?

From: Beto Moreno <pamrtj_at_gmail.com>
Date: Tue, 22 May 2012 15:51:18 -0700

On Sun, May 20, 2012 at 12:57 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 20/05/2012 4:52 p.m., Beto Moreno wrote:
>>
>> Hi.
>>
>> I have read in the doc that squid default setup is using the old way
>> to handle dynamic content:
>>
>> case A
>> hierarchy_stoplist cgi-bin ?
>> acl QUERY urlpath_regex cgi-bin \?
>> cache deny QUERY
>>
>> And for the new way for this is using the next settings:
>> case B
>> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
>> refresh_pattern .            0 20% 4320
>>
>> Some sites I had seen they use things like:
>> case C
>> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
>> refresh_pattern -i \.index.(html|htm)$ 1440 90% 40320
>> refresh_pattern -i \.(html|htm|css|js)$ 1440 90% 40320
>> refresh_pattern .            0 20% 4320
>>
>> the old way in your experience is no longer the right way for this?
>
>
> There is no right/wrong here.
>
> HTTP/1.0 specification is clear that dynamic content created by CGI scripts
> is *very likely* unsafe to cache *unless* the script emits Cache-Control
> headers.
>
> The "old" way was to simply not cache anything which came from a dynamic
> script generator.
>
> The refresh_pattern rules are only used for the objects which have no
> cache-control (ie the unsafe requests) and "-i (/cgi-bin/|\?) 0 0% 0" is a
> heuristic rule crafted specifically to match the "dynamic content" criteria
> and prevent that unsafe content being cached.
>
> The new way permits caching whenever the dynamic responses created by modern
> script languages send cache-controls. All the modern dynamic websites are
> cacheable (their script engines emit cache-control) and using "?, so the old
> way would prevent caching. Leaving ISP with <20% cache HIT ratios. Moving to
> the new rule gains a few % in HIT ratio without much risk.
>
>
>> What is the different between case B and case C?
>> which is better?
>
>
> There is no "better". Everything in refresh_pattern is relative to the
> specific traffic pattern going through a specific proxy.
>
> You can tune it perfectly for todays traffic, and a new website becomes
> popular tomorrow that uses entirely different patterns. Or the popular
> website you are trying to cache changes their headers.
>
>
>
>> for dynamic content is the only settings we have?(I don't care about
>> youtube or streaming).
>
>
> The thing to understand is that to squid there is no distinction between
> "dynamic" and "static" content. It is all just content. *individual* objects
> have headers (or not) which indicate its *individual* cacheability.
>
> "refresh_pattern" directive is a blunt-object regex pattern applied
> universally to all requests to estimate cacheability time for objects which
> have no specific mention of lifetime sent by the server.
> "cache" directive is a sledge hammer to prevent caching or particular ACL
> matching requests.
>
>
>
>>
>> exist a formula to setup min/max percent?
>
>
> No. They are the *input* values to a formula for calculating expiry time.
> They are how long *you want* to store any object which matches the regex.
>
>
> Amos

Thanks for your great explanation, I'm working on this settings and
see which of them give more from my cache, thanks again!!!
Received on Tue May 22 2012 - 22:51:25 MDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 12:00:04 MDT