Re: [squid-users] How to filter response in squid-3.1.x?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 20 Oct 2011 21:11:53 +1300

On 20/10/11 20:11, Kaiwang Chen wrote:
> 2011/10/20 Amos Jeffries<squid3_at_treenet.co.nz>:
>> On Thu, 20 Oct 2011 00:39:32 +0800, Kaiwang Chen wrote:
>>>
>>> 2011/10/19 Amos Jeffries:
>>>>
>>>> On Wed, 19 Oct 2011 05:15:22 +0800, Kaiwang Chen wrote:
>>
>> <snip>
>>>>
>>>> To only change the HTTP headers, there are some tricks you can do with
>>>> the
>>>> "must-revalidate" and/or "proxy-revalidate" cache control. These controls
>>>> causes the surrogate to contact the origin web server on every request.
>>>> The
>>>> origin can send back new headers on a 304 not-modified response. Meaning
>>>> the
>>>> headers get changed per-response, but the cached body gets sent only when
>>>> actually changed. Retaining most of the bandwidth and performance
>>>> benefits
>>>> of caching.
>>>
>>> So, the possible solution could be injecting a "Cache-Control:
>>> must-revalidate" header by some eCap reqmod_precache service, then
>>> Squid will revalidate the response on every request carrying new
>>> request headers, then the origin server has its chance to set new
>>> response headers? A little counter-intuitive workaround for class 4
>>> adaption. Not perfect, since revalidate only occurs only when the
>>> response is stale,
>>
>> That would be 'normal' revalidation operation. Which is why the control
>> exists and is called must-revalidate. To override the normal operation and
>> force revalidation on every request.
>>
>> You could set it in a filter module altering the headers. And repeat the
>> setup on every proxy surrogate as your expand the CDN. It is far easier to
>> send it from the origin which is designed to do set these controls very
>> efficiently and scales perfectly.
>>
>
> So which header forces revalidation on every request, is it cache
> response directive "Cache-Control: max-age=0, must-revalidate"?
>
> Referring to this section of rfc2616,
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9,
> 'Cache-Control: must-revalidate' is a cache-response-directive, and as
> cited:
>
> When the must-revalidate directive is present in a response
> received by a cache, that cache MUST NOT use the entry after it
> becomes stale to respond to a subsequent request without first
> revalidating it with the origin server. ... In all circumstances an
> HTTP/1.1 cache MUST obey the must-revalidate directive; in particular,
> if the cache cannot reach the origin server for any reason, it MUST
> generate a 504 (Gateway Timeout) response.
>
> Well, I have some trouble to understand the following transaction,
> where the cached response was stale from the client's perspective,
> Squid really did revalidation and got 304(success, isn't it?),
> however, the client still got "Revalidation failed" warning... The
> Squid was configured
>
> refresh_pattern . 0 20% 4320
>
> which guessed out a relatively long fresh period. And when origin
> server specifies "Cache-Control: max-age=0, must-revalidate", Squid
> revalidates on each request and warns the client with "Revalidation
> failed".
>
> //============= client -> surrogate
> GET /cgi-bin/index.php HTTP/1.0
> User-Agent: Wget/1.10.2 (Red Hat modified)
> Accept: */*
> Host: my.example.com
> Connection: Keep-Alive
> Cache-Control: max-age=10
>
> //============= surrogate -> origin server
> GET /cgi-bin/index.php HTTP/1.1
> If-Modified-Since: Wed, 19 Oct 2011 11:00:00 GMT
> User-Agent: Wget/1.10.2 (Red Hat modified)
> Accept: */*
> Host: my.example.com
> Via: 1.0 s0.example.com (squid/3.1.16)
> X-Forwarded-For: x.x.x.x
> Cache-Control: max-age=10
> Connection: keep-alive
>
> //============= origin server -> surrogate
> HTTP/1.1 304 Not Modified
> Date: Thu, 20 Oct 2011 05:26:28 GMT
> Server: Apache/2.2.3 (CentOS)
> Connection: close
> Cache-Control: must-revalidate
>
> //============= Surrogate -> Client
> HTTP/1.0 200 OK
> X-Powered-By: PHP/5.1.6
> Last-Modified: Wed, 19 Oct 2011 11:00:00 GMT
> Content-Length: 66
> Content-Type: text/html; charset=UTF-8
> Date: Thu, 20 Oct 2011 05:26:28 GMT
> Server: Apache/2.2.3 (CentOS)
> Cache-Control: must-revalidate
> Warning: 110 squid/3.1.16 "Response is stale"
> Warning: 111 squid/3.1.16 "Revalidation failed"

These warnings being present is a bug. The rest of the result is correct.

  The max-age=10 requirement ("nothing more than 10 seconds stale")
forces it to revalidate since the object it has is around 24hrs old.

  The origins must-revalidate also forces revalidation.

  The reply to the client should not have the warnings, since the origin
has indicated that the object is currently valid (304).

refresh_pattern is not relevant. Since there is a Cache-Control header
present. No estimations need to be made.

>>
>>> while what I am looking for is adapting every
>>> response before it leaves Squid for the client. 'Cache-Control:
>>> max-age=0' will force revalidation every response, though.
>>
>> Otherwise known as "force reload".
>> Forces full erasure and new a full new fetch on every request. Not
>> revalidation.
>
> Let's make it clear.. Is the 'Cache-Control: max-age=0' as request
> header that force full erasure,

No. From the client it simply means revalidate immediately. AND pass on
the max-age=0 to origin.

Erasure is a side effect of Squid receiving a 200 reply from the
revalidation check. Nothing more. It is very likely to change when
multiple variants are cached.

> while 'Cache-Control: max-age=0' as
> response header simply marks pre-expiration and Squid fells free to
> store a pre-expired response and validates it later when serving next
> request?

That is correct.
You will just need to check the Squid release is one of the recent ones
which cache pre-expired content. Some earlier ones did not.

>
> Looks like a "Surrogate-Control: max-age=0, revalidate" header
> eliminates the need of a filter module in this case? Not sure about
> the 'Surrogate-Control: revalidate", since it is not listed in Edge
> Architecture Specification, http://www.w3.org/TR/edge-arch, referred
> by http://wiki.squid-cache.org/Features/Surrogate.

Squid ignores unknown ones presently. If you need, it can be extended.
Although, if you go with max-age=0, revalidate is redundant.

>>>
>>> I also chance read ESI which really resembles class 4 adaption with
>>> limited capability that only modifies response body. Looks like it is
>>> incapable of doing custom complex calculation. So Squid does not
>>> support class 4 adaption in general? Any other alternative?
>>
>> ESI, yes is good for personalization of the body. It does not exactly do
>> calculations. It does widget insertion in to pages for personalization at
>> the gateway machine. Allowing caching of the page template and widgets
>> separately within a CDN.
>>
>> You were taking about personalizing Cookies etc, which are not part of the
>> body content.
>
> Sure. A side question: when a surrogate fetches ESI widget, will it
> carry request headers from client(assuming widget is in same domain to
> that of the page) and inject response headers before the page is
> served to client?
>

I don't think so. It is just a form of body/object macro-expansion. With
some fancy bits for determining which widget to insert.

>
> So Squid without the adapter will cache one copy of responses in only
> one encoding.

Yes.

> Will "Vary:Accept-Encoding" request header enable
> multiply copies?

No. It tells Squid there are multiple variants with the same URL, and to
check the Accept-Encoding header against the one stored already when
deciding if it is a HIT.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.16
   Beta testers wanted for 3.2.0.13
Received on Thu Oct 20 2011 - 08:12:05 MDT

This archive was generated by hypermail 2.2.0 : Fri Oct 21 2011 - 12:00:03 MDT