Re: [squid-users] How to filter response in squid-3.1.x?

From: Kaiwang Chen <kaiwang.chen_at_gmail.com>
Date: Thu, 20 Oct 2011 15:11:18 +0800

2011/10/20 Amos Jeffries <squid3_at_treenet.co.nz>:
> On Thu, 20 Oct 2011 00:39:32 +0800, Kaiwang Chen wrote:
>>
>> 2011/10/19 Amos Jeffries:
>>>
>>> On Wed, 19 Oct 2011 05:15:22 +0800, Kaiwang Chen wrote:
>
> <snip>
>>>
>>> To only change the HTTP headers, there are some tricks you can do with
>>> the
>>> "must-revalidate" and/or "proxy-revalidate" cache control. These controls
>>> causes the surrogate to contact the origin web server on every request.
>>> The
>>> origin can send back new headers on a 304 not-modified response. Meaning
>>> the
>>> headers get changed per-response, but the cached body gets sent only when
>>> actually changed. Retaining most of the bandwidth and performance
>>> benefits
>>> of caching.
>>
>> So, the possible solution could be injecting a "Cache-Control:
>> must-revalidate" header by some eCap reqmod_precache service, then
>> Squid will revalidate the response on every request carrying new
>> request headers, then the origin server has its chance to set new
>> response headers? A little counter-intuitive workaround for class 4
>> adaption. Not perfect, since revalidate only occurs only when the
>> response is stale,
>
> That would be 'normal' revalidation operation. Which is why the control
> exists and is called must-revalidate. To override the normal operation and
> force revalidation on every request.
>
> You could set it in a filter module altering the headers. And repeat the
> setup on every proxy surrogate as your expand the CDN. It is far easier to
> send it from the origin which is designed to do set these controls very
> efficiently and scales perfectly.
>

So which header forces revalidation on every request, is it cache
response directive "Cache-Control: max-age=0, must-revalidate"?

Referring to this section of rfc2616,
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9,
'Cache-Control: must-revalidate' is a cache-response-directive, and as
cited:

    When the must-revalidate directive is present in a response
received by a cache, that cache MUST NOT use the entry after it
becomes stale to respond to a subsequent request without first
revalidating it with the origin server. ... In all circumstances an
HTTP/1.1 cache MUST obey the must-revalidate directive; in particular,
if the cache cannot reach the origin server for any reason, it MUST
generate a 504 (Gateway Timeout) response.

Well, I have some trouble to understand the following transaction,
where the cached response was stale from the client's perspective,
Squid really did revalidation and got 304(success, isn't it?),
however, the client still got "Revalidation failed" warning... The
Squid was configured

refresh_pattern . 0 20% 4320

which guessed out a relatively long fresh period. And when origin
server specifies "Cache-Control: max-age=0, must-revalidate", Squid
revalidates on each request and warns the client with "Revalidation
failed".

//============= client -> surrogate
GET /cgi-bin/index.php HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: my.example.com
Connection: Keep-Alive
Cache-Control: max-age=10

//============= surrogate -> origin server
GET /cgi-bin/index.php HTTP/1.1
If-Modified-Since: Wed, 19 Oct 2011 11:00:00 GMT
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: my.example.com
Via: 1.0 s0.example.com (squid/3.1.16)
X-Forwarded-For: x.x.x.x
Cache-Control: max-age=10
Connection: keep-alive

//============= origin server -> surrogate
HTTP/1.1 304 Not Modified
Date: Thu, 20 Oct 2011 05:26:28 GMT
Server: Apache/2.2.3 (CentOS)
Connection: close
Cache-Control: must-revalidate

//============= Surrogate -> Client
HTTP/1.0 200 OK
X-Powered-By: PHP/5.1.6
Last-Modified: Wed, 19 Oct 2011 11:00:00 GMT
Content-Length: 66
Content-Type: text/html; charset=UTF-8
Date: Thu, 20 Oct 2011 05:26:28 GMT
Server: Apache/2.2.3 (CentOS)
Cache-Control: must-revalidate
Warning: 110 squid/3.1.16 "Response is stale"
Warning: 111 squid/3.1.16 "Revalidation failed"
X-Cache: HIT from s0.example.com
X-Cache-Lookup: HIT from s0.example.com:80
Via: 1.0 s0.example.com (squid/3.1.16)
Connection: keep-alive

<h1>It works!</h1><pre>Last-Modified: Wed, 19 Oct 2011 11:00:00GMT

>
>> while what I am looking for is adapting every
>> response before it leaves Squid for the client. 'Cache-Control:
>> max-age=0' will force revalidation every response, though.
>
> Otherwise known as "force reload".
> Forces full erasure and new a full new fetch on every request. Not
> revalidation.

Let's make it clear.. Is the 'Cache-Control: max-age=0' as request
header that force full erasure, while 'Cache-Control: max-age=0' as
response header simply marks pre-expiration and Squid fells free to
store a pre-expired response and validates it later when serving next
request?

Looks like a "Surrogate-Control: max-age=0, revalidate" header
eliminates the need of a filter module in this case? Not sure about
the 'Surrogate-Control: revalidate", since it is not listed in Edge
Architecture Specification, http://www.w3.org/TR/edge-arch, referred
by http://wiki.squid-cache.org/Features/Surrogate.

>
>>
>> I also chance read ESI which really resembles class 4 adaption with
>> limited capability that only modifies response body. Looks like it is
>> incapable of doing custom complex calculation. So Squid does not
>> support class 4 adaption in general? Any other alternative?
>
> ESI, yes is good for personalization of the body. It does not exactly do
> calculations. It does widget insertion in to pages for personalization at
> the gateway machine. Allowing caching of the page template and widgets
> separately within a CDN.
>
> You were taking about personalizing Cookies etc, which are not part of the
> body content.

Sure. A side question: when a surrogate fetches ESI widget, will it
carry request headers from client(assuming widget is in same domain to
that of the page) and inject response headers before the page is
served to client?

>
>>
>>>
>>>  NP: this trick with 304 is only possible for headers which do not update
>>> headers with details about the particular body object. ie you can use it
>>> for
>>> altering Cookie values per-request, but not for changing the apparent
>>> Content-Encoding from gzip to deflate. For things affecting the body you
>>> use
>>> the normal 200 response and send the updated body as well.
>>
>> Sure.
>>
>> BTW, I tried the gzip compression adapter from
>> http://code.google.com/p/squid-ecap-gzip/, and found that after a
>> request carrying "Accept-Encoding: gzip", Squid always passes back
>> gzip'ed response to the client, even it no longer carries that header,
>> because the object is not modified. A request without gzip support and
>> with 'Cache-Control: no-cache' refreshes the cache to be always
>> returning plain text responses.  Does it imply that Squid only caches
>> one copy of response, rather than one per each enconding? How to make
>> it serve other encoding different from the cached one?
>
> Sounds like the adapter is not working. What you describe is normal Squid
> behaviour without the adapter.
>
> IIRC the module was supposed to update the background requests to prefer
> gzipped, and itself do the un-zipping when an identity encoded response was
> required by the client.

So Squid without the adapter will cache one copy of responses in only
one encoding. Will "Vary:Accept-Encoding" request header enable
multiply copies?

>
>
> Amos
>

Thanks,
Kaiwang
Received on Thu Oct 20 2011 - 07:11:26 MDT

This archive was generated by hypermail 2.2.0 : Thu Oct 20 2011 - 12:00:03 MDT