Re: [squid-users] How to filter response in squid-3.1.x?

From: Kaiwang Chen <kaiwang.chen_at_gmail.com>
Date: Fri, 21 Oct 2011 03:44:25 +0800

2011/10/20 Amos Jeffries <squid3_at_treenet.co.nz>:
> On 20/10/11 20:11, Kaiwang Chen wrote:
>>
>> 2011/10/20 Amos Jeffries<squid3_at_treenet.co.nz>:
>>>
>>> On Thu, 20 Oct 2011 00:39:32 +0800, Kaiwang Chen wrote:
>>>>
>>>> 2011/10/19 Amos Jeffries:
>>>>>
>>>>> On Wed, 19 Oct 2011 05:15:22 +0800, Kaiwang Chen wrote:
>>>
>>> <snip>
>>>>>
>>>>> To only change the HTTP headers, there are some tricks you can do with
>>>>> the
>>>>> "must-revalidate" and/or "proxy-revalidate" cache control. These
>>>>> controls
>>>>> causes the surrogate to contact the origin web server on every request.
>>>>> The
>>>>> origin can send back new headers on a 304 not-modified response.
>>>>> Meaning
>>>>> the
>>>>> headers get changed per-response, but the cached body gets sent only
>>>>> when
>>>>> actually changed. Retaining most of the bandwidth and performance
>>>>> benefits
>>>>> of caching.
>>>>
>>>> So, the possible solution could be injecting a "Cache-Control:
>>>> must-revalidate" header by some eCap reqmod_precache service, then
>>>> Squid will revalidate the response on every request carrying new
>>>> request headers, then the origin server has its chance to set new
>>>> response headers? A little counter-intuitive workaround for class 4
>>>> adaption. Not perfect, since revalidate only occurs only when the
>>>> response is stale,
>>>
>>> That would be 'normal' revalidation operation. Which is why the control
>>> exists and is called must-revalidate. To override the normal operation
>>> and
>>> force revalidation on every request.
>>>
>>> You could set it in a filter module altering the headers. And repeat the
>>> setup on every proxy surrogate as your expand the CDN. It is far easier
>>> to
>>> send it from the origin which is designed to do set these controls very
>>> efficiently and scales perfectly.
>>>
>>
>> So which header forces revalidation on every request, is it cache
>> response directive "Cache-Control: max-age=0, must-revalidate"?
>>
>> Referring to this section of rfc2616,
>> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9,
>> 'Cache-Control: must-revalidate' is a cache-response-directive, and as
>> cited:
>>
>>     When the must-revalidate directive is present in a response
>> received by a cache, that cache MUST NOT use the entry after it
>> becomes stale to respond to a subsequent request without first
>> revalidating it with the origin server. ... In all circumstances an
>> HTTP/1.1 cache MUST obey the must-revalidate directive; in particular,
>> if the cache cannot reach the origin server for any reason, it MUST
>> generate a 504 (Gateway Timeout) response.
>>
>> Well, I have some trouble to understand the following transaction,
>> where the cached response was stale from the client's perspective,
>> Squid really did revalidation and got 304(success, isn't it?),
>> however, the client still got "Revalidation failed" warning... The
>> Squid was configured
>>
>> refresh_pattern .               0       20%     4320
>>
>> which guessed out a relatively long fresh period. And when origin
>> server specifies "Cache-Control: max-age=0, must-revalidate", Squid
>> revalidates on each request and warns the client with "Revalidation
>> failed".
>>
>> //============= client ->  surrogate
>> GET /cgi-bin/index.php HTTP/1.0
>> User-Agent: Wget/1.10.2 (Red Hat modified)
>> Accept: */*
>> Host: my.example.com
>> Connection: Keep-Alive
>> Cache-Control: max-age=10
>>
>> //============= surrogate ->  origin server
>> GET /cgi-bin/index.php HTTP/1.1
>> If-Modified-Since: Wed, 19 Oct 2011 11:00:00 GMT
>> User-Agent: Wget/1.10.2 (Red Hat modified)
>> Accept: */*
>> Host: my.example.com
>> Via: 1.0 s0.example.com (squid/3.1.16)
>> X-Forwarded-For: x.x.x.x
>> Cache-Control: max-age=10
>> Connection: keep-alive
>>
>> //============= origin server ->  surrogate
>> HTTP/1.1 304 Not Modified
>> Date: Thu, 20 Oct 2011 05:26:28 GMT
>> Server: Apache/2.2.3 (CentOS)
>> Connection: close
>> Cache-Control: must-revalidate
>>
>> //============= Surrogate ->  Client
>> HTTP/1.0 200 OK
>> X-Powered-By: PHP/5.1.6
>> Last-Modified: Wed, 19 Oct 2011 11:00:00 GMT
>> Content-Length: 66
>> Content-Type: text/html; charset=UTF-8
>> Date: Thu, 20 Oct 2011 05:26:28 GMT
>> Server: Apache/2.2.3 (CentOS)
>> Cache-Control: must-revalidate
>> Warning: 110 squid/3.1.16 "Response is stale"
>> Warning: 111 squid/3.1.16 "Revalidation failed"
>
> These warnings being present is a bug. The rest of the result is correct.
>
>  The max-age=10 requirement ("nothing more than 10 seconds stale") forces it
> to revalidate since the object it has is around 24hrs old.

No.. max-age has nothing to do with the around-24hrs "age"(resource
age, the amount of time since resource creation or modification, that
is since Wed, 19 Oct 2011 11:00:00 GMT); instead, it is compared to
response age(the mount of time since origin server serve the
transaction, that is since Thu, 20 Oct 2011 05:26:28 GMT). Although
the object(resouce) was around 24-hours old, the response could be
fresh as long as it had left origin server within 10 seconds.

>
>  The origins must-revalidate also forces revalidation.
>
>  The reply to the client should not have the warnings, since the origin has
> indicated that the object is currently valid (304).
>
> refresh_pattern is not relevant. Since there is a Cache-Control header
> present. No estimations need to be made.

Clear.

>
>>>
>>>> while what I am looking for is adapting every
>>>> response before it leaves Squid for the client. 'Cache-Control:
>>>> max-age=0' will force revalidation every response, though.
>>>
>>> Otherwise known as "force reload".
>>> Forces full erasure and new a full new fetch on every request. Not
>>> revalidation.
>>
>> Let's make it clear.. Is the 'Cache-Control: max-age=0' as request
>> header that force full erasure,
>
> No. From the client it simply means revalidate immediately. AND pass on the
> max-age=0 to origin.
>
> Erasure is a side effect of Squid receiving a 200 reply from the
> revalidation check. Nothing more. It is very likely to change when multiple
> variants are cached.

Great! It's clear now! Cited from
http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.6

  We would like the client to use the most recently generated
response, even if older responses are still apparently fresh.

>
>
>> while 'Cache-Control: max-age=0' as
>> response header simply marks pre-expiration and Squid fells free to
>> store a pre-expired response and validates it later when serving next
>> request?
>
> That is correct.
> You will just need to check the Squid release is one of the recent ones
> which cache pre-expired content. Some earlier ones did not.

So latest stable should cache pre-expired content? Do you have any
idea since which release that behavior has been introduced? I yet
verify thoroughly, but looks like some version of Resin will carry
"Cache-Control: no-cache" with each response. It is said that
"Cache-Control: no-cache" is equal to "Cache-Control: max-age=0"; I
guess it'd better carry "Cache-Control: no-store" instead, to avoid
polluting disk cache and to eliminate those I/O, assuming the response
is dynamic.

>
>
>>
>> Looks like a "Surrogate-Control: max-age=0, revalidate" header
>> eliminates the need of a filter module in this case? Not sure about
>> the 'Surrogate-Control: revalidate", since it is not listed in Edge
>> Architecture Specification, http://www.w3.org/TR/edge-arch, referred
>> by http://wiki.squid-cache.org/Features/Surrogate.
>
> Squid ignores unknown ones presently. If you need, it can be extended.
> Although, if you go with max-age=0, revalidate is redundant.

How to configure Squid-3.1.16 behaves as a surrogate conforming to
Edge Architecture Specification, in particular "Surrogate-Control"
overriding "Cache-Control"? I believe only the following directives
were related in squid.conf

http_port 80 vhost
httpd_accel_surrogate_id proxy123.example.com

I made the response left origin server carrying both
"Surrogate-Control: max-age=61" and "Cache-Control: max-age=100", and
found that Squid revalidated only when the response was already 100
seconds old, rather than 61 seconds old. Packet capture shows that it
was not acting as a surrogate because"Surrogate-Control: max-age=61"
leaked to the client. What am I missing?

//============== client -> surrogate
GET /cgi-bin/index.php HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.zongheng.com
Connection: Keep-Alive
Cache-Control: no-cache

//=============== surrogate -> origin server
GET /cgi-bin/index.php HTTP/1.1
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.example.com
Via: 1.0 s0.example.com (squid/3.1.16)
Surrogate-Capability: proxy123.example.com="Surrogate/1.0 ESI/1.0"
X-Forwarded-For: x.x.x.x
Cache-Control: no-cache
Connection: keep-alive

//================ origin server -> surrogate
HTTP/1.1 200 OK
Date: Thu, 20 Oct 2011 19:29:55 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.1.6
Surrogate-Control: max-age=61
Cache-Control: max-age=100
Last-Modified: Wed, 19 Oct 2011 11:00:00 GMT
Content-Length: 66
Connection: close
Content-Type: text/html; charset=UTF-8

<h1>It works!</h1><pre>Last-Modified: Wed, 19 Oct 2011 11:00:00GMT

//=============== surrogate -> client
HTTP/1.0 200 OK
Date: Thu, 20 Oct 2011 19:29:55 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.1.6
Surrogate-Control: max-age=61
Cache-Control: max-age=100
Last-Modified: Wed, 19 Oct 2011 11:00:00 GMT
Content-Length: 66
Content-Type: text/html; charset=UTF-8
X-Cache: MISS from s0.example.com
X-Cache-Lookup: HIT from s0.example.com:80
Via: 1.0 s0.example.com (squid/3.1.16)
Connection: keep-alive

<h1>It works!</h1><pre>Last-Modified: Wed, 19 Oct 2011 11:00:00GMT

>
>>>>
>>>> I also chance read ESI which really resembles class 4 adaption with
>>>> limited capability that only modifies response body. Looks like it is
>>>> incapable of doing custom complex calculation. So Squid does not
>>>> support class 4 adaption in general? Any other alternative?
>>>
>>> ESI, yes is good for personalization of the body. It does not exactly do
>>> calculations. It does widget insertion in to pages for personalization at
>>> the gateway machine. Allowing caching of the page template and widgets
>>> separately within a CDN.
>>>
>>> You were taking about personalizing Cookies etc, which are not part of
>>> the
>>> body content.
>>
>> Sure. A side question: when a surrogate fetches ESI widget, will it
>> carry request headers from client(assuming widget is in same domain to
>> that of the page) and inject response headers before the page is
>> served to client?
>>
>
> I don't think so. It is just a form of body/object macro-expansion. With
> some fancy bits for determining which widget to insert.

Clear.

>
>>
>> So Squid without the adapter will cache one copy of responses in only
>> one encoding.
>
> Yes.
>
>> Will "Vary:Accept-Encoding" request header enable
>>
>> multiply copies?
>
> No. It tells Squid there are multiple variants with the same URL, and to
> check the Accept-Encoding header against the one stored already when
> deciding if it is a HIT.

Clear.

>
>
> Amos
> --
> Please be using
>  Current Stable Squid 2.7.STABLE9 or 3.1.16
>  Beta testers wanted for 3.2.0.13
>

Thanks,
Kaiwang
Received on Thu Oct 20 2011 - 19:44:34 MDT

This archive was generated by hypermail 2.2.0 : Fri Oct 21 2011 - 12:00:03 MDT