Re: [squid-users] Forcing TCP_REFRESH_HIT to be answered from cache from Amos Jeffries on 2010-07-15 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 15 Jul 2010 20:36:45 +1200

dererk_at_mail.buenosaireslibre.org wrote:
> Hi everyone!
>
> I'm running a reverse proxy (1) to help my httpd to serve content fast
> and
> avoid going to the origin as much as possible.
> Doing that, I found I made a _lot_ of TCP_REFRESH_HIT requests to

First off lets get the terminology clear:

REFRESH_HIT means the cached copy was sent to the client. But the
origin was tested to make sure it was correct.

REFRESH_MISS means the above was tried but during the IMS check the
web server pushed a new copy out to be sent the client.

Things to check which can cause this to happen a lot:

* The web server sending Cache-Control: must-revalidate. Check for and
remove it where possible. It forces every single request to be a
REFRESH_* instead of a nice cache HIT.

* Invalid date formats in the HTTP reply headers. They break the
staleness checks and cause their headers to be ignored. Some (ie
Expires) are required to be interpreted as instant staleness.

* Client Cache-Control headers. There is little that can be done to
avoid these. The refresh_pattern reload-into-ims option is about all I
think.

Having skipped ahead and read your problem. This is what I think you
need to start with:

* Send Expires: header just under a year in advance (ie 364 days 23
hours). Then make sure your proxy caches obey the Expires: header (ie
remove all copies of overide-expires).

The rest depends on which version of Squid you have:

With Squid-2.7 you have stale-while-revalidate and stale-if-error
options. These can be sent by the web server to permit your proxy giving
a fast but potentially stale response to the clients. An IMS refresh
will still happen, but will be done in the background without affecting
any of the clients response times.

Squid-3 brings the Surrogate-Control feature for the web server to
send a completely different set of Cache-Control options to your
reverse-proxy. But the stale-* features are not yet ported. I think they
will be of more use meeting your stated requirements.

You can fine-tune further with the ignore-* settings to ignore the
headers sent by clients which reduce the HIT ratio. I'm not quite savvy
enough to point at the full set.
Starting with reload-into-ims is the first one. I'd recommend looking
at the reply headers sent by the web server and removing any expiry
related settings which will cause problems. Then looking at the client
request headers coming in to Squid and see what can be done on that front.

> origin, although I've an insane 10-year-long expiration date set on my
> http response headers back to squid.

That might be part of the problem. RFC 2616 defines a limit of 1 year
offset for valid expiry dates. Your date may be discarded from
consideration due to its insanity.

>
> Although I did verify that, using wget -S and some fancies tcpdump
> lines,
> I wanted to get rid of any TCP_REFRESH_HIT request, main reason is
> because there's no way some objects change, so requesting for freshness
> has no sense moreover increases server load (1/7 are refresh_hit's).

These types of objects is what Expires: exists for. see above.

If you continue to use the freshness algorithms then make darn sure the
objects never get their on-disk timestamps touched. The IMS algorithm
percentage extends the period between freshness checks in a exponential
scale from time of Last-Modified header with powers of the pct% setting.

The values only matter when freshness needs to be estimated.

>
> I used refresh_pattern with override-expire and extremely high values
> for min and max values, with absolutely no effect.

Note that the config values are in minutes and for use are converted to
seconds. There is a point where insanely high values are accepted as
valid signed minutes but wrap and become negative when multiplied into
seconds. This happens from about 7* *** *** and is not checked by Squid
beyond negatives being rounded up to zero.

The values only matter when freshness needs to be estimated.

>
> For the record, If I use offline_mode I obtain partially what I wanted,
> unfortunately I loose the flexibility of the regex capacity that
> refresh_pattern has, which I need for avoiding special objects.

offline_mode is badly named. It means aggressive caching.

There has been a lot of work done in making that type of caching normal.
The very latest 2.7 and 3.1 releases go a long way towards it, but for
even more caching capability you want the development code in 2.HEAD or
3.HEAD.

>
> I've enabled debug for a blink of an eye, and got a request that goes as
> TCP_REFRESH_HIT, and as for what I understand, appears to be answered as
> being stale and requested back to origin.
>
> 2010/07/14 13:35:58| parseHttpRequest: Complete request received
> 2010/07/14 13:35:58| removing 1462 bytes; conn->in.offset = 0
> 2010/07/14 13:35:58| clientSetKeepaliveFlag: http_ver = 1.0
> 2010/07/14 13:35:58| clientSetKeepaliveFlag: method = GET
> 2010/07/14 13:35:58| clientRedirectStart: 'http://foobar.com/object'
> 2010/07/14 13:35:58| clientRedirectDone: 'http://foobar.com/object'
> result=NULL
> 2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_NOCACHE = NOT
> SET
> 2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_CACHABLE = SET
> 2010/07/14 13:35:58| clientInterpretRequestHeaders: REQ_HIERARCHICAL =
> SET
> 2010/07/14 13:35:58| clientProcessRequest: GET
> 'http://foobar.com/object'
> 2010/07/14 13:35:58| clientProcessRequest2: storeGet() MISS
> 2010/07/14 13:35:58| clientProcessRequest: TCP_MISS for
> 'http://foobar.com/object'
> 2010/07/14 13:35:58| clientProcessMiss: 'GET http://foobar.com/object'
> 2010/07/14 13:35:58| clientCacheHit: http://foobar.com/object = 200
> 2010/07/14 13:35:58| clientCacheHit: refreshCheckHTTPStale returned 1
> 2010/07/14 13:35:58| clientCacheHit: in refreshCheck() block
> 2010/07/14 13:35:58| clientProcessExpired: 'http://foobar.com/object'
> 2010/07/14 13:35:58| clientProcessExpired: lastmod -1
> 2010/07/14 13:35:58| clientReadRequest: FD 84: reading request...
> 2010/07/14 13:35:58| parseHttpRequest: Method is 'GET'
> 2010/07/14 13:35:58| parseHttpRequest: URI is '/object'
>
> In the way of checking anything to get some effect, I also gived a try
> to ignore-stale-while-revalidate override-lastmod override-expire
> ignore-reload ignore-no-cache, pushed refresh_stale_hit high in the sky,
> and again, no effects :-(

stale-while-revalidate is an HTTP control which web server send to
permits your proxy to do the refresh part in the background without
doing that "block" and slowing down the users response even if they get
a slightly stale object for a short while.

Setting ignore-stale-while-revalidate does the opposite of what you
say you want.

FYI: stale-if-error is its twin and keeps the proxy serving data from
cache if the web server dies completely or starts sending back fatal 5xx
replies. Both good things to have on a reverse proxy.

>
> What I'm doing wrong? Is there any other way to avoid REFRESH_HITs from
> being performed?

Only the extreme: don't use a proxy. That way all requests are direct
client requests and there is no cache to be updated with new info.

Correct use and handling of Expires: is designed for your type of
very-long-aged objects.

stale-while-revalidate is designed for shorter more dynamic objects
which also need to be served without a blocking lag.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.5

Received on Thu Jul 15 2010 - 08:36:55 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 15 2010 - 12:00:04 MDT