Re: [squid-users] An example to squid cache affecting user-agents(Firefox,Chrome,wget\curl)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 12 Jul 2013 02:03:20 +1200

On 11/07/2013 11:40 p.m., Eliezer Croitoru wrote:
> I have been testing quite some time some urls for cachability.
> It seems like there are different methods to request the same file which
> leads to different reaction in squid and I want to make sure 100% what
> is the cause to the *problem* before I am running to a conclusion since
> I am not 100% sure.
> Please take your *free* time to read it and see if there is something I
> probably missed with hope to understand the issue in hands.\

As you probably noticed from the final diagnois of the last weird case
you brought up it can be important to consider both pairs of
request/reply between both client-squid and squid-server. Either side of
Squid can affect the overall transaction behaviour...

Can you state exactly what the problem is up front? that is a little
unclear from your text.

> Thanks Ahead,
> Eliezer
>
> I have tried to use wget\curl or firefox and chrome which gave me
> another reaction from squid and I want to make sure what the cause for it.
> using simple wget of two requests I am getting:
> 1373541195.850 743 192.168.10.124 TCP_MISS/200 85865 GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-29-728.jpg?132986699
> - HIER_DIRECT/88.221.156.163 image/jpeg
> 1373541220.437 4 192.168.10.124 TCP_MEM_HIT/200 85737 GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-29-728.jpg?132986699
> - HIER_NONE/- image/jpeg
>
>
> which is a success caching\HIT.
> in this request the headers are:
> ---------
> GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-11-728.jpg?1329866994
> HTTP/1.1
> User-Agent: Wget/1.14 (linux-gnu)
> Accept: */*
> Host: image.slidesharecdn.com
> Connection: Close
> Proxy-Connection: Keep-Alive
>

Bug #1 (in the client): Proxy-Connection is an obsolete header but in no
sane system should it ever directly contradict the Connection header
like that.

> ----------
> The response is:
> ---------
> HTTP/1.1 200 OK
> x-amz-id-2: wQGOvCvBOH4nVmOEbu1UMJ+Kxv4a4v/9oGpyWnIYy8WRtBL6ZAx2yQtZ0T5u3sfr
> x-amz-request-id: 2F83F33589002A74
> Last-Modified: Wed, 08 Aug 2012 08:30:58 GMT
> x-amz-version-id: _9hthq6oqnMYSuZCVxGCF1sN5VJtYebW
> ETag: "cd5970b95914bd43a88a021b78d2f67b"
> Content-Type: image/jpeg
> Server: AmazonS3
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:18:48 GMT
> X-Cache: MISS from www1.home
> X-Cache-Lookup: MISS from www1.home:3128
> Transfer-Encoding: chunked
> Connection: keep-alive

Is this the Squid response sent to client after the above request?
If so that would be Bug #2: The client expicitly sent "Connection:close"
and Squid should be obeying that.
If that is the server->squid response there is no bug a the connection
persistence is separate.

> ----------
> and on the second time
> ---------
> HTTP/1.1 200 OK
> x-amz-id-2: wQGOvCvBOH4nVmOEbu1UMJ+Kxv4a4v/9oGpyWnIYy8WRtBL6ZAx2yQtZ0T5u3sfr
> x-amz-request-id: 2F83F33589002A74
> Last-Modified: Wed, 08 Aug 2012 08:30:58 GMT
> x-amz-version-id: _9hthq6oqnMYSuZCVxGCF1sN5VJtYebW
> ETag: "cd5970b95914bd43a88a021b78d2f67b"
> Content-Type: image/jpeg
> Server: AmazonS3
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:18:48 GMT
> Age: 347
> X-Cache: HIT from www1.home
> X-Cache-Lookup: HIT from www1.home:3128
> Transfer-Encoding: chunked
> Connection: keep-alive
>
>
> ----------
> Which makes it a HIT.
> Then there is nothing that seems wrong to the application server way of
> doing things and basic squid internals.
> While on chrome and firefox there is something different in the request:
> ---------
> GET
> /glusterorgwebinarant-120126131226-phpapp01/95/slide-4-728.jpg?1329866994 HTTP/1.1
> Host: image.slidesharecdn.com
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,
> like Gecko) Chrome/28.0.1500.71 Safari/537.36
> Accept-Encoding: gzip,deflate,sdch
> Accept-Language: en-US,en;q=0.8
> Cache-Control: max-age=4794000
> Connection: keep-alive
>
>
> ----------
> that results in a respond:
> ---------
> HTTP/1.1 200 OK
> x-amz-id-2: VmcmoZnkiG7I/OEc+VJxJJKS7fnsu+BCqEw4NqVuMC7ckHl+DEYidi4P1d1vflRK
> x-amz-request-id: BC59D681FF091B4E
> Last-Modified: Wed, 08 Aug 2012 08:30:56 GMT
> x-amz-version-id: kCNUG8l6HMz03fgYIbYHlsGJmzD3CplD
> ETag: "4a351b56fb96496224d67ae752c75386"
> Accept-Ranges: bytes
> Content-Type: image/jpeg
> Server: AmazonS3
> Vary: Accept-Encoding

Bug #3 (in the server): Vary header has suddenly appeared. It should be
sent on all responses to this URL regardess of whether the variant
headers existed in the client response.

This object will be cached under the store location:
hash(URL)+hash("gzip,deflate,sdch")

> Content-Encoding: gzip
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:26:53 GMT
> Content-Length: 48511
> X-Cache: MISS from www1.home
> X-Cache-Lookup: MISS from www1.home:3128
> Connection: keep-alive
>
>
> ---------
>
> while the next chrome request treated as a 304:
> ---------
> HTTP/1.1 304 Not Modified
> Content-Type: image/jpeg
> Last-Modified: Wed, 08 Aug 2012 08:30:56 GMT
> ETag: "4a351b56fb96496224d67ae752c75386"
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:29:00 GMT
> Connection: keep-alive
> Vary: Accept-Encoding

Object is cacheable for a while and Chrome is requesting using
"Accept-Encoding:gzip,deflate,sdch" which allows it to locate the Vary
cached object at hash(URL)+hash("gzip,deflate,sdch").

There is no "Server:" header in the 304 indicating which service
produced the 304 reply.

>
> ----------
> <...>
> HTTP Client REPLY:
> ---------
> HTTP/1.1 304 Not Modified
> Content-Type: image/jpeg
> Last-Modified: Wed, 08 Aug 2012 08:30:56 GMT
> ETag: "4a351b56fb96496224d67ae752c75386"
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:29:00 GMT
> Vary: Accept-Encoding
> X-Cache: MISS from www1.home
> X-Cache-Lookup: MISS from www1.home:3128
> Connection: keep-alive
>
>
> ----------
> So the application server responds to the 304.. and not squid since
> squid is obligated to respond with a valid http response.
> So chrome verifies that his local cache is valid and its fine.
>
> The next scenario is when chrome force a no-cache in the Cache-Control
> header.
> ---------
> GET
> /glusterorgwebinarant-120126131226-phpapp01/95/slide-4-728.jpg?1329866994 HTTP/1.1
> Host: image.slidesharecdn.com
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Pragma: no-cache
> User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,
> like Gecko) Chrome/28.0.1500.71 Safari/537.36
> Accept-Encoding: gzip,deflate,sdch
> Accept-Language: en-US,en;q=0.8
> Cache-Control: no-cache
> Connection: keep-alive
>
>
> ----------
> <...>
> HTTP Server REPLY:
> ---------
> HTTP/1.1 200 OK
> x-amz-id-2: VmcmoZnkiG7I/OEc+VJxJJKS7fnsu+BCqEw4NqVuMC7ckHl+DEYidi4P1d1vflRK
> x-amz-request-id: BC59D681FF091B4E
> Last-Modified: Wed, 08 Aug 2012 08:30:56 GMT
> x-amz-version-id: kCNUG8l6HMz03fgYIbYHlsGJmzD3CplD
> ETag: "4a351b56fb96496224d67ae752c75386"
> Accept-Ranges: bytes
> Content-Type: image/jpeg
> Server: AmazonS3
> Vary: Accept-Encoding
> Content-Encoding: gzip
> Content-Length: 48511
> Cache-Control: max-age=31536000
> Date: Thu, 11 Jul 2013 11:34:03 GMT
> Connection: keep-alive
>
> �
> ----------
> I am not sure but I want to debug this issue if there is one.
> The request should have been served from cache since the refresh_pattern
> is pretty explicit about it.

I explicitly made "ignore-no-cache" refresh_pattern option obsolete when
upgrading the no-cache support in Squid-3.2 and later.

For several reasons;
1) it never actually applied on request header no-cache like that one.
The http_port "ignore-cc" option available to reverse proxy
installations does that with varied success or problems resulting
depending on the site behaviour.

2) no-cache in the reply means revalidate with the server before using
cached copy. Revalidate ensures accurate content is delivered and
without much bandwidth usage when teh server has IMS request support.

3) no-cache permits responses fetched by authenticated users be cached.
Ignoring the no-cache revalidation requirement in that situation is a
bit dangerous. Ignoring the no-cache and making those auth responses
uncacheable again contradicts the common usage of "ignore-no-cache" to
increase caching of objects.

4) The refresh_pattern is used widely on older Squid to enable caching
of the responses with "no-cache" in it, since those versions would treat
it as an alternative of "no-store". Now that Squid is treating it as an
alternative to "must-revalidate" the benefit to those caches is gone.
The behaviour change seen by them would be simply the enhanced dangerous
side effects of (3).

> I know how to read the headers and what is suppose to be but I am a bit
> confused and unable to reach the right conclusion to the root of why
> squid will treat the wget request differently from chrome requests.
> Any new point of view will help me.

Probably the server lack of Vary on the wget responses. If not that the
max-age requirement sent in by Chrome on its fetches.

Could also be Bug #4 in Squid: the Vary being seen as MISS in recent
Squid releases. Which we have not yet dug out of the sources.

Amos
Received on Thu Jul 11 2013 - 14:03:29 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 11 2013 - 12:00:24 MDT