Re: [squid-users] Whis this dosn't cache??

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 09 Jul 2013 14:55:51 +1200

On 9/07/2013 7:37 a.m., Eliezer Croitoru wrote:
> Sorry it took me so much time to reply but it seems like either a log
> bug or something else.
> I am unsure about it yet.
> the line in the access.log is:
> 1373311134.462 38 192.168.10.124 TCP_MISS/304 332 GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
> - HIER_DIRECT/88.221.156.163 image/jpeg
>
> Which is kind of not real since it's a TCP_MISS but still a 304.

The log code would be TCP_CLIENT_IMS_MISS or something like that once we
have cleaned the log code types up, for now there is no such code and
Squid still uses TCP_MISS.

The client sent in IMS. There was no cached entry (MISS) and the server
was contacted using the clients IMS headers. The serve replied 304 which
Squid relayed to the client. There is no change to squid cache since; a)
there was no cached entry to begin with, and b) 304 does not provide a
payload object to add to storage.

> If I do a calculation it's like I said before for somebody who asked here.
> There is a TCP_MISS instead of TCP_IMS/304.
> There are the ALL,2 cache.log output:
> ##start
>
> HTTP Client REQUEST:
> ---------
> GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
> HTTP/1.1
> Host: image.slidesharecdn.com
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
> Firefox/22.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en-US,en;q=0.5
> Accept-Encoding: gzip, deflate
> Connection: keep-alive
> If-Modified-Since: Wed, 08 Aug 2012 08:30:54 GMT
> If-None-Match: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
> Cache-Control: max-age=0
>
>
> ----------
> HTTP Server REQUEST:
> ---------
> GET
> /glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
> HTTP/1.1
> Host: image.slidesharecdn.com
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
> Firefox/22.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en-US,en;q=0.5
> Accept-Encoding: gzip, deflate
> If-Modified-Since: Wed, 08 Aug 2012 08:30:54 GMT
> If-None-Match: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
> Cache-Control: max-age=0
> Connection: keep-alive
>
>
> ----------
> HTTP Server REPLY:
> ---------
> HTTP/1.1 304 Not Modified
> Content-Type: image/jpeg
> Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
> ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
> Cache-Control: max-age=31536000
> Date: Mon, 08 Jul 2013 19:19:18 GMT
> Connection: keep-alive
> Vary: Accept-Encoding
>
>
> ----------
> HTTP Client REPLY:
> ---------
> HTTP/1.1 304 Not Modified
> Content-Type: image/jpeg
> Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
> ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
> Cache-Control: max-age=31536000
> Date: Mon, 08 Jul 2013 19:19:18 GMT
> Vary: Accept-Encoding
> X-Cache: MISS from www1.home
> X-Cache-Lookup: MISS from www1.home:3128
> Connection: keep-alive
>
>
> ----------
>
> ##end
> Which shows that the request is IMS from the client with a ETAG on it.
> The request is served from local cache rather then from squid cache
> since the client didn't asked for a response.
> With basic wget and curl I am getting a full TCP_HIT or TCP_IMS_HIT when
> doing the same request over and over again.
> But it seems like the browser is smarter then plain wget and obeys the
> "rules" of cache and tried to verify the ETAG and Last-Modified.

Yes. The browser has a cache, wget and curl do not. That is all.

> Now the refresh_pattern is:
> refresh_pattern ^http://slidesharecdn\.squid\.internal/.* 10080 80%
> 79900 override-expire override-lastmod ignore-no-cache ignore-private
> ignore-reload ignore-must-revalidate ignore-no-store store-stale
> ignore-must-revalidate

There is nothing in the Squid cache and no payload body arriving on a
304. Whether or not it is tested refresh_pattern has nothing to work with.

> Which I am unsure if squid suppose to actually ignore-no-cache or to
> convert it to IMS.

NP: ignore-no-cache is obsolete and not supported by the latest 3.2+
versions of Squid. You should be getting a WARNING out of -k parse about
that.
It is irrelevant to your query though.

> The next log is when I use firefox and do a full refresh which is using
> SHIFT+F5 which is not the same as F5 that I have used and have been seen
> on the logs before:
> ##start
> HTTP Client REQUEST:
> ---------
> GET
> http://image.slidesharecdn.com/glusterorgwebinarant-120126131226-phpapp01/95/slide-13-728.jpg?1329866994
> HTTP/1.1
> Host: image.slidesharecdn.com
> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:22.0) Gecko/20100101
> Firefox/22.0
> Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> Accept-Language: en-US,en;q=0.5
> Accept-Encoding: gzip, deflate
> Connection: keep-alive
> Pragma: no-cache
> Cache-Control: no-cache
>

Notice the absence of If-* headers this time. Shift+F5 is a "reload"
request. Meaning the browser cache entry is ignored and Cache-Control
sent upstream (to Squid) requires only the most up to date content to be
delivered.
  Firefox uses "no-cache", some other agents use "max-age=0" which mean
the same thing. Any receiving cache MUST revalidate (IMS request) its
cached content before responding with a HIT.

However in your scenario there is no cached content stored by Squid. So
the client request is again simply relayed out to the server. This time
however the server responds with a 200 request. Which has a payload
object for Squid to cache.

> ----------
> HTTP Client REPLY:
> ---------
> HTTP/1.1 200 OK
> x-amz-id-2: rzYdkUpXw11w51eJTKynLasRv3HUjy4d45Kj0a2jj7swoWlpps+p3orjUpgiywdd
> x-amz-request-id: 959D56498B76D223
> Last-Modified: Wed, 08 Aug 2012 08:30:54 GMT
> x-amz-version-id: gducYyCJ.73GoIA.WcScH1tx7qj9rkby
> ETag: "ff40d19dc3187ec5643c7c4a2ffcd0a9"
> Accept-Ranges: bytes
> Content-Type: image/jpeg
> Server: AmazonS3
> Vary: Accept-Encoding
> Content-Encoding: gzip
> Content-Length: 59728
> Cache-Control: max-age=31536000
> Date: Mon, 08 Jul 2013 19:26:47 GMT
> X-Cache: MISS from www1.home
> X-Cache-Lookup: MISS from www1.home:3128
> Connection: keep-alive
>
>
> ----------
> 2013/07/08 22:26:23.258 kid1| store_io.cc(34) storeCreate: storeCreate:
> Selected dir 0 for -1_at_-1=0/2/1/1
> ##end
>
> I am not sure if to think of it as a bug or just as it suppose to be...
> So we have two cases which are not the same.

Yes the second case presents a payload body Squid might store. Given the
ignores and overrides on your refresh_pattern it is very likely to be
stored regardless of any "dont store" signals from the server, which
results in a HIT on following traffic.

> The first is suppose to be a 304 and is recognized as a 304 but still
> considered to be a TCP_MISS which is basically a "HIT" and the second

No, both cases you showed above are "basically a MISS". The difference
is whether Squid received a cacheable object from the server or not. In
the second case it does - which allows Squid to get out of the "MISS"
state and start responding with future HITs.

> case when the request is a no-cache with pragma which suppose to be
> served from cache or at least to be treated with 304 towards the server
> and still it stays as a full regular request and then as a TCP_MISS.

There is no cached object. If there was one ignore-must-revalidate would
have made Squid send plain old GET requests to the server instead of IMS
requests - *if* the IMS was generated by *Squid*.

> If there is a need to file a bug just let me know and I will file it
> with all the details.
> If not please help me make sure I understood right the logic.

Hope the above helps explain.

On the whole the aim of Squid is to optimize HTTP bandwidth consumption.
As such we have controls converting reload fetches into IMS (reduces
bandwidth to the server), but not converting client IMS into a reload -
which would present a security vulnerability ("bandwidth amplification"
attack type) if clients could send 300 bytes to Squid and trigger an
unlimited amount of internal bandwidth consumption.

There are two things here which make your cases behavie like they do:

1) there is no cached content already. Not really a problem, but it does
mean you are dealing with MISS on both times instead of HIT.

2) the logging codes are not really as permiable as the log
documentation suggests. Specifically TCP_CLIENT_IMS_* codes do not exist
to clarify the TCP_*/304 case you are seeing.

Amos
Received on Tue Jul 09 2013 - 02:56:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Jul 09 2013 - 12:00:13 MDT