Re: [squid-users] Squid caching SSI pages?

From: Greg Swallow <gswallow@dont-contact.us>
Date: Thu, 01 Apr 2004 10:26:42 -0500

Henrik Nordstrom wrote:
> On Wed, 31 Mar 2004, Greg Swallow wrote:
>
>
>>Assuming documentation at
>>http://www.bowiesnyder.com/writings/caching_shtml.htm is true, my
>>squid-based reverse proxies shouldn't be caching HTML documents that are
>>server parsed. However, it appears that the four reverse proxies I have
>>sitting in front of my webservers are?
>
>
> How the page was generated on the server does not matter that much other
> than that server generated pages usually does not have caching
> information, what matters is the HTTP headers of the response and to some
> extent your refresh_pattern settings in squid.conf.
>
> Some good references to understand these concepts better:
>
> Caching Tutorial for Web Authors and Webmasters
> <url:http://www.mnot.net/cache_docs/>
>
> Cacheability Engine
> <url:http://www.mnot.net/cacheability/>

Wow, those are kick-ass docs. I'll have to bookmark them.

Let me give you a little more background. The closest related
configuration directives I can find are:

cache_peer 157.91.12.68 sibling 80 3130 proxy-only
cache_peer 157.91.12.70 sibling 80 3130 proxy-only
cache_peer 157.91.12.71 sibling 80 3130 proxy-only
refresh_pattern \.cfm$ 0 0% 0
refresh_pattern \.asp$ 0 0% 0
refresh_pattern \.aspx$ 0 0% 0
refresh_pattern . 59 20% 240

These (along with the 59 minutes minimum, above) are new, as of
upgrading to 2.5-stable4:

digest_generation on
digest_rebuild_period 600 seconds
digest_rewrite_period 3600 seconds
refresh_pattern -i \.pdf$ 59 20% 240 reload-into-ims
override-lastmod override-expire

squid.conf is functionally identical on all four systems -- I've checked
them with diff.

The page in question is our main index page: http://www.IN.gov/
According to the cache-docs page I read, this page is not cacheable when:

There are no validators (ETags or Last-Modified headers. Maybe
Cache-control?)

The content in question, in our page, is:

<div id="bannerimage"> <div id="amberalert"><!--#include
virtual="/amber/include.html"--></div>

This include is either a zero-length file or a graphic, depending on
conditions.

So I used the caching engine to test our page, but I'm afraid the
results are skewed, since that's visiting our cache. The caching engine
reports that my page isn't cacheable, but I decided to do some testing
with curl instead.

curling my own site (no SSI):

curl -z 040113342004.00 -D headers http://www.netgawds.com/

HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 14:53:08 GMT
Server: Apache/1.3.29 (Unix) PHP/4.3.4 mod_ssl/2.8.16 OpenSSL/0.9.7c
Last-Modified: Wed, 31 Mar 2004 03:58:20 GMT
ETag: "18a480-289e-406a41dc"
Accept-Ranges: bytes
Content-Length: 10398
Connection: close
Content-Type: text/html

curl -z 040113342004.00 -D headers http://www.IN.gov/ (direct to the
origin server -- inside a firewall)

HTTP/1.1 200 OK
Date: Thu, 01 Apr 2004 15:08:51 GMT
Server: Apache
Connection: close
Content-Type: text/html

You're right about the 304's. However, I'm still a bit confused about
why this is happening:

cache1# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print
$NF}' | sort | uniq -c | sort
    1 TCP_HIT:NONE
  153 TCP_IMS_HIT:NONE
  169 TCP_MISS:DIRECT
  480 TCP_MISS:NONE
4456 TCP_MEM_HIT:NONE

cache2# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print
$NF}' | sort | uniq -c | sort
  176 TCP_MISS:DIRECT
1186 TCP_MISS:CD_SIBLING_HIT

cache3# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print
$NF}' | sort | uniq -c | sort
  162 TCP_MISS:DIRECT
1118 TCP_MISS:CD_SIBLING_HIT

cache4# grep 'GET http:\/\/www.in.gov\/ HTTP' access.log | awk '{print
$NF}' | sort | uniq -c | sort
  161 TCP_MISS:DIRECT
1205 TCP_MISS:CD_SIBLING_HIT

On the bright side, my hit ratios are better than they ever have been
before :)

Hey, BTW, how "stable" is Squid 3? I'd like to start using ESI within
the year.

-- 
+--------------+------+----------------------+---------------+
| Greg Swallow | CCNA | System Administrator | accessIndiana |
+--(http://www.IN.gov/)----------------------(888.4IN.EGOV)--+
**********************************************************************
CONFIDENTIALITY NOTICE: This E-mail and any attachments are
confidential.  If you are not the intended recipient, you do not have
permission to disclose, copy, distribute, or open any attachments.
If you have received this E-mail in error, please notify us
immediately by returning it to the sender and delete this copy from
your system.
Thank you.
accessIndiana, MyLocal.IN.gov, CivicNet
**********************************************************************
Received on Thu Apr 01 2004 - 08:28:04 MST

This archive was generated by hypermail pre-2.1.9 : Fri Apr 30 2004 - 12:00:01 MDT