[squid-users] I would like to use Squid for caching but it is imperative that all files be cached.

From: Sheridan \ <dan_at_web-mail.eclipse.co.uk>
Date: Thu, 21 Apr 2011 17:57:01 +0100

First I will explain what I am trying to do.

I have a number of tests (executables and scripts) which run on
resources downloaded via HTTP, FTP etc. Some of these tests are third
party compiled executables which would be problematic to change. The
resources can potentially be any type of file and have different file
extensions. Some URLs for these files have query strings. Tests can
download resources in any order, there is no way to tell which test will
download any given file first. I have no control at all over the
resources tested. The tests run on a server which is used for nothing
else but running these tests (no human web browsing). It is imperative
that all tests are run on identical files for each URL. If the file
changes the tests will be inconsistent.

Therefore it is imperative that all files be cached regardless of
anything. I would like to use Squid for this caching. The only things
that should not be cached are HTTP response codes Internal Error 500,
Service temporarily overloaded 502 and suchlike, where it is better to
have some tests run rather than none in the case of a temporary server
error. I guess it would be to much to ask to be able to cache over HTTPS.

I have tried to configure Squid 2.7.Stable9 to achieve caching of
everything regardless. These are the changes made to the default
configuration file supplied with the default distribution on Ubuntu
Server 10.10:

< # http_access deny all

> http_access allow all

< # hierarchy_stoplist cgi-bin ?

> hierarchy_stoplist never_direct

< refresh_pattern ^ftp: 1440 100% 10080

< refresh_pattern ^gopher: 1440 100% 1440

< #refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

< #refresh_pattern (Release|Package(.gz)*)$ 0 20% 2880

< # example line deb packages

< #refresh_pattern (\.deb|\.udeb)$ 129600 100% 129600

< refresh_pattern . 1440 100% 4320

> refresh_pattern .* 1440 100% 4320 ignore-no-cache ignore-private
ignore-auth override-expire reload-into-ims

With these settings and a fully primed cache I still get entries like
this in my access.log file:

1303398515.769 120 192.168.1.8 TCP_MISS/200 7174 GET
http://domain.com/resource.php - DIRECT/83.223.106.8 text/html

1303398524.140 80 192.168.1.8 TCP_MISS/200 521 HEAD
http://domain.com/resource.php - DIRECT/83.223.106.8 text/html

1303398524.536 118 192.168.1.8 TCP_MISS/200 7174 GET
http://domain.com/resource.php - DIRECT/83.223.106.8 text/html

1303398532.671 118 192.168.1.8 TCP_MISS/200 7174 GET
http://domain.com/resource.php - DIRECT/83.223.106.8 text/html

Also even with an URL containing a “?” (even for an HTML file which
otherwise caches) I get :

1303398589.824 98 192.168.1.8 TCP_MISS/200 440 HEAD
http://domain.com/resource.html? - DIRECT/83.223.106.8 text/html

1303398590.117 141 192.168.1.8 TCP_MISS/200 2665 GET
http://domain.com/resource.html? - DIRECT/83.223.106.8 text/html

Can anybody advise if it is possible to achieve what I intend with
changes to configuration only or can somebody point my to a starting
point where I can change Squid source code?
Received on Thu Apr 21 2011 - 16:57:35 MDT

This archive was generated by hypermail 2.2.0 : Thu Apr 28 2011 - 12:00:03 MDT