Re: [squid-users] Can Squid cache literally all HTTP responses for testing? from Amos Jeffries on 2012-11-16 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 17 Nov 2012 10:43:32 +1300

On 17/11/2012 9:54 a.m., Kevin Nardi wrote:
> Hi squid-users,
>
> I'm an experienced web developer who is using Squid for the first
> time. For internal testing, we need a stable cache of a certain list
> of sites (which we do not own) that we use in our test. I know Squid
> isn't built to do this, but I thought for sure it would be possible to
> configure it to cache literally all HTTP responses and then use those
> for all requests.

Squid caches objects both literally (full object and meta data) and
temporaly (time-oriented). HTTP itself is stateless and contains both
entity variation and a high temporality for those representation
variations. Which means each object in cache is just one instance from a
*set* of response objects which are *all* represented by the one URL.

If you use a proxy cache like Squid as the data source for this type of
testing you will get false test results.

You need a web service setup to present the expected answer for each of
your requests. For testing Squid we use Co-Advisor or HTTP compliance
testing, and custom server scripts to respond with fixed output on
certain requests. Polygraph is also in the mix there sometimes for
throwing traffic load like you want through the system - but is more
oriented at testing server systems than client ones AFAIK.

> Here is my very simple Squid 3.1 config that is
> intended to do that:
>
>
> ===================================================
> offline_mode on
>
> refresh_pattern . 525600 100% 525600 override-expire override-lastmod
> ignore-reload ignore-no-cache ignore-no-store ignore-must-revalidate
> ignore-private ignore-auth
> vary_ignore_expire on
> minimum_expiry_time 99 years
> minimum_object_size 0 bytes
> maximum_object_size 1024 MB
>
> cache_effective_user _myusername
>
> http_access allow all
>
> coredump_dir /usr/local/squid/var/logs
>
> strip_query_terms off
>
> url_rewrite_access allow all
> url_rewrite_program /usr/local/squid/similar_url_rewrite.rb
> url_rewrite_concurrency 10
> url_rewrite_children 10
>
> cache_dir ufs /usr/local/squid/caches/gm 5000 16 256
> http_port 8082
> pid_filename /usr/local/squid/var/run/gm.pid
>
> access_log /usr/local/squid/var/logs/access-gm.log
> cache_log /usr/local/squid/var/logs/cache-gm.log
> ===================================================
>
>
> As you can see, I am intelligently rewriting URLs to always match URLs
> that I know should be in the cache because I've hit them before. I
> find that my hit rate is still only about 56%, and that is mostly 304
> IMS hits.

URL != object.

Also, Squid is only rated to 50% HIT rate for forward-proxy HTTP traffic
- often a lot less. Getting above that is a rather good outcome (when
ignoring response accuracy).

> I have been unable to find sufficient documentation or debug
> logging to explain why Squid would still not cache some requests.

HTTP uses *a lot* more than URL to determine the suitable response
representation. All of those headers which you use refresh_pattern to
ignore are how Squid identifies object X from object Y when both are at
the same URL.
That ignore-no-store and ignore-private are particularly dangerous for
you since it is an explicit *removal* of permission for those responses
to be stored even temporarily to disk. They are private and clearly
marked as such by the owner - storing them is actually illegal in most
of the world.

What are you testing? (the client software)

And what are the site profiles for your test sites?
(static / dynamic content proportions? personalization amount and
types? Vary: header? highly variable dynamic content? at what update
rate/frequency? is the server performing refresh properly (304 versus
useless 200 responses)?)

Amos
Received on Fri Nov 16 2012 - 21:43:47 MST

This archive was generated by hypermail 2.2.0 : Sat Nov 17 2012 - 12:00:05 MST