RE: Introducing Phil Bogle, proposed additions to Squid

From: Phil Bogle <PhilBo@dont-contact.us>
Date: Mon, 26 Sep 2005 13:53:36 -0700

Thank you very much for your comments and feedback, Henrik.

I should clarify some special aspects about the way we're using Squid.
The only clients of the cache are the crawlers, which are all considered
trusted and which share common aggressive caching assumptions. There
are no intervening caches. These special circumstances reduce some of
the safety concerns that would apply in a more general setting.

Regarding x-always-cache: I believe that the existing refresh_pattern
override is a viable alternative for our needs. However, there are
certain advantages in our environment to allowing clients to control the
policy.

Different developers are adding crawlers for new sites all of the time.
It's somewhat awkward to have multiple developers competing to update
the live configuration on a Squid cache every time we add a site, and to
ensure that these changes are appropriately deployed to all squid cache
instances.

On the other hand, we can update the client code just once so that all
requests originating in the crawl have the appropriate flag set. The
changes to the client headers are checked into revision control and
continue to apply even if we talk to a different Squid cache.

Regarding x-cache-key: as you mention, x-cache-key is a dangerous
directive if there are untrusted clients or the cache belongs to a
hierarchy. It would be very useful if there were a way to accomplish
the same thing through Squid configuration through something akin to the
redirector interface as you describe.

For some of the same reasons mentioned above, there is some
attractiveness in having the option of allowing clients to control cache
key policy through an x-cache-key directive, but it should be turned off
by default.

-----Original Message-----
From: Henrik Nordstrom [mailto:hno@squid-cache.org]
Sent: Saturday, September 24, 2005 4:55 PM
To: Phil Bogle
Cc: squid-dev@squid-cache.org
Subject: Re: Introducing Phil Bogle, proposed additions to Squid

On Tue, 20 Sep 2005, Phil Bogle wrote:

> * The x-always-cache directive overrides all other response headers
> that would otherwise cause Squid not to cache a response (e.g.
Expires).
> Be careful with how you use this header since it could cause
unexpected
> caching.

Doesn't the existing refresh_pattern overrides do the job as well? Or is

there any specific reason you need this on a per-request basis rather
than
based on what is being requested?

> * The x-cache-key directive overrides the actual URL used to fetch
> the content for purposes of determining cache hits. It is typically
used
> to ignore session query string parameters that would otherwise prevent
> caching.

To be safe this logics needs to be moved into the proxy rather than the
client (crawler in your case).

One way I see this done "proper" is similar to the redirector interface
we
have today, but instead of rewriting the URL to be requested only the
cache key URL is rewritten.

Regards
Henrik
Received on Mon Sep 26 2005 - 14:53:40 MDT

This archive was generated by hypermail pre-2.1.9 : Sat Oct 01 2005 - 12:00:05 MDT