Introducing Phil Bogle, proposed additions to Squid

From: Phil Bogle <PhilBo@dont-contact.us>
Date: Tue, 20 Sep 2005 16:33:29 -0700

Hi, my name is Phil Bogle; I work at a company called Jobster
(http://www.workzoo.com, http://www.jobster.com). My blog at
http://thebogles.com/blog tells a bit more about me.

Thank you for your great work on Squid. I have written two small
extensions to Squid that might be of general interest. You might also
have suggestions regarding a more elegant way to satisfy our needs using
existing Squid features; I'm relatively new to Squid. If these features
in fact satisfy an unmet need, I'd like to contribute them to the common
code base.

One of the features we offer at Jobster is vertical search for jobs. We
provide a single form for searching jobs obtained by crawling a number
of differente sites on the web.

We have multiple test and development environments, all crawling the
same sites, which would potentially lead to a lot of duplicate traffic
against these sites. A caching proxy is ideal for eliminated the
duplicated hits against those sites, reducing the load on those sites.

However, many of the sites that we crawl have cache control directives
in their response headers that cause Squid not to cache the response.
Furthermore, many of these sites have session variables in the query
string that should not be considered for purposes of determining whether
a cache hit has occurred. It would absolutely the wrong thing to do to
ignore these factors for a general caching proxy, but for our specific
application we know that it's OK to accept a stale and inexact cached
response.

We therefore define two extended caching directives that can be included
in the Cache-Control request header.

    * The x-always-cache directive overrides all other response headers
that would otherwise cause Squid not to cache a response (e.g. Expires).
Be careful with how you use this header since it could cause unexpected
caching.
    * The x-cache-key directive overrides the actual URL used to fetch
the content for purposes of determining cache hits. It is typically used
to ignore session query string parameters that would otherwise prevent
caching.

These directives are useful when combined with max-stale to specify how
old the response can be before the proxy must refresh it.

Thanks for your time and suggestions.
Received on Tue Sep 20 2005 - 18:27:47 MDT

This archive was generated by hypermail pre-2.1.9 : Sat Oct 01 2005 - 12:00:05 MDT