Re: [squid-users] Problem with HTTP Headers

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 13 Nov 2011 00:47:52 +1300

On 12/11/2011 10:30 a.m., Ghassan Gharabli wrote:
> Hello,
>
> I am facing a trouble with Caching HTTP Headers.
>
> Everyday I see that www.facebook.com header is being cached and then I
> try to remove it manually from Cache and so other websites ...

...other websites what?

>
> I tried to add these refresh_patterns before any rule but
> unfortunately with no luck!

Okay some basics. This is a bit complex so if I'm not clear please mention.

There are several algorithms affecting caching.

Firstly is absolute expiry.
  This tells Squid exactly when to erase the object. Down to the second.
Controlled by Expires: header, or a Cache-Control header with private,
no-store, max-age= values.
  As Squid HTTP/1.1 support increases Expires: (a HTTP/1.0 feature) is
getting ignored more often.

Secondly, there are freshness algorithm.
  This tells Squid exactly when the object can be used immediately, or
needs revalidation before use. It is an estimation only.
  Controlled by the Date, Last-Modified headers, with Cache-Control
max-stale, etc mixed in as well.
   This is where refresh_pattern happens, its min/pct/max values are
used to set boundaries in the decision about staleness. The wiki and
refresh_pattern config docs cover exactly how it works, so I wont repeat
it all here.

Thirdly, there are the variant algorithm(s).
  These tell Squid whether the object in cache is relevant to the
request at all or needs to be skipped. Controlled by the ETag, Vary,
Accept, etc.

To complicate things refresh_pattern has ignore-* an doverride-* options
which make Squid ignore the particular header. These are mostly HTTP
violations and can prevent immediate expiry or extend the estimation
well beyond anything that woudl otherwise be chosen.
NOTE: all these options and refresh_pattern itself can only *extend* the
time something is cached for. They cannot and do not prevent caching or
remove things early. refresh_pattern can have the appearance of
shortening cache times, *if* and only if, the object was caused to be
cached that long by another refresh_pattern estimation later down the
list (ie our default 1 week storage time in the "." pattern line).

  To prevent caching use "cache deny ...".

>
> refresh_pattern -i \.(htm|html|jhtml|mhtml|php)(\?.*|$) 0 0% 0
> refresh_pattern ^http:\/\/www\.facebook\.com$ 0 0% 0
>
> REFRESH_PATTERN CONFIG :
> ------------------------------------------------
> # 1 year = 525600 mins, 1 month = 43800 mins
> refresh_pattern
> (get_video|videoplayback|videodownload|\.flv).*(begin|start)\=[1-9][0-9]* 0
> 0% 0
> refresh_pattern -i \.(htm|html|jhtml|mhtml|php)(\?.*|$) 0 0% 0
> refresh_pattern ^http:\/\/www\.facebook\.com$ 0 0% 0

This rule above will never match. Squid uses absolute URLs to pass the
refresh pattern.
Absolute URL have a "/" in the path. *never* will ".com" be the last
four characters of the absolute URL.

Amos
Received on Sat Nov 12 2011 - 11:48:02 MST

This archive was generated by hypermail 2.2.0 : Sun Nov 13 2011 - 12:00:02 MST