Re: [squid-users] Squid not retaining cached objects between server restarts

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 27 Jul 2009 10:43:48 +1200

On Sun, 26 Jul 2009 07:37:19 -0700 (PDT), RynoDJ <rdejager_at_icon.co.za>
wrote:
> Hi,
>
> As a rule squid works pretty well, except that it seems to 'loose'
objects
> from the cache when the machine is restarted and or after a few
hours/days.
> It then re-downloads files that have not changed. All I need is a simple
> setup that will cache as much as possible (use as much of the cache size
as
> possible) and only download files when they've changed. I do lots of
Linux
> re-installs inside VMs and I'd like to source updates from the cache
> instead
> of downloading the same RPMs over and over again.

This sounds like normal proxy behaviour. Objects do not have unlimited
existences.

 * Web servers often send information indicating when objects are to be
replaced, and if not Squid makes a guess and checks for new ones.
 * Some Servers send back a whole new object whether they need to or not
when Squid merely asks if its changed.
 * The amount of space you have available is not unlimited, garbage
collection normally throws out old objects which may or may not be usable
when more space is needed.
 * Forcing a fast shutdown/restart and/or lack of disk space can prevent
squid saving in-memory objects to disk. Affected objects are thus lost
until re-fetched.
 * Some versions of Squid (older than mid-2.6 era, and all Squid-3) do not
handle variants (ie compressed or non-compressed) versions of objects well,
and will discard the stored copy if a new variant is needed.
 * I've heard a few people mention that RPMs come from changing URLs, this
will cause a re-fetch for each unique URL where it exists. The
storeurl_rewrite feature from 2.7 is needed to evade that problem.

If you want to know why squid is not saving a particular URL, visit
www.redbot.org and enter the URL there (needs to be publicly available).
If the report there indicates it should be cacheable when its actually
thrown away, look closer at your logs for a reason why its being discarded.

>
> Could someone perhaps tell me what I need to change in my conf file?
>
>
> Thanks
>
>
> http_port 3128 transparent

I advise not using port 3128 for interception. The regular proxy traffic
coming in will be trying to do NAT lookups and URL changes all the time.
See CVE-2009-0801 for the security issues.

> hierarchy_stoplist cgi-bin ?

> acl QUERY urlpath_regex cgi-bin \?
> no_cache deny QUERY

Assuming you have Squid-2.6 or higher: remove the two QUERY lines above.

> cache_replacement_policy heap LFUDA

The above means that objects (stale or not) which have not been asked for
longest will be removed on garbage collection. This may be related to the
object loss you are noticing. It may be worth looking up the meanings of
this and alternatives to see which best matches what you want.

> cache_dir diskd /var/spool/squid 10240 16 256
> cache_store_log none

> auth_param basic children 5
> auth_param basic realm Squid proxy-caching web server
> auth_param basic credentialsttl 2 hours

Authentication will not work for intercepted requests (ie anything arriving
on a "transparent" marked port).
It also appears not to be used by your controls. You can save yourself some
startup/shutdown delays by removing the above.

> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440

Add a new pattern here to help with dynamic objects:
  refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

> refresh_pattern . 0 20% 4320

For more aggressive caching you can also add "reload-into-ims" to the "."
pattern if your squid supports it.

> half_closed_clients off
> acl all src 0.0.0.0/0.0.0.0
> acl manager proto cache_object
> acl localhost src 127.0.0.1/255.255.255.255
> acl to_localhost dst 127.0.0.0/8
> acl localnet src 10.0.0.0/8 # RFC1918 possible internal network
> acl localnet src 172.16.0.0/12 # RFC1918 possible internal network
> acl localnet src 192.168.0.0/16 # RFC1918 possible internal network
> acl SSL_ports port 443 563
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 563 # https, snews
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
> http_access allow manager localhost
> http_access deny manager
> http_access deny !Safe_ports
> http_access deny CONNECT !SSL_ports
> http_access deny to_localhost

> acl mynetwork src 192.168.100.0/255.255.255.0

The idea of adding "localnet" to the config was that you place your own
network ranges under that name.
There is no need for both a "localnet" and a "mynetwork" ACL. you can
remove the localnet defaults.

> http_access allow mynetwork
> http_access allow localnet
> http_access allow localhost
> http_reply_access allow all
> icp_access allow all
> visible_hostname myfirewall_at_mydomain.com

The above is not a fully qualified domain name.
It should look something like this: myfirewall.mydomain.com
and have public rDNS available for people to find your IP and related
contacts when things go wrong.

> append_domain .homeland.net

Squid newer than 2.6 should be pulling this from /etc/resolve.conf
properly.
I think the latest 2.6 do as well, but am not completely sure of that.

> err_html_text admin_at_mydomain.com
> deny_info ERR_CUSTOM_ACCESS_DENIED all
> memory_pools off
> coredump_dir /var/spool/squid
> ie_refresh on
> maximum_object_size 800 MB

HTH
Amos
Received on Sun Jul 26 2009 - 22:43:53 MDT

This archive was generated by hypermail 2.2.0 : Mon Jul 27 2009 - 12:00:05 MDT