Re: [squid-users] TCP_MISS/504 after UDP_HIT - from sibling squid

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 08 Oct 2010 15:57:09 +1300

On 08/10/10 05:42, Adrian Dascalu wrote:
> Hi to all squid users!
>
> I'm new to this list so please hold the big guns.

The problem you outline is discussed last in this reply. I've taken the
opportunity to comment on the config improvements possible all the way down.

>
> Here's my setup:
>
> 1. Using Squid squid-2.6.STABLE6-5.el5_1.3 (pinned at this version since all newer ones will eventually stop responding with 100%cpu. But this could be the subject of another post on this list)
> 2. 2 servers in a heartbeat cluster. 192.168.2.1-2 are the IPs used for the internal communication in the cluster.
> 3. The requests come to Apache server who passes them to squid on the localhost.

Squid is designed to be used the other way around.
The only reason I'm aware of for placing Apache out front it to map URL
to Zopes weird virtual hosting URI space. You appear to be using squirm
to do this instead.
  Is there another reason I'm not aware of?

> 4. The squids are configured to use the other squid as sibling and webserver instances from both servers as parents. ICP is used in all cases (the webservers will always reply MISS but the fastest to reply to ICP is probably the less busy and closest)
>
> My squid config looks like this:
>
> ********************************************************************
> cache_effective_user squid
> cache_effective_group squid
> http_port 192.168.2.2:3128 transparent
> http_port 127.0.0.1:3128 transparent

Are you receiving regular ISP-type traffic from internal PCs at this Squid?
The rest of your config indicates only some administrative channel. As
such you can drop the "transparent" security hole (and slow NAT
lookups!) and use "accel" etc instead.

NP: "accel" automatically turns on "never_direct deny all"

> icp_port 3130
> udp_incoming_address 192.168.2.2
> cache_dir ufs /var/spool/squid 20000 16 256
> cache_mgr webadmin_at_subdomain.domain.xx
> visible_hostname host1.subdomain.domain.xx
> log_icp_queries on
> cache_access_log /var/log/squid/access.log
> cache_log /var/log/squid/cache.log
> cache_store_log /var/log/squid/store.log
> cache_store_log none

Remove the first of those lines. It's overriden by the second.

> emulate_httpd_log off

This is the default and a deprecated option. I think you can remove it
from the config.

> cache_mem 512 MB

NP: the bigger you can make this the faster Squids hits will go (within
reason). The squid-2.x individual object in memory MB limit I see you
are already aware of below.

> maximum_object_size 100 MB # max cached object size
> maximum_object_size_in_memory 1 MB # max cached-in-memory object size
> acl all src 0.0.0.0/0.0.0.0

acl all src all

> acl localhost src 127.0.0.1/32
> acl localnet src 192.168.2.0/24
> acl ssl_ports port 443 563
> acl safe_ports port 81 80 443
> acl zope_servers src 127.0.0.1
> acl zope_servers src XXX.XXX.XXX.181
> acl zope_servers src XXX.XXX.XXX.134
> acl zope_servers src XXX.XXX.XXX.155
> acl zope_servers src 192.168.2.0/24
> acl manager proto cache_object
> acl connect method connect
> acl accelerated_protocols proto http
> acl accelerated_hosts dst 127.0.0.0/8
> acl accelerated_hosts dst XXX.XXX.XXX.181/32
> acl accelerated_hosts dst XXX.XXX.XXX.155/32

You call these two accelerated hosts but I see no cache_peer entries
allowing Squid to pass requests to them.
You don't even use this ACL so I say remove it to make things clearer.

> acl accelerated_ports myport 3128

another unused ACL.

> acl purge method PURGE
> http_access allow zope_servers purge
> http_access deny purge
> http_reply_access allow all
> acl webdav method PROPFIND TRACE PURGE PROPPATCH MKCOL COPY MOVE LOCK UNLOCK
> never_direct allow all
> http_access allow manager localnet
> http_access allow manager localhost
> http_access deny manager
> http_access deny connect !ssl_ports
> icp_access allow localhost
> icp_access allow localnet
> http_access allow all

Not great. I'm sure you have an index or registry somewhere of your
served domains. If its large use an external ACL to hook in and do
lookups real-time.
  This will trade a small amount of external lookups (most get cache for
zero cost) for a large(er) amount of processing invalid domains and
attack requests.

Or, when isolated away from the general Internet like you have use "src"
ACL to enumerate the machines/ranges allowed to pass requests in to this
Squid.

> cache_peer 192.168.2.1 sibling 3128 3130 name=theothersquid
> cache_peer 192.168.2.1 parent 8988 3988 no-netdb-exchange round-robin no-digest name=11
> cache_peer 192.168.2.1 parent 8990 3990 no-netdb-exchange round-robin no-digest name=12
<snip>
> cache_peer 192.168.2.2 parent 9008 4008 no-netdb-exchange round-robin no-digest name=211
> cache_peer 192.168.2.2 parent 9010 4010 no-netdb-exchange round-robin no-digest name=212

"round-robin" or ICP. With 2.6 you can pick only one.

3.0+ is needed for "weighted-round-robin background-ping" where the ICP
lag times are used to select fastest respondents more often. This also
measures the HTTP lag times and ICMP pinger tests. So ICP is not
strictly required.

> redirect_program /var/XXDIR/bin/squirm
> redirect_children 20
> redirect_rewrites_host_header off
> acl static_content urlpath_regex -i \.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|js|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$
> acl static_content urlpath_regex (.*)misc_/ExternalEditor/edit_icon$
> acl static_content urlpath_regex (.*)p_/(.*)

Remove the (.*) prefix and trailer from the above patterns. Regex
assumes they are there unless the ^ and $ anchors are used.

"no_cache" is an obsolete and confusing name. Remove the "no_" part from
all these lines...

> no_cache allow static_content
> acl post_requests method POST
> no_cache deny post_requests

POST requests are not cachable due to how they work in HTTP. Move denial
to the top of your cache tests.

NP: I'm not too sure about 2.6, but you may find POST requests and
others like it are never even checked for the "cache" access controls.

> acl QUERY urlpath_regex \?
> acl CGIBIN urlpath_regex cgi-bin
> no_cache allow QUERY
> no_cache deny CGIBIN

The QUERY and CGIBIN bits you may want to re-consider. We now recommend
allowing them to cache. With a refresh_pattern used to expire the broken
ones placed immediately before the "." pattern:
   refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

You will need this pattern anyway since you cache the \? pages.

The QUERY pattern if you want to keep it as allow can be merged as one
of the static_content patterns. Might be good to call static_content
slightly different after that.

> external_acl_type is_cacheable_type children=5 %{Cookie:__ac} %{Cookie:;__ac} %{Authorization} %{If-None-Match} /var/XXDIR/bin/squidAcl.py
> acl is_cacheable external is_cacheable_type
> no_cache allow is_cacheable

What exactly is that helper doing if I may ask?

> no_cache deny all

Hmm, you wanted performance. Thats usually gained by increasing the
amount cached and thus reducing network distance to client and server load.

If this was done to prevent drive-by attacks poisoning the cache the
conversion to proper reverse-proxy "accel" config will fix that.

If this was done due to the web servers output you may gain by inverting
the approach here to what is the intended use of "cache". Caching
everything but allowing explicit denial of badness where known.

> negative_ttl 0
> refresh_pattern . 0 50% 999999 ignore-reload
> refresh_pattern -i /getFile$ 60 90% 3600

The "." refresh_pattern will match *everything*. Your custom patterns
need to be placed above it to have any effect.

Also very large numbers in the min/max will 32-bit wrap when multiplied
up to a timestamp and end up doing the opposite of what you want. It's
not good in general to cache for more than a year so they should be set
to 525600 or less.

ie:
   refresh_pattern -i /getFile$ 60 90% 3600
   refresh_pattern -i (/cgi-bin/|\?) 0 0% 0 ignore-reload
   refresh_pattern . 0 50% 525600 ignore-reload

NP: If you can upgrade to 3.1+ you gain the "accel ignore-cc" option
combo on http_port which overrides all the possible client-sent
controls, not just the reload.

> shutdown_lifetime 1 seconds
> pipeline_prefetch on
>
> *******************************************************************
>
> The other squid will have a very similar config, just replace 192.168.2.1 with 192.168.2.2 and vice-versa.
>
> The main problem I'm facing is that every time the squid on the "passive" member responds with UDP_HIT the following line will be a TCP_MISS/504. Like this:
>
> 1286468808.210 0 192.168.2.1 UDP_HIT/000 168 ICP_QUERY http://127.0.0.1:3128/path/to/object - NONE/- -
> 1286468808.721 4 192.168.2.1 TCP_MISS/504 1915 GET http://127.0.0.1:3128/path/to/object - NONE/- text/html

Are these logs lines from 192.168.2.1 or 192.168.2.2?

If they are recorded on 192.168.2.1 they show a loop as it fetches from
itself and fails badly. The thing about loops is that they can hold up a
lot of resources for a long time before stopping and being logged.

If they are recorded on 192.168.2.2, I expect they are just showing ICP
false-positivies. ICP is known to be limited in the things it can match
on. ie just the URL. Vary headers are a big problem when matching. You
could disable the use of ICP entirely and use the round-robin.

You will need a newer Squid to get better accuracy than ICP. One which
supports HTCP and has more HTTP/1.1 compliant caching behaviour. HTCP
will also let you use the nifty recursive HTCP CLR instead of HTTP PURGE.

Also note how Squid is informing the web server that it's domain name is
"127.0.0.1:3128". This is due to lack of the "accel vhost" options on
http_port.

>
> I've searched this list and internet in general for ideas of what I'm doing wrong and came up empty.
>
> I'm open to any suggestion for improvement in this setup. Performance is my main goal.
>
> Many thanks,
> Adrian
>

HTH
Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.8
   Beta testers wanted for 3.2.0.2
Received on Fri Oct 08 2010 - 02:57:17 MDT

This archive was generated by hypermail 2.2.0 : Fri Oct 08 2010 - 12:00:03 MDT