Re: [squid-users] Increased memory usage (memory leak?) from Amos Jeffries on 2010-06-09 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 10 Jun 2010 00:33:18 +0000

On Wed, 09 Jun 2010 22:29:50 +0300, Panagiotis Christias
<p.christias_at_noc.ntua.gr> wrote:
> Hello list (sorry for the long message),
>
> we are using eight servers (running FreeBSD 7.3 amd64 with 2G ram each)
> as transparent proxy servers (Squid 2.7.9 along with squidGuard 1.4) for

> the content filtering service of the Greek School Network.
>
> Our current squid configuration includes:
> http_port 80 transparent
> http_port 8080
> cache_mem 192 MB
> cache_swap_low 80
> cache_swap_high 85
> maximum_object_size 4096 KB
> ipcache_size 94208
> ipcache_low 90
> ipcache_high 95
> cache_store_log none
> cache_dir null /tmp
> redirect_program /usr/local/bin/squidGuard
> redirect_children 48
> redirector_bypass on
> uri_whitespace encode
> request_header_max_size 25 KB
>
> Although it is not peak season (schools have exams and are about to
> close for summer), we are experiencing high memory usage in some cases
> (see cache01 at http://noc.ntua.gr/~christia/squid-vsize.png). All boxes

> are identical. The memory usage seems to be related to the type of the
> requests (not the volume, http://noc.ntua.gr/~christia/squid-reqs.png).
> In order to rule out any possibility of server specific problems we
> moved the client requests from one cache box to another and the memory
> allocation problem moved along (at first it was cache04 and now it is
> cache01).
>
> We compared the cache.log files and found no special messages in cache01

> or in any other server that showed increased memory usage. After looking

> into the access.log we tend to believe that the memory leaps occur
> during periods that squid receives too many concurrent requests for
> sites that are responding too slowly or not responding at all
> (firewalled, not sending any packets back). In these cases, concurrent
> requests vary from 40 reqs/sec to 700 reqs/sec, the average service
> varies from 570 msec to 178502 msec and the HTTP return codes are
> usually TCP_MISS/417, TCP_MISS/504 and TCP_MISS/000.
>
> Cache manager reports for cache01 692882 allocated cbdata
> clientHttpRequest objects occupying 742381 KB and 100% of them in use
> (ten times more than the other cache boxes).
>
> Full report is available here:
> http://noc.ntua.gr/~christia/cache01.mem.20100609-21:35.txt
> http://noc.ntua.gr/~christia/cache01.info.20100609-21:35.txt
>
> At the same 'top' in cache01 reports about the squid process:
> 1575 MB virtual size
> 303 MB resident size
>
> Finally, pmap (FreeBSD sysutils/pmap port) report is available here (in
> case it is useful):
> http://noc.ntua.gr/~christia/cache01.pmap.20100609-21:35.txt
>
> Anyway, is this behaviour normal or is it a bug or memory leak? I am
> willing to try any suggestion, provide any other information and help
> debugging.
>
> For the moment, I am manually excluding from the transparent cache
> schema the sites that seem to cause problems. I am also considering
> adding a maxconn acl and lowering some connection timeouts. Waiting for
> 180000 msec for a firewalled connection is *too* much.
>
> Regards,
> Panagiotis

Looks like broken almost-HTTP/1.1 clients to me. The behaviour when a
client sends "Expect: 100-continue" without properly supporting it
themselves. Squid does not fully support that feature of HTTP/1.1, just
enough to send back the right 417 error and negotiate its non-existence
with the client.

What I think is happening is that many requests come in using the Expect:
header for HTTP/1.1 (which Squid does not yet support) the clients then
wait for the 100 status code to return. Which will never happen.
Standards-compliant Squid will reply with 417 errors and hope the client
retries without Expect:.

Check that your cache01 is configured with "ignore_expect_100 off". This
will send back the 417 error immediately for any clients which are actually
working to negotiate a faster response back. Broken clients will die
immediately with the error instead of dying after holding your resources
for that timeout.

Amos
Received on Thu Jun 10 2010 - 00:33:22 MDT

This archive was generated by hypermail 2.2.0 : Thu Jun 10 2010 - 12:00:03 MDT