Re: [squid-users] squid consuming near all (95+ %) CPU, it is normal?

From: Frantisek Hanzlik <franta_at_hanzlici.cz>
Date: Wed, 21 Nov 2012 07:01:41 +0100

Amos Jeffries wrote:
> On 20.11.2012 09:33, Frantisek Hanzlik wrote:
>> Amos Jeffries wrote:
>>> On 12.11.2012 03:25, Frantisek Hanzlik wrote:
>>>> With this squid configuration:
>>>> acl localnet src 172.16.0.0/12
>>>> acl localnet src 192.168.0.0/16
>>>> acl SSL_ports port 443
>>>> acl SSL_ports port 85
>>>> acl SSL_ports port 81
>>>> acl SSL_ports port 5443
>>>> acl Safe_ports port 80
>>>> acl Safe_ports port 21
>>>> acl Safe_ports port 443
>>>> acl Safe_ports port 70
>>>> acl Safe_ports port 210
>>>> acl Safe_ports port 1025-65535
>>>> acl Safe_ports port 280
>>>> acl Safe_ports port 488
>>>> acl Safe_ports port 591
>>>> acl Safe_ports port 777
>>>> acl Safe_ports port 5443
>>>> acl Safe_ports port 85
>>>> acl Safe_ports port 81
>>>> acl CONNECT method CONNECT
>>>> http_access allow manager localhost
>>>> http_access deny manager
>>>> http_access deny !Safe_ports
>>>> http_access deny CONNECT !SSL_ports
>>>> http_access allow localnet
>>>> http_access allow localhost
>>>> http_access deny all
>>>> http_port 3128
>>>> hierarchy_stoplist cgi-bin ?
>>>> cache_dir ufs /var/spool/squid 1000 16 256 max-size=999000
>>>> cache_mem 512 MB
>>>> maximum_object_size 4096 KB
>>>> memory_pools off
>>>> cache_swap_low 90
>>>> cache_swap_high 95
>>>> dns_nameservers 172.16.1.1
>>>> client_db off
>>>> half_closed_clients off
>>>> max_filedesc 4096
>>>> coredump_dir /var/spool/squid
>>>> refresh_pattern ^ftp: 1440 20% 10080
>>>> refresh_pattern ^gopher: 1440 0% 1440
>>>> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
>>>> refresh_pattern . 0 20% 4320
>>>> acl users src 172.31.0.0/16
>>>> delay_pools 1
>>>> delay_class 1 2
>>>> delay_parameters 1 5000000/10000000 5000000/10000000
>>>> delay_access 1 allow users
>>>>
>>>> squid very often load CPU at near 100%, with cca 200 users and 4000
>>>> connections (~2000 to users, 2000 to internet). Removing delay pool
>>>> configuration has no big effect.
>>>
>>> Users and connections is meaningless in HTTP. Requests per second flowing
>>> over those connections is what counts.
>>>
>>> The proxy might have all 4000 links idle (low CPU; zero bandwidth; zero
>>> disk I/O), be downloading video (or .iso) images simultaneously (low CPU;
>>> maxed out bandwidth; high disk I/O), or parsing and processing
>>> header-only requests (100% CPU; moderate or low bandwidth; no disk I/O).
>>>
>>> NP: 3.2 uses HTTP/1.1. A lot of the protocol performance features in
>>> HTTP/1.1 are done by removing object bodies and reducing the transactions
>>> to header-only requests and responses.
>>>
>>>
>>>> HW configuration: Dual core E8500_at_3.16GHz CPU, 4GB RAM, 2x SATA 7k2
>>>> Raid Edition disks in SW RAID1 for squid cache (disk performance
>>>> seems isn't problem, IOWAIT is small), gigabit ethernet cards to
>>>> internet (~800 Mbps line) and to LAN.
>>>> It is squid-3.2.3.20121106.r11695-1.fc14.i686 on Fedora 14 i686
>>>> (I test it with some older squid 3.1 version and same configuration
>>>> too, but results were same, or rather worse)
>>>>
>>>> It this CPU load normal, or can be there done some performance
>>>> tunnning for it?
>>>
>>> In order of likelyhood:
>>>
>>> Experiment #1 is remove that SW RAID and test again.
>>> Sure iowait is not bad (um, to *master* disk it is same as accessing
>>> without RAID at all), however iowait is only half the story with SW RAID.
>>> Being SW every I/O op sucks away some ... CPU. Disk I/O CPU loading in
>>> particular can be doubled depending on the implementations buffering
>>> efficiency.
>>> If you need RAID at all use HW RAID instead of SW. The only benefit you
>>> get from RAID under Squid is some advance notice of disks failing before
>>> the proxy crashes (or starts TCP_SWAPFAIL_MISS'ing - but UFS cache type
>>> still crashes so maybe you do need RAID).
>>> By using SW RAID in particular you are taking CPU cycles away from
>>> Squid, which would otherwise be using them to process a higher req/sec
>>> peak capacity. If your peak traffic is low req/sec this is not a problem,
>>> but for several thousand users I expect your peak capacity needs are
>>> high.
>>> Choice is yours, but I seriously advise moving away from SW RAID.
>>>
>>>
>>> Experiment #2 is to set "memory_pools on".
>>> Up to 90% of malloc/free calls are for very short strings and small
>>> objects. Squid can save on a lot of system allocator cycles and shrink
>>> the overall system RAM requirements a little bit by allocating these in
>>> batches/blocks/pools. This will help speed up req/sec capacity in the
>>> traffic which consists of mostly HTTP headers.
>>>
>>>
>>> Experiment #3 is to raise max_filedesc
>>> I suspect this setting is the only thing limiting your proxy to 2000 user
>>> connections and 2000 internet connections. Notice that makes 4000, and
>>> with 100 connections held in reserve for unexpected disk access it would
>>> seem that you are not using disks at all for many of these connections
>>> (TCP_MISS and TCP_MEM_HIT being most of your load?).
>>> When there are not enough FD to service all incoming requests Squid
>>> starts limiting them and spends extra CPU cycles doing management of the
>>> waiting client connection queue.
>>>
>>>
>>> Experiment #4 is to set "cache_swap_low 97" and "cache_swap_high 98".
>>> Periodic garbage collection happens when the cache fills up. With a 1GB
>>> disk cache the default settings make about 50-100MB of objects being
>>> processed in the cache and erased from disk. Most objects will be only a
>>> few KB - see your avg object size stats.
>>> NOTE: this may only marginally appear in iowait, but shows up better in
>>> the related erase/unlink operation stats.
>>>
>>> Amos
>>
>> Thank You for recommendations. I think #3 perhaps isn't problem there,
>> as squids cache.log says there is 16384 filedescriptors (i do not
>> understand why it is this number, in startup script i'm setting
>> 'ulimit -HSn 65536'
>> just before starting squid - then why no. of fd is only 16384? Some other
>> system limitations?):
>>
>> 2012/11/17 08:40:29 kid1| Starting Squid Cache version
>> 3.2.3-20121106-r11695 for i686-pc-linux-gnu...
>> 2012/11/17 08:40:29 kid1| Process ID 2497
>> 2012/11/17 08:40:29 kid1| Process Roles: worker
>> 2012/11/17 08:40:29 kid1| With 16384 file descriptors available
>> 2012/11/17 08:40:29 kid1| Initializing IP Cache...
>> 2012/11/17 08:40:29 kid1| DNS Socket created at 0.0.0.0, FD 9
>> 2012/11/17 08:40:29 kid1| Adding nameserver 172.16.1.1 from squid.conf
>> 2012/11/17 08:40:29 kid1| Logfile: opening log
>> daemon:/var/log/squid/access.log
>> 2012/11/17 08:40:29 kid1| Logfile Daemon: opening log
>> /var/log/squid/access.log
>> 2012/11/17 08:40:29 kid1| Unlinkd pipe opened on FD 15
>> 2012/11/17 08:40:29 kid1| Local cache digest enabled; rebuild/rewrite
>> every 3600/3600 sec
>> 2012/11/17 08:40:29 kid1| Store logging disabled
>> 2012/11/17 08:40:29 kid1| Swap maxSize 1024000 + 524288 KB, estimated
>> 119099 objects
>> 2012/11/17 08:40:29 kid1| Target number of buckets: 5954
>> 2012/11/17 08:40:29 kid1| Using 8192 Store buckets
>> 2012/11/17 08:40:29 kid1| Max Mem size: 524288 KB
>> 2012/11/17 08:40:29 kid1| Max Swap size: 1024000 KB
>> 2012/11/17 08:40:29 kid1| Rebuilding storage in /var/spool/squid (dirty
>> log)
>> 2012/11/17 08:40:29 kid1| Using Least Load store dir selection
>> 2012/11/17 08:40:29 kid1| Set Current Directory to /var/spool/squid
>> 2012/11/17 08:40:30 kid1| Loaded Icons.
>> 2012/11/17 08:40:30 kid1| HTCP Disabled.
>> 2012/11/17 08:40:30 kid1| Squid plugin modules loaded: 0
>> 2012/11/17 08:40:30 kid1| Adaptation support is off.
>> 2012/11/17 08:40:30 kid1| Accepting HTTP Socket connections at
>> local=0.0.0.0:3128 remote=[::] FD 18 flags=9
>> 2012/11/17 08:40:30 kid1| Store rebuilding is 0.11% complete
>> 2012/11/17 08:40:45 kid1| Store rebuilding is 40.44% complete
>> 2012/11/17 08:41:00 kid1| Store rebuilding is 83.15% complete
>> 2012/11/17 08:41:05 kid1| Done reading /var/spool/squid swaplog (3699238
>> entries)
>> 2012/11/17 08:41:05 kid1| Finished rebuilding storage from disk.
>> 2012/11/17 08:41:05 kid1| 1877119 Entries scanned
>> 2012/11/17 08:41:05 kid1| 6 Invalid entries.
>> 2012/11/17 08:41:05 kid1| 0 With invalid flags.
>> 2012/11/17 08:41:05 kid1| 54901 Objects loaded.
>> 2012/11/17 08:41:05 kid1| 0 Objects expired.
>> 2012/11/17 08:41:05 kid1| 1821036 Objects cancelled.
>> 2012/11/17 08:41:05 kid1| 53 Duplicate URLs purged.
>> 2012/11/17 08:41:05 kid1| 1129 Swapfile clashes avoided.
>> 2012/11/17 08:41:05 kid1| Took 35.95 seconds (1527.34 objects/sec).
>> 2012/11/17 08:41:05 kid1| Beginning Validation Procedure
>> 2012/11/17 08:41:06 kid1| Completed Validation Procedure
>> 2012/11/17 08:41:06 kid1| Validated 54895 Entries
>> 2012/11/17 08:41:06 kid1| store_swap_size = 922068.00 KB
>> 2012/11/17 08:41:06 kid1| storeLateRelease: released 6 objects
>>
>>
>>
>> Thus I started with recommendation #2: setting 'memory_pools on'.
>> This seemed to me firstly as it highly helped: CPU load was dropped
>> to approx. 40-50% at the time of highest load (this is only between
>> 18 to 21 hours our time /UTC+1/, in all other time is squid load
>> much smaller and CPU load was reasonable). But then I noticed
>> that '(squid-1)' process (PID 14667 below, child of 11769) changes
>> its PID, as if died and parent started new copy:
>>
>> 1 11769 11769 11769 ? -1 Ss 0 0:00 squid -f /etc/squid/squid.conf
>> 11769 14667 11769 11769 ? -1 R 23 0:11 (squid-1) -f /etc/squid/squid.conf
>> 14667 14670 11769 11769 ? -1 S 23 0:00 (logfile-daemon)
>> /var/log/squid/access.log
>>
>> And then I found in /var/log/messages these lines
>> (/var/log/squid/cache.log contained nothing suspicious(*) ):
>>
>> Nov 19 16:53:40 172 squid[2834]: Squid Parent: (squid-1) process 32702
>> exited due to signal 6 with status 0
>
> Ouch. That needs to be found and fixed.
> Are you able to assist with that by running Squid under a debugger and
> getting a backtrace?
> (http://wiki.squid-cache.org/SquidFaq/BugReporting#Using_gdb_debugger_on_a_live_proxy_.28with_minimal_downtime.29)

I tried enable core dump generating, and run gdb on it. This is result:

gdb /tmp/core.15363
GNU gdb (GDB) Fedora (7.2-52.fc14)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/squid...Reading symbols from
/usr/lib/debug/usr/sbin/squid.debug...done.
done.
[New Thread 15363]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install
/usr/lib/debug/.build-id/c4/1c574f31a203492b9389c783adad6ff1989915
Reading symbols from /lib/libpthread-2.13.so...Reading symbols from
/usr/lib/debug/lib/libpthread-2.13.so.debug...done.
[Thread debugging using libthread_db enabled]
done.
Loaded symbols for /lib/libpthread-2.13.so
Reading symbols from /lib/libcrypt-2.13.so...Reading symbols from
/usr/lib/debug/lib/libcrypt-2.13.so.debug...done.
done.
Loaded symbols for /lib/libcrypt-2.13.so
Reading symbols from /usr/lib/libecap.so.2.0.0...Reading symbols from
/usr/lib/debug/usr/lib/libecap.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib/libecap.so.2.0.0
Reading symbols from /usr/lib/libxml2.so.2.7.7...Reading symbols from
/usr/lib/debug/usr/lib/libxml2.so.2.7.7.debug...done.
done.
Loaded symbols for /usr/lib/libxml2.so.2.7.7
Reading symbols from /lib/libexpat.so.1.5.2...Reading symbols from
/usr/lib/debug/lib/libexpat.so.1.5.2.debug...done.
done.
Loaded symbols for /lib/libexpat.so.1.5.2
Reading symbols from /usr/lib/libssl.so.1.0.0e...Reading symbols from
/usr/lib/debug/usr/lib/libssl.so.1.0.0e.debug...done.
done.
Loaded symbols for /usr/lib/libssl.so.1.0.0e
Reading symbols from /lib/libcrypto.so.1.0.0e...Reading symbols from
/usr/lib/debug/lib/libcrypto.so.1.0.0e.debug...done.
done.
Loaded symbols for /lib/libcrypto.so.1.0.0e
Reading symbols from /lib/libgssapi_krb5.so.2.2...Reading symbols from
/usr/lib/debug/lib/libgssapi_krb5.so.2.2.debug...done.
done.
Loaded symbols for /lib/libgssapi_krb5.so.2.2
Reading symbols from /lib/libkrb5.so.3.3...Reading symbols from
/usr/lib/debug/lib/libkrb5.so.3.3.debug...done.
done.
Loaded symbols for /lib/libkrb5.so.3.3
Reading symbols from /lib/libk5crypto.so.3.1...Reading symbols from
/usr/lib/debug/lib/libk5crypto.so.3.1.debug...done.
done.
Loaded symbols for /lib/libk5crypto.so.3.1
Reading symbols from /lib/libcom_err.so.2.1...Reading symbols from
/usr/lib/debug/lib/libcom_err.so.2.1.debug...done.
done.
Loaded symbols for /lib/libcom_err.so.2.1
Reading symbols from /lib/libnsl-2.13.so...Reading symbols from
/usr/lib/debug/lib/libnsl-2.13.so.debug...done.
done.
Loaded symbols for /lib/libnsl-2.13.so
Reading symbols from /lib/libresolv-2.13.so...Reading symbols from
/usr/lib/debug/lib/libresolv-2.13.so.debug...done.
done.
Loaded symbols for /lib/libresolv-2.13.so
Reading symbols from /lib/libcap.so.2.22...Reading symbols from
/usr/lib/debug/lib/libcap.so.2.22.debug...done.
done.
Loaded symbols for /lib/libcap.so.2.22
Reading symbols from /lib/librt-2.13.so...Reading symbols from
/usr/lib/debug/lib/librt-2.13.so.debug...done.
done.
Loaded symbols for /lib/librt-2.13.so
Reading symbols from /lib/libdl-2.13.so...Reading symbols from
/usr/lib/debug/lib/libdl-2.13.so.debug...done.
done.
Loaded symbols for /lib/libdl-2.13.so
Reading symbols from /usr/lib/libltdl.so.7.2.2...Reading symbols from
/usr/lib/debug/usr/lib/libltdl.so.7.2.2.debug...done.
done.
Loaded symbols for /usr/lib/libltdl.so.7.2.2
Reading symbols from /usr/lib/libstdc++.so.6.0.14...Reading symbols from
/usr/lib/debug/usr/lib/libstdc++.so.6.0.14.debug...done.
done.
Loaded symbols for /usr/lib/libstdc++.so.6.0.14
Reading symbols from /lib/libm-2.13.so...Reading symbols from
/usr/lib/debug/lib/libm-2.13.so.debug...done.
done.
Loaded symbols for /lib/libm-2.13.so
Reading symbols from /lib/libgcc_s-4.5.1-20100924.so.1...Reading symbols
from /usr/lib/debug/lib/libgcc_s-4.5.1-20100924.so.1.debug...done.
done.
Loaded symbols for /lib/libgcc_s-4.5.1-20100924.so.1
Reading symbols from /lib/libc-2.13.so...Reading symbols from
/usr/lib/debug/lib/libc-2.13.so.debug...done.
done.
Loaded symbols for /lib/libc-2.13.so
Reading symbols from /lib/ld-2.13.so...Reading symbols from
/usr/lib/debug/lib/ld-2.13.so.debug...done.
done.
Loaded symbols for /lib/ld-2.13.so
Reading symbols from /lib/libfreebl3.so...Reading symbols from
/usr/lib/debug/lib/libfreebl3.so.debug...done.
done.
Loaded symbols for /lib/libfreebl3.so
Reading symbols from /lib/libz.so.1.2.5...Reading symbols from
/usr/lib/debug/lib/libz.so.1.2.5.debug...done.
done.
Loaded symbols for /lib/libz.so.1.2.5
Reading symbols from /lib/libkrb5support.so.0.1...Reading symbols from
/usr/lib/debug/lib/libkrb5support.so.0.1.debug...done.
done.
Loaded symbols for /lib/libkrb5support.so.0.1
Reading symbols from /lib/libkeyutils-1.2.so...Reading symbols from
/usr/lib/debug/lib/libkeyutils-1.2.so.debug...done.
done.
Loaded symbols for /lib/libkeyutils-1.2.so
Reading symbols from /lib/libattr.so.1.1.0...Reading symbols from
/usr/lib/debug/lib/libattr.so.1.1.0.debug...done.
done.
Loaded symbols for /lib/libattr.so.1.1.0
Reading symbols from /lib/libselinux.so.1...Reading symbols from
/usr/lib/debug/lib/libselinux.so.1.debug...done.
done.
Loaded symbols for /lib/libselinux.so.1
Reading symbols from /lib/libnss_files-2.13.so...Reading symbols from
/usr/lib/debug/lib/libnss_files-2.13.so.debug...done.
done.
Loaded symbols for /lib/libnss_files-2.13.so
Core was generated by `(squid-1) -f /etc/squid/squid.conf'.
Program terminated with signal 6, Aborted.
#0 0x0026c424 in __kernel_vsyscall ()
(gdb) backtrace
#0 0x0026c424 in __kernel_vsyscall ()
#1 0x0092c2f1 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0x0092dd5e in abort () at abort.c:92
#3 0x00924c08 in __assert_fail (assertion=0x83cf443 "false",
file=0x83ff936 "Address.cc", line=689,
    function=0x83ffca0 "void Ip::Address::GetAddrInfo(addrinfo*&, int)
const") at assert.c:81
#4 0x082e1b60 in Ip::Address::GetAddrInfo (this=0xa339214,
dst=@0xbfd5b02c, force=0) at Address.cc:689
#5 0x082da72b in comm_openex (sock_type=1, proto=6, addr=..., flags=1,
tos=0 '\000', nfmark=0,
    note=0xa0378e0 "[2a02:598:1::9999]") at comm.cc:525
#6 0x0832fd65 in Comm::ConnOpener::start (this=0x9c7e2a8) at
ConnOpener.cc:164
#7 0x082d13be in NullaryMemFunT<AsyncJob>::doDial (this=0xa6180fc) at
../../src/base/AsyncJobCalls.h:101
#8 0x082d15d6 in JobDialer<AsyncJob>::dial (this=0xa6180fc, call=...) at
../../src/base/AsyncJobCalls.h:175
#9 0x082d1775 in AsyncCallT<NullaryMemFunT<AsyncJob> >::fire
(this=0xa6180e0) at ../../src/base/AsyncCall.h:142
#10 0x082ce34b in AsyncCall::make (this=0xa6180e0) at AsyncCall.cc:36
#11 0x082d196b in AsyncCallQueue::fireNext (this=0x921e6f0) at
AsyncCallQueue.cc:54
#12 0x082d1ae0 in AsyncCallQueue::fire (this=0x921e6f0) at
AsyncCallQueue.cc:40
#13 0x0818edc0 in EventLoop::dispatchCalls (this=0xbfd5b424) at
EventLoop.cc:154
#14 0x0818eff7 in EventLoop::runOnce (this=0xbfd5b424) at EventLoop.cc:131
#15 0x0818f0e0 in EventLoop::run (this=0xbfd5b424) at EventLoop.cc:95
#16 0x081ff650 in SquidMain (argc=3, argv=0xbfd5b594) at main.cc:1500
#17 0x0820076f in SquidMainSafe (argc=3, argv=0xbfd5b594) at main.cc:1215
#18 main (argc=3, argv=0xbfd5b594) at main.cc:1207

Should I do some another steps?
Thanks
Received on Wed Nov 21 2012 - 06:01:54 MST

This archive was generated by hypermail 2.2.0 : Wed Nov 21 2012 - 12:00:04 MST