Re: [squid-users] Re: Squid Ldap Authenticators

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 14 Mar 2012 15:15:06 +1300

On 14.03.2012 03:54, guest01 wrote:
> Hi,
>
> Sorry, I pressed the send button by mistake ...
>
> We are having strange Squid troubles, at first, let me describe our
> setup:
>
> - 4 HP G6/G7 DL380 servers with 16CPUs and 28GB RAM with RHEL 5.4-5.8
> 64bit and Squid 3.1.12 (custom compiled)
> Squid Cache: Version 3.1.12
> configure options: '--enable-ssl' '--enable-icap-client'
> '--sysconfdir=/etc/squid' '--enable-async-io' '--enable-snmp'
> '--enable-poll' '--with-maxfd=32768' '--enable-storeio=aufs'
> '--enable-removal-policies=heap,lru' '--enable-epoll'
> '--disable-ident-lookups' '--enable-truncate'
> '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid.pid'
> '--with-default-user=squid' '--prefix=/opt/squid'
> '--enable-auth=basic
> digest ntlm negotiate'
> '-enable-negotiate-auth-helpers=squid_kerb_auth'
> --with-squid=/home/squid/squid-3.1.12 --enable-ltdl-convenience
>
> - Each server has two instances for kerberos/ntlm authentication and
> two instances for LDAP authentication (different customers)
> - we have a hardware loadbalancer which is balancing request for our
> kerberos-customers (4x2 instances) and ldap-customers (4x2
> instances),
> each has a different IP address.
> - average load values are approx 0.5 (5min values)
> - approx 60RPS per instance (equally distributed -> 16 * 60 => 960
> RPS)
> - up to 150Mbit/s traffic per server
> - ICAP servers for content adaption (multiple servers with a hardware
> loadbalancer in front of it)
>
> From time to time we are having troubles with our Squid servers which
> may not be a problem related to Squid, I suspect an OS issue.
> Nevertheless, sometimes the servers don't respond to request (even
> SSH-requests) or logging in takes forever (reverse lookup failure?)
> or
> even worse, sometimes the server interface is just down (there is no
> indication of any problem at the switch port level). If we check the
> squidclient output, we can see some hanging ldap authenticators:
>
> squid_at_xlsqit01 /opt/squid/bin $ ./squidclient -h 10.122.125.23
> cache_object://10.122.125.23/basicauthenticator
> HTTP/1.0 200 OK
> Server: squid/3.1.12
> Mime-Version: 1.0
> Date: Tue, 13 Mar 2012 13:34:07 GMT
> Content-Type: text/plain
> Expires: Tue, 13 Mar 2012 13:34:07 GMT
> Last-Modified: Tue, 13 Mar 2012 13:34:07 GMT
> X-Cache: MISS from xlsqip02_3
> Via: 1.0 xlsqip02_3 (squid/3.1.12)
> Connection: close
>
> Basic Authenticator Statistics:
> program: /opt/squid/libexec/squid_ldap_auth
> number active: 20 of 20 (0 shutting down)
> requests sent: 13316
> replies received: 13312
> queue length: 0
> avg service time: 4741 msec
>
> # FD PID # Requests Flags Time Offset
> Request
> 1 12 16038 2150 B 125.885 0 user1
> pw1\n
> 2 24 16043 85 B 119.562 0 user2
> pw2\n
> 3 32 16049 63 B 13.639 0 user3
> pw3\n
> 4 43 16055 21 B 116.143 0 user4
> pw4\n
> 5 46 16059 12 189.002 0
> (none)
> 6 50 16064 1 189.003 0
> (none)
> 7 56 16069 2 0.079 0
> (none)
> 8 60 16074 0 0.000 0
> (none)
> 9 65 16079 0 0.000 0
> (none)
> 10 86 16084 0 0.000 0
> (none)
> 11 88 16095 0 0.000 0
> (none)
> 12 90 16101 0 0.000 0
> (none)
> 13 92 16117 0 0.000 0
> (none)
> 14 95 16122 0 0.000 0
> (none)
> 15 97 16130 0 0.000 0
> (none)
> 16 99 16138 0 0.000 0
> (none)
> 17 101 16144 0 0.000 0
> (none)
> 18 104 16150 0 0.000 0
> (none)
> 19 107 16162 0 0.000 0
> (none)
> 20 109 16173 0 0.000 0
> (none)

Looks like you can save some resources by dropping that down to 10
helpers. But re-evaluate that after they are fixed in case the loading
goes up after that.

>
> Flags key:
>
> B = BUSY
> W = WRITING
> C = CLOSING
> S = SHUTDOWN PENDING
>
> 2012/03/13 03:00:04| Ready to serve requests.
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'

> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
> squid_ldap_auth: WARNING, could not bind to binddn 'Can't contact
> LDAP server'
>
> Testing the ldap authentication at CLI level, it is working without
> any problems:
>
> root_at_xlsqip02 ~ # /opt/squid/libexec/squid_ldap_auth -b
> "dc=squid-proxy" -D "uid=...." -w xxx -h ldaphost -f "(uid=%s)"
> user1 pw1
> OK
>
> Unfortunately, there is nothing helpful in syslog, e.g.
> Mar 13 15:05:19 xlsqip02 last message repeated 2 times
> Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
> libsmb/clientgen.c:cli_receive_smb(111)
> Mar 13 15:05:25 xlsqip02 winbindd[4283]: Receiving SMB: Server
> stopped responding
> Mar 13 15:05:25 xlsqip02 winbindd[4283]: [2012/03/13 15:05:25, 0]
> rpc_client/cli_pipe.c:rpc_api_pipe(790)
> Mar 13 15:05:25 xlsqip02 winbindd[4283]: rpc_api_pipe: Remote
> machine wienroot1.wien.rbgat.net pipe \lsarpc fnum 0x4008returned
> critical error. Error was Call timed out: server did not respond
> after
> 10000 milliseconds

What does the domain "wienroot1.wien.rbgat.net" resolve to?
  Is connectivity to all its IPs working?

Looks a lot like network congestion affecting SMB. Or possibly route
up/down connectivity issues for IP (v4? v6?).

Winbind has some nasty limitations, but should not be hitting this type
of problem.

> Mar 13 15:05:48 xlsqip02 sockd[4235]: warning: accept(2) failed:
> Resource temporarily unavailable (errno = 11)
> Mar 13 15:06:20 xlsqip02 last message repeated 7 times
> Mar 13 15:07:26 xlsqip02 last message repeated 4 times
> Mar 13 15:08:27 xlsqip02 last message repeated 4 times
> Mar 13 15:09:30 xlsqip02 last message repeated 10 times
> Mar 13 15:10:37 xlsqip02 last message repeated 7 times
> Mar 13 15:11:39 xlsqip02 last message repeated 11 times
> Mar 13 15:12:55 xlsqip02 last message repeated 9 times
> Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
> libsmb/credentials.c:creds_client_check(324)
> Mar 13 15:12:57 xlsqip02 winbindd[4331]: creds_client_check:
> credentials check failed.
> Mar 13 15:12:57 xlsqip02 winbindd[4331]: [2012/03/13 15:12:57, 0]
> rpc_client/cli_netlogon.c:rpccli_netlogon_sam_network_logon(1030)
> Mar 13 15:12:57 xlsqip02 winbindd[4331]:
> rpccli_netlogon_sam_network_logon: credentials chain check failed
> Mar 13 15:13:05 xlsqip02 sockd[4235]: warning: accept(2) failed:
> Resource temporarily unavailable (errno = 11)
>
> btw, winbind just sucks ... But I doubt that winbind is the root
> cause ...

Right. Something underneath it is. Affecting both winbind and
squid_ldap_auth connectivity. Possibly routing related.

>
> Anyway, we had some NIC issues before (packet drops), at the moment
> we
> disabled all TSO-stuff
>
> root_at_xlsqip02 ~ # ethtool -k eth0
> Offload parameters for eth0:
> Cannot get device udp large send offload settings: Operation not
> supported
> rx-checksumming: off
> tx-checksumming: off
> scatter-gather: off
> tcp segmentation offload: off
> udp fragmentation offload: off
> generic segmentation offload: off
> generic-receive-offload: off
>
> root_at_xlsqip02 ~ # ethtool -i eth0
> driver: bnx2
> version: 1.9.3
> firmware-version: 4.6.4 NCSI 1.0.3
> bus-info: 0000:02:00.0
>
> root_at_xlsqip02 ~ # ethtool -g eth0
> Ring parameters for eth0:
> Pre-set maximums:
> RX: 1020
> RX Mini: 0
> RX Jumbo: 4080
> TX: 255
> Current hardware settings:
> RX: 1020
> RX Mini: 0
> RX Jumbo: 0
> TX: 255
>
> netstat output, if interesting:
> root_at_xlsqip02 ~ # netstat -s
> Ip:
> 1031106057 total packets received
> 32 with invalid addresses
> 0 forwarded
> 0 incoming packets discarded
> 1031105815 incoming packets delivered
> 943692708 requests sent out
> 214 dropped because of missing route

Possibly related.

> 34 reassemblies required
> 17 packets reassembled ok
> Icmp:
> 77877 ICMP messages received
> 339 input ICMP message failed.
> ICMP input histogram:
> destination unreachable: 31124

unreachable is way too high. The NIC is either going down
intermittently or a route has disappeared for some destinations.

> timeout in transit: 3011
> echo requests: 43271
> echo replies: 467
> 43804 ICMP messages sent
> 0 ICMP messages failed
> ICMP output histogram:
> destination unreachable: 66
> echo request: 467
> echo replies: 43271

Amos
Received on Wed Mar 14 2012 - 02:15:10 MDT

This archive was generated by hypermail 2.2.0 : Wed Mar 14 2012 - 12:00:03 MDT