[PATCH] Bug 3389: Auto-reconnect for tcp access_log

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 12 Apr 2013 10:43:28 -0600

Hello,

    The attached trunk patch fixes several problems with TCP module for
access_log. The module is unusable in production without these changes
if there is a chance the TCP logger becomes unavailable or TCP bandwidth
becomes temporary restricted . Nathan Hoad is the primary author of
these improvements. Factory helped Nathan during last stages of the
project. The code passed initial lab tests.

The code and detailed development history are also available at
https://code.launchpad.net/~squid/squid/bug3389

Here is a list of major changes:

1. Squid reconnects to TCP logger as needed. Squid keeps trying to
connect forever, using a hard-coded 0.5 second delay between attempts.

2. Squid buffers log records while there is no connectivity. The
buffering limit is configurable.

3. On buffer overflows, Squid worker either dies or starts dropping log
records. The choice is configurable.

4. The tcp logging module honors buffered_logs setting. Old code was
flushing each record.

5. Squid reports changes in logging state to cache.log. Routine
connection retries are not reported at level 1, to reduce noise level:

> 2013/04/11 17:42:31.129| tcp:127.0.0.1:23333 logger connection attempt #1 failed. Will keep trying every 0.50 seconds.
> 2013/04/11 17:42:49.411| tcp:127.0.0.1:23333 logger connectivity restored after 37 attempts.
> 2013/04/11 17:42:59.645| tcp:127.0.0.1:23333 logger connection attempt #1 failed. Will keep trying every 0.50 seconds.
> 2013/04/11 17:43:31.826| tcp:127.0.0.1:23333 logger 68608-byte buffer overflowed; cannot fit 172 bytes
> 2013/04/11 17:43:31.826| tcp:127.0.0.1:23333 logger starts dropping records.
> 2013/04/11 17:43:41.332| tcp:127.0.0.1:23333 logger connectivity restored after 83 attempts.
> 2013/04/11 17:43:41.422| tcp:127.0.0.1:23333 logger stops dropping records after 96 drops; current buffer use: 3420 out of 68608 bytes

6. A new access_log configuration format/style has been added. It allows
us to easily add named options such as buffer-size or on-error. The same
format can be used to add module-specific options in the future, but
doing so would require changes to the high-level logging code. All old
configuration formats/styles are still supported.

7. squid.conf buffered_log option documentation now reflects reality. It
used to talk about cache.log but I do not think Squid uses that option
for cache.log maintenance.

There are a few known minor side-effects of these changes on other features:

i) All access_log logs can now be configured to bypass errors because
the old "fatal" flag is now configurable via log-specific on-error
option in squid.conf. The default is still "die". I have not checked
whether modules other than TCP logger honor that flag.

ii) All access_log logs now use 8*MAX_URL (64KB) instead of a 4*MAX_URL
(32KB) or smaller buffer size by default. The ICAP logger was using
2*MAX_URL buffer size. The TCP logger was using 64KB buffer size before
so no change for TCP. I decided that it is better to raise the default
buffering level for some logs rather than decrease it for other logs,
but it is not clear what the best default is. The buffer size is now
configurable via buffer-size so admins can control it on individual log
basis.

We have attempted to solve more TCP logging problems, but it turns out
that correct solutions would require fixing higher-level logging code,
not specific to TCP logger or Bug 3389 scope. Those unsolved problems
include:

A. During reconfiguration, all logs are closed and reopened, even if
there have been no changes to their configuration that necessitate such
a drastic action (or no changes at all!). For TCP logger, this means
that the old connection is used to flush remaining buffered records (if
any), and the new connection is used to log new records, possibly at the
same time. Nathan wrote clever code that keeps logging going using the
same job/connection. However, we had to yank that code out because it
clashed with higher-level logging state in subtle ways.

B. During shutdown, all connections are put in the closing state before
logs are told to flush remaining records. For TCP logger, this means
that the remaining buffered records (if any) are lost. The correct fix
would probably require rearranging shutdown sequence AND letting
EventLoop run during shutdown (probably among other things).

C. When logger connectivity is lost, Squid does not notice the problem
until the second TCP socket write (or later). This results in lost
records. This is due to TCP-level buffering. I suspect the only cure for
this is adding logger-to-Squid "I got your records" feedback, which
requires changes in the logging protocol (currently there is no
logger-to-Squid communication at all).

HTH,

Alex.

Received on Fri Apr 12 2013 - 16:43:51 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 16 2013 - 12:00:06 MDT