Re: Explaining internal errors

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 30 Jun 2010 17:05:46 -0600

On 06/29/2010 05:31 PM, Amos Jeffries wrote:

>>> With both tags, don't forget the %icap:: / %http:: scoping so we can
> have
>>> multiple servers error on one request (particularly with icap failover
>>> possible when working with a dead http origin).

>> These new log fields are for the master transaction as a whole, I think,
>> and so they should not use a protocol prefix. The logged error codes may
>> be specific to a protocol, but the logging code would not know that.

> Logging knows. The AccessLogEntry is broken down into protocol sub-structs
> and the tag code indicates exactly where the data is sourced. This is as
> much as logging needs to know.
>
> Anyway, you seem to be intending to store a formatted field layout inside
> the one code. That wasn't clear earlier. As long as we can easily
> distinguish which if the many servers a particular error came from.

I do not understand the above because the proposed logformat codes,
%err_code and %err_detail are not specific to any one protocol (just
like the local time or the transaction response time code, for example).
The logged values may be specific to a protocol, but the logformat code
is not.

I hope it will all become clear when I post the patch. Take1 is being
tested in production now.

>>> Being more consistent with use of the ERROR: and FATAL: prefixes to
>>> cache.log (as you noted, many errors are not even logged in the
>>> administrative cache.log!). With the error page details there to link
>>> specific problems to URL and error pageid and reason.

>> I do not think that would solve the problem, even if all our cache.log
>> messages are perfect. Most transaction errors are not and should not be
>> reported to cache.log!

> I disagree. Log level 2 or 3 would be appropriate for all such transaction
> errors IMO, along with the main transaction states. Normally out of sight,
> but configurable to visible.

You are correct, and I should have clarified that I am only talking
about level-1 cache.log. Logging errors at higher debugging levels is
the right thing to do, of course.

Unfortunately, log level 2+ is irrelevant when the errors occur
infrequently (but possibly in bursts) on busy caches. The admins will
not enable tons of extra logging for a day, just to learn what happens
during those unpredictable 10 seconds when Squid behaves strangely and
kills most transactions, logging NONE/500 (for example).

Also, while transaction logs are optimized for automated processing and
are already routinely processed for statistics and such, adding formal
processing of cache.log immediately adds a lot of new complexity.

In general, the primary place to log transaction-specific facts is the
transaction log.

> This is regardless and on top of any other information anywhere more
> useful. The debug/administrative log is a very appropriate place for this
> kind of thing.

It is indeed appropriate for those cases where you can run with elevated
debugging levels and can manually search for error explanations. For
high-volume production environment, cache.log runs at level 1 and should
be reserved for very important messages only. We are not there yet
(cache.log is too noisy in many environments), but that should be the goal.

Thank you,

Alex.
Received on Wed Jun 30 2010 - 23:06:36 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 01 2010 - 12:00:08 MDT