Re: Explaining internal errors

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 30 Jun 2010 22:52:10 -0600

On 06/30/2010 05:56 PM, Amos Jeffries wrote:
>>> %err_code - name of ERR_* page the user was shown (if transaction was
>>> non-recoverable).

>> Yes, pretty much. It is the ERR_* page that Squid decided to serve to
>> the user. Whether the user was shown anything, we do not know. In the
>> extreme case, the user may have disconnected before we could reply.

>>> %err_detail - series of N error codes which occurred processing the
>>> transaction.

>> Just one [first] "error code", for now. Reporting more than one may be a
>> good idea for future work, but we are usually mostly interested in what
>> caused the error and not what happened during error handling.

> It is still a fairly common transaction sequence for me with some clients
> to get all of these important errors happening at once:
>
> - url-rewriter returns invalid URL-path (client control panel configured
> badly)
> NP: only the manually-entered path is broken, host is correct.
> - DNS AAAA lookup fails (no servers responding)
> - expected sibling POP shard fails connect attempts
> - DNA A succeeds but one IP is not responding
> - DIRECT connect works, but 404 on the bad URL.
>
> As you can see, with a simple transaction there are already 5 errors.

Sure, but I can still solve them one by one. Not perfect, but it should
work.

> The client can see and correct their part of the config. But after that we
> are still left with a potentially slow site as the other two internal bits
> are not working well. Only getting the first err the iteration process
> would be slow to say the least when 24hr DNS is involved.

I [still] agree that supporting a list of errors may be useful :-).

>> Here is an access.log sample from a lab test, with
>> logformat xsquid %Ss/%03Hs %err_code/%err_detail %<Hs
>>
>> NONE/500 ERR_ICAP_FAILURE/100001 -
>> NONE_ABORTED/500 ERR_ICAP_FAILURE/100001 -
>> TCP_MISS/200 -/- 200
>> TCP_MISS/200 ERR_ICAP_FAILURE/100004 200
>> TCP_MISS/500 ERR_ICAP_FAILURE/100003 200
>> TCP_MISS/500 ERR_ICAP_FAILURE/- 200
>> TCP_MISS_ABORTED/000 -/- -
>> TCP_MISS_ABORTED/000 -/- 200
>> TCP_MISS_ABORTED/500 ERR_ICAP_FAILURE/100003 200
>
> Nice. It seems to be catching the 500's. What about the 000's (random
> disconnect) and 600s (parsing)?

I did not work on those, but it is easy to add more ERR_DETAIL codes. I
think they can be added "as needed" unless somebody volunteers to do a
comprehensive implementation.

Some cases may not need more details. For example, TCP_MISS_ABORTED/000
without an error probably means the client went away before we could get
and/or serve the response.

Cheers,

Alex.
Received on Thu Jul 01 2010 - 04:53:02 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 01 2010 - 12:00:08 MDT