Re: Explaining internal errors

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 30 Jun 2010 23:56:40 +0000

On Wed, 30 Jun 2010 17:39:07 -0600, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 06/30/2010 05:17 PM, Amos Jeffries wrote:
>
>> Okay; just to clarify what we are talking about here:
>
>> %err_code - name of ERR_* page the user was shown (if transaction was
>> non-recoverable).
>
> Yes, pretty much. It is the ERR_* page that Squid decided to serve to
> the user. Whether the user was shown anything, we do not know. In the
> extreme case, the user may have disconnected before we could reply.
>
>
>> %err_detail - series of N error codes which occurred processing the
>> transaction.
>
> Just one [first] "error code", for now. Reporting more than one may be a
> good idea for future work, but we are usually mostly interested in what
> caused the error and not what happened during error handling.

It is still a fairly common transaction sequence for me with some clients
to get all of these important errors happening at once:

 - url-rewriter returns invalid URL-path (client control panel configured
badly)
     NP: only the manually-entered path is broken, host is correct.
 - DNS AAAA lookup fails (no servers responding)
 - expected sibling POP shard fails connect attempts
 - DNA A succeeds but one IP is not responding
 - DIRECT connect works, but 404 on the bad URL.

As you can see, with a simple transaction there are already 5 errors.

The client can see and correct their part of the config. But after that we
are still left with a potentially slow site as the other two internal bits
are not working well. Only getting the first err the iteration process
would be slow to say the least when 24hr DNS is involved.

>
>> By the above, I mean that every protocol involved may produce it's own
>> error codes. I was worried that you only had one being logged (such as
>> errno).
>
> For now, I am logging what caused ERR_* page generation. xerrno is a
> good example. FTP response code is another example. Not all cases are
> currently covered or covered well.
>
> The current patch is attached, FYI. It is not ready for merging yet.
>
> Here is an access.log sample from a lab test, with
> logformat xsquid %Ss/%03Hs %err_code/%err_detail %<Hs
>
> NONE/500 ERR_ICAP_FAILURE/100001 -
> NONE_ABORTED/500 ERR_ICAP_FAILURE/100001 -
> TCP_MISS/200 -/- 200
> TCP_MISS/200 ERR_ICAP_FAILURE/100004 200
> TCP_MISS/500 ERR_ICAP_FAILURE/100003 200
> TCP_MISS/500 ERR_ICAP_FAILURE/- 200
> TCP_MISS_ABORTED/000 -/- -
> TCP_MISS_ABORTED/000 -/- 200
> TCP_MISS_ABORTED/500 ERR_ICAP_FAILURE/100003 200

Nice. It seems to be catching the 500's. What about the 000's (random
disconnect) and 600s (parsing)?

Amos
Received on Wed Jun 30 2010 - 23:56:44 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 01 2010 - 12:00:08 MDT