Re: [PATCH] Better reporting of REQMOD failures during body processing

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 22 Mar 2011 19:31:25 -0600

On 03/22/2011 06:17 PM, Amos Jeffries wrote:
> On Tue, 22 Mar 2011 13:02:03 -0600, Alex Rousskov wrote:
>> Better reporting of REQMOD failures during body processing.
>>
>> When REQMOD body processing fails, the server side needs to abort the
>> in-progress transaction with the origin server. Use a new custom error
>> detail and HTTP 500 (Internal Server Error) instead of 502 (Bad Gateway)
>> status code for this case.
>>
>> The HTTP 500 (Internal Server Error) is more appropriate because most
>> deployments consider adaptation to be a "part of the proxy service"
>> rather than an "external gateway". And the custom error detail carries
>> more information than a potentially stale and often irrelevant errno
>> global.
>>
>
> IMHO, we need to help dispell that marketing fud. 'service' is a
> technical term and applies to Squid in a specific software-daemon way.
> Trying to apply marketing level definitions on HTTP protocol values is
> plain wrong.

There are many technically accurate meanings for the term "service", and
the proposed commit message did not use it in a marketing sense, IMO.
Consider, for example, the following use of a non-daemon non-marketing
meaning of "service" in RFC 2616:

> ... a proxy that modifies
> the request or response in order to provide some added service to
> the user agent, such as group annotation services, media type
> transformation, protocol reduction, or anonymity filtering.

FWIW, the request to fix this bug came from engineers (rather than
marketing folks) and they do consider Squid+ICAP as a single "service"
or "server" in the context of providing an error message to the user
(who clearly does not care about daemons and other parts of the proxing
architecture).

> "external" is the wrong word anyway. The definition says 502 is "
> The server, while acting as a gateway or proxy, received an invalid
> response from the upstream server it accessed in attempting to
> fulfill the request.
> ".

I prefer "external" to "upstream" because the latter is often
misinterpreted in the reverse proxy environments. Neither is perfect, of
course.

> So, we come down to: is an "ICAP server" a *server*? (yes), "upstream"?
> (um, not for REQMOD), and is it being used to "fulfill the request"? (yes)

I agree. The failure of the "upstream" part of the above suitability
test is critical here, IMO: The error in this case is not about some
upstream server that we have no control over. It is an internal error
that the proxy service should take full responsibility for.

In other words, the user receiving the error should not complain to the
origin server administrator.

I think this is the most common interpretation/understanding in the
field: Internally, ICAP problems are often handled by a different team
or even a different company, but it is the responsibility of the proxy
team to triage the error and point the finger to the ICAP server...

> If you do commit this with 500 please remove/change the last paragraph
> of the comments.
> "most deployments" also use 302 in response to POST while saying they
> are HTTP/1.1.
>
> This would be better:
> "
> The HTTP 500 (Internal Server Error) is more appropriate because it
> is unclear whether 502 applies to HTTP auxiliary servers like ICAP.
>
> The custom error detail carries more information than a potentially
> stale and often irrelevant errno global.
> "

Sure, I will rephrase to emphasize that neither 500 nor 502 fit
perfectly, but 502 feels like a worse fit because ICAP is less of a
"gateway" or "upstream" server than it is an "internally" used auxiliary
service.

> NP: we have the same conundrum when getting a non-502 reply from
> cache_peers when they have done the mangling. They are not the upstream
> either, and apparently the upstream presented them with a valid reply
> (non-502).

We could argue whether cache_peer is an upstream proxy, but I do not
want to drag cache_peers here to avoid enlarging the debate.

>> Earlier Squid versions (e.g., v3.0) were returning HTTP 500 (Internal
>> Server Error) error under similar conditions. Then, after some code
>> improvements, we started handing these errors better, but the new code
>> used ERR_READ_ERROR and the 502 (Bad Gateway) codes for the symmetry
>> with similar error handling code.
>>
>> Unfortunately, preserving that symmetry was wrong in this case because
>> we are not dealing with a server-side reading error here. After an
>> upgrade to v3.2, some users noticed the increased number of 502 (Bad
>> Gateway) errors, which wrongly implied origin server problems rather
>> than ICAP server problems.
>
> +1. Fine, with the documentation change.
>
> Did you have plans for an ERR_DETAIL_* when REQMOD aborts during
> headers?

When REQMOD aborts during headers, the client-side is the one receiving
the notification. See ClientHttpRequest::handleAdaptationFailure, which
will be called with ERR_DETAIL_CLT_REQMOD_ABORT in this case. Note that
we already use HTTP 500 (Internal Server Error) here.

> There will be EARLY and LATE variants of this internally as
> well. Depending on whether the body bytes sent are retry-able.

Yes, we could add something like that to
ClientHttpRequest::handleAdaptationFailure which already tries to detect
bypassable errors. Personally, I do not have plans to do that until it
is needed to triage some new error, but I would welcome such an addition
(which would be a little tricky to implement with the current code
structure).

Thank you,

Alex.
Received on Wed Mar 23 2011 - 01:31:33 MDT

This archive was generated by hypermail 2.2.0 : Wed Mar 23 2011 - 12:00:04 MDT