Re: HTTP client disconnection vs ICAP disconnection

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 14 Feb 2011 22:19:33 -0700

On 02/09/2011 02:49 PM, John Craws wrote:
> I am including logs of what happens when wget is cancelled.
>
> FD 8 = wget
> FD 9 = origin
> FD 12 = ICAP

John,

    I recall fixing a similar bug not too long ago, but I do not
remember where it happened. If it is not too difficult, please try using
Squid trunk even if the eventual fix needs to be propagated to
your/other Squid version.

> noteInitiatorAborted() is scheduled from the ClientHttpRequest
> destructor for FD 8 (wget). It doesn't seem to be executed however.
> Shouldn't it be?

The call is not delivered because the recipient does not exist any more:

> 2011/02/07 12:32:30.381| AsyncCall.cc(42) make: will not call
> Initiate::noteInitiatorAborted [call49421] because of job gone

This means that the initiatee (one of the ICAP jobs that eventually
creates an ICAP REQMOD transaction and waits for its results) quit some
time ago. This is normal and, if I understand your report correctly, you
do not care about REQMOD. You care about RESPMOD.

Your problem is not with the HTTP client-side <-> ICAP REQMOD link. Your
problem is with the HTTP server-side <-> ICAP RESPMOD link. If HTTP
server job is never destroyed then the ICAP transaction will run until
the ICAP transaction is done or will get stuck waiting for the
server-side to supply or consume response body. More on that below.

I suggest that you run this with debug_options set to ALL,9 and post the
complete log from the time the request starts to 1 minute after the
client is gone. It would be better if there are no other transactions
around. Compress and/or file a bug report as needed. If you do this with
the latest trunk and CC me, I promise to take a look at the log and will
try to help :-). As usual, others may find and fix the bug faster.

And, just to clarify:

On 02/08/2011 05:38 PM, Amos Jeffries wrote:

>> 2. -> AsyncJob::deleteThis("HttpStateData::httpStateConnClosed") is not
>> > called.

> Ambiguous whether this is needed. IIRC at this point doneAll() supposedly
> should return true which would lead to (3) below.

Amos describes the current AsyncJob design correctly. However, Server
and HttpStateData classes were written well before that design too shape
and have never been fully converted. I am sure there are some bugs
floating around there and it is possibly you found one of them.
Specifically, I do not think Server/HttpStateData implement doneAll()
correctly. Instead, they often rely on deleteThis(), quitIfAllDone() and
other conversion hacks.

> The server side seems very twisted and does not appear to implement the
> Jobs API at the leaf HttpStateData level, doing it at the generic
> ServerStateData level instead.

IMHO, the server side is far less twisted than the client side, but it
does require an extra effort to see the whole chain of events when it is
split between generic Server and HTTP-specific HttpStateData (or FTP)
objects. Based on your description, it sounds like your problem is
generic (client disconnect affects all Server kids in similar fashion),
but I may be wrong.

> 1. Squid downloads the response from the origin server pretty fast.
> 2. ** The ICAP service needs the complete original response before the
> modified response may be generated for the client.
> 3. The download of the original response completes and
> HttpStateData::processReplyBody() sees a COMPLETE_NONPERSISTENT_MSG
> (because of persistentConnStatus()).
> 4. serverComplete() is called.
> 5. serverComplete() calls closeServer().
> 6. HttpStateData::closeServer() calls comm_close() on the FD, ** but
> it first removes the close handler using comm_remove_close_handler()
> **.
> 7. comm_close() (_comm_close()) calls commCallCloseHandlers(), but the
> close handler has been removed.

All of the above sounds good. If Server is done talking to the origin
server, it should close the socket normally. There is no need for
emergency/unexpected cleanup provided by the comm_close handler.

You need to figure out why the ICAP transaction waits for something if
the Server thinks it has nothing more to give. Did the Server forget to
notify the ICAP side that there will be no more virgin response body?
This is done by calling stopProducingFor().

BTW, note how we do not talk about the client-side at all in the above.
Based on your description, it appears that the server side is unaware of
the client abort. This is almost normal because the server code is
probably not registered to receive StoreEntry abort notifications. It
will eventually notice the abort when it tries to write to that entry.
There is room for optimization here, but your primary problem is
probably elsewhere (i.e., in the stuck ICAP transaction, waiting for
something from the server-side, if I understand your report correctly).

HTH,

Alex.
Received on Tue Feb 15 2011 - 05:19:51 MST

This archive was generated by hypermail 2.2.0 : Tue Feb 15 2011 - 12:00:04 MST