Re: HTTP client disconnection vs ICAP disconnection from John Craws on 2011-02-15 (squid-dev)

From: John Craws <john.craws_at_gmail.com>
Date: Tue, 15 Feb 2011 13:46:39 -0500

On Tue, Feb 15, 2011 at 12:19 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 02/09/2011 02:49 PM, John Craws wrote:
>> I am including logs of what happens when wget is cancelled.
>>
>> FD 8 = wget
>> FD 9 = origin
>> FD 12 = ICAP
>
> John,
>
> I recall fixing a similar bug not too long ago, but I do not
> remember where it happened. If it is not too difficult, please try using
> Squid trunk even if the eventual fix needs to be propagated to
> your/other Squid version.

Hi Alex,

I did reproduce the issue with 3.HEAD-20110215. I will continue my
analysis with that version.

>
>
>> noteInitiatorAborted() is scheduled from the ClientHttpRequest
>> destructor for FD 8 (wget). It doesn't seem to be executed however.
>> Shouldn't it be?
>
> The call is not delivered because the recipient does not exist any more:
>
>> 2011/02/07 12:32:30.381| AsyncCall.cc(42) make: will not call
>> Initiate::noteInitiatorAborted [call49421] because of job gone
>
> This means that the initiatee (one of the ICAP jobs that eventually
> creates an ICAP REQMOD transaction and waits for its results) quit some
> time ago. This is normal and, if I understand your report correctly, you
> do not care about REQMOD. You care about RESPMOD.

Got that. You are correct; I care about RESPMOD.

>
>
> Your problem is not with the HTTP client-side <-> ICAP REQMOD link. Your
> problem is with the HTTP server-side <-> ICAP RESPMOD link. If HTTP
> server job is never destroyed then the ICAP transaction will run until
> the ICAP transaction is done or will get stuck waiting for the
> server-side to supply or consume response body. More on that below.

Ok.

Some background: I initially traced some of the steps using 3.1.X and saw that

1. The socket to the origin server was closed (no close handler) after
the data had been downloaded and before ICAP was complete.
I wondered why the close handler (which would have ended up calling
cleanAdaptation)( via swanSong()) was removed, but a post from Amos
made me realize that this is normal and why.

2. I noticed that when the client socket is cancelled, trying to read
on it detects it and the socket is closed. Traces showed that
FwdState::abort() gets called. The method did not close server_fd via
comm_close() because it had already been closed normally (server_fd <
0). I then thought that this was the problem as I could not see right
away how adaptation cleaning would occur otherwise (again, because the
close handler was not called).

>
> I suggest that you run this with debug_options set to ALL,9 and post the
> complete log from the time the request starts to 1 minute after the
> client is gone. It would be better if there are no other transactions
> around. Compress and/or file a bug report as needed. If you do this with
> the latest trunk and CC me, I promise to take a look at the log and will
> try to help :-). As usual, others may find and fix the bug faster.
>

Thank you! I have just completed this as described in my previous post.

> And, just to clarify:
>
> On 02/08/2011 05:38 PM, Amos Jeffries wrote:
>
>>> 2. -> AsyncJob::deleteThis("HttpStateData::httpStateConnClosed") is not
>>> > called.
>
>> Ambiguous whether this is needed. IIRC at this point doneAll() supposedly
>> should return true which would lead to (3) below.
>
> Amos describes the current AsyncJob design correctly. However, Server
> and HttpStateData classes were written well before that design too shape
> and have never been fully converted. I am sure there are some bugs
> floating around there and it is possibly you found one of them.
> Specifically, I do not think Server/HttpStateData implement doneAll()
> correctly. Instead, they often rely on deleteThis(), quitIfAllDone() and
> other conversion hacks.
>
>
>> The server side seems very twisted and does not appear to implement the
>> Jobs API at the leaf HttpStateData level, doing it at the generic
>> ServerStateData level instead.
>
> IMHO, the server side is far less twisted than the client side, but it
> does require an extra effort to see the whole chain of events when it is
> split between generic Server and HTTP-specific HttpStateData (or FTP)
> objects. Based on your description, it sounds like your problem is
> generic (client disconnect affects all Server kids in similar fashion),
> but I may be wrong.

Got that!

>
>> 1. Squid downloads the response from the origin server pretty fast.
>> 2. ** The ICAP service needs the complete original response before the
>> modified response may be generated for the client.
>> 3. The download of the original response completes and
>> HttpStateData::processReplyBody() sees a COMPLETE_NONPERSISTENT_MSG
>> (because of persistentConnStatus()).
>> 4. serverComplete() is called.
>> 5. serverComplete() calls closeServer().
>> 6. HttpStateData::closeServer() calls comm_close() on the FD, ** but
>> it first removes the close handler using comm_remove_close_handler()
>> **.
>> 7. comm_close() (_comm_close()) calls commCallCloseHandlers(), but the
>> close handler has been removed.
>
> All of the above sounds good. If Server is done talking to the origin
> server, it should close the socket normally. There is no need for
> emergency/unexpected cleanup provided by the comm_close handler.

I now see that this is pretty obvious.

>
> You need to figure out why the ICAP transaction waits for something if
> the Server thinks it has nothing more to give. Did the Server forget to
> notify the ICAP side that there will be no more virgin response body?
> This is done by calling stopProducingFor().
>
> BTW, note how we do not talk about the client-side at all in the above.
> Based on your description, it appears that the server side is unaware of
> the client abort. This is almost normal because the server code is
> probably not registered to receive StoreEntry abort notifications. It
> will eventually notice the abort when it tries to write to that entry.
> There is room for optimization here, but your primary problem is
> probably elsewhere (i.e., in the stuck ICAP transaction, waiting for
> something from the server-side, if I understand your report correctly).
>

Thank you Alex. I will start with that :)

John

>
> HTH,
>
> Alex.
>
>
>
>
>
>
>
Received on Tue Feb 15 2011 - 18:46:48 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 16 2011 - 12:00:06 MST