Re: HTTP client disconnection vs ICAP disconnection from John Craws on 2011-02-15 (squid-dev)

From: John Craws <john.craws_at_gmail.com>
Date: Tue, 15 Feb 2011 13:09:39 -0500

Hi,

I have just reproduced the issue with 3.HEAD-20110215 and created
http://bugs.squid-cache.org/show_bug.cgi?id=3153.

The compressed log file was too large to add as an attachment so I
uploaded it to http://www.box.net/shared/1op53tyl2c.

The end result is the same as with 3.1.9/10/11, but at first glance
there appears to be differences in the steps that lead to it (perhaps
just new log messages). I'm posting this for reference before starting
my own analysis so that could be false.

I thought the following might be of importance:

2011/02/15 11:40:55.672| ModXact.cc(500) readMore: not reading because
ICAP reply pipe is full

Thanks again!

John

On Tue, Feb 15, 2011 at 10:55 AM, John Craws <john.craws_at_gmail.com> wrote:
> Alex,
>
> Thank you so much!
>
> I'll start testing with trunk and gathering the information you requested.
>
> John
>
> On Tue, Feb 15, 2011 at 12:19 AM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 02/09/2011 02:49 PM, John Craws wrote:
>>> I am including logs of what happens when wget is cancelled.
>>>
>>> FD 8 = wget
>>> FD 9 = origin
>>> FD 12 = ICAP
>>
>> John,
>>
>> I recall fixing a similar bug not too long ago, but I do not
>> remember where it happened. If it is not too difficult, please try using
>> Squid trunk even if the eventual fix needs to be propagated to
>> your/other Squid version.
>>
>>
>>> noteInitiatorAborted() is scheduled from the ClientHttpRequest
>>> destructor for FD 8 (wget). It doesn't seem to be executed however.
>>> Shouldn't it be?
>>
>> The call is not delivered because the recipient does not exist any more:
>>
>>> 2011/02/07 12:32:30.381| AsyncCall.cc(42) make: will not call
>>> Initiate::noteInitiatorAborted [call49421] because of job gone
>>
>> This means that the initiatee (one of the ICAP jobs that eventually
>> creates an ICAP REQMOD transaction and waits for its results) quit some
>> time ago. This is normal and, if I understand your report correctly, you
>> do not care about REQMOD. You care about RESPMOD.
>>
>>
>> Your problem is not with the HTTP client-side <-> ICAP REQMOD link. Your
>> problem is with the HTTP server-side <-> ICAP RESPMOD link. If HTTP
>> server job is never destroyed then the ICAP transaction will run until
>> the ICAP transaction is done or will get stuck waiting for the
>> server-side to supply or consume response body. More on that below.
>>
>> I suggest that you run this with debug_options set to ALL,9 and post the
>> complete log from the time the request starts to 1 minute after the
>> client is gone. It would be better if there are no other transactions
>> around. Compress and/or file a bug report as needed. If you do this with
>> the latest trunk and CC me, I promise to take a look at the log and will
>> try to help :-). As usual, others may find and fix the bug faster.
>>
>> And, just to clarify:
>>
>> On 02/08/2011 05:38 PM, Amos Jeffries wrote:
>>
>>>> 2. -> AsyncJob::deleteThis("HttpStateData::httpStateConnClosed") is not
>>>> > called.
>>
>>> Ambiguous whether this is needed. IIRC at this point doneAll() supposedly
>>> should return true which would lead to (3) below.
>>
>> Amos describes the current AsyncJob design correctly. However, Server
>> and HttpStateData classes were written well before that design too shape
>> and have never been fully converted. I am sure there are some bugs
>> floating around there and it is possibly you found one of them.
>> Specifically, I do not think Server/HttpStateData implement doneAll()
>> correctly. Instead, they often rely on deleteThis(), quitIfAllDone() and
>> other conversion hacks.
>>
>>
>>> The server side seems very twisted and does not appear to implement the
>>> Jobs API at the leaf HttpStateData level, doing it at the generic
>>> ServerStateData level instead.
>>
>> IMHO, the server side is far less twisted than the client side, but it
>> does require an extra effort to see the whole chain of events when it is
>> split between generic Server and HTTP-specific HttpStateData (or FTP)
>> objects. Based on your description, it sounds like your problem is
>> generic (client disconnect affects all Server kids in similar fashion),
>> but I may be wrong.
>>
>>> 1. Squid downloads the response from the origin server pretty fast.
>>> 2. ** The ICAP service needs the complete original response before the
>>> modified response may be generated for the client.
>>> 3. The download of the original response completes and
>>> HttpStateData::processReplyBody() sees a COMPLETE_NONPERSISTENT_MSG
>>> (because of persistentConnStatus()).
>>> 4. serverComplete() is called.
>>> 5. serverComplete() calls closeServer().
>>> 6. HttpStateData::closeServer() calls comm_close() on the FD, ** but
>>> it first removes the close handler using comm_remove_close_handler()
>>> **.
>>> 7. comm_close() (_comm_close()) calls commCallCloseHandlers(), but the
>>> close handler has been removed.
>>
>> All of the above sounds good. If Server is done talking to the origin
>> server, it should close the socket normally. There is no need for
>> emergency/unexpected cleanup provided by the comm_close handler.
>>
>> You need to figure out why the ICAP transaction waits for something if
>> the Server thinks it has nothing more to give. Did the Server forget to
>> notify the ICAP side that there will be no more virgin response body?
>> This is done by calling stopProducingFor().
>>
>> BTW, note how we do not talk about the client-side at all in the above.
>> Based on your description, it appears that the server side is unaware of
>> the client abort. This is almost normal because the server code is
>> probably not registered to receive StoreEntry abort notifications. It
>> will eventually notice the abort when it tries to write to that entry.
>> There is room for optimization here, but your primary problem is
>> probably elsewhere (i.e., in the stuck ICAP transaction, waiting for
>> something from the server-side, if I understand your report correctly).
>>
>>
>> HTH,
>>
>> Alex.
>>
>>
>>
>>
>>
>>
>>
>
Received on Tue Feb 15 2011 - 18:09:47 MST

This archive was generated by hypermail 2.2.0 : Wed Feb 16 2011 - 12:00:06 MST