Re: pconn.cc assert index >= 0, async call queue madness

From: Henrik Nordstrom <henrik@dont-contact.us>
Date: Tue, 08 Apr 2008 00:46:11 +0200

tis 2008-04-08 klockan 01:15 +0300 skrev Tsantilas Christos:
> Looks possible that it is an AsyncCalls bug, but must not happen. I can
> not understand it...

It is. Just did a 100% reproducible testcase for it using gdb.

1. Start a new request and identify it's outgoing filedescriptor

   squidclient http://www.example.com/a

   squidclient mgr:filedescriptors

   remember the filedescriptor as X

2. Start a second request to the same server.

   squidclient http://www.example.com/b

3. Have the server respond to both, keeping the connections open.

4. Break into Squid and set the timeout of the connection in '1' (X) to "now"

   p fd_table[X].timeout = squid_curtime

5. Have the server close the connection from 1 (X), but keep the other one open.

6. Now tell gdb to let Squid continue

   continue

7. Watch both events trigger, get AsyncCall queued and smash each other
even if the pconn code tries to do things right and deregister one if
the other trigger...

The same race should also be possible in the way I described earlier,
but is harder to trigger that way.

> Moreover looking in the largeresp patch I can not find any relation with
> this bug....

There is none. It's completely unrelated.

The workaround is simple, verified and committed. Make the assert a soft
error. It's harmless. But I seriously think we need to revisit the async
call queue a bit to see how this kind of situations is meant to be dealt
with. It's not the first bug of this kind caused by the queue..

my immediate reaction is that cbdata should be used proper, with one
cbdata per state machine. I.e. in this case one cbdata per idle pconn
and not just one per pconn host...

Regards
Henrik
Received on Mon Apr 07 2008 - 16:47:54 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 30 2008 - 12:00:07 MDT