Re: pconn.cc assert index >= 0, async call queue madness

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Tue, 08 Apr 2008 11:39:57 +0200

mån 2008-04-07 klockan 23:41 -0600 skrev Alex Rousskov:
> Can you post cache.log lines with full debugging enabled? I would be
> happy to help with the queue review, but it would speed things up if I
> do not have to go through the complicated steps required to reproduce
> the problem.

The cache.log is in ~hno/tmp/async-queue.log on squid-cache.org if you
want it.

But I think it's easier to discuss the problem on a meta level. What we
need is more of a discussion on programming & design style than the
specific case

- There is two events that may happen, 'A' and 'B'.
- In the processing of one interest of the other is deregistered with
the silent assumption that after that it won't get invoked, but the
object it's related to still exists.
- Both events fire at the same time and gets queued in the async call
queue.
- 'A' executes. Deregisters interest in 'B' and assumes it won't get
called, but it's already in the async call queue.
- 'B' executes unexpectedly..

There is three ways to deal with this

a) make sure the event interest is registered with a cbdata that gets
invalidated when the interest is deregistered.

b) shield every such action so that if fails gracefully if it gets
unexpectedly called due to race conditions with other actions on the
same object.

c) Make the async call queue always drain between each event processed
and hope that's sufficient to avoid races (it's not 100%)

'a' is a lot easier to audit as you only need to look at one type of
event at a time to judge if it's safe or not and is why I push for that
model.

'b' is maybe a little more efficient as it needs less cbdata type
objects. And also a quicker fix when running into this kind of problems,
but in many cases it's very hard to identify all possible races..

'c' would also work I think, but not sure how to fit that in the current
event engine mess with multiple paths that may each fire multiple events
and only very loose coupling between them...

The question is primarily which of the three should we use as the
default model.

Regards
Henrik
Received on Tue Apr 08 2008 - 03:41:46 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 30 2008 - 12:00:07 MDT