Re: squid-smp: synchronization issue & solutions from Henrik Nordstrom on 2009-11-24 (squid-dev)

From: Henrik Nordstrom <henrik_at_henriknordstrom.net>
Date: Tue, 24 Nov 2009 10:04:30 +0100

sön 2009-11-22 klockan 00:12 +1300 skrev Amos Jeffries:

> I think we can open the doors earlier than after that. I'm happy with an
> approach that would see the smaller units of Squid growing in
> parallelism to encompass two full cores.

And I have a more careful opinion.

Introducing threads in the current Squid core processing is very
non-trivial. This due to the relatively high amount of shared data with
no access protection. We already have sufficient nightmares from data
access synchronization issues in the current non-threaded design, and
trying to synchronize access in a threaded operations is many orders of
magnitude more complex.

The day the code base is cleaned up to the level that one can actually
assess what data is being accessed where threads may be a viable
discussion, but as things are today it's almost impossible to judge what
data will be directly or indirectly accessed by any larger operation.

Using threads for micro operations will not help us. The overhead
involved in scheduling an operation to a thread is comparably large to
most operations we are performing, and if adding to this the amount of
synchronization needed to shield the data accessed by that operation
then the overhead will in nearly all cases by far out weight the actual
processing time of the micro operations only resulting in a net loss of
performance. There is some isolated cases I can think of like SSL
handshake negotiation where actual processing may be significant, but at
the general level I don't see many operations which would be candidates
for micro threading.

Using threads for isolated things like disk I/O is one thing. The code
running in those threads are very very isolated and limited in what it's
allowed to do (may only access the data given to them, may NOT allocate
new data or look up any other global data), but is still heavily
penalized from synchronization overhead. Further the only reason why we
have the threaded I/O model is because Posix AIO do not provide a rich
enough interface, missing open/close operations which may both block for
significant amount of time. So we had to implement our own alternative
having open/close operations. If you look closely at the threads I/O
code you will see that it goes to quite great lengths to isolate the
threads from the main code, with obvious performance drawbacks. The
initial code even went much further in isolation, but core changes have
over time provided a somewhat more suitable environment for some of
those operations.

For the same reasons I don't see OpenMP as fitting for the problem scope
we have. The strength of OpenMP is to parallize CPU intensive operations
of the code where those regions is well defined in what data they
access, not to deal with a large scale of concurrent operations with
access to unknown amounts of shared data.

Trying to thread the Squid core engine is in many ways similar to the
problems kernel developers have had to fight in making the OS kernels
multithreaded, except that we don't even have threads of execution (the
OS developers at least had processes). If trying to do the same with the
Squid code then we would need an approach like the following:

1. Create a big Squid main lock, always held except for audited regions
known to use more fine grained locking.

2. Set up N threads of executing, all initially fighting for that big
main lock in each operation.

3. Gradually work over the code identify areas where that big lock is
not needed to be held, transition over to more fine grained locking.
Starting at the main loops and work down from there.

This is not a path I favor for the Squid code. It's a transition which
is larger than the Squid-3 transition, and which have even bigger
negative impacts on performance until most of the work have been
completed.

Another alternative is to start on Squid-4, rewriting the code base
completely from scratch starting at a parallel design and then plug in
any pieces that can be rescued from earlier Squid generations if any.
But for obvious staffing reasons this is an approach I do not recommend
in this project. It's effectively starting another project, with very
little shared with the Squid we have today.

For these reasons I am more in favor for multi-process approaches. The
amount of work needed for making Squid multi-process capable is fairly
limited and mainly circulates around the cache index and a couple of
other areas that need to be shared for proper operation. We can fully
parallelize Squid today at process level if disabling persistent shared
cache + digest auth, and this is done by many users already. Squid-2 can
even do it on the same http_port, letting the OS schedule connections to
the available Squid processes.

Regards
Henrik
Received on Tue Nov 24 2009 - 09:04:40 MST

This archive was generated by hypermail 2.2.0 : Tue Nov 24 2009 - 12:00:06 MST