why threads suck from Jonathan S. Kay on 1999-04-29 (squid-dev)

From: Jonathan S. Kay <jkay@dont-contact.us>
Date: Thu, 29 Apr 1999 03:36:10 -0500

> - no big select() loop, non-blocking I/O and complicated commSetSelect
> calls with read and write handlers anymore; it comes with a lot of
> overhead; ...

The big select() loop is the best of some bad choices.

I used to work for Isis Distributed Systems, a vendor formed out of
the Cornell Isis Project, that sold one of the first group
communication systems.

The primary product was a programming toolkit that implemented a suite
of group communication protocols. User-level threads were highly
integral to its workings; the initial API was designed around threads.
Shared data conflicts were resolved by traditional locking
algorithms.

After about a year working there, I began to circulate proposals to
reduce Isis' exposure to threads. Basically, there were two problems.

The most obvious problem was performance. Threads do switch more
rapidly than user processes on all platforms observed, but that does
not necessarily mean much. Threads still have actual, serious costs.
Isis spent a disconcerting percentage of its time in thread
operations. Part of the problem lies with the pthread definitions,
which require absurdly strong signal masking semantics, which require
kernel calls to implement.

The more subtle, but far worse, problem was that locks spread. Any
time a thread can block, any data that that thread is accessing must
either be saved and reaccessed, or locked down entirely. Locking it
is always easier, and is a grave temptation. The alternative is code
that saves everything before the blocking point and rereads everything
after the blocking point - bulky and inelegant at best. But if you
use a lock, notice what you've done - a lock means that there is a new
blocking point, in other pieces of code using the newly shared data
structure. This process recurses. A piece of threaded code with
even minor locking will, faster than you believe, become a piece of
code with heavy locking.

And the more locks you add, the more prone you are to deadlocks and
race conditions. These kinds of bugs are only 2-4 harder to debug
than memory problems. So listen to Nancy. JUST DON'T DO IT!

My observations are not unique. Robbert van Renesse, of Cornell,
who wrote a successor group communication toolkit (Horus) agreed with
me after sad experience with threads in Horus. Early in the BSD
'daemon book', the authors state that BSD kernel design, which
drastically limited nature and number of kernel blocking points, was
based on bitter experience with freer designs - a comment I did not
understand until bitter experience arrived.

Working with Squid has been far easier than working with Isis, not
least because of the confined role threads play so far.

BSD did come up with a reasonable compromise: no kernel code path
contains more than one blocking point. All the gazillions of reasons
for blocking in the Berkeley Unix kernel could be tested for easily at
a single spot. Events were handled through interrupts, not
necessarily a good thing to emulate (userland signals are too much
slower), but you can use a single thread to handle events.

Still, I think the Squid 1.1.x design - no threads, Ma! - is the best
still. The handlers/continuations 'seem' awkward at first, but it's a
tiny price to pay for something that has so few synchronization
problems.
It takes about five minutes to compose a separated handler, vs. 1-2
days
on average to solve bugs from each sync point (that average includes
the few that work OK immediately).

I guess threads make sense when you can usually get a big performance
gain, but you gotta be real careful about limiting thread problem
exposure to the rest of your system.

Jon
Received on Tue Jul 29 2003 - 13:15:58 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:07 MST