Re: [squid-users] Squid Performance Issues - reproduced from Andres Kroonmaa on 2003-01-04 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Sat, 04 Jan 2003 21:22:07 +0200

On 4 Jan 2003 at 16:08, Henrik Nordstrom wrote:

> > nono, its quite opposite. on SMP system other cpu is used so faintingly
> > little that its almost irrelevant. Most point of multiple threads in IO is just
> > to make several IO requests pending concurrently. Most of the time IO
> > threads are blocked in OS anyway, and thats good, UP or SMP system.
> > So imo the more threads we can get concurrently, the better. Signalling
> > too many threads just eats up main thread cpu time. Also requesting
> > things that almost never block, like file write requests.
>
> Sorry, I do not quite follow here.

We gain most if we have all our IO requests pending in kernel. For that
all threads must be active and blocked in kernel. In current squid this is
done by sending cond_signal per request. Either all threads run or one
thread is awaken per request. If we loose signal, we loose concurrency.
New thread will not be unblocked, and those running will later service the
request. This is not fatal, but this reduces efficiency. So we want to deliver
signal per request. But if this can cause thread switch, then this eats cpu
of main thread. And if actual IO request is so short in userspace cpu terms,
we don't want to waste main thread time on scheduling. SMP gives very little,
because userspace cpu-time of IO-thread is almost nil, most of the time
system waits for IO hardware. But we need threads for that happen. So we
have IO threads. But whether they are bound to same cpu as main thread
or any other cpu is rather irrelevant. More fruitful would be make sure that
thread awakening and thus its overhead would be wasted outside main
thread, which means during its sleep-time.

> There is quite many I/O operations that rarely block. The completetion
> of these we rather have signalled back to the main thread before we
> enter poll/select again, and for this to happen they need to get started
> while we process other events.

There is question whether it makes at all sense to pass such ops to thread
instead of completing from main thread. If thread sw/scheduling overhead
is higher than cpu overhead of nonblocking io from main thread, then no.
Imo, we shouldn't be using threads for single requests that immediately
complete. Maybe such reqs should be batched together for 1-2 threads.
Maybe we should be able to pass list of requests to a single thread instead
of 1 request. By our design, imo, we should not send to io-threads such
requests that would take less time than single pass of comm_loops. We
should use threads only for requests that block, so that by the time they
complete, we have reached poll in main thread.

> I agree that the signalling can be optimized a bit, but if the number of
> threads are reasonable then there is not much to gain. A cond_signal
> while no threads are blocked on the condition should not be at all a
> heavy operation, and if there is threads blocked we rather have them
> started as there probably is no thread running or the ones running are
> blocked on I/O.

What I tried to craft was design where main thread signals only once
a fact that there is request list waiting for service - one thread switch.
Then main thread could continue or poll, and thread would pop 1 req
and if its not last, signal one more thread before going into kernel. That
way just before blocking on io, thread would spin off follower. Even if
this signalling eats cpu, it would happen more likely during time when
main thread has nothing left to do but poll.

btw, pipe is bidirectional. how about threads blocking on read from pipe?
How that differs from mutex/cond signalling in terms of overhead?

> However, it is very easy to measure if this is a problem. Just insert a
> counter counting how often threads gets awakened finding there is no
> requests in the requests queue. If found that this is high in relation
> to the number of requests processed by the thread then there is a
> problem.

There are spurious wakeups. To my understanding, currently there can't
be false signalling. All we'll see is spurious wakeups. That can be quite
large a number. Empty queue would mean some "old" thread managed
to grab the request before this thread unblocked.
Gotta put some lot of counters there. not only these. I for eg would like
to measure time between main thread signalling and iothread grabing req.
cpuProf maybe. Could show alot of how much real overhead there is.
Received on Sat Jan 04 2003 - 11:08:29 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:05 MST