Re: [squid-users] Squid Performance Issues - reproduced

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Sat, 04 Jan 2003 23:00:56 +0200

On 4 Jan 2003 at 15:56, Henrik Nordstrom wrote:

> > thats sure. The only thing to keep in mind is that running thread can't easily
> > trigger other cpu. There must be suitable conditions for that, other cpu must
> > be necessarily suspended in either halt or some mutex spin, and running cpu
> > must parse through all the internals to find that out. Its slow.
>
> As long as it gets triggered when we cond_signal I see no problem.

 Being slow means not preferred path. I don't know how it goes. All I want to say
 is that expecting other cpu being instantly engaged is highly unwarranted. It is
 merely wanted.

 ppl tend to think that kernel runs on separate cpu from app, that as soon as syscall
 is made request is passed to another cpu and this cpu is free to continue. This is
 very rarely the case. cpu migration is not taken easily. Only completely unrelated
 tasks most probably run on separate cpus. Almost all io requested by one cpu is
 infact serviced by that same cpu, upto lowest hw levels. Same for thread switches.

 In a sense, if cond signalling makes a thread switch, one can be pretty sure that it
 is handled by same cpu up until return. Same for process switch like with kill().

 And this is more efficient than checking at every corner whether its possible to handle
 over given code path. Single cpu-thread runs as long as its possible to continue,
 and only at points where all soft-lwp-threads block will cpu migration be considered.

 In this sense, even though squid has multiple threads, it is completely unsure if
 they ever run on separate cpus, including not only userspace but also all kernel
 space codepaths. Mostly because they are tightly related and same cpu can
 handle them all. Although unintuitive, it seems that best way to force other cpus
 into the scene is to detach threads from each other, use sort of loose interaction
 that does not allow same cpu to handle it straight. Like enqueue requests but
 not signal, instead let iothreads wake up through timed waits. This will cause
 latency and neccesitate scheduler pass, but would force OS to schedule these
 threads onto any free cpu. If single cpu is scarce, may even make sense.

 They call them kernel threads. Quite alot because they are all there is. There is
 not such thing are kernel-internal threads. All threads that exist fulfill all codepaths,
 kernel-internal and process threads. There exists actually no kernel's thread. All
 threads are created for applications. The very one created at boot is init. These
 threads can migrate application and kernel boundaries, but cpu migration is
 done very rarely. Mostly only at points where cpu must be suspended.

> > > I still maintain that two cond_signal from the same thread SHOULD
> > > unblock "at least two threads" however.
>
> And it is also true that I do not care. If it happens that two
> cond_signal gets optimized down to a single signal I am equally happy

 That would be bad actually. means that instead of N threads only 1 thread would
 service N requests sequentially. As well we could use ufs then or single thread.

> > Whatever the docs say, cond var should not be touched outside mutex.
> > There are some rare cases when race conditions could cause a thread deadlock.
> > Not the one signalling, but the one waiting on condvar.
> > And this would happen, on one OS or other. I've had these on Solaris 2.5 or 2.6,
> > even UP back then. Maybe these days such things are avoided, and iirc this
> > can be recovered by timedwait, so in a sense, not very important, but this could
> > cause other weird interactions that we'd better avoid. Win is very questionable
> > and very small.
>
> What kind of deadlock are we talking about here?
>
> a) A lost cond_signal, causing the waiting threads to never wake up
>
> b) A internal deadlock, making the threads subsystem hang and not
> recover again even if additional cond_signal is sent.
>
> 'a' is fully expected and must be dealt with if using cond_signal
> outside the mutex.
>
> 'b' would be implementation bugs.

 Its not quite either.
 I don't recall details, but it is something like this. Before thread goes to
 sleep waiting for cond_signal, it must prepare itself. It places itself onto
 cond_signal queue from where it'll be popped to be restarted. Now, between
 putting itself into queue and suspending, signal is delivered. This thread is
 instantly removed from cond queue and sent wakeup signal. But because
 it is not yet suspended, its no-op. Now thread goes into suspend. Now it is
 not on cond queue, it has lost wakeup signal, and theres no damn way to
 restart it again. I don't know how its in linux, but for OS'es where such things
 are done in kernel space, there is no way to recover. This thread is stuck
 in kernel. It can't be signalled, it can't be canceled, it can't be resumed.
 And as process can't be killed if there are active threads, its quite a mess,
 especially if this thread was your accept socket handler. The only way to
 recover is to reboot the system. timedwait helps here, amazingly. But if
 you think of it, such scenario must never been allowed in first place.

 I don't agree its implementation bug. First I thought so too. But its more like
 documentation bug, or api design bug. It kinda slipped through at initial api
 and stuck since then. Some systems now document warnings about races,
 some like linux simply and straightly say "must be used with mutex".
 To implement this cleanly would require more overhead than mutex. Why?
Received on Sat Jan 04 2003 - 12:47:19 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:05 MST