Re: [squid-users] Squid Performance Issues - reproduced

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Thu, 2 Jan 2003 15:10:14 +0200

On 2 Jan 2003, at 6:02, Henrik Nordstrom <hno@squid-cache.org> wrote:

> I have now managed to randomly reproduce the situation here on my poor
> old P133/64MB home development machine, giving almost exact 200KB/se hit
> transfer rate for a single aufs cache hit or sometimes a couple in
> parallell (each then receiving 200KB/s). Exact conditions triggering
> this is yet unknown but I would guess there is some kind of oscillation
> in the queue management due to the extremely even and repetive pattern
> I/O pattern.
>
> Hmm.. 200KB/s / 4KB per I/O request = 50 * 2 queue operations per
> request = 100 = number of clock ticks/s (HZ) on a Linux Intel X86..
>
> Wait a minute.. yes, naturally. This will quite likely happen on an aufs
> Squid which has nothing to do as the I/O queue is only polled once per
> comm loop, and an "idle" comm loop runs at 100/s. The question is more
> why it does not always happen, and why it is 200KB/s and not 400KB/s.

> What can be said is also that the likelyhood that this won't happen
> decreases a lot on SMP machines as the main thread then have no
> copmetition with the I/O threads for the CPU.

 I'd not be sure of that. Competition for cpu is not so much of an
 issue here, its thread scheduling. Before main thread LWP does not
 reach scheduler, no IO thread rescheduling will happen. I see not
 much difference.

> I have an old async-io patch somewhere which introduces a signal to
> force the main process to resume when an disk I/O operation have
> completed, but it may have other ill effects so it is not included (or
> maintained).

 Signal is heavy. Can't we have one dummy FD in the fd_set that is
 always polled for read, and when IO thread is ready, write 0-1 bytes
 in there? Like pipe? That would cause poll to unblock and allows
 multiple threads to write into same FD.

> Testing on my old UP P133 glibc 2.1.3 gives some surpricing results in
> thread scheduling. It seems that under some conditions the signalled
> thread is not awakened until much later than expected. I would expect
> the signalled thread to be awakeden when the signalling thread calls
> pthread_mutex_unlock or at latest at sched_yield, but when Squid is in
> "200K/s aufs cache hit mode" the I/O thread stays suspended until the
> next system tick or something similar it seems... will try to do
> additional investigation later on.

 Thats how threads work. Wishful thinking, ;) been there too. Thread
 switch will happen only when it is unavoidable. SMP optimisations..
 For that reason, yield() is almost 100% no-op. Mutex unlock after
 cond_signal quite the same. It is about async nature of threads, OS
 "assumes" that you gonna cond_signal many threads before you block
 so that it can then schedule signalled threads on cpu (batching).

 Therefore mutex_unlock is not steered through LWP scheduler. Signalled
 thread will not awake before main LWP hits scheduler (which is
 any call causing blocking in kernel). At the same time, you can't
 rely on this also, as there are cases when mutex unlock may cause
 thread switch (quite probable if threads are scope_process). Problem
 is that different OS'es behave differently, and infact no assumptions
 are relyable. Keyword is that threads are asyncronous, and proceed
 in unpredictable order. If you need syncronisation, use mutexes.

 The only way to reliably kick specific thread is through solid
 mutex handshaking. Even blocking in poll does not guarantee that
 after return signalled thread has been run, especially on SMP systems
 that try to keep threads from migrating cpus. If we are blocked in
 poll long enough, they all obviously would have to get to run. But
 here lies another problem, we can't slow down network IO to make sure
 aio threads get run. We may try to tweak with priorities, but thats
 not enough unless we run IO threads in realtime class, I suppose.

 If I recall right, cond_wait is boolean, and, if many threads are
 blocked on cond, and cond_signal does not make thread switch, and
 another cond_signal is sent, only 1 thread would be unblocked
 eventually. I assume that mutex is put into to-be-locked state upon
 first cond_signal (owner unspecified, but one of threads on wait),
 and second attempt to signal would cause solid thread switch to
 consume first cond (because mutex is "locked"). That most probably
 happens when there is alot of activity with IO. But when only 1
 client, thread switch would not happen until we block into poll.
 In this case yield is also nop because io threads are not yet
 scheduled to cpu, thus there is nothing to run at yield() time.
 Using yield is bad coding in SMP world, I'd suggest avoiding it.

 Basically, we have 2 probs to solve. 1) we need reliable and least
 overhead kickstart of aufs IO threads at the end of comm_poll run.
 Poll can return immediately without running scheduler if there are
 FDs ready. Forcibly blocking in poll would cause lost systick for
 network io. Therefore I think we'd need to think of some other
 way to get io-threads running before going into poll. We only
 need to make sure io-threads have grabbed their job and are on
 cpu queue. Maybe even only last cond_signal is an issue if my
 above guess is right.

 2) We need semi-reliable and least latency notification of aio
 completion when poll is blocking. The latter one probably more
 important. Could the pipe FD do the trick? Signal would, but at
 high loads it would cause alot of overhead.

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 CTO, Microlink Data AS
 Tel: 6501 731, Fax: 6501 725
 Pärnu mnt. 158, Tallinn
 11317 Estonia
Received on Thu Jan 02 2003 - 06:19:48 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:05 MST