Re: async-io for 2.4 from Andres Kroonmaa on 2000-11-03 (squid-dev)

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Fri, 3 Nov 2000 13:53:33 +0200

On 3 Nov 2000, at 5:10, Joe Cooper <joe@swelltech.com> wrote:

> > Will commit the code to 2.4 in a few days unless someone has any reason
> > not to.
>
> I'm not going to argue that it shouldn't go in because this is much more
> stable than the current 2.3 async code (which is probably unusable), but

hmm, I'm using async on 2.3 and am pretty happy. no problems and its faster
than without. Although I couldn't use it on Linux when I tested an alpha box.
Linux threads are no good.

> (the box is freezing due to a ReiserFS deadlock condition...but I think
> it is triggered by a Squid issue, since I don't see this problem with
> 2.2STABLE5+hno--I don't think...I've never run this set of benchmarks on

rule of thumb: no application is to be able to take whole system down.
if that happens, then the OS has a big problem. If reiserFS is running
in kernel space, then it may be part of the problem, squid could only
be unmasking it.

> Anyway...Here's what happens:
>
> 2000/11/03 01:19:46| storeAufsOpenDone: (2) No such file or directory
> 2000/11/03 01:19:46| /cache1/01/5F/00015F6A

sounds like some race after unlink, or simply lost file. Happens here
too from time to time. Are you sure this is related?

> 2000/11/03 01:20:15| comm_poll: poll failure: (12) Cannot allocate
> 2000/11/03 01:20:15| Select loop Error. Retry 1
> 2000/11/03 01:20:15| comm_poll: poll failure: (12) Cannot allocate

> 2000/11/03 01:20:15| Select loop Error. Retry 2
> 2000/11/03 04:14:49| comm_poll: poll failure: (12) Cannot allocate

interesting, 3 hours later?

> 2000/11/03 04:14:49| comm_poll: poll failure: (12) Cannot allocate
> 2000/11/03 04:14:49| Select loop Error. Retry 10
> FATAL: Select Loop failed!
>
> ReiserFS also reports memory allocation failures, and triggers a
> deadlock condition (which I'll report to the ReiserFS guys...Chris Mason
> over there is a hero on these kinds of problems). Now...memory is not
> filled when this condition occurs. There is some stuff in swap (136MB
> worth, actually) but it was put there much earlier and didn't cause
> problems. CPU saturation seems to be the culprit here.
>
> Here is a snippet of 'vmstat 1' as the deadlock is happening (when
> system usage goes to 100% is when you know the box is locked).
>
> procs memory swap io
> system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 7 0 0 136 2864 115564 57664 0 0 0 0 738 22634 16 84 0
> 10 0 0 136 2824 115564 57664 0 0 4 0 822 504 0 100 0
> 17 0 0 136 2796 115564 57664 0 0 8 0 508 283 1 99 0
> 18 0 1 136 2792 115564 57664 0 0 2 690 347 164 0 100 0

I suppose running (r) processes are actually linux threads? As they are
not blocking nor waiting, they most probably are spinning on a mutex.
"mpstat 1" could add some to the picture.
sounds like a bug or design error in kernel or threads library.
Interrupts and signals can cause mutex spins, so if there is insanely
high interrupt rate, this might hint on hardware issue. SCSI devices
are notorious in failing to properly implement tagged queueing, for eg.
try disabling this.

error message about memory shortage seems either mswindows like
"general error", ie. misleading, or could be related to some kernel
space shortage, like being too fragmented to accommodate needed
sequential chunk, or something like that. Threads may be blocked
onto mutex, and kernel is desperately trying to schedule threads to
let them make progress, in hope that some resource is released.

in any case, to me it looks more like kernel/OS problem, rather than
Squid problem.

CPU burn seems to be the result of some problem, as is also reiserfs
malloc failures.

> failing, I'm not sure. Also note that the box doesn't gradually get
> overloaded as in butterfly or in the old 2.2 async...this box falls over
> within seconds of the response time climbing over 2.5 seconds (while the

brickwall effect. Seen such before on Linux with weird loads.

------------------------------------
Andres Kroonmaa <andre@online.ee>
Delfi Online
Tel: 6501 731, Fax: 6501 708
Pärnu mnt. 158, Tallinn,
11317 Estonia
Received on Fri Nov 03 2000 - 04:57:14 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:54 MST