Re: MemPools rewrite

From: Andres Kroonmaa <andre@dont-contact.us>
Date: Fri, 10 Nov 2000 19:06:58 +0200

On 10 Nov 2000, at 0:21, Henrik Nordstrom <hno@hem.passagen.se> wrote:

> Andres Kroonmaa wrote:
>
> > The problem is with robustness of mempools to misuse of freed items.
> > 1) multiple Frees on the same item corrupts freelist consistency.
> > 2) If item is tampered with after Free(), chunk's freelist can also
> > get corrupted.
>
> > So I ask: is it ok to live with (2) and make it responsibility of a
> > coder to make sure and track misuse of freed items, or should I drop
> > the idea of keeping zero-ram-overhead freelist inside freed items?
> > I'd rather not drop the free listnode approach, as it conserves
> > quite alot of memory (or cpu overhead, if using bitmaps).
>
> My vote is that we live with 2, and then have an option to completely
> disable memory pools to be able to use standard malloc debuggers to find
> such problems.

 thats good. anyone object this?

> Spent most of the day yesterday to try to find one of those odd memory
> corruption problem, but I never managed to find it. Something overflowed
> a buffer and stomped on mallocs pointers, and sometime later a segfault
> was seen when mempools was releasing memory.. And today I did not see
> the problem, but I am not aware of having fixed it...

 Aha, so I'm not alone here. I think I'm seeing the same exact issue. I
 can't find the cause for it, but it seems to happen after a spike of
 activity, shortly after some mempools are returned to the system. My
 first thought was that I did screw up something, but I doubt.
 So far my guesses lead me to client_side.c around line 1742

    /* write */
    comm_write_mbuf(fd, mb, clientWriteComplete, http);
    /* if we don't do it, who will? */
    memFree(buf, MEM_CLIENT_SOCK_BUF);

 But all this area of code is too complex for me to grasp, so I'm
 quite helpless for time being and only guessing. To me it seems that
 some buff is overflown by quite a small amount, perhaps a pointer
 size, because after a crash I can see that a memchunk struct is corrupt
 by only first pointer, the rest following it seems ok. But I have
 only few crashes investigated. It just looks like this part of code
 might create situations when memory gets freed twice or used after
 a free.
 Few crashes has occured inside dlmalloc.c (bt below), which is why
 I suspect misuse after a free, but that is only my amateur guess.

 With real customer traffic I got quite frequent crashes when I reduced
 mempool idle limit to few MB, but as I couldn't debug the crash with
 proxy down I had to restart it shortly. Yet I cannot anyhow reproduce
 the bug with neither polygraph tests, nor with tcp_banger2 and real
 urls from sameday access.log. Thats weird, makes me think that it
 might be related to some peering interactions or DNS stuff.

#10 0x80a6d53 in memFree (p=0x8996770, type=22) at mem.c:136
#11 0x8078999 in clientSendMoreData (data=0x8527060,
    buf=0x8996770 "HTTP/1.0 200 OK\r\nContent-length: 3216\r\nContent-type: image/jpeg\r\nDate: Thu, 09 Nov 2000 10:56:44
GMT\r\nLast-modified: Sun, 06 Aug 2000 17:01:32 GMT\r\nServer: Netscape-Enterprise/3.6 SP3\r\nEtag: \"8530-c90"...,
size=1477)
    at client_side.c:1744
#12 0x80c6a90 in storeClientCopy2 (e=0x8f1fc80, sc=0x89019a0) at store_client.c:255
#13 0x80c77a8 in InvokeHandlers (e=0x8f1fc80) at store_client.c:532
#14 0x80c2ee0 in storeAppend (e=0x8f1fc80,
    buf=0x814bd60 "Content-length: 3216\r\nContent-type: image/jpeg\r\nDate: Thu, 09 Nov 2000 10:56:44 GMT\r\nLast-
modified: Sun, 06 Aug 2000 17:01:32 GMT\r\nServer: Netscape-Enterprise/3.6 SP3\r\nEtag: \"8530-c90-
398da7fc\"\r\nAccep"..., len=1460)
    at store.c:463
#15 0x80968d8 in httpReadReply (fd=12, data=0x895c9d0) at http.c:565
#16 0x807f51a in comm_poll (msec=10) at comm_select.c:449
#17 0x80a63e2 in main (argc=4, argv=0x8047880) at main.c:708

...
another:
Program received signal SIGSEGV, Segmentation fault.
0x80d8e00 in free (mem=0x82bcfe4) at dlmalloc.c:2359
2359 unlink(p, bck, fwd);
(gdb) cont
Continuing.

Program received signal SIGABRT, Aborted.
0xdf926a3d in ?? ()
(gdb) bt
#0 0xdf926a3d in ?? ()
#1 0xdf91ce8a in ?? ()
#2 0xdf91a7d4 in ?? ()
#3 0xdf91f543 in ?? ()
#4 0xdf91f434 in ?? ()
#5 0xdf9ea72b in ?? ()
#6 0xdf9db824 in ?? ()
#7 0x80d072c in death (sig=11) at tools.c:275
#8 0xdf9178bf in ?? ()
#9 0xdf9258db in ?? ()
#10 <signal handler called>
#11 0x80d8e00 in free (mem=0x82bcfe4) at dlmalloc.c:2359
#12 0x80dc2e6 in xfree (s=0x82bcfe4) at util.c:481
#13 0x80ac8ba in peerDestroy (data=0x82bcfe4, unused=70) at neighbors.c:903
#14 0x807324e in cbdataReallyFree (c=0x82c1128) at cbdata.c:154
#15 0x8073490 in cbdataUnlock (p=0x82bcfe4) at cbdata.c:198
#16 0x80aeb64 in peerDigestDestroy (pd=0x82c4f90) at peer_digest.c:127
#17 0x80aede7 in peerDigestNotePeerGone (pd=0x82c4f90) at peer_digest.c:210
#18 0x80aee5b in peerDigestCheck (data=0x82c4f90) at peer_digest.c:232
#19 0x8087232 in eventRun () at event.c:147
#20 0x80a63bf in main (argc=4, argv=0x8047880) at main.c:704

------------------------------------
 Andres Kroonmaa <andre@online.ee>
 Delfi Online
 Tel: 6501 731, Fax: 6501 708
 Pärnu mnt. 158, Tallinn,
 11317 Estonia
Received on Fri Nov 10 2000 - 10:10:20 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:57 MST