wierdness....

From: Michael O'Reilly <michael@dont-contact.us>
Date: 25 Feb 1998 17:52:19 +0800

Wed Feb 25 16:00:01 WST 1998
squid: disk.c:372: diskHandleWriteComplete: Assertion `q->buf_offset <= q->len' failed.

Ok. seen them before. Got gdb on it this time.

(gdb) where
#0 0x4004b994 in __assert_fail (assertion=0x4 <Address 0x4 out of bounds>,
    file=0x2 <Address 0x2 out of bounds>, line=0,
    function=0xbffffe3c "Pþÿ¿\036\226\006\b\001")
#1 0x805aa78 in diskHandleWriteComplete (data=0xa1e5b08, len=1596, errcode=4)
    at disk.c:372
#2 0x804f7aa in aioCheckCallbacks () at async_io.c:372
#3 0x805914a in comm_select (sec=1) at comm.c:1028
#4 0x806961e in main (argc=2, argv=0xbffffe70) at main.c:600
#5 0x804a3ce in _start ()

Ok...
(gdb) up
#1 0x805aa78 in diskHandleWriteComplete (data=0xa1e5b08, len=1596, errcode=4)
    at disk.c:372
372 assert(q->buf_offset <= q->len);
(gdb) print *q
$42 = {file_offset = -1, buf = 0xb3ba5e0 "\003q", len = 113,
buf_offset = 1596, next = 0x0, free_func = 0x807cb18 <xfree>}

Ok. The code has already checked that 'q' is valid, and it's the same
thing that the aioWrite was called on.

(gdb) print q->buf[0] @ 1596
$34 = "\003q\000\000\000\003\020\000\000\0006ûÚÍ\211\nZ\2179\017>Ó\031\023Ûë\005\030\000\000\000´Ñó4ÕÑó4þÿÿÿ\013\aÝ4\000\000\000\000\002\000`\005\0044\000\000\000http://www.iinet.com.au/artwork/home/common/nav.gif\000\b\000\000\000y\000\000\000\001\000\000\000XB\b\n\210~Q\nÆ\036", '\000' <repeats 22 times>, "z\005\000\000x½H\013Ø{Q\n\030üC\t¹Ñó4\e\027\a\000@è\006\b gS\n=\002\000\000\000\000\000\000 ï&\th\235L\013", '\000' <repeats 12 times>...

Hmmm. It's a gif, and there's WAY too many zeros in there to be the
result of LZW compression.

Conclusion: only the first 113 bytes of the buffer is really valid.

(gdb) print q->buf[0] @ 113
$35 = "\003q\000\000\000\003\020\000\000\0006ûÚÍ\211\nZ\2179\017>Ó\031\023Ûë\005\030\000\000\000´Ñó4ÕÑó4þÿÿÿ\013\aÝ4\000\000\000\000\002\000`\005\0044\000\000\000http://www.iinet.com.au/artwork/home/common/nav.gif\000\b"

Gee, that's nice. :)

(gdb) print *F
$43 = {type = 2, open = 1, local_port = 0, remote_port = 0,
  ipaddr = '\000' <repeats 15 times>,
  desc = "/w4/s/00/0F/0000078D", '\000' <repeats 43 times>, flags = 2,
bytes_read = 0, bytes_written = 1596, uses = 0, disk = {wrt_handle =
0x80783c4 <storeSwapOutHandle>, wrt_handle_data = 0x93d8938, write_q
= 0x9459400, write_q_tail = 0x9459400, offset = 0}, read_handler = 0,
read_data = 0x0, write_handler = 0, write_data = 0x0, timeout_handler
= 0, timeout = 0, timeout_data = 0x0, close_handler = 0x0,
defer_check = 0, defer_data = 0x0, rwstate = 0x0, connect_timeout = 0}

Ok. That's cool.

^Z
# ls -l /w4/s/00/0F/0000078D
-rw-r--r-- 1 squid squid 0 Feb 25 16:09 /w4/s/00/0F/0000078D

Wild!! The write never actually ran!

Hmm. In theory, the last request to be satisfied should be the head
of the free_requests linked list..

(gdb) print *free_requests
$44 = {request_type = 0, cancelled = 0, path = 0xa813678 "p¨¹\bø#\t\b/48/00002419",
  oflag = 2048, mode = 420, fd = 470024194,
  bufferp = 0x3a1e3bcb <Address 0x3a1e3bcb out of bounds>,
  tmpbufp = 0x1 <Address 0x1 out of bounds>, buflen = 518, offset =
1342177282, whence = 186137547, ret = 1596, err = 4, tmpstatp =
0x3a1e3bcb, statp = 0xffffffff, resultp = 0x80aaa90, next = 0xaec2998}

Jackpot! at least, the 'ret = 1596' matches the len that got passed
into the function, ditto the errcode.

But where did that 'fd = 470024194' come from! In fact, this whole
structure looks pretty sus. tmpbufp == 0x1 !? offset == 1342177282!?
Umm. Actually, the request_type is '0' (AIO_OP_OPEN).

Walking thru all the requests in the busythreads queue show a lot of
opens, and one write for FD 239. with the wrong length.

So, it looks like the aioWrite simply never got the request.

Digging....

WOW!!!! Gdb claims to have found 1587 entries on the
request_queue_head list. I guess that's pretty extreme...

The 671st entry is

(gdb) print * (aio_request_t *) 0xa89ada8
$1769 = {request_type = 2, cancelled = 0, path = 0x9459400
"ÿÿÿÿà¥;\013q", oflag = 3649, mode = 420, fd = 32, bufferp =
0x61667e2f <Address 0x61667e2f out of bounds>, tmpbufp = 0xb173e28
"\003q", buflen = 113, offset = 0, whence = 2, ret = 32, err = 4,
tmpstatp = 0x67706a2e, statp = 0xffffff00, resultp = 0x80aaa90, next =
0x93d8648}

Which is the right request, and ...

(gdb) frame 2
#2 0x804f7aa in aioCheckCallbacks () at async_io.c:372
372 (ctrlp->done_handler) (ctrlp->done_handler_data,
(gdb) print &ctrlp->result
$1776 = (aio_result_t *) 0x80aaa90

ta-dah! the resultp matches. So the question is, how did the resultp
get moved out there?

Ideas??

Michael.
Received on Tue Jul 29 2003 - 13:15:47 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:43 MST