Re: [PATCH] Convert the ConnStateData input buffer to SBuf from Alex Rousskov on 2014-03-08 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 08 Mar 2014 23:34:04 -0700

On 03/08/2014 04:13 AM, Amos Jeffries wrote:
> Prepare the way to efficiently parse client requests using SBuf based
> parser-ng.
>
> We rely on several guarantees for this to work:
>
> * rawSpace(N) guarantees that the backing store is both unique to this
> SBuf and has at least N bytes of unused buffer allocated.

Yes, but the problem here is that this guarantee does not last long
enough. See below.

> * SBuf member retains an active reference to the MemBlob backing store
> for the duration of the read operation. Except in cases whete
> ConnStateData gets destructed, in which case ConnStateData cancels the
> read explicitly.

SBuf does not provide such a guarantee AFAICT, even with the above
exception. An SBuf object may change its MemBlob, and, IIRC, its MemBlob
may change its backing store. Any one of those two actions would be
deadly for the patched code. Retaining a reference is not enough to
prevent those actions.

> * comm_read() is effectively a blocking operation with regards to read
> on the client socket. No other read is scheduled until the callback
> handler has been run.

Yes, but the patched code actually needs a much stronger guarantee which
is not provided by the patched Squid: Comm_read() has to be blocking
with regard to any other buffer-modifying operation as well. For
example, nobody can consume anything from the buffer while the comm_read
is pending because consumption may invalidate the free space pointer
that the comm has.

Why would somebody want to consume the input buffer while a Comm read is
pending? This might happen when, for example, the BodyPipe buffer gets
free space and notifies the producer (via (ConnStateData's
noteMoreBodySpaceAvailable) that the producer can now shovel more
request body data into the pipe. The producer shovels, consumes the
shoveled data, and SBuf gets cleared. Comm then reads into an invalid
(or worse) pointer.

Another example are various c_str() calls that the patched code uses
with the input buffer. Some of those calls happen when forming error
responses, and some happen when handling completely asynchronous (to the
transaction holding the buffer) Cache Manager requests! Those calls are
allowed to change the buffer storage pointer, but even when they do not,
they modify the contents of something we may have just read.

I am worried that this design is too dangerous to use, even if it
miraculously works now due to extra copies or other temporary/implicit
boundaries that currently isolate buffer-modifying code from the buffer
reading code.

> * parsing and other manipulations of the buffer are done synchronously
> by the read handler before scheduling more reads.

AFAICT, parsing and other manipulations may happen while the read is
pending. The above are just a couple of examples I could find quickly.

Please redesign to avoid these problems. Please let me know if you need
help with that -- I have a couple of ideas how this can be done, but I
suspect yours are going to be better since you have spent a lot more
time with this code.

> - comm_read(clientConnection, in.addressToReadInto(), getAvailableBufferLength(), reader);
> + comm_read(clientConnection, in.buf.rawSpace(in.buf.spaceSize()), in.buf.spaceSize()-1, reader);

If this code survives, please allocate the buffer outside the comm_read
call. There was already an ugly dependency on the right parameter
evaluation order, and your changes make it look even worse.

More to come, but probably after we settle the design problem discussed
in the beginning of this email.

Thank you,

Alex.
Received on Sun Mar 09 2014 - 06:34:09 MDT

This archive was generated by hypermail 2.2.0 : Sat Mar 15 2014 - 12:00:12 MDT