Re: [PATCH] Pipeline prefetch build-time control

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 25 May 2013 03:09:45 +1200

To clarify what we are up to:

Pipelining is a queue of requests which are pre-parsed (and sometimes
even have a response queued) while some request received earlier is
being fetched. The pipeline queue has several states: disabled, empty,
blocked, terminated, and full. Empty is pretty self-explanatory state.
Full is at some arbitrary point as determined by pipeline_prefetch.
Blocked is a temporary state for preventing further pipelining until all
currently queued requests are completed. Terminated signals an
irreversable halt to pipelining on the connection. Blocked and
Terminated are precautionary measures to prevent wasted resources and
unrolling a pipeline of request contexts.

Some of these things are already existing either explicitly or
implicitly. This patch:

* changes pipeline_prefetch directive from an on/off toggle with
hard-coded pipeline queue length of 1 request (+1 being handled) to a
numeric queue-length limiter for determining arbitrary values of where
"full" state is preventing new request parsing.

* adds pipeline state flags to ConnStateData and accessors for the
parser to signal blocked or terminated status as identified.

* adds calls to relevant parts of the parser and pre-processing
functions to set pipeline blocked or terminated states.

* changes the OkTOAddRequest() test function to check all the states and
prevent further pipelining if there is a) any reason not to pre-parse
further, or b) no room in the queue.

* disabling pipelining whenever client_persistent_connections is OFF.

On 22/05/2013 5:00 a.m., Alex Rousskov wrote:
> On 05/20/2013 10:59 PM, Amos Jeffries wrote:
<snip>
>> I've added a loop to scan for Connection:close (and
>> Proxy-Connection:close), and CONNECT method as blockers on further
>> pipieline reads. There may be other conditions we find later.
> The new loop is problematic, on several levels:
>
> 1) We should either
>
> * assume (or ensure) that when we are called a connection is persistent
>
> or
>
> * not assume that when getCurrentContext() is NULL the connection is
> persistent.
>
> I recommend the first approach because it is simpler, does not add new
> state, and avoids code duplication. This is the approach used before
> your changes AFAICT.

We are HTTP/1.1 now. We can and do assume the connection is persistent
unless some signal is available that explicitly means it is closed.
Before and after the loop was added AFAICT.

> However, if you disagree, then the
> "getCurrentContext() is NULL implies that the connection is persistent"
> assumption above should be removed and the next two items will also apply:
>
>
> 2) We should not implement "connection is persistent after this request"
> check inside the loop. It should be either a method of
> ClientSocketContext() or a ConnStateData member, depending on whether
> you remove the assumption discussed above.

There is a ! operator on that test you may have overlooked. This is the
check for "Connection:close is present --> no more prefetching.". In all
other cases we are assuming persistence. That may or may not be right
but note that this function is only called when there is *already * a
series of bytes in the read buffer and we are simply deciding whether to
parse it now (assuming it is a new request header) or wait to find out
if it actually is a header block.

NP: in the attached patch after the loop removal the same persistence
check is run right after HttpHeaders are all parsed. Then updates the
ConnStateData pipeline state to disable any future requests if
non-persistence is seen.

On the whole I am thinking there is a lot more things we should be
blocking on which are not tested for yet and real-world traffic exposure
will require further additions.
While moving the block/disable/no-change checks into the
clientProcessRequest() function I have added more tests, this time for
payload existence to temporarily block the pipeline as well, just like
CONNECT, and all sorts of "framing" errors in the early parser now
disable pipelining entirely to avoid potential request smuggling situations.

Yes I agree that some of these events are possibly not necessary to
disable or block. However, pipelining is itself a traffic optimization
which is largely unused today anyway. So by allowing >0 queued requests
for any portion of a connections lifetime we are already gaining.
Disabling or pausing it at the first hint of trouble seems to be the
right (conservative) way to go until we have more experience and testing
to prove that enabling it is fine.

>
>> + // XXX: check that pipeline queue length is no more than M seconds long already.
>> + // For long-delay queues we need to push back at the client via leaving things in TCP buffers.
>> + // By time M all queued objects are already at risk of getting client-timeout errors.
> I do not think we should do this, at least not by default, so this is
> not an XXX. I would remove that comment, but if you want to keep it,
> please rephrase it as an optional feature. It would be rather difficult
> for Squid to measure the time those requests spent in TCP buffers and
> also predict whether the client still wants those requests to go ahead.
> Some clients would not submit all requests at once and will not timeout
> submitted requests as long as they keep receiving response(s), for example.

FYI: Using time+count for queue length instead of only count is one of
the main design models that have come out of the buffer bloat project as
a major source of efficient high performance in queue management.

As to implementation ... each queued request should be having a
timestamp in its state data somewhere for time of first read and time of
headers completion read. I am thinking we can compare one of those
against current_timestamp and know for the last queued request and know
how long it has been waiting. That timestamp is just missing or unknown
to me right now, and is an optimization anyway so its not implemented yet.

I have altered the XXX to a TODO.

Amos

Received on Fri May 24 2013 - 15:09:54 MDT

This archive was generated by hypermail 2.2.0 : Fri May 24 2013 - 12:01:47 MDT