Re: SBuf issue from Amos Jeffries on 2013-11-22 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 23 Nov 2013 00:47:50 +1300

On 22/11/2013 4:16 a.m., Alex Rousskov wrote:
> On 11/21/2013 02:27 AM, Amos Jeffries wrote:
>> While writing parser updates I have encountered the small problem that
>> SBuf always *copies* data from non-Sbuf sources. There is absolutely no
>> way provided to use a pre-allocated I/O buffer as the backing store for
>> Sbuf objects. This includes pointing SBuf at a already allocated global
>> char*.
>> It always copies.
>
> I recommend keeping it that way. While adding support for alternative
> backing blobs is doable, we should really focus on reducing the number
> of non-Sbuf sources instead, at least for now. Our chances of properly
> optimizing something (other than by pure luck) decrease exponentially as
> the complexity increases, and we are just starting to use the new string
> code with "simple" single-type backing.

I am trying to start that by working straight from the I/O buffers into
SBuf. If we are happy to wear the data copy until the buffer itself is
made SBuf friendly I will just keep going on parser, otherwise I will
prioritize my client-side cleanup patch which upgrades the I/O buffer
(see connection-manager launchpad branch).

>
>> The way it is done makes sense in parsing where the input buffer is
>> constantly being cycled/shifted by the I/O system and possibly has 500KB
>> of area with a small sub-string needing to be pointed at by an SBuf for
>> long periods.
>> However, is also prevents us from doing two things:
>> 1) having a global array of char* header names and field values. Which
>> the parser points an SBuf at before emitting (avoiding a lock on the I/O
>> buffer memory).
>
> I do not see a global array of char* header names as valuable. We should
> have a global set of SBuf header names instead, to optimize search and
> comparison.

Good point.

>
>> Primarily because parsing happens in small pieces and the end of a block
>> of input may not even be present when we have to scan the start of it.
>
> The "sliding window parser" is a different problem, actually. We have
> tried to discuss it several times already, without strong consensus.
> IMO, we need a good tokenizer to solve this problem for "small" content
> (like request headers) AND a buffer list (with an even better tokenizer)
> to solve this problem for "large" content (like chunked encoding). The
> tokenizer and list APIs proposed earlier had too many problems IMO, but
> that is a relatively minor detail we can fix.
>
> I hope to be able to propose a tokenizer soon.
>

Great.

Future thinking in me is working along the lines that MemBuf becomes
backed by MemBlob store and Tokeniser can take either MemBuf,SBuf to
spawn SBuf for the same MemBlob.

>
> BTW, is HTTP/2 parsing based primarily on offsets ("header #5 is at
> offset 100") rather than string patterns ("the new header starts after
> CRLF sequence")?

HTTP/2 has binary Frame blocks with type codes, size and various fields
much like a TCP/IP packet header. Then inside the HEADER frame type
there is payload consisting of the HTTP headers in compressed format.

* HTTP/1 request-line fields are split into headers along the lines of
Host: (eg. method becomes a :method header, URI becomes :scheme, :host,
:path headers) in a HEADERS frame.

* HTTP/1 response status also becomes :status header in a HEADERS frame.
1xx status are obsoleted.

* HTTP/1 mime headers become generic headers listed after those "special
ones" in the HEADERS frames

* HTTP/1 entities/payload become DATA frames

* request and response are paired with a shared "stream ID".

I have not yet looked that closely at the header compression draft yet.
From what I can see of the comments they are still concentrating on the
SPDY mindset about taking in plain-text HTTP/1 style headers as-is and
just compressing the bytes on a per-line basis. Possibly with a line
length prefix in binary - meaning we still need to walk headers in
sequence like ASN.1 but with step size given so we don't have to search
for delimiters between lines (only ',' or '\0' delimiters within lines
if the talk this week goes ahead).

As relates to SBuf/strings:

* there is expected to be a per-TCP-connection state
array/stack/map/fifo/lifo/whatever structure with an entry ID
numerically 0-N assigned to each header in decompressed form which
persists for the lifetime of the TCP connection with constant churn.

* there is expected to be a static global binary->text mapping between
RFC registered header name/values, method names, etc.
We have this already in char* / enum arrays, it will just mean us
renumbering of those entries at some point.

Amos
Received on Fri Nov 22 2013 - 11:47:59 MST

This archive was generated by hypermail 2.2.0 : Fri Nov 22 2013 - 12:00:11 MST