Re: SBuf issue

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 21 Nov 2013 08:16:35 -0700

On 11/21/2013 02:27 AM, Amos Jeffries wrote:
> While writing parser updates I have encountered the small problem that
> SBuf always *copies* data from non-Sbuf sources. There is absolutely no
> way provided to use a pre-allocated I/O buffer as the backing store for
> Sbuf objects. This includes pointing SBuf at a already allocated global
> char*.
> It always copies.

I recommend keeping it that way. While adding support for alternative
backing blobs is doable, we should really focus on reducing the number
of non-Sbuf sources instead, at least for now. Our chances of properly
optimizing something (other than by pure luck) decrease exponentially as
the complexity increases, and we are just starting to use the new string
code with "simple" single-type backing.

> The way it is done makes sense in parsing where the input buffer is
> constantly being cycled/shifted by the I/O system and possibly has 500KB
> of area with a small sub-string needing to be pointed at by an SBuf for
> long periods.
> However, is also prevents us from doing two things:
> 1) having a global array of char* header names and field values. Which
> the parser points an SBuf at before emitting (avoiding a lock on the I/O
> buffer memory).

I do not see a global array of char* header names as valuable. We should
have a global set of SBuf header names instead, to optimize search and
comparison.

Furthermore, I suspect the "lock on the large I/O buffer" problem is a
red herring: When we are done with the tokenizer and buffer list
(discussed below), there should be no need for 500KB buffers, and
locking a small I/O buffer is better than allocating an even smaller
one. In fact, 500KB is not really an I/O buffer in most contexts. It is
a content accumulation buffer. And if/when system I/Os become that
large, they will be naturally paired with smaller RAM costs.

> 2) having a sub-string of an existing buffer temporarily pointed to by
> an SBuf for string comparision operations. eg
> char* ioBuffer[1024];
> ... do some read()
> if (ioBuffer[0] == 'X') {
> SBuf key(ioBuffer+1, 20);
> ... use key as a reference to the ioBuffer[1 - 20] sub-string ...
> }
>
> this latter is what I am finding many needs for through the HTTP parser.

You should use MemBlob for ioBuffer and then (until we have a tokenizer)
create SBuf strings (from that buffer) when you want to access the read
content.

> Primarily because parsing happens in small pieces and the end of a block
> of input may not even be present when we have to scan the start of it.

The "sliding window parser" is a different problem, actually. We have
tried to discuss it several times already, without strong consensus.
IMO, we need a good tokenizer to solve this problem for "small" content
(like request headers) AND a buffer list (with an even better tokenizer)
to solve this problem for "large" content (like chunked encoding). The
tokenizer and list APIs proposed earlier had too many problems IMO, but
that is a relatively minor detail we can fix.

I hope to be able to propose a tokenizer soon.

BTW, is HTTP/2 parsing based primarily on offsets ("header #5 is at
offset 100") rather than string patterns ("the new header starts after
CRLF sequence")?

Thank you,

Alex.
Received on Thu Nov 21 2013 - 15:16:44 MST

This archive was generated by hypermail 2.2.0 : Fri Nov 22 2013 - 12:00:11 MST