Re: [RFC] Time to talk about StringNG merge again?

From: Kinkie <gkinkie_at_gmail.com>
Date: Sat, 27 Jul 2013 20:00:10 +0200

On Sat, Jul 27, 2013 at 6:39 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 07/27/2013 06:00 AM, Kinkie wrote:
>>> Here is the use case I am thinking about:
>>>
>>> sbuf.reserveSpace(ioSize);
>>> bytesRead = read(sbuf.rawSpace(), ioSize);
>>> sbuf.forceSize(sbuf.size() + bytesRead);
>
>
>> The other use case is:
>> sbuf.reserveCapacity(newCapacity);
>> sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows).
>
> Sure, that is another use case for another method.
>
>
>> This could be used e.g. to instantiate error pages from their
>> templates (pseudo-code):
>>
>> Sbuf template(.....);
>> Sbuf errorpage;
>> errorpage.reserveCapacity(template * reasonable_scaling_factor);
>> SBufTokenizer t(template);
>> while (SBuf parsed=t.nextToken("%")) {
>> errorpage.append(parsed).append(handleCode(t.peek());
>> }
>> errorpage.append(t.whatever_remains());
>>
>> In cases such as this it may not be needed to cow(), so why do it?
>
> AFAICT, your example immediately above will not cause CoW, with or
> without the current optimization in reserveSpace() because errorpage
> SBuf is not shared with other users. There will be no reason to Copy
> errorpage buffer.
>
> The following sketch may be a better example:
>
> SBuf buf;
> buf.reserveCapacity(...);
> while (there is more data and there is buffer space) {
> buf.append(data);
> SBuf token = nextToken(buf);
> give token to somebody else
> }
>
> In the above example, many tokens and a single buf share the same
> underlying buffer. A straight cow()-based implementation of
> reserveSpace() called by append() would result in one cow() per iteration.
>
> My understanding is that your current optimization in reserveSpace()
> will remove all cow()s in this case because while tokens and buf share
> buffer content, they do not share buffer space and so no CoW is really
> needed when appending to a shared buffer.
>
> Is my understanding correct?

Yes; that's exactly the point. Append doesn't require cow. It only
does when there is not enough unused space in the memblob OR the SBuf
being appended to is not at the end of the used shared storage ( sorry
for the multiple negatives)

> Do we want this optimization? I think we do.

Good :) Then we agree.

>>> The above case is already handled by rawSpace(), but now I am confused
>>> why rawSpace() is implemented using unconditional cow() while
>>> reserveSpace() appears to be optimizing something. That optimization
>>> seems to be the key here. Perhaps rawSpace() should be deleted and
>>> reserveCapacity() adjusted to use the same optimization?
>>> reserveSpace(n) ought to be reserveCapacity(content size + n)
>>
>> Apart from this, I see the benefit of Amos' suggestion of having
>> rawSpace also absorb the function of reserveCapacity.
>> This also covers you observation that both reserveSpace and rawSpace
>> are maybe not needed, one can do the job of both.
>
>
> *If the optimization discussed above works*, then we should consider
> having two or tree methods:
>
> 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.
>
> 1b. Reserve buffer space. Ensure exclusive buffer ownership.
>
> 2. Reserve N space bytes for the caller to append to. No guarantees
> regarding buffer ownership are provided.
>
> 1b is optional but nice to have and costs only one line of code. I will
> focus on (1a) vs. (2) below.
>
> As you can see, (1a) and (2) semantics are very different. In some
> contexts, the difference may not be important at all. In some, it may be
> critical for good performance (if the optimization discussed above works).
>
> I see only one potential problem with this approach: We cannot be
> certain that some code is not going to first get space twice and only
> then use the obtained space twice, as illustrated below:
>
> char *buf1 = bigBuffer.rawSpace(...);
> char *buf2 = bigBuffer.rawSpace(...);
> ...
> read into buf1;
> read into buf2; // corrupts buf1 content
>
> Needless to say that the above buggy code would probably be spread over
> several functions so that the bug will not be clearly visible.
>
> I think we can try to combat the above by requiring that the pointer
> returned by the rawSpace() call is used immediately.

Yes. It is the caller's risk and responsibility not to mess with the
raw storage, and it should be avoided whenever possible. Several best
practices are going to be helpful avoid issues, but they can't be
technically enforced, only documented and reviewed.
Given this, I expect that the common case is going to be 2. This is
why I like Amos's suggestion more (two methods for the append-friendly
optimization, and one for accessing the internals).

-- 
    /kinkie
Received on Sat Jul 27 2013 - 18:00:23 MDT

This archive was generated by hypermail 2.2.0 : Sun Jul 28 2013 - 12:00:06 MDT