Re: [RFC] Time to talk about StringNG merge again?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 27 Jul 2013 10:39:57 -0600

On 07/27/2013 06:00 AM, Kinkie wrote:
>> Here is the use case I am thinking about:
>>
>> sbuf.reserveSpace(ioSize);
>> bytesRead = read(sbuf.rawSpace(), ioSize);
>> sbuf.forceSize(sbuf.size() + bytesRead);

> The other use case is:
> sbuf.reserveCapacity(newCapacity);
> sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows).

Sure, that is another use case for another method.

> This could be used e.g. to instantiate error pages from their
> templates (pseudo-code):
>
> Sbuf template(.....);
> Sbuf errorpage;
> errorpage.reserveCapacity(template * reasonable_scaling_factor);
> SBufTokenizer t(template);
> while (SBuf parsed=t.nextToken("%")) {
> errorpage.append(parsed).append(handleCode(t.peek());
> }
> errorpage.append(t.whatever_remains());
>
> In cases such as this it may not be needed to cow(), so why do it?

AFAICT, your example immediately above will not cause CoW, with or
without the current optimization in reserveSpace() because errorpage
SBuf is not shared with other users. There will be no reason to Copy
errorpage buffer.

The following sketch may be a better example:

    SBuf buf;
    buf.reserveCapacity(...);
    while (there is more data and there is buffer space) {
        buf.append(data);
        SBuf token = nextToken(buf);
        give token to somebody else
    }

In the above example, many tokens and a single buf share the same
underlying buffer. A straight cow()-based implementation of
reserveSpace() called by append() would result in one cow() per iteration.

My understanding is that your current optimization in reserveSpace()
will remove all cow()s in this case because while tokens and buf share
buffer content, they do not share buffer space and so no CoW is really
needed when appending to a shared buffer.

Is my understanding correct?

Do we want this optimization? I think we do.

>> The above case is already handled by rawSpace(), but now I am confused
>> why rawSpace() is implemented using unconditional cow() while
>> reserveSpace() appears to be optimizing something. That optimization
>> seems to be the key here. Perhaps rawSpace() should be deleted and
>> reserveCapacity() adjusted to use the same optimization?
>> reserveSpace(n) ought to be reserveCapacity(content size + n)
>
> Apart from this, I see the benefit of Amos' suggestion of having
> rawSpace also absorb the function of reserveCapacity.
> This also covers you observation that both reserveSpace and rawSpace
> are maybe not needed, one can do the job of both.

*If the optimization discussed above works*, then we should consider
having two or tree methods:

1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

1b. Reserve buffer space. Ensure exclusive buffer ownership.

2. Reserve N space bytes for the caller to append to. No guarantees
regarding buffer ownership are provided.

1b is optional but nice to have and costs only one line of code. I will
focus on (1a) vs. (2) below.

As you can see, (1a) and (2) semantics are very different. In some
contexts, the difference may not be important at all. In some, it may be
critical for good performance (if the optimization discussed above works).

I see only one potential problem with this approach: We cannot be
certain that some code is not going to first get space twice and only
then use the obtained space twice, as illustrated below:

    char *buf1 = bigBuffer.rawSpace(...);
    char *buf2 = bigBuffer.rawSpace(...);
    ...
    read into buf1;
    read into buf2; // corrupts buf1 content

Needless to say that the above buggy code would probably be spread over
several functions so that the bug will not be clearly visible.

I think we can try to combat the above by requiring that the pointer
returned by the rawSpace() call is used immediately.

HTH,

Alex.
Received on Sat Jul 27 2013 - 16:40:12 MDT

This archive was generated by hypermail 2.2.0 : Sat Jul 27 2013 - 12:00:50 MDT