Re: StringNg review request: SBuf

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 19 Oct 2010 08:57:35 -0600

On 10/19/2010 08:28 AM, Kinkie wrote:
> On Mon, Oct 18, 2010 at 9:02 PM, Alex Rousskov wrote:
>> On 09/26/2010 04:47 PM, Kinkie wrote:
>>
>>>>>>> *
>>>>>>> * A SBuf can be empty, but can never be undefined. In other words
>>>>>>> there is no equivalent
>>>>>>> * of NULL. defined/undefined semantics need to be implemented by
>>>>>>> the
>>>>>>> caller.
>>>>>>> */
>>>>>>
>>>>>> Remove: Avoid documenting what SBuf cannot be. If you insist on
>>>>>> discussing
>>>>>> the non-existent APIs, move that discussion to the SBuf.dox file.
>>>>>
>>>>> How about moving a simplified version of it that to the class
>>>>> documentation? I have committed such an attempt.

>>>> You are still documenting negatives, which is counter-productive. I know
>>>> what you mean by "can never be undefined" but a new developer will not
>>>> know how to interpret that because there is no defined() function in C/C++.

>>> Is "A SBuf can be empty (zero-length), but always points to a valid
>>> memory address." clearer?

>> Not really because from the user point of view SBuf does not point to
>> anything at all. There are no "*" or "->" operators for SBuf.

> Tried clarifying again, "A SBuf can be empty (zero-length), but is
> always defined."

What do you mean by "defined"?

>>> As discussed on IRC, will move to
>>> const char * rawContent() const
>>> (-> raw access to memory for READING)
>>> char * rawSpace()
>>> (-> raw access to memory for WRITING, does cow(). Return a pointer to
>>> the first unused byte in the backend store)
>>>
>>> There is a problem with this approach though. If we write to the SBuf
>>> outside its import methods, we also need to signal it how much we
>>> wrote (e.g. a method "haveAppended"). But it's lots of layer-breaking
>>> :(
>>
>> Since we decided to provide write access to the raw buffer, the API has to
>> have a minimum set of methods, including appended(). There is nothing wrong
>> with that, IMO.
>
> I hope that in time we will be able to do without accessing the raw
> buffer, even if it'll mean making the SBuf API wider.

I doubt you will convince POSIX folks to redefine read(2) API to accept
SBufs. Until you do, we will need raw write access to SBuf or equivalent.

>>> The algorithm needs tweaking, but this should already
>>> give an idea about what it may be about. The useage scenario it is
>>> designed for is:
>>> SBuf s;
>>> while (s.append(something())) {
>>> SBuf read=s.consume(chunk)
>>> doSomethingWith(read);
>>> }
>>> s will get realloc'ed more and more; by giving more headroom we ensure
>>> that it will adapt to the useage pattern.
>>>
>>> SBuf::size_type SBuf::estimateCapacity (SBuf::size_type desired) const
>>> {
>>> if (nreallocs< 10)
>>> return 2*desired;
>>> if (nreallocs< 100)
>>> return 4*desired;
>>> if (nreallocs< 1000)
>>> return 8*desired;
>>> return 16*desired;
>>> }
>>
>> I understand what you are trying to do here, but I have no idea whether such
>> optimization would help or hurt. It would be nice to focus on essential
>> pieces first and then add questionable optimizations later, after they have
>> been tested and proved their effectiveness. It is often much easier to add a
>> good new optimization than to find and remove a bad old one.
>
> It'd be interesting to think of some indicators to tune the algorithm.
> How about some histograms for sizes and distribution of nrealloc's at
> SBuf destruction time, or the distribution of requested sizes (before
> algorithms are applied)?

My recommendation is to define exactly what you are trying to optimize
and only then design a set of statistics to measure the effectiveness of
those optimizations. The current approach seems to be the opposite: add
a bunch of statistics, look at how they change, and then wonder "What
does this mean anyway?". Both approaches can result in fast code, but
the former allows for meaningful discussion/review and, IMO, should be
more efficient, on average.

>>>>>>> _SQUID_INLINE_ char * buf() const;
>>>>>>> _SQUID_INLINE_ char * bufEnd() const;
>>>>>>
>>>>>> Why const if they allow SBuf modification?
>>>>>
>>>>> They don't allow modifications. They are private functions to get
>>>>> pointers and end of "our" store, for interfacing with the legacy C
>>>>> world.
>>>>
>>>> They should return "const char *" if they do not allow modification.
>>>
>>> I believe I wasn't clear: those are simply private mehtods meant to
>>> help with recurring activities.
>>
>> That does not answer my original question: Why const if they allow SBuf
>> modification? Either they should return "const char *" or they should not be
>> const themselves.
>
> They don't change the SBuf. But maybe I'm just misunderstanding the
> meaning of a const method.

  const SBuf s(...);
  *s.buf() = 'a'; // changes s; should this be caught by the compiler?

whether such change is outside buf() scope is debatable: Providing write
access to something an object points to is usually not considered
modification. In our case, I would argue that buffer is not something
SBuf points to (from API point of view); it is an integral part of SBuf.
Thus, if buf() returns a non-const buffer pointer, buf() must not be const.

HTH,

Alex.
Received on Tue Oct 19 2010 - 14:57:39 MDT

This archive was generated by hypermail 2.2.0 : Tue Oct 19 2010 - 12:00:04 MDT