Re: Sbuf review at r9331

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 20 Feb 2009 09:56:59 -0700

On 02/20/2009 05:16 AM, Kinkie wrote:
> On Sun, Feb 1, 2009 at 9:08 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> Here are a few Buffer comments based on r9331 of lp:~kinkie/squid/stringng

Executive summary: Many problems fixed, thank you. We are making
progress. Three known big issues remain:

1) SBufStore should be a stand-alone class (see the first two related
bullets below).

2) Confusion regarding pooling of SBuf objects is suspected (discussed
in a few bullets below).

3) Lack of detailed low-level review -- waiting for the resolution of
the above issues.

If you really hate doing changes related to (1) and (2), I can take over
once the other fixes are implemented. I am not excited about fixing
this, but I would rather do extra work than see you depressed. If this
happens, you can still be responsible for testing, integrating, and
polishing the result, but you might be happier finding my bugs than
listening to my complaints about yours :-)

Details below.

>> * Move SbufStore outside of SBuf. You are not buying any extra security or
>> protection by embedding it, but making the code more complex and less
>> reusable.
>
> This can be addressed in three ways that I can think of:
> 1- leaving as-is
> 2- bringing that out, with private constructors and friendship to
> select who can instantiate what.
> 3- bringing that out with big red signs in the class declaration not
> to (ab)use it.
>
> would 2. be acceptable to you?

4. Bring it out as a regular class.

There is no good reason to try to hide the store class or try to make
its use by others more difficult. Make the class reusable instead of
making it ugly and prohibiting its reuse. You are not making SBuf any
better by keeping SBufStore inside either. You are making it worse! Just
treat SBufStore as a regular class, with a specific purpose -- providing
refcountable and growable continues memory storage.

Whether there will be other users for SBufStore is unknown and
unimportant for now.

We should probably call the storage class MemBlob to emphasize its
purpose and disassociate it from one specific user (SBuf).

BTW, I am not sure the current set of SBufStore methods is correct. I
have a feeling the store is much smarter than it should be, duplicating
some of the SBuf functionality. Such problems are typical when two
classes are too tightly coupled -- they tend to lose their clear
identity and purpose. I will revisit this later, after MemBlob is isolated.

>> * SBufStore(const SBuf &S);
>>
>> SBufStore must not know about SBuf.
>
> It knowing about SBuf is a serious design help wrt SBuf::consume()
> If it didn't, we'd have to take a trip through char*.
> If it can't be accepted, I'll do so, but if it's possible at all I'd
> like to keep it.

Just like with store issue above, I fundamentally disagree here. The
only information stored in SBuf and not in SBufStore is offset and
length. You can pass that information to the constructor and other
members, as needed.

Whether "trips through char*" are a bad thing in this context is
debatable, BTW. You can pass SBufStore instead of char* but you would
still need very similar methods where raw char* is passed. I am fine
with two copies of methods, one char*-based and one SBufStore-based, but
I would not be surprised if keeping it simple with just char* would be a
better solution overall.

>> * \throw TextException always
>
> I've tried to be more specific, all more-specific exceptions inherit
> from TextException.
> I'd like to understand what's the benefit in having only one exception kind.

I do not think the above bullet was my review comment. I think it was a
part of your source code, copy-pasted to provide context for the "How
about simply not implementing the not-to-be-used methods?" comment.

I do not object to specific exceptions.

>> * Why is Sbuf MemPooled?! We are not supposed to allocate SBuf dynamically,
>> right? Did you mean to pool SbufStore instead?
>
> SBuf is a fixed-size object. MemPooling avoids heap fragmentation, doesn't it?

SBuf objects are not allocated on heap. They are allocated on stack. You
cannot fragment the stack :-). Pooling on-stack objects does not make
sense to me. Am I missing something?

>> * If you have introduced size_type, use it. Do not use size_t instead except
>> for size_type typedef.
>
> I've only left it for those cases where we go to char* land.
> I have no issues with changing it if you feel it's the best thing to do.

Using std::string as a guide, let's use size_type everywhere.

>> * I doubt clear() should free the storage by default. Usually, we clear to
>> fill again. You even do that yourself in SBuf::assign(). Deallocating and
>> allocating would cost more than doing nothing. Let's not deallocate by
>> default, for now.
>
> It is a possible optimization, but it's tricky and it involves
> checking refcounts and slicing history of the SBufStore, and the same
> practical result can be obtained by mempools (as it is now). In other
> words, I'd leave it as-is.

I do not foresee any new tricks and I believe one of us is confused
about MemPools in SBuf context (see above). AFAICT, SBuf::clear() can be
implemented as

  cow(); // noop if less than two references
  offset = 0;
  len = 0;

You need cow() or equivalent for many other SBuf methods so I am not
adding any new tricks here.

>> * Once again, please remove all exception specifications (I think you have
>> only empty throw()s left). For rationale, see my earlier reviews or Item 13
>> "A pragmatic look at exception specifications" in Sutter's "Exceptional C++
>> Style". Most if it should also be available at
>> http://www.gotw.ca/publications/mill22.htm
>
> Yes, I had left the empty ones since a previous comment of yours told
> that they may be non-useless.
> I've now removed them all.

Thank you. If you reread my original comment (2008/09/18), you will see
that I explicitly asked for the empty specifications to be removed.

>> * See my earlier StringNg comments regarding truncateSelf, chopSelf,
>> import*, export*, substrSelf, and headUntil* naming and existence. Trim them
>> down to basics, use standard names, and remove "Self".
>
> They're now all gone except for substr and substrSelf. The dualism is
> needed because the former creates a new String and the latter modifies
> in-place, which is not part of the std::string API. If you can suggest
> a better name, I'll gladly change it.

I bet std::string does not have substrSelf() because it can be easily
and cheaply written as

    s = s.substr(...);

If you insist on leaving substrSelf in, then please call it leave(from,
to), focus(from, to), or zoom(from, to).

>> * remove dumpStats. Just give access to Stats object(s) that can dump.
>
> Are you sure? We're all friends and we're not supposed to be meddling
> with the contents of stats, but this makes it possible to enforce the
> policy.

I am sure. Give _const_ access to enforce the policy.

>> * Why is cow() public?
>
> It doesn't hurt, and allows clients to anticipate to a String that
> they'll be needing to modify data. IMO it is similar to allowing
> reserve() to be public.

reserve() is a very important performance optimization when you expect
to add many little data pieces to a buffer.

Can you come up with examples where the code will become more efficient
with an explicit cow() call? I cannot and I do not want to waste time to
tell folks to remove cow() calls from their code because those calls can
only hurt performance (in cases where we did not reach the modifying
operation for whatever reason). So, unlike reserve(), explicit cow()
never helps and sometimes hurts, IMO. If you can give examples that
prove me wrong, please do.

>> * s/alloc_strategy/calcCapacity/ or similar. The method does not
allocate,
>> change, or return any "strategy".
>
> is "estimateCapacity" fine?

Sure.

>> * Check that assignment operator sets everything. I think nreallocs was
>> forgotten.
>
> That's intentional. nreallocs tracks the history of a SBuf, not of its contents.

I think this is a bad approach because folks expect assignment to assign
everything, including history. They expect a copy constructor and an
assignment operator to have the same basic outcome.

If you insist that your assignment should not be what an average
developer expects, add a comment explaining why some of the data members
are not assigned.

BTW, the whole idea of counting the number of reallocs in a
stack-allocated object that will be often copied to other
stack-allocated objects seems rather strange to me. You are not going to
get any usable stats from this, IMO.

If you move this statistic to the store object, you would not have any
of these problems.

>> * There are too many similar-but-different append() implementations.
Can't
>> you have one or two and have others call it with the right parameters?
>
> There are four, and the major difference is the type of data being
> appended (SBuf,char*+size, c-string, std::string). Plus one formatted

I am talking about the implementations (method bodies), not the
declarations. Can three out of four append()s be implemented as a call
to append(char*+size)?

>> * Please change your substr() semantics (as implemented) to match that of
>> std::string when "from" is too big.
>
> Will do.

Please do not forget.

>> * I do not understand why you cannot keep the experted message without
>> duplicating it in the code below. Can you explain, please?
>>
>> msg=b.exportCopy();
>> message=xstrdup(msg); //double-copy to avoid mismatching malloc and new
>> delete msg;
>
> On some platforms (windows at least) there are different allocators
> for malloc() and operator new().
> free(new object()) and delete(malloc(42)) are illegal on those platforms.

It is illegal (from the coding style point of view) everywhere,
regardless of whether it "works".

> I think I've seen that Squid overrides operator new with xmalloc,

You should not confuse new() implementation and semantics. No matter how
new() and delete() are implemented, one must not free(new()) and
delete(malloc()).

> but I'm not 100% sure of that, and thus the double-copy.
> Please let me know if it can safely be dropped.

I think I now understand where the bug is. Your exportCopy()
implementation is violating Squid rules for allocating char arrays. In
other words, either exportCopy() should use xmalloc like the rest of
Squid or you should change the rest of Squid code to use new[] for char
arrays. I recommend the former :-).

The double-copy above should be removed. SBuf users should not be
expected to remember to do that trick. SBuf code should be fixed to make
this trick unnecessary.

Thank you,

Alex.
Received on Fri Feb 20 2009 - 16:57:08 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 21 2009 - 12:00:03 MST