Re: Sbuf review at r9331 from Kinkie on 2009-02-20 (squid-dev)

From: Kinkie <gkinkie_at_gmail.com>
Date: Fri, 20 Feb 2009 20:50:17 +0100

On Fri, Feb 20, 2009 at 5:56 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 02/20/2009 05:16 AM, Kinkie wrote:
>> On Sun, Feb 1, 2009 at 9:08 PM, Alex Rousskov
>> <rousskov_at_measurement-factory.com> wrote:
>>> Here are a few Buffer comments based on r9331 of lp:~kinkie/squid/stringng
>
> Executive summary: Many problems fixed, thank you. We are making
> progress. Three known big issues remain:
>
> 1) SBufStore should be a stand-alone class (see the first two related
> bullets below).
>
> 2) Confusion regarding pooling of SBuf objects is suspected (discussed
> in a few bullets below).

Given your explanation it is now a certainty.
On the other hand, SBufStore objects would probably benefit from it,
wouldn't they?

> 3) Lack of detailed low-level review -- waiting for the resolution of
> the above issues.
>
> If you really hate doing changes related to (1) and (2), I can take over
> once the other fixes are implemented. I am not excited about fixing
> this, but I would rather do extra work than see you depressed. If this
> happens, you can still be responsible for testing, integrating, and
> polishing the result, but you might be happier finding my bugs than
> listening to my complaints about yours :-)

Now this is what I call a positive attitude! ;-)
Seriously: the code looks in a much better shape now than it did.
Thanks for prodding me.

> Details below.
>
>>> * Move SbufStore outside of SBuf. You are not buying any extra security or
>>> protection by embedding it, but making the code more complex and less
>>> reusable.
>>
>> This can be addressed in three ways that I can think of:
>> 1- leaving as-is
>> 2- bringing that out, with private constructors and friendship to
>> select who can instantiate what.
>> 3- bringing that out with big red signs in the class declaration not
>> to (ab)use it.
>>
>> would 2. be acceptable to you?
>
> 4. Bring it out as a regular class.
>
> There is no good reason to try to hide the store class or try to make
> its use by others more difficult. Make the class reusable instead of
> making it ugly and prohibiting its reuse. You are not making SBuf any
> better by keeping SBufStore inside either. You are making it worse! Just
> treat SBufStore as a regular class, with a specific purpose -- providing
> refcountable and growable continues memory storage.

You have convinced me.
I just ask you for the future, whenever you have objections about
something I design, please show me a code mock-up. My resistance to
change is almost often motivated by me not seeing a different way to
do things. OTOH, I think I have demonstrated a reasonable willingness
to change my mind.

> Whether there will be other users for SBufStore is unknown and
> unimportant for now.
>
> We should probably call the storage class MemBlob to emphasize its
> purpose and disassociate it from one specific user (SBuf).

Sure, no problem.

> BTW, I am not sure the current set of SBufStore methods is correct. I
> have a feeling the store is much smarter than it should be, duplicating
> some of the SBuf functionality. Such problems are typical when two
> classes are too tightly coupled -- they tend to lose their clear
> identity and purpose. I will revisit this later, after MemBlob is isolated.

It is smarter than needed, mostly for convenience. And yes, they are
tightly coupled.. MemBlob is a glorified struct, meant to serve the
Buffer...

>>> * SBufStore(const SBuf &S);
>>>
>>> SBufStore must not know about SBuf.
>>
>> It knowing about SBuf is a serious design help wrt SBuf::consume()
>> If it didn't, we'd have to take a trip through char*.
>> If it can't be accepted, I'll do so, but if it's possible at all I'd
>> like to keep it.
>
> Just like with store issue above, I fundamentally disagree here. The
> only information stored in SBuf and not in SBufStore is offset and
> length. You can pass that information to the constructor and other
> members, as needed.

Ok.

> Whether "trips through char*" are a bad thing in this context is
> debatable, BTW. You can pass SBufStore instead of char* but you would
> still need very similar methods where raw char* is passed. I am fine
> with two copies of methods, one char*-based and one SBufStore-based, but
> I would not be surprised if keeping it simple with just char* would be a
> better solution overall.

Simplest way to know is to just try it. It's not something that can't
be done in one or two hours probably.

>>> * \throw TextException always
>>
>> I've tried to be more specific, all more-specific exceptions inherit
>> from TextException.
>> I'd like to understand what's the benefit in having only one exception kind.
>
> I do not think the above bullet was my review comment. I think it was a
> part of your source code, copy-pasted to provide context for the "How
> about simply not implementing the not-to-be-used methods?" comment.

Ok.

> I do not object to specific exceptions.
>
>>> * Why is Sbuf MemPooled?! We are not supposed to allocate SBuf dynamically,
>>> right? Did you mean to pool SbufStore instead?
>>
>> SBuf is a fixed-size object. MemPooling avoids heap fragmentation, doesn't it?
>
> SBuf objects are not allocated on heap. They are allocated on stack. You
> cannot fragment the stack :-). Pooling on-stack objects does not make
> sense to me. Am I missing something?

No. But _I_ was. I'll de-poolify SBuf immediately.

>>> * If you have introduced size_type, use it. Do not use size_t instead except
>>> for size_type typedef.
>>
>> I've only left it for those cases where we go to char* land.
>> I have no issues with changing it if you feel it's the best thing to do.
>
> Using std::string as a guide, let's use size_type everywhere.

Ok.

>>> * I doubt clear() should free the storage by default. Usually, we clear to
>>> fill again. You even do that yourself in SBuf::assign(). Deallocating and
>>> allocating would cost more than doing nothing. Let's not deallocate by
>>> default, for now.
>>
>> It is a possible optimization, but it's tricky and it involves
>> checking refcounts and slicing history of the SBufStore, and the same
>> practical result can be obtained by mempools (as it is now). In other
>> words, I'd leave it as-is.
>
> I do not foresee any new tricks and I believe one of us is confused
> about MemPools in SBuf context (see above). AFAICT, SBuf::clear() can be
> implemented as
>
> cow(); // noop if less than two references
> offset = 0;
> len = 0;

This doesn't free any memory back to the system, which for long-lived
objects (think MemBuf-alikes) is not really a good thing IMVHO, for
instance because it skewers the reallocs counter..
What I currently do is implement a Prototype pattern, with a static
zero-length MemBlob which gets assigned. Is this reasonable?

> You need cow() or equivalent for many other SBuf methods so I am not
> adding any new tricks here.
>
>>> * Once again, please remove all exception specifications (I think you have
>>> only empty throw()s left). For rationale, see my earlier reviews or Item 13
>>> "A pragmatic look at exception specifications" in Sutter's "Exceptional C++
>>> Style". Most if it should also be available at
>>> http://www.gotw.ca/publications/mill22.htm
>>
>> Yes, I had left the empty ones since a previous comment of yours told
>> that they may be non-useless.
>> I've now removed them all.
>
> Thank you. If you reread my original comment (2008/09/18), you will see
> that I explicitly asked for the empty specifications to be removed.

Then I had once more misunderstood. Sorry...

>>> * See my earlier StringNg comments regarding truncateSelf, chopSelf,
>>> import*, export*, substrSelf, and headUntil* naming and existence. Trim them
>>> down to basics, use standard names, and remove "Self".
>>
>> They're now all gone except for substr and substrSelf. The dualism is
>> needed because the former creates a new String and the latter modifies
>> in-place, which is not part of the std::string API. If you can suggest
>> a better name, I'll gladly change it.
>
> I bet std::string does not have substrSelf() because it can be easily
> and cheaply written as
>
> s = s.substr(...);
>
> If you insist on leaving substrSelf in, then please call it leave(from,
> to), focus(from, to), or zoom(from, to).

it can be substituted, but then it also doesn't have consume() which
is a very handy tool for parsers for intance. The reworked Tokenizer
is IMO very lean, partly as a result of in-place chopping of Buffers.

zoom sounds fine. Otherwise, I'd dare suggest "slice"

>>> * remove dumpStats. Just give access to Stats object(s) that can dump.
>>
>> Are you sure? We're all friends and we're not supposed to be meddling
>> with the contents of stats, but this makes it possible to enforce the
>> policy.
>
> I am sure. Give _const_ access to enforce the policy.

Ok. This means that the getter will remain (IIRC it was also a
suggestion of yours to remove it).

>>> * Why is cow() public?
>>
>> It doesn't hurt, and allows clients to anticipate to a String that
>> they'll be needing to modify data. IMO it is similar to allowing
>> reserve() to be public.
>
> reserve() is a very important performance optimization when you expect
> to add many little data pieces to a buffer.
>
> Can you come up with examples where the code will become more efficient
> with an explicit cow() call? I cannot and I do not want to waste time to
> tell folks to remove cow() calls from their code because those calls can
> only hurt performance (in cases where we did not reach the modifying
> operation for whatever reason). So, unlike reserve(), explicit cow()
> never helps and sometimes hurts, IMO. If you can give examples that
> prove me wrong, please do.

I'll take the easy route and make it private for now. Making it public
again if one such example pops up is trivial.
I will however object to anyone doing anything stupid such as
append("") to trigger a cow().

> >> * s/alloc_strategy/calcCapacity/ or similar. The method does not
> allocate,
>>> change, or return any "strategy".
>>
>> is "estimateCapacity" fine?
>
> Sure.
>
> >> * Check that assignment operator sets everything. I think nreallocs was
>>> forgotten.
>>
>> That's intentional. nreallocs tracks the history of a SBuf, not of its contents.
>
> I think this is a bad approach because folks expect assignment to assign
> everything, including history. They expect a copy constructor and an
> assignment operator to have the same basic outcome.
>
> If you insist that your assignment should not be what an average
> developer expects, add a comment explaining why some of the data members
> are not assigned.

I'll ponder on this some more.
I don't believe it to be a huge issue tho, as it's part of the
heuristics that are mostly useful to estimateCapacity() - aka very
much behind-the-scenes work, which will certainly require tweaking
once StringNg is actuall used.
Some of the questions intentionally left unresolved are:
- what sizes and how many of them to use for MemBlobs' backing storage
- how much headroom to leave
- what algorithms to use to adapt the headroom left - if adapting is
useful at all

All of those estimates are concentrated in estimateCapacity. I do not
expect to get it right on the first try.

> BTW, the whole idea of counting the number of reallocs in a
> stack-allocated object that will be often copied to other
> stack-allocated objects seems rather strange to me. You are not going to
> get any usable stats from this, IMO.

I'm thinking in terms of data-members of other structs.
producers/consumers are the main targets of this.

> If you move this statistic to the store object, you would not have any
> of these problems.

It has its own set. In time , I expect to increase the reporting
capabilities of the two classes to also answer some basic questions
like "but is in the end this whole thing useful?" (e.g. what's the
average reference count for objects, what's the average size of
things, what's the average number of reallocs per object...)

> >> * There are too many similar-but-different append() implementations.
> Can't
>>> you have one or two and have others call it with the right parameters?
>>
>> There are four, and the major difference is the type of data being
>> appended (SBuf,char*+size, c-string, std::string). Plus one formatted
>
> I am talking about the implementations (method bodies), not the
> declarations. Can three out of four append()s be implemented as a call
> to append(char*+size)?

Three, I don't think. One or two, probably yes.

>>> * Please change your substr() semantics (as implemented) to match that of
>>> std::string when "from" is too big.
>>
>> Will do.
>
> Please do not forget.

Already done.

>>> * I do not understand why you cannot keep the experted message without
>>> duplicating it in the code below. Can you explain, please?
>>>
>>> msg=b.exportCopy();
>>> message=xstrdup(msg); //double-copy to avoid mismatching malloc and new
>>> delete msg;
>>
>> On some platforms (windows at least) there are different allocators
>> for malloc() and operator new().
>> free(new object()) and delete(malloc(42)) are illegal on those platforms.
>
> It is illegal (from the coding style point of view) everywhere,
> regardless of whether it "works".
>> I think I've seen that Squid overrides operator new with xmalloc,
>
> You should not confuse new() implementation and semantics. No matter how
> new() and delete() are implemented, one must not free(new()) and
> delete(malloc()).
>
>> but I'm not 100% sure of that, and thus the double-copy.
>> Please let me know if it can safely be dropped.
>
> I think I now understand where the bug is. Your exportCopy()
> implementation is violating Squid rules for allocating char arrays. In
> other words, either exportCopy() should use xmalloc like the rest of
> Squid or you should change the rest of Squid code to use new[] for char
> arrays. I recommend the former :-).
> The double-copy above should be removed. SBuf users should not be
> expected to remember to do that trick. SBuf code should be fixed to make
> this trick unnecessary.

That's exactly the logic behind that code, and I'll change it as you say.

-- 
    /kinkie

Received on Fri Feb 20 2009 - 19:50:27 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 21 2009 - 12:00:03 MST