Re: SBuf review at r9370

From: Kinkie <gkinkie_at_gmail.com>
Date: Thu, 26 Feb 2009 18:24:03 +0100

On Thu, Feb 26, 2009 at 1:46 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> Hi Kinkie,
>
>    Here are a few corrections for
> http://bazaar.launchpad.net/~kinkie/squid/stringng at r9370.
>
> * Remove isLiteral and the corresponding code/logic. We might add this
> very minor performance optimization as a non-public Blob member once the
> code is known to work well. For now, the code is confusing enough
> without it.

The Literal thing was actually added to answer to a very specific
need. IIRC it was because without it valgrind complained about leaked
memory in case of a global static SBuf.
If you're sure, I'll remove it.

> Once isLiteral is removed, compare with the prototype pointer instead of
> testing isLiteral in cow().
>
> * Remove isImported. Copy and then free the buffer when importing
> instead. Same motivation as in the isLiteral comment above.

This too has a IMO a very practical use: it allows us an easy path to
get into the SBuf world i/o buffer which have been created by the i/o
layer.
If you're sure, I'll remove it.

> * SBuf copy constructor and assignment operator still initialize members
> differently.

Fixed. nreallocs is now imported by copy-constructor.

> * Some throw() declarations are still left in the code. Grep for them.

Are you sure? There's only one that I could find, in the virtual
destructor for OutOfBoundsException, and in that case it's needed to
match the signature of TextException.

> * cow() logic feels wrong because grow() does not just grow(). Grow is
> discussed below. The cow() code should be something very simple like:
>
>    if (we are alone)
>        return false; // no need to copy

The only difference I could find is an isLiteral check..

>    store_ = new MemBuf(*store_); // get ourselves an exclusive copy of
> the blob
>    return false;
> No grow() magic -- we are getting an exclusive copy, not growing something.

MemBlob has disabled copy-constructor and assignment operator by
choice, as it gets handled through (refcount) pointers. (BTW: is this
a violation of the coding guidelines?)
It's certainly possible to add it, it's just a different style.

The only extra operation performed by the current implementation is to
run a re-evaluation of the blob store sizing heuristics: at the end
it's a cheap operation compared to the copying, and since we're
copying we may as well check it.
I'll change if you confirm me that you have also considered this side
of the argument.

I however agree with you that the current cow()/grow() logic is flawed
in that a certain amout of growing is always performed, regardless the
fact that the MemBlob is used or not. Which means that a SBuf which
gets aliased and cow()ed a lot will grow and waste memory.

My suggestion: reintroduce reAlloc() which gets called by cow() and
does the low-level work, and have both cow() and grow() perform
checks, heuristics and call reAlloc(). How would you like this?

> * roundto(5,5) should return 5 but returns 10. The comment and
> implementation should be fixed. The method probably does not belong here
> though.

Reimplemented and re-checked. Now works as advertised.
Moved to libmiscutil.

> * estimateCapacity() is too complex to understand its performance
> effects, especially since nreallocs is a very strange animal. I suggest
> that we keep it very simple and add optimizations later, if needed.
> Something like
>
>    granularity = desired < PageSize ? 16 : PageSize;
>    return roundto(desired, granularity)
>
> is much easier to comprehend, analyze, and optimize. Let's postpone
> complex optimizations.

I agree, with a SLIGHTLY more complex design to match the current
standard MemPool String sizes.
Stuff like:
desired *= constant_growth_factor;
if (desired< ShortStringsSize)
  return ShortStringsSize;
if (desired < MediumStringsSize)
  return ShortStringsSize;
if (desired < BigStringsSize)
  return BigStringsSize;
if (desired < 2kBuffersSize)
  return BigStringsSize;
return roundto(desired,PageSize);

We may want to discuss whether squidSystemPageSize should be static to
this module (as it is now) or belongs to the global namespace.

> * To me, compare() returns the opposite of what strcmp() does. "This"
> should be treated as the first parameter of strcmp() and not the second
> one, IMO.

Gah!
Fixed and added explicit cross-checks to unit tests.

> * startsWith() is buggy -- it ignores the contents of its parameter's
> buffer.

Does it?
Its implementation reads: "if *this is too short to match, then it
can't match. Otherwise does a SBuf equality test on a a substr() of
*this long just as the parameter..

> * SBuf::consume() returns a copy of SBuf. It should return a reference
> to this or void.

Er.. no.
Consume shortens the SBuf by chopping off the head and returns the chopped part.
See how it's used in the tokenizer.

Given a:
SBuf data="some very long string";

then:
SBuf chopped=data.consume(5);

is equivalent to:
SBuf chopped=data.substr(0,5);
data.substrSelf(5,SBuf::npos);

or to:
SBuf chopped=data.substr(0,5);
data=data.substr(5,SBuf::npos);

It is just (IMVHO) more readable clear in its purpose.
a MemBuf-alike consume() makes no sense in that it's just an alias for
substrSelf(0,target).

> * Move "if !canAppend then grow" logic to a single method. Call it
> reserve(). More on grow() below.

(I need to check this out)

> * grow() comments are wrong. The method does not always allocate a new
> buffer.

It says so in the comments, just not at the very beginning.
I've changed the wording somewhat, but here all my limitations as a
non-native speaker show up.
Please suggest a different wording if you feel it's not clear.

> * grow() implementation is odd. It ignores the remaining blob space
> size. Am I missing something? I have a feeling you implemented something
> other than grow() so that you can rely on its strange combination of
> do-not-grow and grow-even-if-you-do-not-have-to logic in the rest of the
> code. This needs to be fixed in grow() and in the callers().

It's up to the caller to determine whether it needs to grow the store.
It can't be done in MemBlob itself, since it doesn't know what part of
the blob it needs to copy (unless we code it up so that it trusts the
caller, but then it won't really make coupling looser)
If the caller calls grow() when it's not appropriate, it will degrade
performance by unnecessarily copying data, but it won't break the
model or cause harm.

> * I wonder if SBuf::grow() should be moved to MemBlob that may
> (eventually) have a low-level knowledge on how to do that efficiently.
> What do you and others think?

see previous comment.

> * Please use .length() everywhere you can. Do not mix .len_ and .length().

Ok, checked.
I've left len_ where it's being assigned to another object's len_ or
in very low-level stuff (e.g. object dumping, stuff doing pointer
maths).

> * s/shortcut: same length and same store/shortcut: same store, offset,
> and length/

Ok

> * SBuf::consume() comments are stale, talking about NULL.

Ok.

> * "actual" in SBuf::consume() should be const.

Ok.

> * getStats should return a const reference not a copy of, potentially
> large, stats object.

Done.

> * Please use XMIN or XMAX instead of manually comparing values with a
> ?-operator. It is much easier to see and check the logic that way.

now using xmin() and xmax() as per Amos' suggestion.

> * Exceptions thrown by SBuf will lack source code location info, right?
> I think we need to add a ThrowHere(ex) macro that supplies that to the
> ex parameter.

No, their constructor passes it through.
ThrowHere may make sense in a more general way - see our discussions
on warns() etc.

> * bufEnd() should return the pointer to the byte after the last one. You
> can call it end(). If you find yourself accessing the last byte often,
> add last() that returns a char.

I need to rework a few things then.
last() is not really needed.

> * s/prototype/starter/ or s/prototype/primer/
> And no need to prefix the corresponding member with store, IMO.
> * s/A SBuf is GUARRANTEED to be defined.//
> Any C++ object is "defined".
> * cow() declaration comments are stale

Fixed.

> * Please rename substrSelf() as discussed.

I don't recall reaching a decision about the target name.
We can decide to drop it entirely (make it private), but the last
agreement I recall is that since it's not part of the std::string API,
we're on our own in defining it, and we haven't really defined it.

> * Please capitalize static members.

> * s/roundto/RoundTo/

Now in libmiscutil. Should it be capitalized anyways?

> * s/importCString/absorb/ or s/importCString/absorbCString/
> because you are taking full control over the string memory.
>
> * roundto is declared as a global static function in a .cci file. It
> should probably be declared as an inline.

Fixed.

> * Use static_cast in SBuf::plength() and elsewhere. It would be much
> easier to find and tune casting later if needed.
>
> * s/store/blob/ or s/store/mem/
> We have enough store* stuff in squid.
>
> * exportCopy comments are wrong or stale. It does not return NULL.  It
> should throw on memory allocation errors (but probably does not yet).
> The comment should say that the returned string is zero terminated.

Fixed.

> * Remove comments after #includes. Some tools break on C++ comments
> there and you are partially lying and partially not supplying any new
> info there.

Fixed.

>    #include "config.h" //various typedefs
>    #include "Debug.h"  //debugs etc
>
>
> At this point, I got "Launchpad is offline for scheduled maintenance. We
> should be back soon." error so I will have to look at MemBlob later.

Thanks.

-- 
    /kinkie
Received on Thu Feb 26 2009 - 18:34:04 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 27 2009 - 12:00:02 MST