Re: SBuf review

From: Kinkie <gkinkie_at_gmail.com>
Date: Thu, 18 Sep 2008 22:14:06 +0200

On Thu, Sep 18, 2008 at 7:26 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On Thu, 2008-09-18 at 14:10 +0200, Kinkie wrote:
>
>> The feature branch is at https://code.launchpad.net/~kinkie/squid/stringng
>
> Here is another round of comments, based on the current SBuf.h in the
> above repository. I have marked my comments with these prefixes:
>
> *B1*: a blocking issue. I do not recommend proceeding much further until
> this issue is resolved because the code may not be accepted or would
> have to be changed significantly to address the issue.
>
> *B2*: I am not sure yet, but this may become a blocker for me. If you
> are not certain, please consider save us some time by implementing the
> requested changes. If you are certain that I am wrong, a discussion
> would be nice and may save us from trouble later.
>
> *B3*: Everything else.
>
>
> *B1* I still think that string-manipulation interfaces should be moved
> into a String class.

I could agree to that if I could see an equally efficient way to implement this.
I've called the class SBuf because I see it as being a bit of a String
and a bit of a Buf.
You may notice that there are NO functions which deal with encoding,
for instance. I don't plan to add them to this class.

> *B2* I still think that having NULL Strings is a bad idea.

Lazy instantiation?
I can agree to a name change. But then it's also a pretty
straightforward porting of a MemBuf construct with the same name.
Reasoning goes, if it's in MemBuf there may be a valid reason for it.

> *B2* I am not sure placing your classes in include/ is better than
> placing them in src/. We have discussed similar questions before. I do
> not know whether there was consensus, but I hope we agreed that only
> Squid-independent portability/wrapper code should live outside of src/.
> I think your code will (or should) be Squid-dependent. It will make a
> lot of things easier. It is difficult to develop outside of src/ and it
> does not buy us anything in this case.

I agree it should be Squid-dependent, as it will have to hook into
MemPools, Debug, CacheMgr and various other places.
I'm trying to follow the squid 3 coding guidelines as expressed in the
wiki, but maybe I misunderstood them.
I can move it to src/ no problem.

> *B2* class outOfBoundsException: public std::exception
>
> It is better to inherit all your customized exceptions from the existing
> TextException. Otherwise, the catching code will become unnecessary long
> and the exception class code will duplicate a lot of things.
> TextException itself should be polished and split or renamed to
> Exception, but that is a different topic. Most generic catching code
> should catch TextExceptions (which includes their children).

I've followed recommended c++ coding guidelines, but I have no issues
in implementing this.

> *B1* Exception specifications: type method(...) throw(...);
>
> Surprisingly (well, at least it was to me!), exception specifications
> are a bad idea in virtually all cases except for empty specifications.
> Empty specifications should also be avoided unless you Know What You Are
> Doing and Absulutely Sure About That.
>
> Please remove all your empty throw() exception specifications.
>
> Please comment out (but do not remove) your non-empty exception
> specifications. They are useful as documentation:
> type method(...); // throw(...)
> or perhaps
> type method(...); // throws ...
>
> Thinking I am nuts? Here is one of the authoritative references on this
> fascinating subject: http://www.gotw.ca/publications/mill22.htm

Ok. Leaving them documented is the most important thing to me.
I can also move them to doxygen comments so that they'll be more visible.

> *B1* outOfBoundsException::what() returns a string that does not exist
> outside of that method. One more reason to move non-trivial code outside
> of the header where we do not have to review it (yet) :-).
>
>
> *B2* Move helper class declarations outside of SBuf. Class is not a
> namespace. Nested classes complicate the overall impression from the API
> and tempt you to do things you really should not.

If possible at all, I'd prefer not to.
The reason why classes are placed like that is to enforce
accessibility constraints without using friend classes as much as
possible (see below)

> *B1* dump() methods should be constant.

Trivial, will do.

> *B3* If a class does not have both dump() and print() methods, let's
> call the only printing method print(). This will make it easier to
> manipulate printable objects for debugging.

The convention I followed is: dump() is for debugging, stats etc.
print() is for operator <<.
I can adhere to any other convention, as long as there's consensus -
ideally to the point where it's sanctioned in the coding guidelines.

> *B2* If SBufStore is refcounted, why is it not using the RefCountable
> API we already have? If this is some kind of optimization, I would
> prefer to see it done after the code is reviewed, merged, and debugged.

This code is very low-level. and it's also very self-contained. I aim
for maximum efficiency.

> *B2* Please make SBufStore data members private. They are too sensitive
> to expose them like that. With the sensitive info open, we will not
> catch all the abuses (or spend too much time explaining why a public
> interface should not be used).

The whole SBufStore class is a protected member of SBuf. Noone except
for SBuf and its derived or friend classes can even see objects of
that class, let alone manipulate them.
In this case I've deliberately broken object layering to increase
performance and code compactness significantly; otherwise SBuf would
have to proxy most of its methods to the underlying SBufStore, which
then would lack a significant understanding of what's going on. It
would also make code much harder to follow and read as the logic would
be split in two parts.

> *B3* Please rename SBufStore::init() to a more traditional allocate().
> Even your own comment says that the block is not initialized!

Ok.

> *B2* init() is missing asserts that the buffer has not been allocated
> twice. If that code is meant to be called immediately from constructors
> only, then at least make it private and add a comment about that. It is
> important to note that the object is in invalid state until init() has
> been called.
>
>
> *B1* SBufStore::init() and SBufStore in general should not change the
> reference counter. Reference counting is the prerogative of smart
> pointers, not the objects they point to (even if the object stores the
> counter). You need to add a smart pointer so that the code becomes clear
> and manageable. Again, I suggest using RefCountable API to start with.
> It will be easy to replace/optimize later, with no significant effect on
> user code.

It's there as a shortcut, but I can move that out of that function no problem.

> Again, this is why I keep suggesting that a polished API is agreed on
> first, before we start finding bugs in the unpolished API
> implementation.
>
>
> *B2* SBufStore constructors should be declared before init() and other
> methods. Same for SBuf.

It's cosmetic, will do.

> *B1* SBufStore(const SBuf &S) is not a copy constructor but the comment
> says it is.

Ok.

> *B2* SBufStore(const SBuf &S) makes implicit conversions possible. Do we
> really need these? Doesn't creating "store" from a buffer that uses that
> "store" make you feel uncomfortable?

No, it doesn't. See my previous comments about who's the only ones who
can do that.
If that was a public API, then I'd agree with you 100%; but it's not.

> *B1* SBufStore is missing a copy constructor and an assignment operator.
> You do not have to actually implement them, but you have to declare them
> to make sure the defaults are not used. Make them private if you think
> they should never be used.

Ok.

> *B3* It is probably too early to review statistics, but "alloc" does not
> seem to contain the "number of allocation operations". This is probably
> because it is maintained in the code that does not allocate anything
> (SBuf).

IIRC it counts the number of SBuf objects allocated, which may or may
not allocate a SBufStore.

> *B1* SBuf assignment does not handle "assignment to self".

Right. Is it safe to implement that as a noop?

> *B1* Appending interface and/or comments do not make sense:
>
>> 575 * Append to the supplied SBuf, <b>modifying</b> this.
>> SBuf append(SBuf& S)
>
> If we are appending TO the supplied SBuf, then the method should be
> const (we would not be modifying this) and should be named appendTo. I
> do not think we need such a method. If we are appending FROM the
> supplied SBuf, then that SBuf should be const (we will not be modifying
> it) and the comments should not talk about that supplied buffer
> maintenance (again, because we do not modify it).

Comment is broken. I'll fix it.
*this is not const, except for const methods and for those marked as
*Copy methods.

> The return value should be documented, especially if you return a copy
> of "this" buffer. Why would you do that?

Hm.. should return a reference, not a copy. Reason is:

foo.append(bar).append(gazonk).append(baz).chop();

> Most of the above comments apply to other SBuf::append() methods as
> well.
>
>
> *B1* Please review all methods for const-correctness. Comparison
> operators, for example, should have constant arguments.

Ok.

> *B2* I would remove numerous comments about what gets changed and what
> does not. Const declarations already say as much!

I prefer being redundant than unclear, but it's fine.

> *B3* Is dumpStats() needed? Is it replaceable with stats().dump() or
> similar? "Smaller" APIs are often "better" and have fewer bugs.

stats classes are not visible, but it can be arranged, and probably
makes more sense.

> *B3* slice() should probably be renamed to something like cut() or
> chop() since you are not producing slices, you are cutting away
> content.
>
>
> *B2* slice() should be implemented by calling other, more primitive
> chopping methods. Same for other methods that simply combine existing
> simpler actions. Please do not force us to review the same code many
> times.

I'll check. In my opinion, where doing otherwise doesn't make things
significantly more efficient, slice() should be the low-level
primitive, and the other methods the shortcuts.

> *B2* I am not sure search methods should throw exceptions when the thing
> is not found. Exceptions are expensive and complex things. Not finding
> something is a common case. The searcher calling find() should assume
> that the thing may not be there. If we want to provide a confident
> searcher a convenient but safe method, lets have two methods:
>
> // returns invalid position if not found (like std::npos)
> Position find(...) const;
> // throws if not found
> Position findOrThrow(...) const;

std::npos sounds like a good idea. Ignorance on my part, I didn't know
it existed.

> This gives user a choice. A confident user will be able to use/pass
> findOrThrow() results immediately, without checking, but safely.
>
>
> *B1* Please move the Tokenizer class into its own header. FWIW, I may
> not be able to review it until SBuf/String issues are resolved.

Sure, I have full intention of complying with the coding guidelines.

Regarding String, I recall that the aim of this class is to replace
MemBuf and char[] instances with something which increases efficiency
while retaining all the advantages of automatic memory management.

Thanks for taking the time and for being so thorough..

-- 
 /kinkie
Received on Thu Sep 18 2008 - 20:14:14 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 19 2008 - 12:00:04 MDT