Re: SBuf review

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 19 Sep 2008 14:36:14 +1200

Kinkie wrote:
> On Thu, Sep 18, 2008 at 7:26 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On Thu, 2008-09-18 at 14:10 +0200, Kinkie wrote:
>>
>>> The feature branch is at https://code.launchpad.net/~kinkie/squid/stringng
>> Here is another round of comments, based on the current SBuf.h in the
>> above repository. I have marked my comments with these prefixes:
>>
>> *B1*: a blocking issue. I do not recommend proceeding much further until
>> this issue is resolved because the code may not be accepted or would
>> have to be changed significantly to address the issue.
>>
>> *B2*: I am not sure yet, but this may become a blocker for me. If you
>> are not certain, please consider save us some time by implementing the
>> requested changes. If you are certain that I am wrong, a discussion
>> would be nice and may save us from trouble later.
>>
>> *B3*: Everything else.
>>
>>
>> *B1* I still think that string-manipulation interfaces should be moved
>> into a String class.
>
> I could agree to that if I could see an equally efficient way to implement this.
> I've called the class SBuf because I see it as being a bit of a String
> and a bit of a Buf.
> You may notice that there are NO functions which deal with encoding,
> for instance. I don't plan to add them to this class.

I kind of fuzzily disagree, the point of this is to replace MemBuf +
String with SBuf. Not implement both again independently duplicating stuff.

>
>> *B2* I still think that having NULL Strings is a bad idea.
>
> Lazy instantiation?
> I can agree to a name change. But then it's also a pretty
> straightforward porting of a MemBuf construct with the same name.
> Reasoning goes, if it's in MemBuf there may be a valid reason for it.
>

Agnostic. One of the open bugs is to clear up handling of NULL strings.
They are needed in abstract for some use-cases such a URI handling,
where parts of the URI may be missing. Whether they need to be NULL or
0-length is iffy.

>> *B2* I am not sure placing your classes in include/ is better than
>> placing them in src/. We have discussed similar questions before. I do
>> not know whether there was consensus, but I hope we agreed that only
>> Squid-independent portability/wrapper code should live outside of src/.
>> I think your code will (or should) be Squid-dependent. It will make a
>> lot of things easier. It is difficult to develop outside of src/ and it
>> does not buy us anything in this case.
>
> I agree it should be Squid-dependent, as it will have to hook into
> MemPools, Debug, CacheMgr and various other places.
> I'm trying to follow the squid 3 coding guidelines as expressed in the
> wiki, but maybe I misunderstood them.
> I can move it to src/ no problem.

I looked at this for IPAddress. It's rather tricky to extend things in
include/ safely, so src/ is preferred for everything not an OS
compatibility code.

Hopefully in a new sub-folder (src/buffers/ ?) consistent with the
SourceLayout plans.

>
>> *B2* class outOfBoundsException: public std::exception
>>
>> It is better to inherit all your customized exceptions from the existing
>> TextException. Otherwise, the catching code will become unnecessary long
>> and the exception class code will duplicate a lot of things.
>> TextException itself should be polished and split or renamed to
>> Exception, but that is a different topic. Most generic catching code
>> should catch TextExceptions (which includes their children).
>
> I've followed recommended c++ coding guidelines, but I have no issues
> in implementing this.
>
>> *B1* Exception specifications: type method(...) throw(...);
>>
>> Surprisingly (well, at least it was to me!), exception specifications
>> are a bad idea in virtually all cases except for empty specifications.
>> Empty specifications should also be avoided unless you Know What You Are
>> Doing and Absulutely Sure About That.
>>
>> Please remove all your empty throw() exception specifications.
>>
>> Please comment out (but do not remove) your non-empty exception
>> specifications. They are useful as documentation:
>> type method(...); // throw(...)
>> or perhaps
>> type method(...); // throws ...
>>
>> Thinking I am nuts? Here is one of the authoritative references on this
>> fascinating subject: http://www.gotw.ca/publications/mill22.htm
>
> Ok. Leaving them documented is the most important thing to me.
> I can also move them to doxygen comments so that they'll be more visible.
>
>> *B1* outOfBoundsException::what() returns a string that does not exist
>> outside of that method. One more reason to move non-trivial code outside
>> of the header where we do not have to review it (yet) :-).
>>
>>
>> *B2* Move helper class declarations outside of SBuf. Class is not a
>> namespace. Nested classes complicate the overall impression from the API
>> and tempt you to do things you really should not.
>
> If possible at all, I'd prefer not to.
> The reason why classes are placed like that is to enforce
> accessibility constraints without using friend classes as much as
> possible (see below)
>
>> *B1* dump() methods should be constant.
>
> Trivial, will do.
>
>> *B3* If a class does not have both dump() and print() methods, let's
>> call the only printing method print(). This will make it easier to
>> manipulate printable objects for debugging.
>
> The convention I followed is: dump() is for debugging, stats etc.
> print() is for operator <<.
> I can adhere to any other convention, as long as there's consensus -
> ideally to the point where it's sanctioned in the coding guidelines.

The whole stats output dump() system IIRC was partially completed moving
to sstream << .
So a specific sstream<< and ostream << 'friends' might be better than
print() and dump() anyways.

>
>> *B2* If SBufStore is refcounted, why is it not using the RefCountable
>> API we already have? If this is some kind of optimization, I would
>> prefer to see it done after the code is reviewed, merged, and debugged.
>
> This code is very low-level. and it's also very self-contained. I aim
> for maximum efficiency.

Ditto on the RefCountable. This was the API I was asuming you would use
anyway. It's already well debugged, and tested, and low-level inline.
Just inherit from RefCount and use I think.

If we find in testing that we need the efficiency, we can test the two.

>
>> *B2* Please make SBufStore data members private. They are too sensitive
>> to expose them like that. With the sensitive info open, we will not
>> catch all the abuses (or spend too much time explaining why a public
>> interface should not be used).
>
> The whole SBufStore class is a protected member of SBuf. Noone except
> for SBuf and its derived or friend classes can even see objects of
> that class, let alone manipulate them.
> In this case I've deliberately broken object layering to increase
> performance and code compactness significantly; otherwise SBuf would
> have to proxy most of its methods to the underlying SBufStore, which
> then would lack a significant understanding of what's going on. It
> would also make code much harder to follow and read as the logic would
> be split in two parts.
>
>> *B3* Please rename SBufStore::init() to a more traditional allocate().
>> Even your own comment says that the block is not initialized!
>
> Ok.
>
>> *B2* init() is missing asserts that the buffer has not been allocated
>> twice. If that code is meant to be called immediately from constructors
>> only, then at least make it private and add a comment about that. It is
>> important to note that the object is in invalid state until init() has
>> been called.
>>
>> *B1* SBufStore::init() and SBufStore in general should not change the
>> reference counter. Reference counting is the prerogative of smart
>> pointers, not the objects they point to (even if the object stores the
>> counter). You need to add a smart pointer so that the code becomes clear
>> and manageable. Again, I suggest using RefCountable API to start with.
>> It will be easy to replace/optimize later, with no significant effect on
>> user code.
>
> It's there as a shortcut, but I can move that out of that function no problem.

Shortcuts like that in refcounting are a debuggers nemesis. The counts
need to be kept symetrical at the layer doing the counting/copying.

The auth memory leak bugs this year have all been due to refcount
asymetry. I'm suspicious that the FD leaks still undiscovered are also.

>
>> Again, this is why I keep suggesting that a polished API is agreed on
>> first, before we start finding bugs in the unpolished API
>> implementation.
>>
>>
>> *B2* SBufStore constructors should be declared before init() and other
>> methods. Same for SBuf.
>
> It's cosmetic, will do.
>
>> *B1* SBufStore(const SBuf &S) is not a copy constructor but the comment
>> says it is.
>
> Ok.
>
>> *B2* SBufStore(const SBuf &S) makes implicit conversions possible. Do we
>> really need these? Doesn't creating "store" from a buffer that uses that
>> "store" make you feel uncomfortable?
>
> No, it doesn't. See my previous comments about who's the only ones who
> can do that.
> If that was a public API, then I'd agree with you 100%; but it's not.
>
>> *B1* SBufStore is missing a copy constructor and an assignment operator.
>> You do not have to actually implement them, but you have to declare them
>> to make sure the defaults are not used. Make them private if you think
>> they should never be used.

With a fatal assert or loud DBG_CRITICAL debugs() complaint to catch bad
behavior.

>
>> *B3* It is probably too early to review statistics, but "alloc" does not
>> seem to contain the "number of allocation operations". This is probably
>> because it is maintained in the code that does not allocate anything
>> (SBuf).
>
> IIRC it counts the number of SBuf objects allocated, which may or may
> not allocate a SBufStore.
>

Agree. Nice if you want to report:
   N SBuf using X memory.
But the store still needs to do counting and link somehow to produce
that report.

>> *B1* SBuf assignment does not handle "assignment to self".
>
> Right. Is it safe to implement that as a noop?

Worst case, do it with a fatal assertion and we will find out in testing.

>
>> *B1* Appending interface and/or comments do not make sense:
>>
>>> 575 * Append to the supplied SBuf, <b>modifying</b> this.
>>> SBuf append(SBuf& S)
>> If we are appending TO the supplied SBuf, then the method should be
>> const (we would not be modifying this) and should be named appendTo. I
>> do not think we need such a method. If we are appending FROM the
>> supplied SBuf, then that SBuf should be const (we will not be modifying
>> it) and the comments should not talk about that supplied buffer
>> maintenance (again, because we do not modify it).
>
> Comment is broken. I'll fix it.
> *this is not const, except for const methods and for those marked as
> *Copy methods.
>
>> The return value should be documented, especially if you return a copy
>> of "this" buffer. Why would you do that?
>
> Hm.. should return a reference, not a copy. Reason is:
>
> foo.append(bar).append(gazonk).append(baz).chop();
>
>> Most of the above comments apply to other SBuf::append() methods as
>> well.
>>
>>
>> *B1* Please review all methods for const-correctness. Comparison
>> operators, for example, should have constant arguments.
>
> Ok.
>
>> *B2* I would remove numerous comments about what gets changed and what
>> does not. Const declarations already say as much!
>
> I prefer being redundant than unclear, but it's fine.
>
>> *B3* Is dumpStats() needed? Is it replaceable with stats().dump() or
>> similar? "Smaller" APIs are often "better" and have fewer bugs.
>
> stats classes are not visible, but it can be arranged, and probably
> makes more sense.
>
>> *B3* slice() should probably be renamed to something like cut() or
>> chop() since you are not producing slices, you are cutting away
>> content.
>>

+1. The parsing stuff I ran into last string change mostly uses
copy-n-cut() semantics. less change later to name it so here now.

>>
>> *B2* slice() should be implemented by calling other, more primitive
>> chopping methods. Same for other methods that simply combine existing
>> simpler actions. Please do not force us to review the same code many
>> times.
>
> I'll check. In my opinion, where doing otherwise doesn't make things
> significantly more efficient, slice() should be the low-level
> primitive, and the other methods the shortcuts.
>
>> *B2* I am not sure search methods should throw exceptions when the thing
>> is not found. Exceptions are expensive and complex things. Not finding
>> something is a common case. The searcher calling find() should assume
>> that the thing may not be there. If we want to provide a confident
>> searcher a convenient but safe method, lets have two methods:
>>
>> // returns invalid position if not found (like std::npos)
>> Position find(...) const;
>> // throws if not found
>> Position findOrThrow(...) const;
>
> std::npos sounds like a good idea. Ignorance on my part, I didn't know
> it existed.

I usually expect things like search should return a signed 'offset'. -N
for errors and invalids.

>
>> This gives user a choice. A confident user will be able to use/pass
>> findOrThrow() results immediately, without checking, but safely.
>>
>>
>> *B1* Please move the Tokenizer class into its own header. FWIW, I may
>> not be able to review it until SBuf/String issues are resolved.
>
> Sure, I have full intention of complying with the coding guidelines.
>
> Regarding String, I recall that the aim of this class is to replace
> MemBuf and char[] instances with something which increases efficiency
> while retaining all the advantages of automatic memory management.
>
>
> Thanks for taking the time and for being so thorough..
>

/2c followup. Dont have tiume to review the code itself today though sorry.

Amos

-- 
Please use Squid 2.7.STABLE4 or 3.0.STABLE9
Received on Fri Sep 19 2008 - 02:36:33 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 19 2008 - 12:00:04 MDT