Re: StringNg review request

From: Kinkie <gkinkie_at_gmail.com>
Date: Tue, 21 Sep 2010 19:29:01 +0200

First of all, thanks.

On to the review :)

On Fri, Sep 17, 2010 at 1:25 PM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 09/03/2010 09:21 AM, Kinkie wrote:
>
>> You'll find the branch at lp:~kinkie/squid/stringng
>> A few details on what to home on:
>> src/SBuf,* : the SBuf class proper
>
> The review for SBuf is below. The code is getting better overall, but there
> are still so many corrections that I will need more time to get the other
> classes reviewed.
>
> Review is based on lp:~kinkie/squid/stringng r9471.
>
> Original source code is prepended with "> ". The comments are below the
> source code they refer to. A single "." comment is there just to separate
> one piece of code from another.
>
>
> For SBuf.h:
>
>> #include "Debug.h"
>> #include "RefCount.h"
>
> Remove if not needed for SBuf.h.
> If SBuf.cci or any other file needs these, it should include them.

Fixed.

>> #include <string.h>
>
> Remove if not needed for SBuf.h

fixed.

>> #include <iostream>
>
> Use <iosfwd> instead if possible.

Done.

>> #include <sstream>
>
> Remove if not needed for SBuf.h. Does this need #define protection?

Done. Yes.

>> #define SQUIDSBUFPRINT(s) s.length(),s.exportRawRef()
>
> Missing parens around s.

It's not causing iesues, but won't hurt either. SquidString has the
same issue in trunk.

>> /**
>>  * Container for varioous SBuf class-wide statistics.
>
> spelling

fixed

>>  * \note read operations and compares are NOT counted.
>
> Remove. Many things are not counted.

ok

>
>> class SBufStats {
>>     public:
>
> It would be nice to run source formatting script against the branch.

Sure; Amos, could you point me at where to find it? Thanks!

>>     u_int64_t alloc; ///number of allocation operations
>
> Vague description: What is being allocated? SBuf? internal buffers?
> Consider: "number of constructor calls"

fixed.

> Here and elsewhere, doxygen comments are missing "<" to refer to the member
> on the left?

My ignorance, I didn't know the construct.

>>     u_int64_t live; ///number of free operations
>
> See above.
> Consider: "number of destructor calls"

fixed.

>>     u_int64_t qset; ///number of assigns/appends without content copy
>>     u_int64_t sset; ///number of assigns/appends with content copy
>
> I am not sure what the actual intent here is, but the description does not
> match use. qset is used in clear() and other methods unrelated to
> assigns/appends.

How about reworking the stats so that they actually count the
performed operations? (e.g. assign fast/slow, append fast/slow,
compare fast/slow etc.)?

>>     u_int64_t qget; ///number of get-ops not requiring full data scan
>>     u_int64_t sget; ///number of get-ops requiring a full data scan/copy
>
> It is not clear what "full data scan" is, why it is limited to get
> operations, and why the descriptions are not exactly symmetric.
> I did not review stats correctness because I do not know what most of these
> members really mean.

yes. see above.

>> /**
>>  * String-Buffer hybrid class (UniversalBuffer)
>
> Confusing to uninitiated.
> Consider: A string or buffer.

changed.

>>  * Features: refcounted backing store, cheap copy and substringing
>>  * operations, cheap-ish append operations, adaptive memory management
>
> Remove "cheap-ish append operations" because it is too vague to be useful
> and because there are no special append optimizations for now. At least I do
> not see them in this class and other classes should not matter here.

ok

> Remove "adaptive memory management" buzzwords.

ok

>>  * copy-on-write semantics to isolate change operations to each insatnce..
>
> spelling

fixed.

>>  * See http://www.cplusplus.com/reference/string/string/ for documentation
>> about
>>  * that API.
>
> Remove commercial site ad? Folks will find it if they need it.

done.

>>  * Notice that users MUST NOT hold pointers to SBuf, as doing so
>>  * moots the whole point of using refcounted buffers in the first place.
>>  */
>
> Remove as technically inaccurate. Besides, users MUST NOT do many things.

ok. How about adding some text discouraging pointers and references?
(const ref is ok unless non-const operations are needed)

>> class SBuf {
>>     public:
>>     typedef signed int size_type;
>
> I think we should support 3GB offsets and such. Could be handy for
> referencing memory cache areas without copying. Is there any reason to limit
> this to int? How about using size_t or ssize_t?

3Gb monolithic buffers are not a very good idea imvho (see the maxSize
tuneable). Given this, the best datatype is probably int32_t - helps
for printf-style portability.

>>     static const size_type npos = -1;
>
> Use static_cast<> to avoid dependence on size_type being signed. See
> std::string::npos.

done.

>>     ///  Flag used by search methods to define case sensitivity
>>     static const bool caseInsensitive = false;
>>     static const bool caseSensitive = true;
>
> Convert to enum?

What is the benefit? (my ignorance)

>>     /// Maximum size of a SBuf: 256Mb. Arbitrary, but should be sane.
>>     /// \note only checked when the SBuf needs to be extended, the first
>> alloc can exceed it
>>     static const size_type maxSize=0xfffffff;
>
> Remove "the first alloc can exceed it". Let's be consistent. Besides, the
> user does not control when things get allocated and does not know what
> "first alloc" is.

Removed.

>>     /** Default constructor. Creates a new empty SBuf.
>
> Do not document that a default constructor is a default constructor. Same
> for all other standard methods: constructors, destructors, operators.
> Can you _create_ something _old_?
>
> Replace with: /** Creates an empty SBuf.

done.

>
>>      *
>>      * A SBuf can be empty, but can never be undefined. In other words
>> there is no equivalent
>>      * of NULL. defined/undefined semantics need to be implemented by the
>> caller.
>>      */
>
> Remove: Avoid documenting what SBuf cannot be. If you insist on discussing
> the non-existent APIs, move that discussion to the SBuf.dox file.

How about moving a simplified version of it that to the class documentation?
I have committed such an attempt.

>>     /** Constructor: import c-style string
>>      *
>>      * Create a new SBuf containing a COPY of the contents of the
>>      * c-string
>>      * \param S the c string to be copied
>>      * \param Ssize the size of S. If it is SBuf::npos or unspecified,
>>      *              S must be null-terminated
>
> Remove Ssize parameter. You have too many sizes. If you want a two-parameter
> conversion from c-string, define another constructor with just two
> parameters. See std::string.

std::string API does not cover specifying pos and length when
importing c-strings.
While I agree that mimicking that API is good, isn't this variant more
effective than copy-all-then-substring (or even worse
manipulate-in-c-then-import) ?

>>      * \param pos how many bytes to skip at the beginning of the c-string
>>      * \param n how many bytes to import into the SBuf. If it is
>> SBuf::npos
>>      *              or unspecified, imports to end-of-cstring
>>      */
>>     explicit SBuf(const char *S, size_type Ssize = npos, size_type pos =
>> 0, size_type n=npos);
>
> .
>
>>     /** Import a char[] into a SBuf, copying the data.
>>      *
>>      * Assign a c-style string to a SBuf.
>
> The above two lines are saying essentially the same thing.
>
> You may use "c-string" instead of "c-style string". It is standard and
> shorter.
> Do not use char[] if the argument is not exactly char[].

fixed

>>      * It is the caller's duty to free the imported string, if needed.
>>      * \param S the c string to be copied
>>      * \param Ssize the size of S. If it is SBuf::npos or unspecified,
>>      *              S must be null-terminated
>>      * \param pos how many bytes to skip at the beginning of the c-string
>>      * \param n how many bytes to import into the SBuf. If it is
>> SBuf::npos
>>      *              or unspecified, imports to end-of-cstring
>>      */
>>     SBuf& assign(char const *S, size_type Ssize = npos, size_type pos = 0,
>> size_type n=npos);
>
> Again, remove Ssize parameter. It is not needed.

Are you sure? How about the case of non-null-terminated strings (e.g.
stuff coming out of read(2)? I'd have to offer two variants of the
API, one without Ssize and one with, and then there'd by type clashes
for specifying pos and n, which in turn means that a single call turns
into import-all-then-substring..

> Please use "const char *" instead of "char const *" for consistency sake.
> Same for a few more method declarations.

fixed

>>     /** clear the SBuf as if it was just created.
>>      *
>>      */
>>     void clear();
>
> s/clear/reset/ in the description. Document whether the already allocated
> memory stays allocated. That fact is important for performance-sensitive
> caller code.

Fixed. Changed code behaviour to free memory (it's a #define in .cc),
documented.

>>     /** Append operation
>>      *
>>      * Append the supplied SBuf to the current one; extend storage as
>> needed.
>
> .
>
>>      * If the caller wishes to maintain a copy of the unappended-to
>>      * SBuf she must do so <b>before</b> appending.
>
> Remove. The method is not const and that should be enough. Check other
> methods for the same reminders.

ok

>
>>      */
>>     SBuf& append(const SBuf & S);
>
> .
>
>>     SBuf& append(const char * S, size_type Ssize = npos, size_type pos =
>> 0, size_type n = npos);
>
> Remove Ssize.

See above: same question as for constructor and assign

>>     /** Assignment operation with printf(3)-style definition
>>      * \note arguments may be evaluated more than once, be careful
>>      *       of side-effects
>>      */
>>     SBuf& Printf(const char *fmt, ...);
>
> Add "Avoid this and other ... methods. For transition only."?
> We should be using ostream formatting instead.

Agreed, if only I could think of a sane way to do so (my ignorance).

>>     /** print a SBuf. generally only used by operator <<
>>      */
>>     std::ostream& print(std::ostream &os) const;
>
> Remove "generally only used by operator <<".

ok

>>     /** random-access read to any char within the SBuf
>>      *
>>      * \throw OutOfBoundsException when access is out of bounds
>>      */
>>     _SQUID_INLINE_ const char operator [](size_type pos) const;
>
> This one should not [promise to] check anything or throw. It should be very
> fast instead. See std::string.

ok

>>     /** random-access read to any char within the SBuf.
>>      *
>>      * \throw OutOfBoundsException when access is out of bounds
>>      */
>>     _SQUID_INLINE_ const char at (size_type pos) const;
>
> This one should indeed check and throw.

it does.

>>     /** direct-access set a byte at a specified operation.
>>      * \param pos the position to be overwritten
>>      * \param toset the value to be written
>>      * \throw OutOfBoundsException when pos is of bounds
>>      * \note performs a copy-on-write if needed.
>>      */
>>     void setAt(size_type pos, char toset);
>
> Perhaps document whether setting the next-to-last character is allowed?
> Sometimes string APIs allow for that...

Any character can be set. I do not understand why there may be a distinction..

>>      * \retval 0 argument an this are equal
>
> s/an/and/

fixed

> It would be better to avoid bare "this" in the above comments. It is awkward
> to read.

I can't think of any better way to call it. I've removed the
reference, but I'm not sure it's clearer.
Copy-paste:
    /** compare to other SBuf, strcmp-style
     *
     * Compare two SBufs. behaves like strcmp(3) or strncmp(3) depending
     * on the value of the case_sensitive parameter
     * \param case_sensitive one of SBuf::caseSensitive or SBuf::caseInsensitive
     * \retval >0 argument is greater
     * \retval <0 argument is smaller
     * \retval 0 argument has the same contents
     */

>>      *  \retval true argument is a prefix of the SBuf
>>      *  \retval false the SBuf is shorter than the argument, or it doesn't
>> match
>
> Do not document the false values because it is totally redundant and you
> often get it slightly wrong.

ok.

>>     /** Consume bytes at the head of the SBuf
>>      *
>>      * MemBuf emulation. Consume N chars at SBuf head, or to SBuf's end,
>>      * whichever is shorter. If more bytes are consumed than available,
>>      * the SBuf is emptied
>
> .
>
>>      * \param howmuch how many bytes to remove.
>
> Append "Could be zero."
> s/howmuch/howMuch/ or, better, s/howmuch/n/ for consistency

Changed. n it is.

>>     /** gets global statistic inforations
>
> spelling

fixed.

>>     static const SBufStats& getStats();
>
> Capitalize the first letter of static methods.

Done.

>>     /** Null-terminate the SBuf
>>      *
>>      * Make sure that the first byte AFTER the SBuf is a NULL.
>>      * Doesn't alter the SBuf contents, may trigger a backing store
>> reloaction
>>      */
>>     void terminate();
>
> Remove. Too dangerous if the effect is not persistent. You do not implement
> persistent termination, and we probably do not need a persistent termination
> anyway.

Moved to private member function. I like the idea of it being a
visible action (as opposed to hand-inlining it)

>>     /** Import a c-style string into the memory-managed world
>>      *
>>      * The char* MUST have been dynamically allocated via new,
>
> c-strings are usually allocated using new[] rather than new.

changed the documentation; since it's freed via xfree, it must be
allocated via xmalloc.
The implementation is transitional, what it SHOULD do is construct a
zero-copy MemBlob out of the user-supplied c-string, and it would be
freed as a MemBlob (currently xfree, when implementation is finalized
memFreeString).
But I agree that it is a dangerous API and the benefits probably do
not balance the risks, so I will not object to removing it.

>>      * and once imported MUST NOT be freed or messed with by the caller.
>>      * It will be freed via delete() by the memory management subsystem
>
> and so they should be freed using delete[] rather than delete.
>
> Moreover, your implementation is using xfree() and not delete[]!
>
> And this is inconsistent with xfree()-based API below. Pick one API and
> stick with it. Do we really need this error-prone method, BTW?

>
>>      * at the appropriate moment
>>      * (what "the appropriate moment" means is UNDEFINED to the caller).
>>      *
>>      * \param len the length of the String. If it's npos or unspecified,
>> then
>>      *    cstr must be NULL-terminated.
>>      */
>>     SBuf& absorb(char * cstr, size_type len=npos);
>
> .
>
>>     /** Export the contents of the SBuf to a C-string
>>      *
>>      * exports the SBuf by copying the contents to a newly-allocated c
>> string.
>>      * Freeing the returned char* is up to the caller, who  MUST xfree()
>> it.
>>      * The returned string's contents are NOT guarranteed not to contain
>> \0.
>>      * The returned string guarranteed to be NULL-terminated.
>>      */
>>     char* exportCopy() const;
>
> Add "Avoid. For transition only." ?

Done.
>>     /** Copy SBuf contents into user-supplied C buffer.
>>      *
>>      * Export a copy of the SBuf's contents into the user-supplied
>>      * buffer, up to the user-supplied-length.
>>      * The exported buffer is guarranteed to have a terminating \0, but
>>      * may truncate contents if the SBuf is too big.
>>      * \return 0 if all is OK
>>      * \return num the number of actually-copied chars, INCLUDING the \0
>>      * \note uses xstrncpy as low-level function
>>      */
>>     size_type copy(char *buf, size_type buflen) const;
>
> Do we really need to zero-terminate the above? This low-level method may be
> handy for I/O which does not require zero-termination. We should minimize
> the number of APIs that terminate, IMO.

Zero-termination is performed by xstrncpy(), AFTER the copy is done.
It doesn't cow().
Should I mention this in the documentation instead?

>>     /** exports a reference to the SBuf internal storage.
>>      * \warning THIS CALL IS DANGEROUS!
>
> The call itself is not dangerous. Using raw internal storage is.

Text changed

>>      * The caller MUST hold a reference to the SBuf while
>>      * accessing the storage, and MUST NOT attempt any operation on the
>> SBuf
>>      * while holding an external reference to the data, or she might
>> invalidate
>>      * the memory reference.
>
> The above is vague and too restrictive. Use this, based on
> std::string::data() description, instead:
>
> Returns a pointer to buffered content. No terminating null character is
> appended (use c_str() for that). The returned value points to an internal
> location which contents are guaranteed to remain unchanged only until the
> next call to a non-constant member function of the string object. Such a
> call may be implicit (e.g., when SBuf is destroyed upon leaving the current
> context).

Changed.

> Document whether this method can ever return NULL.

Done.

>>      * This is a very UNSAFE way of accessing the data.
>>      * The caller better be sure about what she's doing.
>>      * The returned string is NOT NULL-terminated. For that, use
>> exportTRef.
>>      * \see exportTRef
>>      * \note the memory management system guarrantees that the exported
>> region
>>      *    of memory will remain valid if the caller keeps holding
>>      *    a valid reference to the SBuf object and does not write or
>> append to
>>      *    it. For example:
>>      * \code
>>      * SBuf foo("some string");
>>      * const char *bar=foo.exportRawRef();
>>      * doSomething(bar); //safe
>>      * foo.append(" other string");
>>      * doSomething(bar); //unsafe
>>      * \endcode
>>      */
>
> I think the above should go into SBuf.dox

I'm not really sure; I'd like this to be immediately visible even when
the dox are not translated.

>>     char* exportRawRef() const;
>
> I do not like the Ref() suffixes. You are not really exporting a reference
> to internal storage. You are just providing a raw pointer to it. Call this
> "raw()" or, following std::string convention, "data()".

how about "internalStorage", or "getRawDataPtr"? I'd like to keep very
visible also in calling code the fact that this is a dangerous call.
Also, this call should be mostly usedful during the transition.

>>     /** exports a null-terminated reference to the SBuf internal storage.
>>      * \warning THIS CALL IS DANGEROUS!
>>      *
>>      *  The caller MUST hold a reference to the SBuf while
>>      * accessing the storage, and MUST NOT attempt any operation on the
>> SBuf
>>      * while holding an external reference to the data, or she might
>> invalidate
>>      * the memory reference.
>>      * This is a very UNSAFE way of accessing the data.
>>      * The caller better be sure about what she's doing.
>>      * \see exportRawRef
>>      * \note the memory management system guarrantees that the exported
>> region
>>      *    of memory will remain valid if the caller keeps holding
>>      *    a valid reference to the SBuf object and does not write or
>> append to
>>      *    it
>>      */
>>     char* exportTRef();
>
> See above. Similar comments apply here.
>
> Let's call this c_str(), following std::string convention.

done.

>
> You may want to say explicitly that our c_str() never returns NULL.

Done. I've changed the comment to mimic the one for exportRawRef()

>>     /** Get the length of the SBuf
>
> Replace with: Returns the number of bytes stored in SBuf.

done.

>>      * A SBuf is GUARRANTEED to be defined. It can be empty (length()==0)
>
> Remove.

done.

>>      */
>>     _SQUID_INLINE_ size_type length() const;
>
> .
>
>>     /** Get the length of the SBuf, as a signed integer
>>      *
>>      * Compatibility function for printf(3) which requires a signed int
>>      * \throw SBufTooBigException if the SBuf is too big for a signed
>> integer
>
> Kind of wrong exception, but not a big deal since it will have filename and
> line number.

Ok.

>>     /** Request to extend the SBuf's available store space.
>>      *
>>      * After the grow request, the SBuf is guarranteed to have at least
>> minsize
>>      * bytes of total backing store. If it has already more, do nothing.
>
>
> Remove "If it has already more, do nothing." to leave space for future
> optimizations. The caller should not really care about this anyway.

done

>>      * \throw SBufTooBigException if the user tres to allocate too big a
>> SBuf
>>      */
>>     void grow(size_type minsize);
>
> .
>
>>     /** slicing method
>>      *
>>      * extracts a SBuf containing the portion from <i>from</i> to
>> <i>to</i>,
>>      * intersected with the actual SBuf length.
>
> Awkward description, especially because we do not extract but erase.
> Consider: Removes SBuf prefix and suffix, leaving a sequence of bytes from
> <i>from</i> to <i>to</i>.

Fixed.

> However, I think we should follow std::string lead here and change the
> parameters to be "pos" and "n". I do not know why they picked those, but
> let's be consistent with them and with our other methods. They also do not
> clash with "to" and "from" words in the description :-).  I would also
> rename this to leave(), which will help you to check that all callers use
> the right params after the parameters change :-).
>
>
>>      * If to is less or equal to from, an empty SBuf is returned.
>
> Missing "than": If to is less than or equal to from, an empty SBuf is
> returned.

Fixed.

>>      * \param from start substringing from this byte
>>      * \param to end of substring. If longer than the SBuf
>>      *     it will be limited to its size.
>>      *     SBuf::npos means "to end of SBuf"
>>      * \throw OutOfBoundsException when from exceeds the size of the SBuf
>>      * \todo: possible optimization: if the string is at the end of the
>>      *   MemBlob we can decrease the MemBlob tail-pointer so that a
>> subsequent
>>      *   append will reuse the freed space.
>>      */
>>     SBuf& chop(size_type from, size_type to=npos);
>
> The \todo comment does not belong here. It belongs to .cc.

done.

>>     /** Remove characters in the toremove set at the beginning, end or
>> both
>>      *
>>      * \param toremove characters to be removed. Stops chomping at the
>> first
>>      *        found char not in the set
>>      * \param atBeginning if true (default), strips at the beginning of
>> the SBuf
>>      * \param atEnd if true (default), strips at the end of the SBuf
>>      */
>>     SBuf& chomp(const SBuf &toremove, bool atBeginning=true, bool
>> atEnd=true);
>
> Consider renaming to trim() to avoid visual clashes with chop() and a clash
> with Perl chomp that does not trim the prefix.

done. Perl's chomp was indeed the inspiration.

>>     MemBlob::Pointer store_; /// handle to the MemBlob holding the memory
>> block alive
>
> Consider: ///< memory block, possibly shared with other SBufs

done

>>     size_type off_; /// offset of this string from store_->mem
>
> Consider: ///< our content start offset from the beginning of the possibly
> shared store_
>
>
>>     size_type len_; /// length of the SBuf
>
> Consider: ///< number of our content bytes in the possibly shared store_
>
>
>>     u_int16_t nreallocs; /// number of reallocs this SBuf has been subject
>> to
>
> Since nreallocs is copied on assignment, it's true meaning is rather
> difficult to define. However, it is _not_ the number of reAlloc() calls that
> _this_ SBuf has seen.

nreallocs is meant to be a hint to the memory management routines to
define the likeness of needing a grow operation, and thus to define
how much headroom to leave. It's currently unused in favour of a
static (20% headroom + rounding) approach.
Any feedback on optimizing the heuristic and when to propagate the
hint is welcome.

>>     static SBufStats stats; /// class-wide statistics
>>     static const size_type maxSBufSize = 0xffffff; // tuneable
>
> Missing description. What is it? How is it different from maxSize defined
> earlier?

duplicate. Removing.

>>     static const size_type malloc_overhead = 8; //tuneable.
>
> Missing description. What is it?

Another yet-unused optimization for memory management.
The general code to define how much RAM to allocate for a SBuf
requesting "bytes" storage would be something similar to:
round_to_boundary(bytes + malloc_overhead + bytes *
multiplier_function(nreallocs)) - malloc_overhead

where:
- round_to_boundary (MemBlob::memAllocationPolicy) aims making the
memory block page-aligned as much as possible
- malloc_overhead is a constant (generally os- and libc- dependent)
which takes into account libc's malloc overhead
- mutliplier_function (SBuf::estimateCapacity) would try to give more
headroom to long-lived SBufs with lots of append operations

>>     static MemBlob::Pointer storePrototype;
>>     _SQUID_INLINE_ static MemBlob::Pointer getStorePrototype(); //only
>> used by anonymous constructor
>
> Static thing names should start with a capital letter.

done.

>>     _SQUID_INLINE_ char * buf() const;
>>     _SQUID_INLINE_ char * bufEnd() const;
>
> Why const if they allow SBuf modification?

They don't allow modifications. They are private functions to get
pointers and end of "our" store, for interfacing with the legacy C
world.

>>     _SQUID_INLINE_ size_type estimateCapacity(size_type desired);
>
> Should be const if not static.

Done. Should be inline, really.

> I noticed that you print "this" and similar raw pointers in debugging
> messages. In my experience, those quickly become unusable in non-trivial
> logs because when old objects are deleted, the new ones reuse their
> pointers. Consider adding unique SBuf identifiers instead. We can always
> remove them later if the overhead of extra 32 or 64 bits becomes too great
> after the transition is over.

Sounds a good idea. How about making this depend on a SBUF_DEBUG or
some other preprocessor macro?

> For SBuf.cci:
>
>> #include <limits.h>
>
> Should we use <climits> if available?

why not? done. Also added the check in configure.in

>> SBuf& SBuf::operator = (const SBuf & S)
>> {
>>     //stats set in called method
>>     return assign(S);
>> }
>>
>
> Consider removing the comment. It is obvious that this method does nothing
> but forwarding the call.

ok

>> int SBuf::plength() const
>
> Do we need to place the return type on its own line in .cci?

Should get fixed by running the formatter tool.

>> /**
>>  * returns the pointer to the first invalid char after this SBuf
>>  */
>
> Consider: returns the pointer to the first char after this SBuf end

changed

>
>> /**
>>  * Try to guesstimate how big a MemBlob to allocate.
>>  * The result is guarranteed to be bigger than the desired
>>  * size.
>
> s/to be bigger than the/to be at least the/

ok

>>  * \todo improve heuristics
>
> The \todo comment belongs to the implementation below, not description.

ok

>> SBuf::size_type SBuf::estimateCapacity (SBuf::size_type desired)
>> {
>>     size_type given;
>>
>>     given=(desired*12)/10;
>
> Consider: const size_type given = (desired*12)/10;

ok

> IIRC, 100% is a typical growth factor. If your 20% is not based on something
> specific, let's use 100% instead:
>
>    const size_type given = desired*2;

Ok. See comment above on the algorithm.

> We are counting on the MemBlock to round that up, right? Perhaps you should
> add a comment about that.
>
Ok

>
>> /**
>>  * Checks common to all string comparison functions, dealing with
>> undefined strings
>>  * \retval 0 check returns equal
>>  * \retval -1 check returns -1
>>  * \retval 1 check returns 1
>>  * \retval 2 check is not conclusive
>>  */
>> int SBuf::commonCompareChecksPre (const SBuf &S) const
>> {
>>     return 2;
>> }
>
> Remove until needed? The return value descriptions are confusing, IMO.

In the end it wouldoptimize a very unlikely case, so I agree.
Removing it.

>> MemBlob::Pointer SBuf::getStorePrototype()
>> {
>>     if (storePrototype==0)
>>         storePrototype = new MemBlob((char*)"",0);
>>     return storePrototype;
>> }
>
> Please document and re-implement without const cast of "".

Done, even if I'm not sure it's the best possible way.

>> bool
>> SBuf::isEmpty() const
>> {
>>     return (len_==0);
>> }
>
> Consider: return !len_;

What is the benefit? To me this way of doing it seems more readable

> For SBuf.cc:
>
>> SBuf::SBuf(const String &S)
>>     : off_(0),nreallocs(0)
>> {
>>     debugs(SBUF_DEBUGSECTION,DBG_DATA,
>>         MYNAME << "this=" <<(void *)this <<
>>         " copyfrom="<<(void *)&S);
>>     store_=new MemBlob(S.size());
>>     store_->append(S.rawBuf(),S.size());
>>     len_=S.size();
>>     ++stats.alloc;
>>     ++stats.sset;
>>     ++stats.live;
>> }
>
> Use assign() or, better, some common denominator to implement the guts of
> this constructor. The above implementation is already slightly inconsistent
> with the other similar ones. Its only going to get worse.

Fixed.
I'm not sure other cases can be reduced (they share storage)

>> SBuf::SBuf(const char *S, size_type Ssize, size_type pos, size_type n)
>>     : off_(0), nreallocs(0) //len_ and store_ defined inside the call
>> {
>>     //very similar to SBuf::append(const char*...). Can be refactored?
>
> Same as above.

I can't really think how. Most of the code adjusts locals, and would
need to be almost-duplicate-but-slightly-differetn anywys.

>
>>     Must(pos==npos || pos >= 0);
>>     Must(n==npos || n >= 0);
>>
>>     if (S==NULL) //to prevent using strlen later. Appending will also be a
>> nop
>>         Ssize=0;
>>     if (Ssize==npos)
>>         Ssize=strlen(S);
>>     if (n==npos)
>>         n=Ssize-pos;
>>     if (n > Ssize-pos)
>>         n=Ssize-pos;
>
> What if pos > Ssize?

Added a specific check, capping pos to Ssize.

>
>>
>>     const char *actual_start=S+pos;
>
> If S is NULL, actual_start will point to nowhere!
>
> I suspect you are also missing some other checks, but I did not verify
> because Ssize is in the way.

Added some more checks; S==NULL is now an explicit no-op.
On Ssize, see comment above.

>>     debugs(SBUF_DEBUGSECTION,DBG_DATA,
>>         MYNAME << "this=" << (void *)this << ", S=" << (void*) S <<
>>         ", Ssize=" << Ssize << ", pos=" << pos << ", n=" << n);
>>     store_=new MemBlob(estimateCapacity(n));
>>     store_->append(actual_start,n);
>>     len_=n;
>>     ++stats.alloc;
>>     ++stats.sset;
>>     ++stats.live;
>> }
>
> .
>
>> SBuf& SBuf::assign(const SBuf &S)
>> {
>>     debugs(SBUF_DEBUGSECTION,7,
>>         HERE << ", this@"<<(void *)this
>>         << ", other=" <<  ", other@"<< (void *)&S );
>>     ++stats.qset;
>>     if (&S==this) //assignment to self. Noop.
>>         return *this;
>
> "Noop" after a member change looks wrong. Remove the "Noop" part of the
> comment?
>

I've turned it into a real noop instead.
Cosmetic, but significant.

>> void SBuf::grow(size_type minsize)
>> {
>>     debugs(SBUF_DEBUGSECTION, 7,
>>         MYNAME << "this="<<(void *)this <<
>>         ",target=" << minsize);
>>     if (minsize > maxSize)
>>         throw SBufTooBigException(__FILE__,__LINE__);
>>     if (minsize < length()) { //nop. We're already big enough.
>
> The length() is irrelevant here. You need to check available space size, not
> allocated content size. More on the grow() checks below.

Hm.. length() is the safe choice. Otherwise we also have to take care
of offsets in the MemBlob.
The code was however buggy in that it didn't return.
A variation of the code may be :
    if (minsize+off_ < store_->bufSize) {
        debugs(SBUF_DEBUGSECTION, 7, HERE << "not growing");
        return;
    }

I've added it but #ifdef it away for now

>
>
>>         debugs(SBUF_DEBUGSECTION, 7, HERE << "not growing");
>>     }
>>     reAlloc(estimateCapacity(minsize));
>> }
>
> Optimization: We should probably memmove() before we reAlloc() if it helps
> to satisfy minsize.

If we realloc often, it probably means "this" SBuf needs more
headroom. see the allocation algorihtm above.
So reAlloc-ing is IMVHO not a bad thing per se.

>> void SBuf::clear()
>> {
>> #if 0
>>     //enabling this code path, the store will be freed and reinitialized
>>     store_=getStorePrototype(); //uncomment to actually free storage upon
>> clear()
>> #else
>>     //enabling this code path, we try to release the store without
>> deallocating it.
>>     // will be lazily reallocated if needed.
>>     if (store_->RefCountCount() == 1)
>>         store_->bufUsed=0;
>
> Please remove this bufUsed optimization. It appears to be broken (you
> increment bufUsed inconsistently), it is complex, and we can always add it
> later.

Done. I've switched the #if 0 around

>> SBuf& SBuf::append(const SBuf &S)
>> {
>>     debugs(SBUF_DEBUGSECTION,7,
>>         MYNAME << "this=" << (void *)this <<
>>         "appending=" <<(void *)&S<<")");
>>     if (!store_->canAppend(bufEnd(),S.length())) {
>>         grow(length()+S.length());
>>     }
>
> Please remove all such canAppend() guards. Just call grow(). It is much
> safer than risking to forget a guard. Your grow() should not reAlloc if
> growth is not needed. You may rename grow() to assureSpace() or similar if
> you prefer but do not split the growth need check and the realloc call in
> the "user" code.

How about refactoring it? A prepareForAppend that does the guards and
grows if needed (grow() still needs to be a public API, the caller may
know more than the low-level does)

>
>>     if (S.length()==0) { //appending a null sbuf.
>>         ++stats.qset;
>>         return *this;
>>     }
>
> Remove (optimizing unusual cases is harmful to the common ones) or at least
> move it above the grow() call.

ok

>>     debugs(SBUF_DEBUGSECTION,8,
>>         HERE << "appending");
>>     store_->append(S.buf(),S.length());
>>     len_+=S.length();
>
> This is missing cow(). And there are many similar bugs elsewhere. The right
> fix, IMO, is to hide store_ and provide two inlined protected methods
> similar to these:

No, it's not missing cow(). store_->canAppend guarrantees that by
directly appending we're not trampling on anyone else and that the
store has enough space to hold what we want to append. grow() will cow
for us otherwise.

> MemBlob &store() { cow(); return *store_; } /// memory access for writing
> const MemBlob &store() const { return *store_; } /// read-only memory access
>
> This way, you do not need to remember or think about when to call cow(),
> which could be tricky at times. The beginning of cow() can be inlined to
> make initial "do we need to copy?" check very cheap.

cow() doesn't know what we're trying to do, whether we're appending or
overwriting and if so how much or what.

>> SBuf& SBuf::append(const std::string &s, SBuf::size_type pos,
>> SBuf::size_type n)
>> {
>
> Use append(char*) or similar to implement this and other std::string
> methods.

Ok.

>> std::ostream& SBuf::dump(std::ostream &os) const
>> {
>>     os << "SBuf@" << (void*) this
>>         << ": memhandle:" << (void *)store_.getRaw()
>>         << "{sz:" << store_->bufSize
>>         << ",used:" << store_->bufUsed
>>         << ",refs: " << store_->RefCountCount() << "}"
>
> MemBlock should dump its own details instead.

Done.

>> int SBuf::compare(const SBuf &S, bool case_sensitive) const
>> {
>
> I think we should add an "n" parameter to limit the comparison scope to the
> first n characters. This will allow for much cheaper implementation of
> startsWith() and may be useful elsewhere as well. You are computing a
> similar parametr below anyway.

addingthe param can be done, I wouldn't touch startsWith however..

>
>> SBuf SBuf::consume(SBuf::size_type howmuch)
>
> s/howmuch/n/ for consistency with other methods.

Done.

>> {
>
> Start this method with:
>      const SBuf rv(substr(0,howmuch));

changed the method to
substr();chop();

> Do we really need to return consumed content?

It is pretty handy; see how SBufUtil.cc:SBufSplit exploits it; and it
is a very cheap object to build.

>>     if (howmuch==npos)
>>         howmuch=length();
>>     const size_type actual=min(howmuch,length());
>>     SBuf rv(substr(0,actual));
>>     off_+=actual;
>>     len_-=actual;
>>     if (length()==0) {//we can free the store_
>>         clear();
>>         return rv;
>>     }
>>     ++stats.qset;
>>     return rv;
>> }
>
> Please implement the code above using chop().

Done (I got there on my own, eventually)

>> SBuf& SBuf::chop(SBuf::size_type from, SBuf::size_type to)
>> {
>>     checkAccessBounds(from); //throws if out-of-bounds
>>     if (to==npos)
>>         to=length();
>
> Missing "else"

changed test to cover both cases instead

>>     if (to > length())
>>         to=length();
>>     ++stats.qset;
>>     if (to <= from) {
>>         clear();
>>         return *this; //stats handled by callee
>
> Remove comment because there are some stats increments above it.

Done

>> SBuf& SBuf::chomp(const SBuf &toremove, bool atBeginning, bool atEnd)
>> {
>>     ++stats.qset;
>>     char *p;
>
> Please declare "const" if possible and move inside each if-statement below,
> combining definition and initialization.

Done.

>>       if (atEnd) {
>>               p=bufEnd()-1;
>>         while (!isEmpty() &&
>> NULL!=memchr(toremove.buf(),*p,toremove.length())) {
>
> Yoda comparison style please avoid. At the very least calling a function
> when. No safer it is, but read difficult to. Please.

LOL!
Yes, my mentor.

>>             //current end-of-buf is in the searched set
>>             chop(0,length()-1); //remove last char and iterate
>
> Can the above expensive call be replaced with something like "--len_"? Even
> if the final replacement would have to be a litle more complex, we should
> probably still do it because calling chop() for every character is just too
> much overhead.

The only special treatment I can see is explicitly clear()ing at the
end of the call if
isEmpty() to free unused storage in case we trimmed all. Done so.

>
>>             --p;
>>         }
>>     }
>>     if (atBeginning) {
>>         p=buf();
>>         while (!isEmpty() &&
>> NULL!=memchr(toremove.buf(),*p,toremove.length())) {
>>             chop(1);
>
> Same here. ++off_?

++off_; --len_, but yes.

>> SBuf::size_type SBuf::find(char c, SBuf::size_type startpos) const
>> {
>>     if (startpos==npos)
>>         startpos=length();
>>     startpos=min(startpos,length());
>
> Missing "else" before the last line.

ok.

>> /**
>>  * checks whether the requested 'pos' is within the bounds of the SBuf
>>  * \throw OutOfBoundsException if access is out of bounds
>>  * \throw TextException if the SBuf is null
>
> I do not see TextException being thrown below.

Besides, the SBuf cannot be null.

>
>>  */
>> void SBuf::checkAccessBounds(SBuf::size_type pos) const
>> {
>>     if (pos < 0)
>>         throw OutOfBoundsException(*this,pos,__FILE__,__LINE__);
>>     if ((size_t)pos > (size_t)length()) //hack. gcc complains otherwise...
>
> I do not understand why this is needed because pos and length() are the same
> type. What did GCC complain about?

Probably leftovers from when size_type was not used consistently.

> Do we allow pos == length()?

No. Fixed.

>
>>         throw OutOfBoundsException(*this,pos,__FILE__,__LINE__);
>> }
>
> .
>
>> String SBuf::toString() const
>> {
>>     String rv;
>>     rv.limitInit(exportRawRef(),length());
>
> Can we just call buf() here instead of the scarry exportRawRef()?

yes.

> Thank you,

Thank you. You took lots of effort in this review, and I appreciate it.
While waiting for the next round, I'll move on to MemBlob.

-- 
    /kinkie
Received on Tue Sep 21 2010 - 17:29:08 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 22 2010 - 12:00:07 MDT