Re: [RFC] Time to talk about StringNG merge again?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 15 Apr 2013 16:49:17 -0600

Hi Kinkie,

    Here is my review of
https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I
found approximately three bugs. I am also concerned about size_type
type, after starting to use MemBlob which has the same problem. The rest
is polishing.

> fgrep npos SBuf.cc
...
> if (pos == npos || pos > length() || n == 0) {
> if (n == npos || (pos+n) > length())
> if (startPos == SBuf::npos)
> return SBuf::npos;

For consistency sake inside SBUf methods, please use either npos
(preferred because it is shorter?) or SBuf::npos.

> if (InitialStore == NULL) {
> static char lowPrototype[] = "";
> InitialStore = new MemBlob(lowPrototype, 0);
> }

If possible, please use "new MemBlob(0)" and remove lowPrototype.

> static MemBlob::Pointer InitialStore = NULL;
> if (InitialStore == NULL) {
> static char lowPrototype[] = "";
> InitialStore = new MemBlob(lowPrototype, 0);
> }

I believe the above can be written simply as

    static MemBlob::Pointer InitialStore = new MemBlob(0);

> /** Request to resize the SBuf's store
> *
> * After this method is called, the SBuf is guaranteed to have at least
> * minCapacity bytes of total space, including the currently-used portion

s/total space/total buffer size/

we use the term "space" to denote free/usable space. See reserveSpace(),
for example.

> void
> SBuf::reserveCapacity(size_type minCapacity)
> {
> Must(0 <= minCapacity && minCapacity <= maxSize);
> reserveSpace(minCapacity-length());
> }

This does not look right. For example, if minCapacity is smaller than
length(), we should do nothing, but will throw instead. I think this
method should look something like this:

    Must(0 <= minCapacity && minCapacity <= maxSize);
    cow(minCapacity);

> // we're not concerned about RefCounts here,
> // the store knows the last-used portion. If
> // it's available, we're effectively claiming ownership
> // of it. If it's not, we need to go away (realloc)
> if (store_->canAppend(off_+len_, minSpace)) {
> debugs(24, 7, "not growing");
> return;
> }

That comment is probably stale because canAppend() is currently not
related to ownership of the buffer: It will happily return true for
shared buffers. AFAICT, reserveSpace() should be implemented using
something like this:

    Must(0 <= minSpace && minSpace <= maxSize - length());
    reserveCapacity(length() + minSpace);

The above two changes imply that reserve*() methods ensure
single-ownership of the buffer by the caller. This is not something
documentation of those methods currently guarantees, but I think it is
logical to assume that anybody calling one of these two methods wants to
modify the string and, hence, will need exclusive control over that
string. I cannot think of any reason to call those methods otherwise.

> typedef int32_t size_type;

Would it be better to make this unsigned? I do not think string sizes
can go negative. std::string is using an unsigned type, right? FWIW,
whenever I try to use MemBlob that has a signed size_type as well, I am
forced to do casts that would not exist otherwise. I think both types
should be changed to reflect what they are representing (a size of a
buffer).

On another note, would it be better to use MemBlob::size_type instead of
int32_t?

> u_int64_t alloc; ///<number of calls to SBuf constructors
> u_int64_t allocCopy; ///<number of calls to SBuf copy-constructor
> u_int64_t allocFromString; ///<number of copy-allocations from Strings
> u_int64_t allocFromCString; ///<number of copy-allocations from c-strings
> u_int64_t assignFast; ///<number of no-copy assignment operations

For consistency sake, please use uint64_t like the rest of Squid code does.

> /** Append operation for std::string
> *
> * Append the supplied std::string to the SBuf; extend storage as needed.
> *
> * \param string the std::string to be copied.
> * \param pos how many bytes to skip at the beginning of the c-string
> * \param n how many bytes to import into the SBuf. If it is SBuf::npos
> * or unspecified, imports to end-of-cstring
> */
> SBuf& append(const std::string &str, size_type pos = 0, size_type n = npos);

Please adjust the docs to talk about std::string, not c-string. str is
not 0-terminated and its end is determined by str.size().

> SBuf::append(const std::string &str, SBuf::size_type pos, SBuf::size_type n)
> {
> return append(str.data(), pos, n); //bounds checked in append()
> }

This will break when n is npos because str.data() is not 0-terminated.
The low-lever c-string append you call here assumes that the string
parameter is 0-terminated if n is npos. Please add the corresponding
missing test case.

> //we can assume that we'll need to append at least strlen(fmt) bytes,
> reserveSpace(strlen(fmt)*2);

The comment does not match the code which assumes we will need to append
at least twice the format length (a reasonable assumption IMO).

> //if max-length was specified, it was hit, and the tail is ignored.
> if (n != npos)
> return 0;

The comment and the code are incorrect. When the comparison result is
zero and max-length (n) was specified, we may not have "hit n".
Consider, for example,

    SBuf("foolong").compare(SBuf("foo"), 0, 5)

where the limit we "hit" is 3 (the length of the shortest string) and
not n (5). The above call should return "foolong is greater than foo"
(up to fifth character) but will return "the strings are the same up to
fifth character". Please add the corresponding missing test cases.

> * \param n compare up to this many bytes. if npos (default), to end-of-string

End of which string? I would rephrase that like this:
  If npos (default), compare whole strings.

> * \retval >0 argument of the call is greater than called SBuf
> * \retval <0 argument of the call is smaller than called SBuf
...
> int compare(const SBuf &S, SBufCaseSensitive isCaseSensitive = caseSensitive, size_type n = npos) const;

Please double check the description. I think the return value signs are
actually reversed. The implementation is correct (in this aspect).

Also, I suggest splitting this into two methods, one with a required
(first?) SBufCaseSensitive parameter and one without it. This will allow
callers to specify n without specifying isCaseSensitive and vice versa.
The shorter, inlined method will simply call the longer one with
caseSensitive, of course, so the implementation will not require more
code or performance overhead.

> //first sz bytes equal. longest string "wins"
> return commonCompareChecksPost(S);

It looks like commonCompareChecksPost() is not used anywhere else.
Please consider merging it with compare(). If nothing else, this would
avoid dealing with an ugly method name :-).

> const InstanceId<SBuf> id; ///< blob identifier

It is really an SBuf not memory blob ID, right? You may want to document
it as something like

  /// SBuf object identifier; does not change when string does,
  /// including during assignment

> void
> SBuf::forceSize(SBuf::size_type newSize)
> {
> Must(store_->LockCount() == 1);
> len_ = newSize;
> store_->size = newSize;
> }

Please throw if newSize exceeds the minimum of maxSize and blob's
capacity minus SBuf offset.

> * Adapt the SBuf internal state after external interference
> * such as writing into it via rawSpace.
> * \throw TextException if we
> */
> void forceSize(size_type newSize);

Unfinished comment after "if we".

> // Debug.h only needed for for SBuf::cow() debug statements.
> #include "Debug.h"

> bool cow(size_type minsize = npos) {
> debugs(24, DBG_DATA, "new size (minimum):" << minsize);

Any method or function with debugs() probably does not deserve to be
inlined and will probably not be inlined by the compiler due to debugs()
complexity even if we declare it inlined.

HTH,

Alex.
Received on Mon Apr 15 2013 - 22:49:22 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 16 2013 - 12:00:06 MDT