Re: [RFC] Time to talk about StringNG merge again?

From: Kinkie <gkinkie_at_gmail.com>
Date: Fri, 5 Jul 2013 07:51:12 +0200

On Tue, Apr 16, 2013 at 12:49 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> Hi Kinkie,
>
> Here is my review of
> https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I
> found approximately three bugs. I am also concerned about size_type
> type, after starting to use MemBlob which has the same problem. The rest
> is polishing.

Ok, thanks. I have some doubts about size_type as well, I am a bit
worried about touching it at this stage in development, but it may be
better now than never.

>> fgrep npos SBuf.cc
> ...
>> if (pos == npos || pos > length() || n == 0) {
>> if (n == npos || (pos+n) > length())
>> if (startPos == SBuf::npos)
>> return SBuf::npos;
>
> For consistency sake inside SBUf methods, please use either npos
> (preferred because it is shorter?) or SBuf::npos.

ok, npos it is.

>> if (InitialStore == NULL) {
>> static char lowPrototype[] = "";
>> InitialStore = new MemBlob(lowPrototype, 0);
>> }
>
> If possible, please use "new MemBlob(0)" and remove lowPrototype.
>
>
>> static MemBlob::Pointer InitialStore = NULL;
>> if (InitialStore == NULL) {
>> static char lowPrototype[] = "";
>> InitialStore = new MemBlob(lowPrototype, 0);
>> }
>
> I believe the above can be written simply as
>
> static MemBlob::Pointer InitialStore = new MemBlob(0);

Yes, it can. Done.

>> /** Request to resize the SBuf's store
>> *
>> * After this method is called, the SBuf is guaranteed to have at least
>> * minCapacity bytes of total space, including the currently-used portion
>
> s/total space/total buffer size/
>
> we use the term "space" to denote free/usable space. See reserveSpace(),
> for example.

Ok.

>> void
>> SBuf::reserveCapacity(size_type minCapacity)
>> {
>> Must(0 <= minCapacity && minCapacity <= maxSize);
>> reserveSpace(minCapacity-length());
>> }
>
> This does not look right. For example, if minCapacity is smaller than
> length(), we should do nothing, but will throw instead. I think this
> method should look something like this:
>
> Must(0 <= minCapacity && minCapacity <= maxSize);
> cow(minCapacity);
>
>
>> // we're not concerned about RefCounts here,
>> // the store knows the last-used portion. If
>> // it's available, we're effectively claiming ownership
>> // of it. If it's not, we need to go away (realloc)
>> if (store_->canAppend(off_+len_, minSpace)) {
>> debugs(24, 7, "not growing");
>> return;
>> }
>
> That comment is probably stale because canAppend() is currently not
> related to ownership of the buffer: It will happily return true for
> shared buffers. AFAICT, reserveSpace() should be implemented using
> something like this:
>
> Must(0 <= minSpace && minSpace <= maxSize - length());
> reserveCapacity(length() + minSpace);

The comment is badly worded but the code is correct: canAppend() tells
us if the end of our buffer is at the end of the used portion of the
MemBlob, and there is enough space in the unused MemBlob portion to
contain the data we plan to use. If the buffer is shared, this
guarantee may become stale at the next append operation to any SBuf
sharing the same MemBlob.

A decision needs to be made if these two methods have different
purposes or are to be simply different expressions of the same
underlying concept (as it is now).
IMO, thinking about it, it does make sense that reserveSpace also
guarantees that after the call the MemBlob will be exclusively owned
by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be
meant as an optimization, to hint the SBuf memory management system
that an append operation of a known size is about to happen, so that
only one cow is performed.
What do you think?

> The above two changes imply that reserve*() methods ensure
> single-ownership of the buffer by the caller. This is not something
> documentation of those methods currently guarantees, but I think it is
> logical to assume that anybody calling one of these two methods wants to
> modify the string and, hence, will need exclusive control over that
> string. I cannot think of any reason to call those methods otherwise.

See above.

>> typedef int32_t size_type;
>
> Would it be better to make this unsigned? I do not think string sizes
> can go negative. std::string is using an unsigned type, right? FWIW,
> whenever I try to use MemBlob that has a signed size_type as well, I am
> forced to do casts that would not exist otherwise. I think both types
> should be changed to reflect what they are representing (a size of a
> buffer).

I have the same doubt. I'm just a bit afraid to change this late. Will
try to do as an isolated change. So far this is NOT done.

> On another note, would it be better to use MemBlob::size_type instead of
> int32_t?

currently they're both int32_t. I believe it's more readable with the
extra level of indirection, at the cost of having to manually keep
them in sync.
Added a comment to highlight this fact.

>> u_int64_t alloc; ///<number of calls to SBuf constructors
>> u_int64_t allocCopy; ///<number of calls to SBuf copy-constructor
>> u_int64_t allocFromString; ///<number of copy-allocations from Strings
>> u_int64_t allocFromCString; ///<number of copy-allocations from c-strings
>> u_int64_t assignFast; ///<number of no-copy assignment operations
>
> For consistency sake, please use uint64_t like the rest of Squid code does.

Done.

>> /** Append operation for std::string
>> *
>> * Append the supplied std::string to the SBuf; extend storage as needed.
>> *
>> * \param string the std::string to be copied.
>> * \param pos how many bytes to skip at the beginning of the c-string
>> * \param n how many bytes to import into the SBuf. If it is SBuf::npos
>> * or unspecified, imports to end-of-cstring
>> */
>> SBuf& append(const std::string &str, size_type pos = 0, size_type n = npos);
>
> Please adjust the docs to talk about std::string, not c-string. str is
> not 0-terminated and its end is determined by str.size().

Done.

>> SBuf::append(const std::string &str, SBuf::size_type pos, SBuf::size_type n)
>> {
>> return append(str.data(), pos, n); //bounds checked in append()
>> }
>
> This will break when n is npos because str.data() is not 0-terminated.
> The low-lever c-string append you call here assumes that the string
> parameter is 0-terminated if n is npos. Please add the corresponding
> missing test case.

This led to restructuring the append operation for C-string, which is
now missing the pos parameter, just like std::string does.

>> //we can assume that we'll need to append at least strlen(fmt) bytes,
>> reserveSpace(strlen(fmt)*2);
>
> The comment does not match the code which assumes we will need to append
> at least twice the format length (a reasonable assumption IMO).

Nod. changed comment to match code.

>> //if max-length was specified, it was hit, and the tail is ignored.
>> if (n != npos)
>> return 0;
>
> The comment and the code are incorrect. When the comparison result is
> zero and max-length (n) was specified, we may not have "hit n".
> Consider, for example,
>
> SBuf("foolong").compare(SBuf("foo"), 0, 5)
>
> where the limit we "hit" is 3 (the length of the shortest string) and
> not n (5). The above call should return "foolong is greater than foo"
> (up to fifth character) but will return "the strings are the same up to
> fifth character". Please add the corresponding missing test cases.

Right. n shouldn't be compared to npos, but to sz.
I've added your exact test-case.

>> * \param n compare up to this many bytes. if npos (default), to end-of-string
>
> End of which string? I would rephrase that like this:
> If npos (default), compare whole strings.

Changed.

>> * \retval >0 argument of the call is greater than called SBuf
>> * \retval <0 argument of the call is smaller than called SBuf
> ...
>> int compare(const SBuf &S, SBufCaseSensitive isCaseSensitive = caseSensitive, size_type n = npos) const;
>
> Please double check the description. I think the return value signs are
> actually reversed. The implementation is correct (in this aspect).

Yes, changed description.

> Also, I suggest splitting this into two methods, one with a required
> (first?) SBufCaseSensitive parameter and one without it. This will allow
> callers to specify n without specifying isCaseSensitive and vice versa.
> The shorter, inlined method will simply call the longer one with
> caseSensitive, of course, so the implementation will not require more
> code or performance overhead.

Ok. I'm calling the two shorthand versions cmp and casecmp respectively
(please let me know if you'd prefer the naming-convention compliant
caseCmp instead)

>
>
>> //first sz bytes equal. longest string "wins"
>> return commonCompareChecksPost(S);
>
> It looks like commonCompareChecksPost() is not used anywhere else.
> Please consider merging it with compare(). If nothing else, this would
> avoid dealing with an ugly method name :-).

Ok.
While looking at it, I also realized that the use of strcmp for the
low_level operation is buggy - it makes the whole method not \0-clean.
Switching to memcmp and hand-rolled for case-sensitive and insensitive
comparison respecitvely.

>> const InstanceId<SBuf> id; ///< blob identifier
>
> It is really an SBuf not memory blob ID, right? You may want to document
> it as something like
>
> /// SBuf object identifier; does not change when string does,
> /// including during assignment

Done with a small variation:
    /// SBuf object identifier; does not change when contents do,
    /// including during assignment

>> void
>> SBuf::forceSize(SBuf::size_type newSize)
>> {
>> Must(store_->LockCount() == 1);
>> len_ = newSize;
>> store_->size = newSize;
>> }
>
> Please throw if newSize exceeds the minimum of maxSize and blob's
> capacity minus SBuf offset.

Ok, throwing SBufTooBigException in that case.

>> * Adapt the SBuf internal state after external interference
>> * such as writing into it via rawSpace.
>> * \throw TextException if we
>> */
>> void forceSize(size_type newSize);
>
> Unfinished comment after "if we".

Fixed.

>> // Debug.h only needed for for SBuf::cow() debug statements.
>> #include "Debug.h"
>
>> bool cow(size_type minsize = npos) {
>> debugs(24, DBG_DATA, "new size (minimum):" << minsize);
>
> Any method or function with debugs() probably does not deserve to be
> inlined and will probably not be inlined by the compiler due to debugs()
> complexity even if we declare it inlined.

Ok, moved to SBuf.cc

Code is as before at lp:~squid/squid/stringng-cherrypick, ready and in
sync with trunk.
test-suite runs OK on eu and ubuntu raring.

I'll be on holiday in a couple of days for a couple of weeks.

--
    /kinkie
Received on Fri Jul 05 2013 - 05:51:22 MDT

This archive was generated by hypermail 2.2.0 : Thu Jul 18 2013 - 12:00:42 MDT