stringng-cherrypick r12764 from Alex Rousskov on 2013-08-25 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sun, 25 Aug 2013 20:22:13 -0600

On 07/30/2013 03:56 AM, Kinkie wrote:
> Ok, it's now done, including documentation.
> lp:~squid/squid/stringng-cherrypick has the most uptodate code.

I found a few bugs in stringng-cherrypick r12764. The rest is polishing.

The first set of comments is about SBuf::vappendf():

> if (sz >= static_cast<int>(spaceSize())) {
> // not enough space on the first go, we now know how much we need
> requiredSpaceEstimate = sz*2; // TODO: tune heuristics
> space = rawSpace(requiredSpaceEstimate);
> sz = vsnprintf(space, spaceSize(), fmt, vargs);
> }
>
> if (sz < 0) // output error in either vsnprintf
> throw TextException("output error in vsnprintf",__FILE__, __LINE__);

Given the complexities involved, I think we should add a check that the
second vsnprintf() resulted in sz of less than spaceSize(). If it did
not, we should throw a different exception.

> // data was appended, update internal state
> len_ += sz;

IMO, we should eat our own dog food and call forceSize() here instead.
If you do that, move this code lower, after sz has been adjusted (while
making other appropriate changes).

> static bool snPrintfTerminatorChecked = false,
> snPrintfTerminatorCounted = false;

Please declare them separately. This is too clever and implies that they
are used similarly. They are not.

> static bool snPrintfTerminatorChecked = false,

You forgot to update this variable inside the if-statement.

> int
> SBuf::compare(const SBuf &S, SBufCaseSensitive isCaseSensitive, size_type n) const
> {
> Must(n == npos || n >= 0);
> if (n != npos) {
> if (n > length())
> return compare(S.substr(0,n),isCaseSensitive);
> return substr(0,n).compare(S.substr(0,n),isCaseSensitive);
> }

The logic surrounding the two return statements above is puzzling. The
last return statement is clear -- we only want to compare the first n
characters. That makes the first return statement an optimization -- it
avoids calling this->substr(0,n). However, if you think such an
optimization is a good idea, why not apply it to the S parameter as
well? In other words, I would expect to see either both optimizations
applied:

if (n != npos) {
    const SBuf &me = n > length() ? substr(0,n) : *this;
    const SBuf &them = n > S.length() ? S.substr(0,n) : S;
    return me.compare(them, isCaseSensitive, npos);
}

or none of the two optimizations applied:

if (n != npos)
return substr(0,n).compare(S.substr(0,n),isCaseSensitive, npos);

but not the current mix. Did I miss why only one optimization should be
applied?

> // this differs from how std::string handles 1-length string

Please clarify what "1-length" string is. Did you mean std::string
behavior when searching for a character? If yes, you can say
"std::string returns something else in this case", especially if it is
difficult to succinctly describe what std::string returns exactly.

> // std::string allows needle to overhang hay but not start outside
> if (startPos != npos && startPos > length()) {
> ++stats.find;
> return npos;
> }

If you can move the above after you test for empty needle, and replace "
> " with " >= ", then you may be able to move the single-char code up
one if-statement as well. That would would clarify your overall
find(string) logic:

    if empty needle ...
    if 1-char needle ...
    complex cases start here

> // for npos with char* std::string scans entire hay
> // this differs from how std::string handles single char from npos
> if (startPos == npos)
> return npos;

Missing ++stats.find that all other cases have.

> // on empty hay std::string returns size of hay
> if (length() == 0)
> return npos;

Did you mean to say "this differs from std::string ..."? Or does our
rfind(char) do what std::string does, returning npos on empty hay?

> if (startPos > length())
> return npos;

This can be broadened to startPos >= length().

> debugs(24, 8, "result: \"" << *this << "\"");
> ++stats.caseChange;
> return rv;

No, *this in debugging is not the result (rv is). There are at least two
copies of this bug.

> * \retval false no grow was needed
> * \retval true had to copy
> */
> bool
> SBuf::cow(SBuf::size_type newsize)

s/grow/growth/

However, cow() may return true if growth was not needed (but a dedicated
buffer was needed) AND cow() may return true when it did not have to
copy anything (because the buffer was empty). You probably want to say
something like:

* \retval true had to allocate a dedicated buffer
* \retval false otherwise

Furthermore, I do not see cow() return value being used. Perhaps we
should change cow() to return void and avoid all this complexity until
it is needed?

> debugs(24, DBG_DATA, "new size: " << newsize);
> debugs(24, DBG_DATA, "new size:" << newsize);
> debugs(24, DBG_DATA, "no cow needed");

These do not dump data. Let's use level 8 for them so that we can
eventually have all DBG_DATA debugs() dedicated to blind data dumping?

> +SBuf &
> +SBuf::append(const char * S, size_type Ssize)
> +{
> + Must (Ssize == npos || Ssize >= 0);
> +
> + if (S == NULL)
> + return *this;

Do you think we should also add a Must() that NULL S has npos or zero Ssize?

* Since SBuf::dump() may output a lot of raw data and that data may
contain non-printable characters, consider using Raw API there.
Eventually, the code implementing that API will deal with those cases
better than just writing raw data to the stream.

> + alloc+=s.alloc;
> + live+=s.live;
> + append+=s.append;
> + liveBytes+=s.liveBytes;

Please surround "+=" with single spaces for readability and consistency:
" += ".

> class NullSBufException : public TextException

Remove as unused. I hope we no longer have a concept of NULL SBuf.

> + virtual ~OutOfBoundsException() throw();

Just curious, do you know why other exceptions you added do not require
this?

> /* */

Please remove that last line from SBufExceptions.cc.

If you do not mind, please drop the copyright statements (here and in
other files you are adding). You are not the only person contributing
this code, and managing per-file copyright claims correctly is
practically impossible (incomplete copyright information in your files
is a case in point).

> + /// clear the stream's store
> + void clearBuf() {
> + theBuf.clear();
> + }

> + /// Clear the stream's backing store
> + SBufStream& clearBuf() {
> + theBuffer.clearBuf();
> + return *this;
> + }

Are these two really needed? AFACT, they are not used (except in a test
case testing them). I am worried that they do not really work because
internal SBufStream state may become inconsistent if these are called
because you are resetting the buffer without resetting all other
internal stream pointers/offsets.

> // inspired to SBufFindTest; to be expanded.

s/ to / by /?

I suspect you can fix all of the above without another round of review.
However, please do not forget that you wanted to make (or consider
making?) size_type unsigned once everybody is happy with the code (but
before merging it).

Thank you,

Alex.
Received on Mon Aug 26 2013 - 02:22:30 MDT

This archive was generated by hypermail 2.2.0 : Mon Aug 26 2013 - 12:00:32 MDT