Re: SqString

From: Amos Jeffries <squid3@dont-contact.us>
Date: Sat, 26 May 2007 18:41:44 +1200

Alex Rousskov wrote:
> On Sat, 2007-05-26 at 04:08 +1200, Amos Jeffries wrote:
>> Alex Rousskov wrote:
>>> On Fri, 2007-05-25 at 20:39 +1200, Amos Jeffries wrote:
>>>
>>>> I have just been experimenting with a few options short of a full backout.
>>>>
>>>> My initial idea of dropping the constructor drags the changes into areas
>>>> the initial patch didn't touch. No go there.
>>> Yes, of course.
>>>
>>>> I have had some success dropping the str* overloading entirely.
>>> Since the old code did not have those, I suspect the new code should
>>> have "complete success" without them.
>> Ah, but it did. it just used a mix of some #define (2.x,3.0) and class
>> methods (3.0) and added upper case letters in the name.
>
> Class methods (versus overloaded globals) make a big difference when it
> comes to silent type conversion (or lack of it). That is, in part, the
> reason why the replacements in the new API where not functionally
> equivalent to the original code.

True, But please understand, my reason for picking the overload path in
the first place was the amount of times that path was taken in the
original code.

>
>> Ok let me make this clearer. the #defines and methods above did
>> _explicit_ calls to c_str().
>
> And you want to remove those methods because ...? Can you just rename
> them to match the STL naming scheme?

See for yourself:

  int
  String::caseCmp (char const *aString) const
  {
      return strcasecmp(buf(), aString);
  }

  int
  String::caseCmp (char const *aString, size_t count) const
  {
      return strncasecmp(buf(), aString, count);
  }

Now spot the benefit in that duplicated code...
Secondly, they were a custom API to the old String:: that was used in
only a few places with the non-wrapped version still used over most of
the code.

And how is the above different from:

  strncasecmp(str.buf(), aStr, count);

OR

  strncasecmp((char*)str, aStr, count);
   [ with (char*)str ==> str.buf() ].

> Does the problem boil down to not
> having an STL equivalent of some custom String methods?

No as the above shows they are _excatly_ equivalent.

All the problems I have seen are due to the magic of C++ choosing the
expensive method of conversion. If there are other bugs or posts than
1967 and 1970 that might effect my view on this please point them out.

>
>>> What I would like to see is a diff against the pre-new-API code showing
>>> that no calls got rewritten. Renaming a method is fine, but most other
>>> changes should be postponed. The new patch you posted does not show me
>>> how the main code is affected by the two changes combined, which is the
>>> only thing that matters. Comparing with the current, broken code is not
>>> very informative...
>>
>> After you mentioned this on IRC I went back to my copy of the original
>> patch.
>> Attached is a pseudo-patch (has stuff elided so I don't expect it will
>> map cleanly) showing all the str* changes and re-arranging you are
>> worried about. All I have elided is the effected debug calls and
>> straight name mods; buf() to c_str() etc.
>> I have also left in the hash function-ptrs where I explicitly set the
>> namespace on strcmp so as not to clash their old values with anything new.
>
> I am uncomfortable with the changes in the latest patch because they
> change the working code. I did not do full analysis, but here are a few
> specific places that worry me:
>
> 1) String::buf() did not return NULL for an empty but initialized
> String. SqString::empty() does:

WTF? here is the original code:
char const *
String::buf() const
{
      return buf_;
}

WHERE: buf_ is either NULL, or a char*. NO initialisation at all going
on there.

WHERE empty() returns (size()==0 || buf_ == NULL);
   TRUE - IFF buf_ == char*
   FALSE - IFF buf_ ==

I the case(s) you point out below the initial operation of buf()
provided a buffer-overrun in the edge case where buf_ != NULL, it
attempted to use it in a comparison or function call. **Even if it was
full of non-null garbage and the len_ was officially 0.***

>
>> - if (target && sct->target.buf() && !strcmp (target,
>> + if (target && !sct->target.empty() && !strcmp(target,
>
> and
>
>> - else if (!target && !sct->target.buf())
>> + else if (!target && sct->target.empty())
>
> and possibly elsewhere.
>
>
> 2) String::cmp does not just call strncmp so I cannot be sure this and
> other similar changes are simple renaming:
>

No, cmp was mapped to compare, which performs the internal operation of
strncmp (doing its own internal check on size()).

Consider the two following code paths:

(cmp / compare)
   - test len_ == 0 - return 0 == param[0]
   - test this* for being null/empty - return 0 == param[0]
   - test para for being null or empty - return 0 == buf_[0]
   - retrn output of strncmp(buf_, param, len_)

(strncmp)
   - test strlen(lhs) == 0 - return 0 == rhs[0]
   - test strlen(rhs) == 0 - return 0 == lhs[0]
   - skip down for strlen places
     OR rhs[p] != lhs[p]
   - return lhs[p] - rhs[p];

So you can see the cmp(str, len) calls were simply performing the same
pre-check operations of strncmp twice for any non-null string.
size() also performs those checks and does so on behalf of the unwrapped
strcmp()'s

The 'simple renaming' as you call it were either the bits I got to
before doing the unit tests that showed me this, or bits where I could
not be sure size() was available beforehand to perform the limiting on
strncmp explicitly.

I wrote a full set of unit-tests to verify all these cases and check the
behaviour output was as expected before doing the API upgrade.

>> - if (buf->contentSize() >= protoPrefix.size() &&
>> protoPrefix.cmp(buf->content(), protoPrefix.size()) != 0) {
>> + if (buf->contentSize() >= protoPrefix.size() &&
>> strncmp(protoPrefix, buf->content(), protoPrefix.size()) != 0) {
>
> 3) I am not sure C++ guarantees the following to be what we want it to
> be so this might cause problems in the future:
>
>> - strcmp(urlPath.buf() + eOff, e->key) == 0) {
>> + strcmp(&urlPath[eOff], e->key) == 0) {
>
> or
>
>> - const char *s = uri.buf() + 7;
>> + const char *s = &uri[7];
>
>
> Are all of the above and other changes correct? I do not know.

I am, so long as your who wrote that code are using the int type given
to eOff properly (as I see you doing in that code) as a whole-number
offset in bytes from the start of buf_.

I am simply using [] to wrap the offset ptr-addition and in a single
place add the buffer-overrun safety-check that has been long missing. I
have used the proper semantics of [n] as meaning n-byte offset on a char
array.

Use of buf_ like that original above code was part of the two _known_
causes of buffer-violation in squid that I found commented in the
original String files.

> It is not
> obvious to me that they are. If you are confident, and others do not
> object, commit them (no need to prove the correctness to me!).

I think we have a talking error here. What did you think this was?
These were only the parts of the 'broken' code that were changed and
would likely remain changed after the latest update. NOTHING outside
SqString.* would change to fix all the broke.

I am confident, was before I submit anything to HEAD. Doesn't mean I'm
perfectly right. What I would like is to be sure that you understand
these changes. In light of my reasoning for a the above If you can still
provide a valid case of any of them being bad I'll agree that bad code
should not be added.

What my latest update fix would do it is terminate the 'bad magic' with
prejudice and a suitable equivalent.

> If you
> are not confident, let's back the whole thing out for now and then fix a
> couple of problems that you have found while working on this. Once 3.0
> is branched, let's discuss whether and how to switch to STL strings.
>
> Thank you,
>
> Alex.
>
Received on Sat May 26 2007 - 00:41:53 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Jun 01 2007 - 12:00:09 MDT