Re: [PATCH] SBuf c-string comparisons

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 12 Apr 2014 19:35:36 -0600

On 04/11/2014 09:01 AM, Amos Jeffries wrote:
> Patch attached for just these SBuf and related unit test changes.

> +int
> +SBuf::compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const
> +{
> + // 0-length comparison,
> + // or full-length compare for NULL-string
> + if (!n || (n == npos && !s)) {
> + ++stats.compareFast;
> + return 0;
> + }

I suggest removing the (n == npos && !s) part (as already discussed on
#squidev?). IMO, Squid should assert if somebody tries to compare SBuf
with an unknown number of characters (npos and !s), as opposed to
comparing SBuf with zero characters (!n).

> + /// comparison with a C-string
> + int compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const;
...
> + const size_type byteCompareLen = min(n, length());
> + ++stats.compareSlow;
> + int rv = 0;
> + if (isCaseSensitive == caseSensitive) {
> + rv = memcmp(buf(), s, byteCompareLen);
> + } else {
> + rv = memcasecmp(buf(), s, byteCompareLen);
> + }

The code above may access s bytes beyond s's 0-terminator (if any). The
header file claims to provide a C-string [i.e., strncmp()] interface,
but the .cc file implements something close to a memcmp() interface instead.

The difference between memcmp and strcmp APIs is that the former stops
at n while the latter also stops at the first terminator character
(whichever comes first). The difference is not important for readable
text, of course, but it becomes critical when dealing with "binary" data.

For example, something like the expression below should not access
non-existent characters in the second parameter, despite a large N:

  strncmp(buf, "123", 1000)

In the proposed implementation, buf.compare("123",, 1000) may access
"1234"[100] and such for some buf values. More on that below.

Please note that testComparisonStdN() in test cases hints at that by
trying to trigger invalid access bugs. That test case should try harder,
apparently :-). For example, instead of doing "2 + min(...)", we should
probably add:

   testComparisonStdN(left, right, maxN + 1024*1024);

or something like that, in hope to crash the test if the implementation
does try to go beyond left's or right's end. However, I realize that
catching such bugs reliably is very difficult and may require
hand-crafted test cases.

Suggestion:

* If you want to just implement the C-string strncmp() interface, rename
cmp() to strCmp(), compare() to strCompare(), and adjust the code
accordingly.

* If you also want to implement the memcmp() interface, then add
memCmp() and memCompare() (with a required n parameter each) and adjust
the copied code accordingly.

> + // pretend we have a terminator and check against s.
> + return s[length()] == '\0' ? 0 : -1;

Similar to the above, what if s[length()] does not exist? For example,
in pseudo code:

  SBuf buf("1234");
  char s[3] = { '1', '2', '3', }; // s[3] does not exist
  buf.cmp(s, 3);

Please note that the above problem applies to both strCmp() and memCmp()
flavors of the code.

Also, it is not clear (to me) what the above code tries to accomplish.
The comment does not really clarify things [for me], especially since
"we" usually refers to the "this" object but the expression does not
check the "this" object buffer. If this code stays, please rephrase.

> + int compare(const SBuf &S, SBufCaseSensitive isCaseSensitive, const size_type n = npos) const;
...
> + int compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const;

I think we should keep these similar and allow n to be strlen(s) by
default in the second case.

> [...] with C-string it is required for over-read bounds safety.

I disagree that n parameter must be required for over-read bounds
safety: With true c-strings, requiring that parameter is more likely to
introduce bugs (due to duplication of information) than to prevent them.

> + /// shorthand version for C-string compare
...
> + /// shorthand version for case-insensitive C-string comparison

s/compare$/compare()/
s/comparison/compare()/

There is a similar problem with old cmp() descriptions, but I cannot
insist on fixing it in this patch.

> +// std::cerr << std::endl << " cmp(char*) npos " << left << " ?= " << right;
> + CPPUNIT_ASSERT_EQUAL(sign(strcmp(left, right)), sign(SBuf(left).cmp(right)));
> +// std::cerr << std::endl << " cmp(SBuf) npos " << left << " ?= " << right;
> + CPPUNIT_ASSERT_EQUAL(sign(strcmp(left, right)), sign(SBuf(left).cmp(SBuf(right))));
> +// std::cerr << std::endl << " caseCmp(char*) npos " << left << " ?= " << right;
> + CPPUNIT_ASSERT_EQUAL(sign(strcasecmp(left, right)), sign(SBuf(left).caseCmp(right)));
> +// std::cerr << std::endl << " caseCmp(SBuf) npos " << left << " ?= " << right;
> + CPPUNIT_ASSERT_EQUAL(sign(strcasecmp(left, right)), sign(SBuf(left).caseCmp(SBuf(right))));
> +}

I think we should remove commented out debugging. The right way to
report problems here is to add another wrapper (via a macro) that would
print the strings if the two calls return different values, before
calling CPPUNIT_ASSERT_EQUAL().

If you disagree, please at least move the "//" comment characters to
where they belong.

Thank you,

Alex.

> On 11/04/2014 10:24 a.m., Alex Rousskov wrote:
>> On 04/10/2014 05:47 AM, Amos Jeffries wrote:
> <snip>
>>
>>> + int cmp(const char *s, size_t n) const;
>>> + int caseCmp(const char *s, size_t n) const;
>>
>> SBuf is currently using size_type, not size_t for comparison methods.
>> Size_t is not used anywhere in SBuf.h AFAICT. Do we have to add size_t
>> into the mix? If you start using size_type instead, please note that the
>> type casting inside the methods will go away (but new type casts in the
>> callers might be needed).
>
> Done and const n.
>
>>
>>> + /// comparison with a C-string
>>> + int cmp(const char *s, size_t n) const;
>>> +
>>> + /// case-insensitive comparison with a C-string
>>> + int caseCmp(const char *s, size_t n) const;
>>
>> There is no good reason to force callers to call length(s) IMO. Please
>> add a default npos value for n. This will also make the new methods more
>> consistent with the existing ones.
>>
>
> The situation for C-string is different from SBuf. Since SBuf contain
> lengths and guarantee non-NULL buffers we are not using the N parameter
> for any bounds safety.
> Whereas with C-string it is required for over-read bounds safety.
>
> Done anyway.
>
>>
>>> + if (rv != 0)
>>> + return rv;
>>> + if (length() == n)
>>> + return 0;
>>> + if (length() > n)
>>> + return 1;
>>> + return -1;
>>
>> This seemingly correct logic actually leads to wrong comparison results:
>>
>> strncmp("foo", "f", 1) is 0
>> but
>> SBuf("foo").cmp("f", 1) is +1
>>
>> If you adapted that code from the existing SBuf::compare(), please note
>> that the similar compare() code works because it is preceded by
>> truncation of _both_ strings to [up to] n characters.
>>
>> To minimize code duplication, you probably want to do here what old
>> compare() does for old cmp() and caseCmp().
>>
>> Finally, this bug also exposes the lack of comparison test cases. The
>> attached unpolished patch adds a few. If you like them, please integrate
>> with your changes and polish as needed. See TODO in patch preamble for a
>> better approach though.
>>
>
> Added with a slight modification to use the existing sign() converter.
> Thank you, that let me find and fix several off-by-1 bugs and the
> terminator handling bugs relatively easily.
>
>
>>
>>> +int
>>> +SBuf::cmp(const char *s, size_t n) const
>>> +{
>>> + assert(s != NULL);
>>
>> It may be better to allow NULL c-strings with zero n:
>>
>> if (!n)
>> return 0;
>> assert(s);
>>
>> I do not insist on this change.
>>
>> Same for the other caseCmp().
>>
>
> Combined the two now same as the SBuf variant. With the above and some
> added checks for handling NULL pointers and empty strings.
>
>>
>>> + size_type byteCompareLen = min(n, static_cast<size_t>(length()));
>> ...
>>> + int rv = memcmp(buf(), s, byteCompareLen);
>> ...
>>> + size_type byteCompareLen = min(n, static_cast<size_t>(length()));
>> ...
>>> + int rv = memcasecmp(buf(), s, byteCompareLen);
>>
>> Make these variables const if possible.
>>
>> I would also make the "n" parameter itself const but that should be done
>> for other cmp() methods as well then.
>>
>
> Done all including n.
>
> Amos
>
Received on Sun Apr 13 2014 - 01:35:46 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 13 2014 - 12:00:10 MDT