Re: [PATCH] SBuf c-string comparisons

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 13 Apr 2014 23:13:46 +1200

On 13/04/2014 1:35 p.m., Alex Rousskov wrote:
> On 04/11/2014 09:01 AM, Amos Jeffries wrote:
>> Patch attached for just these SBuf and related unit test changes.
>
>
>> +int
>> +SBuf::compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const
>> +{
>> + // 0-length comparison,
>> + // or full-length compare for NULL-string
>> + if (!n || (n == npos && !s)) {
>> + ++stats.compareFast;
>> + return 0;
>> + }
>
> I suggest removing the (n == npos && !s) part (as already discussed on
> #squidev?). IMO, Squid should assert if somebody tries to compare SBuf
> with an unknown number of characters (npos and !s), as opposed to
> comparing SBuf with zero characters (!n).
>

Done.

>
>> + /// comparison with a C-string
>> + int compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const;
> ...
>> + const size_type byteCompareLen = min(n, length());
>> + ++stats.compareSlow;
>> + int rv = 0;
>> + if (isCaseSensitive == caseSensitive) {
>> + rv = memcmp(buf(), s, byteCompareLen);
>> + } else {
>> + rv = memcasecmp(buf(), s, byteCompareLen);
>> + }
>
> The code above may access s bytes beyond s's 0-terminator (if any). The
> header file claims to provide a C-string [i.e., strncmp()] interface,
> but the .cc file implements something close to a memcmp() interface instead.
>
> The difference between memcmp and strcmp APIs is that the former stops
> at n while the latter also stops at the first terminator character
> (whichever comes first). The difference is not important for readable
> text, of course, but it becomes critical when dealing with "binary" data.
>

memcmp() is documented at stopping at the byte when the strings differ.

The if-statement guaranteeing length() <= n, will truncate this SBuf so
min(n, length()) is never greater than n and thus should never be >
strlen(s)+terminator.

So, only if both contain the terminator at the same byte AND n >
strlen(s)+1, AND this SBuf contains a terminator internally at exactly
the right byte can it possibly skip past.

Code which is presenting n > strlen(s)+1 is broken despite all that.
Shall we add yet another assert?

NOTE that strn*cmp variants will produce ==0 results in the above broken
case whereas SBuf comparison will produce ==1 or ==-1 (input sizes differ).

> For example, something like the expression below should not access
> non-existent characters in the second parameter, despite a large N:
>
> strncmp(buf, "123", 1000)
>
> In the proposed implementation, buf.compare("123",, 1000) may access
> "1234"[100] and such for some buf values. More on that below.
>
> Please note that testComparisonStdN() in test cases hints at that by
> trying to trigger invalid access bugs. That test case should try harder,
> apparently :-). For example, instead of doing "2 + min(...)", we should
> probably add:
>
> testComparisonStdN(left, right, maxN + 1024*1024);
>
> or something like that, in hope to crash the test if the implementation
> does try to go beyond left's or right's end. However, I realize that
> catching such bugs reliably is very difficult and may require
> hand-crafted test cases.

Indeed the test as written does prove that this overrun case is NOT
happening for common-case strings where there is no matching terminator
at *exactly* the right palce in the SBuf.

Added a test for the rare case.

>
> Suggestion:
>
> * If you want to just implement the C-string strncmp() interface, rename
> cmp() to strCmp(), compare() to strCompare(), and adjust the code
> accordingly.
>
> * If you also want to implement the memcmp() interface, then add
> memCmp() and memCompare() (with a required n parameter each) and adjust
> the copied code accordingly.
>
>
>
>> + // pretend we have a terminator and check against s.
>> + return s[length()] == '\0' ? 0 : -1;
>
> Similar to the above, what if s[length()] does not exist? For example,
> in pseudo code:
>
> SBuf buf("1234");
> char s[3] = { '1', '2', '3', }; // s[3] does not exist
> buf.cmp(s, 3);
>
> Please note that the above problem applies to both strCmp() and memCmp()
> flavors of the code.

The SBuf truncation path is taken on buf. At which point it is
memcmp("123", s, 3) ==> 0.

>
> Also, it is not clear (to me) what the above code tries to accomplish.
> The comment does not really clarify things [for me], especially since
> "we" usually refers to the "this" object but the expression does not
> check the "this" object buffer. If this code stays, please rephrase.

C-string comparisons are including the terminator bytes for their final
EOS results **.

This handles all the cases where:
 SBuf left = "foo"
 char* right = "foo"
 left.cmp(right, 4);

In c-string comparisions these produce ==0. Using the X-longer-then-Y
logic pattern it would produce ==1 like "foo" vs "fooz".

Since SBuf has no terminator we cannot compare the terminator byte like
c-string comparisons do. Accessing buf[length()]==s[length()] would be a
buffer overrun on buf. So we check '\0' == s[length()] instead.

ie. we are guaranteed to be at the end of 'buf', check if 's' is the
c-string terminator.

** same reason for including the terminator when calculating strlen(s)
earlier.

>
>> + int compare(const SBuf &S, SBufCaseSensitive isCaseSensitive, const size_type n = npos) const;
> ...
>> + int compare(const char *s, SBufCaseSensitive isCaseSensitive, const size_type n) const;
>
> I think we should keep these similar and allow n to be strlen(s) by
> default in the second case.

Missed that one. Fixed.

>
>> + /// shorthand version for C-string compare
> ...
>> + /// shorthand version for case-insensitive C-string comparison
>
> s/compare$/compare()/
> s/comparison/compare()/
>
> There is a similar problem with old cmp() descriptions, but I cannot
> insist on fixing it in this patch.

Fixed.

>
>> +// std::cerr << std::endl << " cmp(char*) npos " << left << " ?= " << right;
>> + CPPUNIT_ASSERT_EQUAL(sign(strcmp(left, right)), sign(SBuf(left).cmp(right)));
>> +// std::cerr << std::endl << " cmp(SBuf) npos " << left << " ?= " << right;
>> + CPPUNIT_ASSERT_EQUAL(sign(strcmp(left, right)), sign(SBuf(left).cmp(SBuf(right))));
>> +// std::cerr << std::endl << " caseCmp(char*) npos " << left << " ?= " << right;
>> + CPPUNIT_ASSERT_EQUAL(sign(strcasecmp(left, right)), sign(SBuf(left).caseCmp(right)));
>> +// std::cerr << std::endl << " caseCmp(SBuf) npos " << left << " ?= " << right;
>> + CPPUNIT_ASSERT_EQUAL(sign(strcasecmp(left, right)), sign(SBuf(left).caseCmp(SBuf(right))));
>> +}
>
> I think we should remove commented out debugging. The right way to
> report problems here is to add another wrapper (via a macro) that would
> print the strings if the two calls return different values, before
> calling CPPUNIT_ASSERT_EQUAL().

I had planned to remove before final merge, but that sounds much better.
Macro turned out to be far too messy though.

PS. Getting errors in the unit tetss now which are not being displayed.
New patch tomorrow hopefully.

Amos
Received on Sun Apr 13 2014 - 11:14:08 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 13 2014 - 12:00:10 MDT