Re: [PATCH] SBuf c-string comparisons

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 17 Apr 2014 22:43:01 +1200

On 17/04/2014 7:23 a.m., Alex Rousskov wrote:
> On 04/16/2014 12:30 PM, Amos Jeffries wrote:
>> On 17/04/2014 2:32 a.m., Alex Rousskov wrote:
>>> On 04/16/2014 12:05 AM, Amos Jeffries wrote:
>>>> I don't see any way around this without hand-crafing a full byte-by-byte
>>>> strncmp replacement.
>
>>> I am not against hand-crafting if it is really necessary -- we already
>>> hand-craft memCaseCmp IIRC. Personally, I would hand-craft _if_ system
>>> implementation of strncmp() is just a basic loop rather than some
>>> complicated, optimized low-level code. Otherwise, I would find a way to
>>> avoid strlen().
>
>> Which system? which architecture? which compiler? which library?
>
> Any reasonable/popular implementation selected by the developer. This is
> a one-time check done by the developer, not an automated check done
> during Squid build. Sorry I was not clear about that.
>
>
>> That is a tricky "_if_" to code for.
>
> I hope the above clarifies that no coding is necessary for this _if_.
>

So glibc: a do-while loop scanning word-by-word with individual
byte-by-byte loop (unrolled) over the bytes in each word.

37 if (n >= 4)
38 {
39 size_t n4 = n >> 2;
40 do
41 {
42 c1 = (unsigned char) *s1++;
43 c2 = (unsigned char) *s2++;
44 if (c1 == '\0' || c1 != c2)
45 return c1 - c2;
46 c1 = (unsigned char) *s1++;
47 c2 = (unsigned char) *s2++;
48 if (c1 == '\0' || c1 != c2)
49 return c1 - c2;
50 c1 = (unsigned char) *s1++;
51 c2 = (unsigned char) *s2++;
52 if (c1 == '\0' || c1 != c2)
53 return c1 - c2;
54 c1 = (unsigned char) *s1++;
55 c2 = (unsigned char) *s2++;
56 if (c1 == '\0' || c1 != c2)
57 return c1 - c2;
58 } while (--n4 > 0);
59 n &= 3;
60 }
61
62 while (n > 0)
63 {
64 c1 = (unsigned char) *s1++;
65 c2 = (unsigned char) *s2++;
66 if (c1 == '\0' || c1 != c2)
67 return c1 - c2;
68 n--;
69 }
70

>
>> So...
>> trying to find a way to determine the length of a c-string potentially
>> unterminated, without using strlen() or otherwise looping over it.
>> OR,
>> trying to find out where the system strn*() function stopped.
>>
>> I'm all ears for suggestions on that little gem.
>
> I do not think the above is possible.
>

Indeed.

>
>>> Since the hand-crafted implementation is simple, I do not consider it an
>>> overkill. And I am sure there is a way to avoid it if needed.
>
>> I would absolutely love to hear what that is.
>
> See the cloning sketch in the previous email. To summarize, known
> solutions are:
>
> 1) a custom loop to properly limit SBuf iteration
> 2) cloning to guarantee SBuf 0-termination

The cloning mechanism uses strlen() internally. So no benefit, but extra
malloc+free costs.

>
> Since I expect (2) to be sometimes a lot slower than (1), I would go for
> (1), especially if a quick check of a popular strncmp() implementation
> does not expose some low-level optimizations that we would not be able
> (or would not want) to duplicate in Squid.
>
>
> Hope this clarifies,
>
> Alex.
>

Patch with hand-rolled scanner attached.

Amos

Received on Thu Apr 17 2014 - 10:43:21 MDT

This archive was generated by hypermail 2.2.0 : Fri Apr 18 2014 - 12:00:13 MDT