Re: StringNG merge?

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 14 Jan 2013 08:14:58 -0700

On 01/14/2013 02:17 AM, Kinkie wrote:
> On Sat, Jan 12, 2013 at 8:32 PM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 12/16/2012 03:02 PM, Kinkie wrote:
>>>>>>> + if (buf()+length() < tmp+str.length()) { //not enough chars to match the whole needle
>>>>>>> + debugs(SBUF_DEBUGSECTION,8,HERE << "needle too big for haystack");
>>>>>>> + end=tmp-1;
>>>>>>> + continue;
>>>>>>> + }
>>>>>
>>>>> If I interpret STL documentation correctly, the limit is not buf() +
>>>>> length() but endpos. We should not match against characters beyond
>>>>> endpos, even if the match _starts_ before endpos. Please double check me
>>>>> on that.
>>
>>> I am not 100% sure I understand what you mean, but this simple test:
>>
>>> std::string haystack("foo bar gazonk");
>>> std:;string needle("gazonka");
>>> cout << "find overflow: " << haystack.find(needle) << endl;
>>>
>>> returns npos (as I'd expect)
>>
>> What I meant is that
>>
>> "123".rfind("23", 2)
>>
>> should return npos instead of 1 because "23" is not found in the first
>> two characters of the haystack ("123") string. So the above if-statement
>> condition should use buf+endpos rather than buf+length().
>
> Do we want to emulate the whole std::string API?

Probably not, but where we are using the same methods as std::string,
they must provide pretty much the same API. We may not need to offer
rfind() at all, but if you decide we need to offer rfind(), it would be
rather confusing and dangerous if our rfind() is different from
std::string rfind() while the overall API looks similar to std::string.

> Given StringNG's
> substringing optimizations, the same activity could be expressed as
> "123".substr(0,2).rfind("23")...

Yes, and I would support such an implementation, especially as a
starting point. The implementation has nothing to do with the API though.

>> The code has been rewritten since then, but it is still buggy:
>>
>>> char *bufBegin = buf();
>>> char *cur = bufBegin+length()-needle.length();
>>
>> The "cur" variable should start with bufBegin+endPos-needle.length() or
>> equivalent.
>>
>>
>> BTW, this implies that you do not have a basic "match beyond head" unit
>> test case:
>>
>> "headtail".rfind("t", 4) == npos
>>
>> and I would also add
>>
>> "headmiddletail".rfind("middle", 9) == npos
>> "headmiddletail".rfind("middle", 10) == 4
>>
>> (at least).
>
> See my comment about using substringing. If that doesn't fly by you,
> I'll certainly implement this.

I think an implementation using a substring is the right thing to do in
general. I suspect reusing one high-level operation to implement another
will clash a little with your statistics collection design, but it is
fine with me.

>> And, for every manually added haystack.*find(needle, n) test case, I
>> would automatically add a haystack.substr(0, n).*find(needle) test case,
>> since both must return the same value for every haystack, needle, and n
>> (AFAICT).
>
> Yes. I'm also considering having a parallel test with SBuf and
> std::string, checking that the behavior is the same.

Yes, I am already doing that to double check test cases discussed in my
reviews.

Cheers,

Alex.
Received on Mon Jan 14 2013 - 15:15:08 MST

This archive was generated by hypermail 2.2.0 : Mon Jan 14 2013 - 12:00:06 MST