Re: [RFC] Time to talk about StringNG merge again?

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 01 Aug 2013 00:46:35 +1200

On 1/08/2013 12:18 a.m., Kinkie wrote:
> On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov
> <rousskov_at_measurement-factory.com> wrote:
>> On 07/30/2013 03:56 AM, Kinkie wrote:
>>
>>> lp:~squid/squid/stringng-cherrypick has the most uptodate code.
>>
>> I did not finish the review before running out of time, but I found one
>> or two bugs in r12761 (in the code I have not looked at before). There
>> are also a few polishing issues:
> Thanks.
>
>>> typedef int32_t size_type;
>> I think we should make it unsigned. I agree with your earlier
>> observation that it is better to do that now rather than later or never.
> Yes, it is agreed. It's however something I'd prefer to do as a
> separate change, after everything else has been defined in order to
> have a clear impact.

Yes and no. It will probably cause a lot of little changes all over the
classes involved. So doing it at a point of stability as a separate
commit to stringng is good but not a reason to

>
>>> void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);}
>> As I mentioned before, this lacks overflow control if the sum of
>> length() and minSpace exceeds size_type maximum. Please add a check.
> reserve* calls cow() which - if a resizing is needed - calls reAlloc
> which throws SBufTooBigException if the requested size is too big.
> Isn't that good enough?

Not if the math overflowed down to a smaller value before it even got
passed to reserveCapacity().

>
>>> cow(minSpace+length());
>> Same here, in rawSpace().

>
>> BTW, please add SBuf::spaceSize() as a public method so that
>> optimization like the above will work. Otherwise, non-privileged users
>> will have to write
>>
>> foobar(buf.rawSpace(minSize), minSize);
>>
>> instead of getting an advantage of the actual space size being larger
>> than the requested minimum.
> Done, but I'm quite wary of this: it implies a knowledge about
> internal store positioning which I would prefer to hide from clients.

I think we may find a reason in future to extend this futther to allow
caller handling of the cases where realloc is not possible to the new
minSize.
But for now that should be enough.

>
>>> if (sz >= static_cast<int>(store_->spaceSize()))
>>> rawSpace(sz*2); // TODO: tune heuristics
>> I would use reserveSpace() here instead of ignoring rawSpace() result
>> because we know the buffer needs to be reallocated at this point, right?
>
> Hand-unrolled code doesn't ignore that return value anymore (see below)
>
>>> /* snprintf on Linux returns -1 on overflows */
>> ...
>>> if (sz >= static_cast<int>(store_->spaceSize()))
>>> rawSpace(sz*2); // TODO: tune heuristics
>>> else if (sz < 0) // output error in vsnprintf
>>> throw TextException("output error in vsnprintf",__FILE__, __LINE__);
>> If snprintf on Linux returns -1 on overflows, we should not throw when
>> sz is -1. We should grow the buffer instead, right?
> Man snprintf(3) says
> If an output error is encountered, a negative value is returned.
>
> I guess the comment is broken.
> vsnprintf is standard as of C99; it should be therefore quite
> consistent in our build environment.

What other types of outt error than buffer overflow can occur?
Might be worth a little test to see what overflow produces exactly. And
I agree with Alex that growing is the better way to handle it, on the
proviso that we can reliably detect that overflow.

>
>>> while (length() <= maxSize) {
>>> sz = vsnprintf(bufEnd(), store_->spaceSize(), fmt, ap);
>> ...
>>> /* snprintf on FreeBSD returns at least free_space on overflows */
>>>
>>> if (sz >= static_cast<int>(store_->spaceSize()))
>>> rawSpace(sz*2); // TODO: tune heuristics
>>> else if (sz < 0) // output error in vsnprintf
>>> throw TextException("output error in vsnprintf",__FILE__, __LINE__);
>>> else
>>> break;
>>> }
>> Nothing in that loop grows length(), which is your guard against an
>> "infinite" loop on FreeBSD (at least). Can you think of a better loop guard?
> This loop will be executed at most twice via break clause, as on the
> second go we _know_ how much we have to write, the first attempt is .
> I'm hand-unrolling it.
>
>>> len_ += sz;
>>> // TODO: this does NOT belong here, but to class-init or autoconf
>>> /* on Linux and FreeBSD, '\0' is not counted in return value */
>>> /* on XXX it might be counted */
>>> /* check that '\0' is appended and not counted */
>>>
>>> if (operator[](len_-1) == '\0') {
>>> --sz;
>>> --len_;
>>> }
>> There is no guarantee that len_ is positive. We could start with a zero
>> len_ and an empty "" pattern would result in zero sz. The above code may
>> then access character at -1 offset of the raw storage array...
>>
>> vprintf() is probably so slow that any code using it is not optimized.
>> We should use at(len_-1) instead here.
> Ok.
> This will go away by itself when we move to unsigned size_type.
>
>>> /* check that '\0' is appended and not counted */
>>>
>>> if (operator[](len_-1) == '\0') {
>> Perhaps this day was too long, but I do not think the above checks
>> whether \0 was appended but not counted. It checks whether \0 was
>> appended and counted. Which is probably what you want to check, so the
>> comment should be adjusted to remove the word "not".
> You're right, the comment as-is muddies the waters. Reworded as "check
> whether '\0' was appended and counted.
> But in the end, as vsnprintf is a C99 standard (aand we require C99 +
> now to compile, so that check is now probably redundant.

Amos
Received on Wed Jul 31 2013 - 12:46:43 MDT

This archive was generated by hypermail 2.2.0 : Wed Jul 31 2013 - 12:00:07 MDT