Re: [PATCH] Improve Configuration Checking and Parsing

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 05 Jan 2013 21:52:24 +1300

On 5/01/2013 9:00 p.m., Tianyin Xu wrote:
> Thanks a lot, Amos! I'll modify the patches according to your comments
> and post it on squid-dev again tomorrow.
>
> Before that I have some questions and concerns related to your
> comments (see below). Could you please clarity a bit?
>
> Best,
> Tianyin
>
>
> On Fri, Jan 4, 2013 at 10:12 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On 5/01/2013 3:05 p.m., Tianyin Xu wrote:
>>> Hi, all,
>>>
>>> I modified my patches according to Amos's comments and suggestions
>>> (Thanks a lot, Amos!).
>>>
>>> The attached patch includes all the modifications upon 3.HEAD/trunk.
>>>
>>> The only thing I didn't follow is to move the xato* functions in the
>>> compatibility library and to replace all the ato* functions in the
>>> code. I thought about it for quite a while, but decided not to have a
>>> xato* series as lib calls. The main reason is that these xato*
>>> functions are tailored for configuration parsing instead of general
>>> string-to-number functionality. All the current ato* functions (in
>>> src/Parsing.cc) will call self_destruct() when error occurs, which in
>>> turn prints out the bungled line in the configuration file (this's
>>> really good!).
>>
>> Getting rid of that bungled message is one of our config parsing upgrade
>> goals. If we can adjust the xato*() calls to report errors without aborting
>> Squid it will make them more useful and re-usable. They should still log an
>> error, but IMO aborting is going too far. The code which calls them also has
>> more context to decide whether the config option deserves a fatal() exit or
>> not. You can see a good example of that in src/HelperChildConfig.cc at the
>> second chunk where you change atoi() to xatoi(); the calling code is
>> reporting what values are valid, why and then calling self_destruct()
>> anyway.
>>
> Yes, I'm glad you say so. It's really awesome that you are thinking
> these configuration issues so seriously.
>
>>> On the other hand, strto* series as lib calls already
>>> provide safe string-to-number conversion, and anyone can take
>>> advantage of them. So I don't think there's any need to have a new
>>> series of lib calls.
>>>
>>> I modified some of the ato* functions during configuration file
>>> parsing using the safe functions we implemented, but didn't change
>>> others because they might not be used in configuration parsing (at
>>> least I don't have good understandings of them). My modifications are
>>> type safe, i.e., they didn't introduce any semantic or format change.
>>> Backward compatibility is most important for users and customers. For
>>> example, I use xatoui() only when the type of the dest variable is
>>> "unsigned int"; GetPercentage() is introduced to support options like
>>> "20%" (previously atoi ignores %), etc.
>>>
>>> Amos proposed the great idea that "In the cases where -1 is used we
>>> want to update the documentation and parser to accept the string
>>> "none". Which maps internally to the disabled value (usually still -1
>>> or -2). After which anything else must be >0." But for now, I don't
>>> want a tremendous change in my patch because it's quite dangerous to
>>> abandon all the previous "-1"-like settings.
>>
>> Ack. Do this one separately as its own patch please.
>>
>>
>> Now, audit results from reading your patches...
>>
>> compat/xstrto.cc:
>> * the first chunk changing strtoul() to strtol() then checking for <0 is
>> incorrect. Values such as 0xffffffff (with 8 f's) which strtol() maps to
>> negative numbers and are valid unsigned numbers. In order to use signed
>> conversion function to produce a unsigned value the output of the signed
>> function MUST have a type larger than the unsigned functions output (ie
>> 64-bit signed conversion function can be used to parse 32-bit unsigned
>> values, and cannot be used to parse 64-bit signed values).
>> The more correct change would be strtoul() -> strtoll(). However, these
>> netfilter authored functions have been placed through some strict unit tests
>> by the netfilter guys. They may have overlooked one case you can identify,
>> but before you change them please add cppunit tests to verify that all the
>> input cases are still handled correctly (we currently have none for these
>> functions).
>>
> Yes, you are very right. With the same bit width, unsigned integers
> has larger range than signed ones.
>
> Here, my concern is using "strtoul", a user input "-1" will be
> translated to a large positive integer. But as you said, if the code
> is well tested, we can still keep them.
>
>> * the second chunk. does it still compile when you use the C++
>> static_cast<unsigned long>() operator ?
>>
> Here I don't understand why we should better to use C++
> static_cast<unsigned long>()? What's the advantage of it compared with
> direct type conversion?

Better compiler support for checking things like the signed bit, const
correctness and type size mismatches. C++ compilers can warn about data
loss or mis-alignment. But the old C-style cast makes the compiler hide
such warnings.

>
>> in src/Parsing.cc:
>> * the debugs message about "expected characters" is not very helpful. It
>> should be changed to say what values are expected and what was found.
>> For example: debugs(..., "ERROR: '-' is not a digit. '"<< token << "' is
>> supposed to be a number.");
>>
>> * I spot another bug which remains after your fixes. The "end" pointers
>> passed to strto*() functions is documented as being NULL or checked for NULL
>> value, but the Squid code is not initializing any of the "char *end;"
>> variables to NULL. Please fix these as part of your patch. "char *end =
>> NULL;" should be sufficient.
>>
>> * in xatoi(). This is C++ code please define the variable at point of first
>> use when possible. Also, please use int64_t/uint64_t types rather than "long
>> long".
>> For example: int64_t input = xatoll(token, 10);
>> The same goes for other functions where you are adding new local variables.
>>
> Why int64_t is better than long long? :-P

Shorter and easier to understand for newbie programmers (and even
experiemnced peopel doing a quick patch read). Squid is still very
community driven and depends on patches from such people.

Also, uniformity of code.

>
>> * in xatoui(). The 64-bit signed conversion function xatol() or xatoll()
>> needs to be used, not the xatoi() one.
>>
>> * GetInteger() has the requirement of parsing masks. Please retain the
>> comment about that. The portion about %i is invalid after changing to
>> xatoll(), but GetInteger() still has that usage requirement and your patch
>> only works because our xatoll() actually uses strtoll() as the backend.
>>
>> * GetPercentage() can make use of the strto*() function min/max values to
>> double-check the range accepted.
>> Also, please document in Parsing.h that 0%-100% range is validated and
>> what happens when the value is out of range.
>>
>> in src/cache_cf.cc:
>> * please do not add useless empty lines. Such as in the first chunk at line
>> 1017 between the if statement '}' and function ending '}'.
>>
>> * the next two chunks doing "m*d/u*2" please add () brackets around the
>> (d/u) portion to clarify what the formula is calculating.
> I didn't change m*d/u*2 but just added an error message. Why it's
> m*(d/u)*2 not (m*d)/u*2? I don't understand this part.

division occurs before multiplication in mathematics. So the original
code "m*d/u == "m*(d/u)". Both ways of writing it are correct, but with
brackets it is easier to understand the true calculation. Your not
understanding that immediately is a perfect example of why the brackets
are needed.

>> * the debugs statements "will be deprecated in future releases." is
>> incorrect. The situation is "is deprecated." from the moment these debugs
>> lines get merged.
>>
>> * at line 3475. port is being set by the strtol() call in the if()
>> condition. There is no need to parse the token twice.
> I did it on purpose because strtol() return a long value but here only
> a short one is valid.

Understood. I think this is the case for adding a strtos() function like
you have added some others in Parsing.cc or altering the if() condition
to work with xatos().

>> * the debugs you are adding at line 3705 appears to already be present in my
>> copy of trunk with better text.
>> Hmm. I'm wondering if you did get trunk checkout properly now.
>>
> Really? I thought it's the newest. Could you please check it for me? I
> see the Bug# 3729 is fixed in the version so I suppose it to be very
> new.
> $ bzr checkout http://bzr.squid-cache.org/bzr/squid3/trunk trunk

That looks correct. But if you have different code something must be wrong.

"bzr update" for me currently shows:
   Tree is up to date at revision 12566 of branch /bzr/squid3/trunk

when I wrote that audit email it was revision 12565. Although I think
that debugs was in there for several hundred revisions already.

>> in src/wccp2.cc:
>> * the second chunk is parsing a port value. It should probably be using
>> xatos() and p defined as a uint16_t.
>>
> because previously, p is defined as a int. I feel this way, the log
> message is better than xatos() which gives a general one.

Fair enough for this update.

Although, in light of your below analysis of zto*() vs strto*() we
should alter this to a strto*() function.

>
>>
>> Question: can you identify what the benefits/costs of using the strto*()
>> versus the ato*() functions family are? I think now is a good time to
>> identify which has the best performance and safety to migrate the code to
>> them. xstrtoui() / xatoui() are overlapping, as are some of the others.
>>
> According to my understandings, ato*() functions are unsafe. They have
> 3 kinds of problems: (1) Integer overflow, (2) Bad character issue,
> and (3) Cannot handle octal and hex numbers. The fundamental thing is
> that we have no way to check the 3 cases: no errno, no auxiliary
> pointers, etc.
>
> For example, atoi(10000000000) = 1410065408, atoi(2M0) == 2, atoi(0xFFFF) = 0
>
> So, it's always better to use strto*() functions. For (1) integer
> overflow, it uses long long type which is less likely to have
> overflow. Most importantly, we can check the errno and return value to
> know whether the input string is valid or not.
>
> "If an underflow occurs, strtol() returns LONG_MIN. If an overflow
> occurs, strtol() returns LONG_MAX. In both cases, errno is set to
> ERANGE. Precisely the same holds for strtoll() (with LLONG_MIN and
> LLONG_MAX instead of LONG_MIN and LONG_MAX)."
>
> "ERANGE The resulting value was out of range."
>
> For (2) Bad characters issue, the 2nd argument can tell because it
> stores the address of the first invalid character of the input string.
>
> For (3), the 3rd argument supports both octal, and hex conversion, in
> addition to decimal numbers.

Okay. Thank you for that. When I have some time later in the week I
would like to work on the compat layer safety wrappers with you.

> Now the last question is the performance. I don't know the performance
> of the two functions (might be we can do a small experiments by
> calling them thousands of times and compare the running time). But I
> personally do not think it's critical because these functions are
> merely used in parsing configuration options, which is a constant
> overhead compared with other functions in the critical execution
> paths.

Sort of. reconfigure can happen on large files and needs to be as fast
as possible to minimize outage time. They are also used when processing
manager requests and sometimes HTTP traffic. If the performance is good
these parse functions may be re-used on speed-critical parsing of more
header values.

Amos
Received on Sat Jan 05 2013 - 08:52:37 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 05 2013 - 12:00:12 MST