Re: [PATCH] Improve Configuration Checking and Parsing

From: Tianyin Xu <tixu_at_cs.ucsd.edu>
Date: Sat, 5 Jan 2013 00:00:21 -0800

Thanks a lot, Amos! I'll modify the patches according to your comments
and post it on squid-dev again tomorrow.

Before that I have some questions and concerns related to your
comments (see below). Could you please clarity a bit?

Best,
Tianyin

On Fri, Jan 4, 2013 at 10:12 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 5/01/2013 3:05 p.m., Tianyin Xu wrote:
>>
>> Hi, all,
>>
>> I modified my patches according to Amos's comments and suggestions
>> (Thanks a lot, Amos!).
>>
>> The attached patch includes all the modifications upon 3.HEAD/trunk.
>>
>> The only thing I didn't follow is to move the xato* functions in the
>> compatibility library and to replace all the ato* functions in the
>> code. I thought about it for quite a while, but decided not to have a
>> xato* series as lib calls. The main reason is that these xato*
>> functions are tailored for configuration parsing instead of general
>> string-to-number functionality. All the current ato* functions (in
>> src/Parsing.cc) will call self_destruct() when error occurs, which in
>> turn prints out the bungled line in the configuration file (this's
>> really good!).
>
>
> Getting rid of that bungled message is one of our config parsing upgrade
> goals. If we can adjust the xato*() calls to report errors without aborting
> Squid it will make them more useful and re-usable. They should still log an
> error, but IMO aborting is going too far. The code which calls them also has
> more context to decide whether the config option deserves a fatal() exit or
> not. You can see a good example of that in src/HelperChildConfig.cc at the
> second chunk where you change atoi() to xatoi(); the calling code is
> reporting what values are valid, why and then calling self_destruct()
> anyway.
>

Yes, I'm glad you say so. It's really awesome that you are thinking
these configuration issues so seriously.

>
>> On the other hand, strto* series as lib calls already
>> provide safe string-to-number conversion, and anyone can take
>> advantage of them. So I don't think there's any need to have a new
>> series of lib calls.
>>
>> I modified some of the ato* functions during configuration file
>> parsing using the safe functions we implemented, but didn't change
>> others because they might not be used in configuration parsing (at
>> least I don't have good understandings of them). My modifications are
>> type safe, i.e., they didn't introduce any semantic or format change.
>> Backward compatibility is most important for users and customers. For
>> example, I use xatoui() only when the type of the dest variable is
>> "unsigned int"; GetPercentage() is introduced to support options like
>> "20%" (previously atoi ignores %), etc.
>>
>> Amos proposed the great idea that "In the cases where -1 is used we
>> want to update the documentation and parser to accept the string
>> "none". Which maps internally to the disabled value (usually still -1
>> or -2). After which anything else must be >0." But for now, I don't
>> want a tremendous change in my patch because it's quite dangerous to
>> abandon all the previous "-1"-like settings.
>
>
> Ack. Do this one separately as its own patch please.
>
>
> Now, audit results from reading your patches...
>
> compat/xstrto.cc:
> * the first chunk changing strtoul() to strtol() then checking for <0 is
> incorrect. Values such as 0xffffffff (with 8 f's) which strtol() maps to
> negative numbers and are valid unsigned numbers. In order to use signed
> conversion function to produce a unsigned value the output of the signed
> function MUST have a type larger than the unsigned functions output (ie
> 64-bit signed conversion function can be used to parse 32-bit unsigned
> values, and cannot be used to parse 64-bit signed values).
> The more correct change would be strtoul() -> strtoll(). However, these
> netfilter authored functions have been placed through some strict unit tests
> by the netfilter guys. They may have overlooked one case you can identify,
> but before you change them please add cppunit tests to verify that all the
> input cases are still handled correctly (we currently have none for these
> functions).
>

Yes, you are very right. With the same bit width, unsigned integers
has larger range than signed ones.

Here, my concern is using "strtoul", a user input "-1" will be
translated to a large positive integer. But as you said, if the code
is well tested, we can still keep them.

> * the second chunk. does it still compile when you use the C++
> static_cast<unsigned long>() operator ?
>

Here I don't understand why we should better to use C++
static_cast<unsigned long>()? What's the advantage of it compared with
direct type conversion?

>
> in src/Parsing.cc:
> * the debugs message about "expected characters" is not very helpful. It
> should be changed to say what values are expected and what was found.
> For example: debugs(..., "ERROR: '-' is not a digit. '"<< token << "' is
> supposed to be a number.");
>
> * I spot another bug which remains after your fixes. The "end" pointers
> passed to strto*() functions is documented as being NULL or checked for NULL
> value, but the Squid code is not initializing any of the "char *end;"
> variables to NULL. Please fix these as part of your patch. "char *end =
> NULL;" should be sufficient.
>
> * in xatoi(). This is C++ code please define the variable at point of first
> use when possible. Also, please use int64_t/uint64_t types rather than "long
> long".
> For example: int64_t input = xatoll(token, 10);
> The same goes for other functions where you are adding new local variables.
>

Why int64_t is better than long long? :-P

> * in xatoui(). The 64-bit signed conversion function xatol() or xatoll()
> needs to be used, not the xatoi() one.
>
> * GetInteger() has the requirement of parsing masks. Please retain the
> comment about that. The portion about %i is invalid after changing to
> xatoll(), but GetInteger() still has that usage requirement and your patch
> only works because our xatoll() actually uses strtoll() as the backend.
>
> * GetPercentage() can make use of the strto*() function min/max values to
> double-check the range accepted.
> Also, please document in Parsing.h that 0%-100% range is validated and
> what happens when the value is out of range.
>
> in src/cache_cf.cc:
> * please do not add useless empty lines. Such as in the first chunk at line
> 1017 between the if statement '}' and function ending '}'.
>
> * the next two chunks doing "m*d/u*2" please add () brackets around the
> (d/u) portion to clarify what the formula is calculating.

I didn't change m*d/u*2 but just added an error message. Why it's
m*(d/u)*2 not (m*d)/u*2? I don't understand this part.

>
> * the debugs statements "will be deprecated in future releases." is
> incorrect. The situation is "is deprecated." from the moment these debugs
> lines get merged.
>
> * at line 3475. port is being set by the strtol() call in the if()
> condition. There is no need to parse the token twice.

I did it on purpose because strtol() return a long value but here only
a short one is valid.

>
> * the debugs you are adding at line 3705 appears to already be present in my
> copy of trunk with better text.
> Hmm. I'm wondering if you did get trunk checkout properly now.
>

Really? I thought it's the newest. Could you please check it for me? I
see the Bug# 3729 is fixed in the version so I suppose it to be very
new.
$ bzr checkout http://bzr.squid-cache.org/bzr/squid3/trunk trunk

> in src/wccp2.cc:
> * the second chunk is parsing a port value. It should probably be using
> xatos() and p defined as a uint16_t.
>

because previously, p is defined as a int. I feel this way, the log
message is better than xatos() which gives a general one.

>
>
> Question: can you identify what the benefits/costs of using the strto*()
> versus the ato*() functions family are? I think now is a good time to
> identify which has the best performance and safety to migrate the code to
> them. xstrtoui() / xatoui() are overlapping, as are some of the others.
>

According to my understandings, ato*() functions are unsafe. They have
3 kinds of problems: (1) Integer overflow, (2) Bad character issue,
and (3) Cannot handle octal and hex numbers. The fundamental thing is
that we have no way to check the 3 cases: no errno, no auxiliary
pointers, etc.

For example, atoi(10000000000) = 1410065408, atoi(2M0) == 2, atoi(0xFFFF) = 0

So, it's always better to use strto*() functions. For (1) integer
overflow, it uses long long type which is less likely to have
overflow. Most importantly, we can check the errno and return value to
know whether the input string is valid or not.

"If an underflow occurs, strtol() returns LONG_MIN. If an overflow
occurs, strtol() returns LONG_MAX. In both cases, errno is set to
ERANGE. Precisely the same holds for strtoll() (with LLONG_MIN and
LLONG_MAX instead of LONG_MIN and LONG_MAX)."

"ERANGE The resulting value was out of range."

For (2) Bad characters issue, the 2nd argument can tell because it
stores the address of the first invalid character of the input string.

For (3), the 3rd argument supports both octal, and hex conversion, in
addition to decimal numbers.

Now the last question is the performance. I don't know the performance
of the two functions (might be we can do a small experiments by
calling them thousands of times and compare the running time). But I
personally do not think it's critical because these functions are
merely used in parsing configuration options, which is a constant
overhead compared with other functions in the critical execution
paths.

>
> Amos

--
Tianyin XU,
http://cseweb.ucsd.edu/~tixu/
Received on Sat Jan 05 2013 - 08:00:30 MST

This archive was generated by hypermail 2.2.0 : Sat Jan 05 2013 - 12:00:12 MST