Re: [PATCH] Unknown cfg function from Tsantilas Christos on 2013-08-28 (squid-dev)

From: Tsantilas Christos <chtsanti_at_users.sourceforge.net>
Date: Wed, 28 Aug 2013 16:15:30 +0300

Is this patch OK?
Should I commit to trunk?

On 08/21/2013 06:34 PM, Tsantilas Christos wrote:
> On 08/13/2013 11:57 PM, Alex Rousskov wrote:
>> On 08/07/2013 03:11 AM, Tsantilas Christos wrote:
>>> On 07/31/2013 07:59 PM, Alex Rousskov wrote:
>>>> 2. When configuration_includes_quoted_values is on, new "strict syntax"
>>>> rules are enforced:
>>
>>>> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
>>>> tokens containing any character other than alphanumeric, underscore,
>>>> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
>>>> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.
>>
>>> OK.
>>> In this patch allow the following ".,()-=_%/:"
>>> We can easily add remove characters from this list.
>>> If we remove from this list the ':' then we should require quotes to
>>> define http, icap or ecap urls.
>>> The % used to define percent values and the '/' to define filenames.
>>
>> I am OK with adding ':' and '/' to the allowed set.
>>
>> I think we should allow % at the end of a bare token (e.g., 100%) but
>> not elsewhere (e.g., http://www.%host.com/). I am worried that by
>> allowing %macro-like strings in bare tokens we would not be able to tell
>> the admin when they forgot to quote a string that contains a macro.
>
>
> OK. This patch allow only [0-9]+% tokens (eg 100%, 80% etc).
>
>>
>> Another potential problem here is with the '=' character. We have to
>> allow it because existing code expects it in directive options with
>> values (e.g., bypass=on). However, making '=' a part of the token makes
>> it difficult to support quoted option values:
>>
>> bypass="on" // illegal because a token cannot have quotes
>>
>> "bypass=on" // legal but looks kind of weird
>> // because there are actually several tokens here
>>
>> For simple values such as "on", this is not a problem, but not all
>> option values are that simple, and some of them will require quoting:
>>
>> uri="icap://example.com/?mode=reqmod" // illegal?!
>>
>> "uri=icap://example.com/?mode=reqmod" // weird
>>
>> Changing all callers to treat name=value sequence as three tokens is
>> probably too much work, right?
>
> This is requires changes in many places...
>
>
>>
>>
>> A similar problem exists for function() calls but we hack around it by
>> parsing parameters() specially and requiring that file names are quoted,
>> I guess. More on that below.
>>
>>
>>
>>
>>> + bool saveRecognizeQuotedValues = ConfigParser::RecognizeQuotedValues;
>> ...
>>> + bool savePreview = ConfigParser::PreviewMode_;
>>
>>
>> Please make those variables const.
>
> ok
>
>>
>>
>>> +bool ConfigParser::ParseQuotedOrToEOL_ = false;
>>
>> Please s/EOL/Eol/.
>
> ok
>
>>
>>
>>> +static const char *SQUID_ERROR_TOKEN = "SQUID_ERROR_TOKEN";
>>
>> The value should probably contain spaces and other special characters to
>> minimize a chance of collision with the actual token. For example,
>> "[invalid token]". Note that you never test for SQUID_ERROR_TOKEN being
>> returned -- the code relies on known values to be different from the
>> SQUID_ERROR_TOKEN value.
>
> ok
>
>
>>
>>
>>> + (void)ConfigParser::NextToken(); //Get token fron cfg line
>>
>> typo: s/fron/from/
>
> fixed
>
>>
>>
>>> /* scan until the end of the quoted string, unescaping " and \ */
>>> - while (*s && *s != quoteChar) {
>>> - if (*s == '\\' && isalnum(*( s + 1))) {
>>> - debugs(3, DBG_CRITICAL, "Unsupported escape sequence: " << s);
>>> - self_destruct();
>>> + while (*s && *s != quoteChar && !errorStr && (d - UnQuoted) < sizeof(UnQuoted)) {
>>
>> The old comment is now out of sync. You can say something like "scan
>> until the end of the quoted string, handling escape sequences".
>
> ok
>
>>
>>
>>> } else if (*s == '$' && quoteChar == '"') {
>>> - debugs(3, DBG_CRITICAL, "Unsupported cfg macro: " << s);
>>> - self_destruct();
>>> + errorStr = "Unsupported cfg macro";
>>> + errorPos = s;
>>
>> If there is consensus that we do not need to support $macros now or in
>> the future (except for pre-processor SMP macros that are already handled
>> by the time the above code runs), then the above special $macor case can
>> be removed.
>
> I put the related code inside an "#if 0 ... #endif"
> I will remove it if required...
>
>>
>>
>>> + char *token = xstrdup(UnQuote(nextToken, &nextToken));
>>> + CfgLineTokens_.push(token);
>> ...
>>> + while (!CfgLineTokens_.empty()) {
>>> + char *token = CfgLineTokens_.front();
>>> + CfgLineTokens_.pop();
>>> + free(token);
>>> + }
>>
>> Can we use String or std::string for these stored tokens so that we do
>> not have to be so careful about allocating and deleting them?
>
> Not easy.
> This is requires to modify NextToken and related methods to return
> "const char *" instead of "char *".
> This is requires changes in many places. I remember I had problems when
> I try it...
>
>
>
>>
>>
>>> 4) Add the ConfigParser::RegexPattern() to get the next regex token
>>>
>>> 5) Add the ConfigParser::RegexStrtokFile() to get the next regex token
>>> which is compatible with the old strtokFile
>>
>>> +char *
>>> +ConfigParser::RegexStrtokFile()
>>> +{
>>> + ParseRegex_ = true;
>>> + char * token = strtokFile();
>>> + ParseRegex_ = false;
>>> + return token;
>>> +}
>>>
>>> +char *
>>> +ConfigParser::RegexPattern()
>>> +{
>>> + ParseRegex_ = true;
>>> + char * token = NextToken();
>>> + ParseRegex_ = false;
>>> + return token;
>>> +}
>>
>>
>> This feels wrong. It is good to have these methods so that we know which
>> callers need REs, but their behavior should be different IMO:
>>
>> * In legacy mode, they should call NextToken() and strtokFile() which
>> will parse legacy REs as expected. No need for ParseRegex_.
>>
>> * In strict mode, they should fail immediately so that we can introduce
>> proper RE support later, without screwing up already working
>> configurations that use strict mode. Supporting REs in strict mode is a
>> TODO, just like supporting single quoted strings is. For now, we just
>> need to make sure that we can add such support without breaking
>> configurations that start using strict mode.
>
> OK. I implement this behaviour for now.
> The parser return the message:
> "Can not read regex expresion while
> configuration_includes_quoted_values is enabled"
>
> When we are trying to read regex expresions (eg parse regex acl) when
> the configuration_includes_quoted_values is set to on.
>
> The user should use something like the following:
> # Enable quoted values on squid cfg file parsing
> configuration_includes_quoted_values on
> ....
> #Temporary disable quoted values to parse regex expresions
> configuration_includes_quoted_values off
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
> refresh_pattern . 0 20% 4320
> configuration_includes_quoted_values on
> ...
>
>>
>> For example, if we decide that REs in strict mode should start with '/'
>> or m{} as they do in Perl, then we have to make sure _now_ that no
>> legacy RE that starts with those characters sneaks into the strict mode
>> while we are working on RE support. If we fail to do that, we will
>> create a new set of fresh backward compatibility problems to deal with.
>
> I see....
>
>>
>>
>>> /// configuration_includes_quoted_values in squid.conf
>>> - static int RecognizeQuotedValues;
>>> + static bool RecognizeQuotedValues;
>>> +
>>> + /// Strict syntax mode. Does not allow not alphanumeric characters in unquoted tokens
>>> + static bool StrictMode;
>>
>> You may want to add a comment saying that StrictMode may remain false
>> when the legacy ConfigParser::NextQuotedToken() call forces
>> RecognizeQuotedValues to be temporary true. Otherwise, it is not clear
>> why we need both members.
>
> Ok
>
>>
>>
>>> + if (!PreviewMode_ || type == FunctionParameters)
>>> + parsePos = pos;
>>> + // else next call will read the same token;
>>
>> Please add a comment explaining this logic. It is not clear why parsePos
>> is updated if we are not in preview mode or if we are parsing
>> FunctionParameters [in preview mode].
>
> I add a comment.
> The true is that the related code is a little complex. But for function
> "parameters()" we need to update the current parsing position, even if
> we are in preview mode, because we need to read the next quoted string
> which is the "parameters()" argument, the name of the file we want to
> open to read the next valid token.
> For preview the caller needs the next valid token not the "parameters()"
> function and its argument.
>
>
>>
>>
>>> - * Return the last TokenUndo() or TokenPutBack() queued element, or NULL
>>> + * Return the last TokenPutBack() queued element, or NULL
>>
>> No comma is needed after this change.
>
> ok.
>
>>
>>
>> Thank you,
>>
>> Alex.
>>
>>
>
Received on Wed Aug 28 2013 - 13:15:46 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 12:00:33 MDT