Re: [PATCH] Unknown cfg function

From: Tsantilas Christos <chtsanti_at_users.sourceforge.net>
Date: Wed, 21 Aug 2013 18:34:30 +0300

On 08/13/2013 11:57 PM, Alex Rousskov wrote:
> On 08/07/2013 03:11 AM, Tsantilas Christos wrote:
>> On 07/31/2013 07:59 PM, Alex Rousskov wrote:
>>> 2. When configuration_includes_quoted_values is on, new "strict syntax"
>>> rules are enforced:
>
>>> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
>>> tokens containing any character other than alphanumeric, underscore,
>>> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
>>> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.
>
>> OK.
>> In this patch allow the following ".,()-=_%/:"
>> We can easily add remove characters from this list.
>> If we remove from this list the ':' then we should require quotes to
>> define http, icap or ecap urls.
>> The % used to define percent values and the '/' to define filenames.
>
> I am OK with adding ':' and '/' to the allowed set.
>
> I think we should allow % at the end of a bare token (e.g., 100%) but
> not elsewhere (e.g., http://www.%host.com/). I am worried that by
> allowing %macro-like strings in bare tokens we would not be able to tell
> the admin when they forgot to quote a string that contains a macro.

OK. This patch allow only [0-9]+% tokens (eg 100%, 80% etc).

>
> Another potential problem here is with the '=' character. We have to
> allow it because existing code expects it in directive options with
> values (e.g., bypass=on). However, making '=' a part of the token makes
> it difficult to support quoted option values:
>
> bypass="on" // illegal because a token cannot have quotes
>
> "bypass=on" // legal but looks kind of weird
> // because there are actually several tokens here
>
> For simple values such as "on", this is not a problem, but not all
> option values are that simple, and some of them will require quoting:
>
> uri="icap://example.com/?mode=reqmod" // illegal?!
>
> "uri=icap://example.com/?mode=reqmod" // weird
>
> Changing all callers to treat name=value sequence as three tokens is
> probably too much work, right?

This is requires changes in many places...

>
>
> A similar problem exists for function() calls but we hack around it by
> parsing parameters() specially and requiring that file names are quoted,
> I guess. More on that below.
>
>
>
>
>> + bool saveRecognizeQuotedValues = ConfigParser::RecognizeQuotedValues;
> ...
>> + bool savePreview = ConfigParser::PreviewMode_;
>
>
> Please make those variables const.

ok

>
>
>> +bool ConfigParser::ParseQuotedOrToEOL_ = false;
>
> Please s/EOL/Eol/.

ok

>
>
>> +static const char *SQUID_ERROR_TOKEN = "SQUID_ERROR_TOKEN";
>
> The value should probably contain spaces and other special characters to
> minimize a chance of collision with the actual token. For example,
> "[invalid token]". Note that you never test for SQUID_ERROR_TOKEN being
> returned -- the code relies on known values to be different from the
> SQUID_ERROR_TOKEN value.

ok

>
>
>> + (void)ConfigParser::NextToken(); //Get token fron cfg line
>
> typo: s/fron/from/

fixed

>
>
>> /* scan until the end of the quoted string, unescaping " and \ */
>> - while (*s && *s != quoteChar) {
>> - if (*s == '\\' && isalnum(*( s + 1))) {
>> - debugs(3, DBG_CRITICAL, "Unsupported escape sequence: " << s);
>> - self_destruct();
>> + while (*s && *s != quoteChar && !errorStr && (d - UnQuoted) < sizeof(UnQuoted)) {
>
> The old comment is now out of sync. You can say something like "scan
> until the end of the quoted string, handling escape sequences".

ok

>
>
>> } else if (*s == '$' && quoteChar == '"') {
>> - debugs(3, DBG_CRITICAL, "Unsupported cfg macro: " << s);
>> - self_destruct();
>> + errorStr = "Unsupported cfg macro";
>> + errorPos = s;
>
> If there is consensus that we do not need to support $macros now or in
> the future (except for pre-processor SMP macros that are already handled
> by the time the above code runs), then the above special $macor case can
> be removed.

I put the related code inside an "#if 0 ... #endif"
I will remove it if required...

>
>
>> + char *token = xstrdup(UnQuote(nextToken, &nextToken));
>> + CfgLineTokens_.push(token);
> ...
>> + while (!CfgLineTokens_.empty()) {
>> + char *token = CfgLineTokens_.front();
>> + CfgLineTokens_.pop();
>> + free(token);
>> + }
>
> Can we use String or std::string for these stored tokens so that we do
> not have to be so careful about allocating and deleting them?

Not easy.
This is requires to modify NextToken and related methods to return
"const char *" instead of "char *".
This is requires changes in many places. I remember I had problems when
I try it...

>
>
>> 4) Add the ConfigParser::RegexPattern() to get the next regex token
>>
>> 5) Add the ConfigParser::RegexStrtokFile() to get the next regex token
>> which is compatible with the old strtokFile
>
>> +char *
>> +ConfigParser::RegexStrtokFile()
>> +{
>> + ParseRegex_ = true;
>> + char * token = strtokFile();
>> + ParseRegex_ = false;
>> + return token;
>> +}
>>
>> +char *
>> +ConfigParser::RegexPattern()
>> +{
>> + ParseRegex_ = true;
>> + char * token = NextToken();
>> + ParseRegex_ = false;
>> + return token;
>> +}
>
>
> This feels wrong. It is good to have these methods so that we know which
> callers need REs, but their behavior should be different IMO:
>
> * In legacy mode, they should call NextToken() and strtokFile() which
> will parse legacy REs as expected. No need for ParseRegex_.
>
> * In strict mode, they should fail immediately so that we can introduce
> proper RE support later, without screwing up already working
> configurations that use strict mode. Supporting REs in strict mode is a
> TODO, just like supporting single quoted strings is. For now, we just
> need to make sure that we can add such support without breaking
> configurations that start using strict mode.

OK. I implement this behaviour for now.
The parser return the message:
 "Can not read regex expresion while
configuration_includes_quoted_values is enabled"

When we are trying to read regex expresions (eg parse regex acl) when
the configuration_includes_quoted_values is set to on.

The user should use something like the following:
# Enable quoted values on squid cfg file parsing
configuration_includes_quoted_values on
....
#Temporary disable quoted values to parse regex expresions
configuration_includes_quoted_values off
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
configuration_includes_quoted_values on
...

>
> For example, if we decide that REs in strict mode should start with '/'
> or m{} as they do in Perl, then we have to make sure _now_ that no
> legacy RE that starts with those characters sneaks into the strict mode
> while we are working on RE support. If we fail to do that, we will
> create a new set of fresh backward compatibility problems to deal with.

I see....

>
>
>> /// configuration_includes_quoted_values in squid.conf
>> - static int RecognizeQuotedValues;
>> + static bool RecognizeQuotedValues;
>> +
>> + /// Strict syntax mode. Does not allow not alphanumeric characters in unquoted tokens
>> + static bool StrictMode;
>
> You may want to add a comment saying that StrictMode may remain false
> when the legacy ConfigParser::NextQuotedToken() call forces
> RecognizeQuotedValues to be temporary true. Otherwise, it is not clear
> why we need both members.

Ok

>
>
>> + if (!PreviewMode_ || type == FunctionParameters)
>> + parsePos = pos;
>> + // else next call will read the same token;
>
> Please add a comment explaining this logic. It is not clear why parsePos
> is updated if we are not in preview mode or if we are parsing
> FunctionParameters [in preview mode].

I add a comment.
The true is that the related code is a little complex. But for function
"parameters()" we need to update the current parsing position, even if
we are in preview mode, because we need to read the next quoted string
which is the "parameters()" argument, the name of the file we want to
open to read the next valid token.
For preview the caller needs the next valid token not the "parameters()"
function and its argument.

>
>
>> - * Return the last TokenUndo() or TokenPutBack() queued element, or NULL
>> + * Return the last TokenPutBack() queued element, or NULL
>
> No comma is needed after this change.

ok.

>
>
> Thank you,
>
> Alex.
>
>

Received on Wed Aug 21 2013 - 15:35:04 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 28 2013 - 12:00:33 MDT