Re: [PATCH] Unknown cfg function

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Tue, 13 Aug 2013 14:57:03 -0600

On 08/07/2013 03:11 AM, Tsantilas Christos wrote:
> On 07/31/2013 07:59 PM, Alex Rousskov wrote:
>> 2. When configuration_includes_quoted_values is on, new "strict syntax"
>> rules are enforced:

>> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
>> tokens containing any character other than alphanumeric, underscore,
>> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
>> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.

> OK.
> In this patch allow the following ".,()-=_%/:"
> We can easily add remove characters from this list.
> If we remove from this list the ':' then we should require quotes to
> define http, icap or ecap urls.
> The % used to define percent values and the '/' to define filenames.

I am OK with adding ':' and '/' to the allowed set.

I think we should allow % at the end of a bare token (e.g., 100%) but
not elsewhere (e.g., http://www.%host.com/). I am worried that by
allowing %macro-like strings in bare tokens we would not be able to tell
the admin when they forgot to quote a string that contains a macro.

Another potential problem here is with the '=' character. We have to
allow it because existing code expects it in directive options with
values (e.g., bypass=on). However, making '=' a part of the token makes
it difficult to support quoted option values:

  bypass="on" // illegal because a token cannot have quotes

  "bypass=on" // legal but looks kind of weird
              // because there are actually several tokens here

For simple values such as "on", this is not a problem, but not all
option values are that simple, and some of them will require quoting:

    uri="icap://example.com/?mode=reqmod" // illegal?!

    "uri=icap://example.com/?mode=reqmod" // weird

Changing all callers to treat name=value sequence as three tokens is
probably too much work, right?

A similar problem exists for function() calls but we hack around it by
parsing parameters() specially and requiring that file names are quoted,
I guess. More on that below.

> + bool saveRecognizeQuotedValues = ConfigParser::RecognizeQuotedValues;
...
> + bool savePreview = ConfigParser::PreviewMode_;

Please make those variables const.

> +bool ConfigParser::ParseQuotedOrToEOL_ = false;

Please s/EOL/Eol/.

> +static const char *SQUID_ERROR_TOKEN = "SQUID_ERROR_TOKEN";

The value should probably contain spaces and other special characters to
minimize a chance of collision with the actual token. For example,
"[invalid token]". Note that you never test for SQUID_ERROR_TOKEN being
returned -- the code relies on known values to be different from the
SQUID_ERROR_TOKEN value.

> + (void)ConfigParser::NextToken(); //Get token fron cfg line

typo: s/fron/from/

> /* scan until the end of the quoted string, unescaping " and \ */
> - while (*s && *s != quoteChar) {
> - if (*s == '\\' && isalnum(*( s + 1))) {
> - debugs(3, DBG_CRITICAL, "Unsupported escape sequence: " << s);
> - self_destruct();
> + while (*s && *s != quoteChar && !errorStr && (d - UnQuoted) < sizeof(UnQuoted)) {

The old comment is now out of sync. You can say something like "scan
until the end of the quoted string, handling escape sequences".

> } else if (*s == '$' && quoteChar == '"') {
> - debugs(3, DBG_CRITICAL, "Unsupported cfg macro: " << s);
> - self_destruct();
> + errorStr = "Unsupported cfg macro";
> + errorPos = s;

If there is consensus that we do not need to support $macros now or in
the future (except for pre-processor SMP macros that are already handled
by the time the above code runs), then the above special $macor case can
be removed.

> + char *token = xstrdup(UnQuote(nextToken, &nextToken));
> + CfgLineTokens_.push(token);
...
> + while (!CfgLineTokens_.empty()) {
> + char *token = CfgLineTokens_.front();
> + CfgLineTokens_.pop();
> + free(token);
> + }

Can we use String or std::string for these stored tokens so that we do
not have to be so careful about allocating and deleting them?

> 4) Add the ConfigParser::RegexPattern() to get the next regex token
>
> 5) Add the ConfigParser::RegexStrtokFile() to get the next regex token
> which is compatible with the old strtokFile

> +char *
> +ConfigParser::RegexStrtokFile()
> +{
> + ParseRegex_ = true;
> + char * token = strtokFile();
> + ParseRegex_ = false;
> + return token;
> +}
>
> +char *
> +ConfigParser::RegexPattern()
> +{
> + ParseRegex_ = true;
> + char * token = NextToken();
> + ParseRegex_ = false;
> + return token;
> +}

This feels wrong. It is good to have these methods so that we know which
callers need REs, but their behavior should be different IMO:

* In legacy mode, they should call NextToken() and strtokFile() which
will parse legacy REs as expected. No need for ParseRegex_.

* In strict mode, they should fail immediately so that we can introduce
proper RE support later, without screwing up already working
configurations that use strict mode. Supporting REs in strict mode is a
TODO, just like supporting single quoted strings is. For now, we just
need to make sure that we can add such support without breaking
configurations that start using strict mode.

For example, if we decide that REs in strict mode should start with '/'
or m{} as they do in Perl, then we have to make sure _now_ that no
legacy RE that starts with those characters sneaks into the strict mode
while we are working on RE support. If we fail to do that, we will
create a new set of fresh backward compatibility problems to deal with.

> /// configuration_includes_quoted_values in squid.conf
> - static int RecognizeQuotedValues;
> + static bool RecognizeQuotedValues;
> +
> + /// Strict syntax mode. Does not allow not alphanumeric characters in unquoted tokens
> + static bool StrictMode;

You may want to add a comment saying that StrictMode may remain false
when the legacy ConfigParser::NextQuotedToken() call forces
RecognizeQuotedValues to be temporary true. Otherwise, it is not clear
why we need both members.

> + if (!PreviewMode_ || type == FunctionParameters)
> + parsePos = pos;
> + // else next call will read the same token;

Please add a comment explaining this logic. It is not clear why parsePos
is updated if we are not in preview mode or if we are parsing
FunctionParameters [in preview mode].

> - * Return the last TokenUndo() or TokenPutBack() queued element, or NULL
> + * Return the last TokenPutBack() queued element, or NULL

No comma is needed after this change.

Thank you,

Alex.
Received on Tue Aug 13 2013 - 20:57:16 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 21 2013 - 12:01:31 MDT