Re: [PATCH] Unknown cfg function

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 01 Aug 2013 20:13:16 +1200

On 1/08/2013 4:59 a.m., Alex Rousskov wrote:
> Christos, Amos,
>
> Thank you for working on this! I hope we can avoid repeating the
> same set of mistakes thrice by carefully considering the big picture and
> upgrade path. Here is my understanding of how we want things to work,
> based on the review of the reported bugs and this thread so far:
>
>
> 1. configuration_includes_quoted_values defaults to off. The setting
> scope extends from the place where the directive is used to either the
> end of configuration or to the next use, whichever comes first.
Yes.

> 2. When configuration_includes_quoted_values is on, new "strict syntax"
> rules are enforced:
>
> 2a. "quoted values" and function()s are supported. %Macros are supported
> inside quoted values and only inside them. Unknown functions and macros
> terminate Squid. Code uses NextToken() to get tokens by default. A small
> subset of hand-picked directives may use NextQuotedOrToEol() or other
> special methods instead of NextToken() to accommodate more legacy cases.
> Logformat is one such example.
>
> 2b. A % sign can be \-escaped to block macro expansion in quoted strings
> where needed. A few "standard" escape sequences are supported such as
> \n. Unknown escape sequences terminate Squid.

I think expanding that. Any character can be \-escaped in the quoted
strings.
Which eliminates that escaped character from any special handling
whatever it was.

I don't really see any need for the cases like \n to be expanded. That
could lead to trouble if the \n was output to things like HTTP headers
or log files. So unless there is a good reason to add it I would vote to
omit that little part until some particular need makes us add it.

> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
> tokens containing any character other than alphanumeric, underscore,
> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.

That means that we will have to add documentation for a fairly long list
of directives noting how they are affected by the quoting.
But I am okay with that.

> 3. When configuration_includes_quoted_values is off, old "legacy syntax"
> is supported, to the extent possible:
>
> 3a. "quoted values", functions(), and %macros are not recognized or
> treated specially compared to Squid v3.3. Code uses NextToken() to get
> tokens by default. A small subset of hand-picked directives may use
> NextQuotedOrToEol() instead of NextToken() but that method does _not_
> treat quoted strings specially in legacy mode. Logformat is one such
> example.

If by not treating quoted strings specially you mean that it ONLY
performs the ToEol part of its intended behaviour and leaves the quotes
in its output for quoted strings. Then yes.

> 3b. I am not sure about "file.cfg" include syntax in legacy mode. I
> think it would be OK _not_ to support it (because fixing configurations
> to use new parameters() syntax should not be very difficult in most
> cases), but I may be wrong. If it is easy to continue to support
> "file.cfg" include style in NextToken() working in legacy mode, then we
> should do it.

The current parser still supports it IIRC. I don't see any need to
change that for legacy mode.

However rejecting them in strict mode is what we want.

> 4. Exceptions: A small set of directives that already supported quoted
> strings correctly before the introduction of
> configuration_includes_quoted_values must continue to support them.
> These directives should call NextQuotedOrLegacy() method. The method
> temporary forces configuration_includes_quoted_values to ON if and only
> if the next token starts with a quote.

Yes.

> This may create a few upgrade problems when, for example, somebody is
> using unsupported %macros inside quoted strings with these options, but
> the support for quoted strings in those headers was added relatively
> recently so there should not be many such cases, and it is not practical
> to treat these options as a complex third class. They have to obey all
> strict syntax rules when they use quoted strings (and also when
> configuration_includes_quoted_values is on, of course).

We can avoid a lot of those by being pedantic about the documentation.

debugs with DBG_PARSE_NOTE(...) can be used in *both* modes to highlight
problematic directives. Admin who follow the upgrade guidelines will
then have a bit of extra warning targeted at their specific needs.

> 5. Future changes:
>
> 5a. We revisit handling of 'single quoted strings' later, after the
> above is done. They are useful to disable macros and standard escape
> sequences in strings, but they are not a "must have" for now, and we
> need to decide how to treat \-escapes inside them. We will probably
> allow escaping of two characters only: \ and '.
>
> 5b. We revisit handling of REs, after the above is done. We will
> probably add a proper dedicated /quoting mechanism/ for them. For now,
> most REs require configuration_includes_quoted_values set to off.

I think a dedicated parse function is still needed for now. As Christos
suggested, ConfigParse::RegexPattern().

Whether that function validates some quoting mechanism or just takes in
a word as-is can be left for later though.

Amos
Received on Thu Aug 01 2013 - 08:15:49 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 01 2013 - 12:00:51 MDT