Re: [PATCH] Unknown cfg function

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 01 Aug 2013 10:40:00 -0600

On 08/01/2013 02:16 AM, Amos Jeffries wrote:
> On 1/08/2013 4:59 a.m., Alex Rousskov wrote:
>> Christos, Amos,
>>
>> Thank you for working on this! I hope we can avoid repeating the
>> same set of mistakes thrice by carefully considering the big picture and
>> upgrade path. Here is my understanding of how we want things to work,
>> based on the review of the reported bugs and this thread so far:
>>
>>
>> 1. configuration_includes_quoted_values defaults to off. The setting
>> scope extends from the place where the directive is used to either the
>> end of configuration or to the next use, whichever comes first.
> Yes.
>
>> 2. When configuration_includes_quoted_values is on, new "strict syntax"
>> rules are enforced:
>>
>> 2a. "quoted values" and function()s are supported. %Macros are supported
>> inside quoted values and only inside them. Unknown functions and macros
>> terminate Squid. Code uses NextToken() to get tokens by default. A small
>> subset of hand-picked directives may use NextQuotedOrToEol() or other
>> special methods instead of NextToken() to accommodate more legacy cases.
>> Logformat is one such example.
>>
>> 2b. A % sign can be \-escaped to block macro expansion in quoted strings
>> where needed. A few "standard" escape sequences are supported such as
>> \n. Unknown escape sequences terminate Squid.
>
> I think expanding that. Any character can be \-escaped in the quoted
> strings.
> Which eliminates that escaped character from any special handling
> whatever it was.

That only works if we do not support any escape sequences, which I think
is a bad idea as discussed below:

> I don't really see any need for the cases like \n to be expanded.

The primary reason is support for non-ASCII characters but it is also
useful for supporting non-printable characters and other special
sequences. Nearly all languages I know about support escape sequences,
and while we can add similar support using functions and string
concatenations later, I think escape sequences is a better approach.

Please note that I am not proposing that we add such expanded support
now. We do not even have to support "standard" things like \n today. \%
and \" is probably sufficient. The key here is to prohibit unknown
escape sequences so that we can add such support later.

> So unless there is a good reason to add it I would vote to
> omit that little part until some particular need makes us add it.

No, we cannot add it later without a lot of pain. This would be the same
mistake we made ~15 years ago. If you allow \n to mean "n" now, we
cannot start supporting a "standard" \n sequence (as a new line) in a
few years because there will be squid.confs that contain \n sequences
with the expectation that it will be expanded into "n". Just like right
now we have to deal with squid.confs that already contain function()s,
"quotes" and %macros but do not expect them to be treated specially.

\n may not be the best example, but I hope you see my point why syntax
cannot be upgraded unless it is initially restricted.

>> 2c. By default, token delimiter is whitespace. Bare (i.e., unquoted)
>> tokens containing any character other than alphanumeric, underscore,
>> period, '(', ')', '=', and '-' terminate Squid. For example, foo{bar},
>> foo_at_bar, and foo"bar" terminate Squid in strict syntax mode.
>
> That means that we will have to add documentation for a fairly long list
> of directives noting how they are affected by the quoting.
> But I am okay with that.

I do not see why any per-directive documentation would be needed in this
context. 2c applies to all directives. This is the whole idea behind
strict syntax -- consistent, universal rules for all configuration elements.

>> 3. When configuration_includes_quoted_values is off, old "legacy syntax"
>> is supported, to the extent possible:
>>
>> 3a. "quoted values", functions(), and %macros are not recognized or
>> treated specially compared to Squid v3.3. Code uses NextToken() to get
>> tokens by default. A small subset of hand-picked directives may use
>> NextQuotedOrToEol() instead of NextToken() but that method does _not_
>> treat quoted strings specially in legacy mode. Logformat is one such
>> example.
>
> If by not treating quoted strings specially you mean that it ONLY
> performs the ToEol part of its intended behaviour and leaves the quotes
> in its output for quoted strings. Then yes.

Yes, that is what I meant. The caller code does not change just because
configuration_includes_quoted_values changes, but NextToken() and
NextQuotedOrToEol() behavior does.

>> 3b. I am not sure about "file.cfg" include syntax in legacy mode. I
>> think it would be OK _not_ to support it (because fixing configurations
>> to use new parameters() syntax should not be very difficult in most
>> cases), but I may be wrong. If it is easy to continue to support
>> "file.cfg" include style in NextToken() working in legacy mode, then we
>> should do it.
>
> The current parser still supports it IIRC. I don't see any need to
> change that for legacy mode.

Yes, if it is still supported, we should continue to support it in
legacy mode.

> However rejecting them in strict mode is what we want.

Exactly. In strict mode, it becomes a regular quoted string. No special
handling.

>> 4. Exceptions: A small set of directives that already supported quoted
>> strings correctly before the introduction of
>> configuration_includes_quoted_values must continue to support them.
>> These directives should call NextQuotedOrLegacy() method. The method
>> temporary forces configuration_includes_quoted_values to ON if and only
>> if the next token starts with a quote.
>
> Yes.
>
>> This may create a few upgrade problems when, for example, somebody is
>> using unsupported %macros inside quoted strings with these options, but
>> the support for quoted strings in those headers was added relatively
>> recently so there should not be many such cases, and it is not practical
>> to treat these options as a complex third class. They have to obey all
>> strict syntax rules when they use quoted strings (and also when
>> configuration_includes_quoted_values is on, of course).
>
> We can avoid a lot of those by being pedantic about the documentation.
>
> debugs with DBG_PARSE_NOTE(...) can be used in *both* modes to highlight
> problematic directives. Admin who follow the upgrade guidelines will
> then have a bit of extra warning targeted at their specific needs.

Issuing warnings in legacy mode is a good idea.

I do not think such warnings should be issued in strict mode because
there would be no way to avoid them in valid, verified configurations.
The strict mode is not the default so folks that enable it are
responsible for the consequences.

>> 5. Future changes:
>>
>> 5a. We revisit handling of 'single quoted strings' later, after the
>> above is done. They are useful to disable macros and standard escape
>> sequences in strings, but they are not a "must have" for now, and we
>> need to decide how to treat \-escapes inside them. We will probably
>> allow escaping of two characters only: \ and '.
>>
>> 5b. We revisit handling of REs, after the above is done. We will
>> probably add a proper dedicated /quoting mechanism/ for them. For now,
>> most REs require configuration_includes_quoted_values set to off.

> I think a dedicated parse function is still needed for now. As Christos
> suggested, ConfigParse::RegexPattern().

I special function would be a step forward because it will be needed to
parse strict REs, but I think that function will just call NextToken()
or whatever it is we are using for REs now, so it will not add any new
functionality or change the old behavior today.

In summary, it looks like escape sequences in strict "quoted strings" is
the only serious disagreement between the two of us right now. The other
issues appear to be minor.

Thank you,

Alex.
Received on Thu Aug 01 2013 - 16:40:32 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 01 2013 - 12:00:51 MDT