Re: Revised approach to fixing configuration syntax

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 28 Sep 2013 21:11:09 +1200

On 28/09/2013 4:20 a.m., Alex Rousskov wrote:
> On 09/27/2013 12:20 AM, Amos Jeffries wrote:
>> On 27/09/2013 6:14 a.m., Alex Rousskov wrote:
>>> NextChars(const char *); // the exact sequence of characters given
>>> NextToken(String &t); // next name or token as defined in the grammar
>>> NextValue(String &v, kind); // next value as defined in the grammar
>>> NextSpace(String *v); // Optional White Space (OWS) as in grammar
>>> NextLineEnd(); // end of line; may be \0 if preprocessor strips \n
>>>
>>> Mark CurrentPosition(); // current parsing position
>>> void RevertTo(Mark mark); // reverts low-level parsing state to mark
>> This implies that we require StringNG terminator-agnostic ability for
>> the String& in all the above functions - OR to copy each potential line
>> into a temporary String which can be dropped on unwind.
> Yes, we should use String unless SBuf is available. SBuf would be a lot
> more efficient for this, of course. My notes were written before the
> latest Kinkie's SBuf activity; I could not count on it being available.
>
>
>> Comment handling (see blow for the fix), and the detail that StringNG
>> will allow us to implement the undo fuctionality far better than String
>> - which implies that the large amount of conversion work will have an
>> almost as larhe amount of conversion to SBuf later in 3.5.
> Fortunately, Christos already converted most of the calling code to use
> previous incarnation of various Next*() methods, so the amount of
> additional work is not huge. It would be nice to start with SBuf so that
> we do not have to convert String to SBuf later, of course. I think there
> is a good chance that Kinkie would be able to merge SBuf soon.
>
>
>> Since this design is again prioritizing backward compatibility I think
>> we can schedule this for 3.5+ and work on some bridging polish to avoid
>> 3.4 parser having any user noticible difference from the final strict
>> parser. For starters ignore the code internals being vastly different
>> maintenance nightmare - that is not a user-visible aspect and we will
>> cope with that pain ourselves.
>>
>> What I am thinking is that if we agree on this grammar being suitable
>> (you and I mostly agree, lets see what the others think) then we can
>> massage the 3.4 parser to cope with it. No matter how nasty the hacks
>> may have to be in that version we drop them in 3.5 going forward. I do
>> not see re-implementing the entire parser starting today as a quick
>> enough project to fix things before 3.4 needs to be stable.
> Your call regarding v3.4, but I suggest the following for that branch:
>
> - Undo recent parsing changes.
>
> - Add simple, disabled-by-default support for whitespaces in ACLs:
> https://code.launchpad.net/~measurement-factory/squid/conf-quoted-str
>
> This way, v3.4 syntax and parsing code remain pretty much the same as in
> v3.3 but addresses the most pressing need (spaces in ACLs). The code in
> the above quoted branch may need some review and massaging to get it
> into v3.4, but nothing major (the code is working in production), and we
> can help with that. For example, one must change the default for
> configuration_includes_quoted_values and may rename that option.
>
>
>> Since comments go to EOL it will collide with the preprocessor
>> un-wrapping wrapped lines if a comment is added anywhere space is
>> possible. I am inclined to make comment a case in itself with the
>> following alterations to the grammar:
>>
>> config = *( OWS directive / comment <new line> )
>>
>> directive = name *( RWS parameter ) comment <new line>
>>
>> whitespace = *ascii_whitespace_char
> The above does not quite work because it prohibits whitespace after a
> directive. What you probably want is this:
>
> config = *( OWS [directive] OWS [comment] <new line> )
>
> directive = name *( RWS parameter )
>
> ; note that this prohibits #foo comments at the beginning
> ; of a line, which is something we want to support
> ; if we already support them now (but that is fixable)
> comment = <#> RWS followed by any chars until <new line>
>
> OWS = *ascii_whitespace_char
> RWS = ascii_whitespace_char OWS
>
>
> Note that both the original and the above grammars need to be augmented
> to say that a comment takes precedence over relaxed_value.
>
> I wish we could also support the following:
>
> # here are my well-documented access rules
> http_access allow \
> myself # let myself in \
> goodGuys # allow the good guys \
> nsa # and some special cases
>
> but to do that we need to either
>
> a) move folding support into the primary parser (bad idea IMO); or
>
> b) adjust the preprocessor to leave new line characters in the parsing
> buffer while folding and adjust the primary grammar to add new line
> characters to whitespace/comments/strings while treating "end of buffer"
> specially (I favor this option).
>
>
>> Line wrapping grammar is this:
>>
>> fold = <\> <new line>
>>
>> whitespace = *( ascii_whitespace_char / fold )
>>
>> with no need or possibility to fold inside a comment.
> Today, the above does not belong to the primary grammar because folding
> is done by the preprocessor, which has its own grammar. The primary
> parser never sees the fold sequence. We can change that, of course, but
> I am not sure it is a good idea as it will significantly complicate things.
>
> For example, if preprocessor does folding, one can have long text occupy
> miltiple lines using quoted strings. If the primary parser does folding,
> quoted strings (as defined now) cannot be folded. We can change quoted
> string definitions to allow folding, but similar complications will
> arise in any multi-character sequence, so I think that adding folding
> into the primary grammar is wrong.
>
>
>> comments should be a # _after_ some whitespace. Not instead of,
>> otherwise we cannot tell foo#bar from "foo #bar"
> Yeah, this is probably needed to minimize conflicts. The question is
> whether we want to treat every lines below as a comment?
>
> # nothing special, clearly a comment
> # no space before hash, beginning of line
> #no space after hash, "beginning" of line
> #no space before or after hash, beginning of line
The above are all supported today AFAIK. So they need to stay supported.
This type of comment can even be dropped by the preprocessor along with
empty lines, so we don't have to worry about it in the primary parser.

> foo bar#comment
> foo bar #comment
> foo bar# comment

Ideally none of those would be. However I think the second one is likely
to exist in some configs already.

I'm definitely in favour of requiring the space before the # though. It
solves quite a few of those corner cases with macros etc.

>
>> It is good to have this written down finally.
> If the discussion here does not ban the whole idea, we should put the
> grammar on some wiki page. It does not belong to /Feature/, does it?

Maybe the /ProgrammingGuide/ section. Where you have it now is okay
meanwhile.

I have just committed a basic tester for squid.conf. There are some
extensions I still want to do later for testing FATAL handling but this
makes Squid at leats test the default configuration file and allows for
backward compatibility and edge case unit tests to be added easily.

Amos
Received on Sat Sep 28 2013 - 09:11:21 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 28 2013 - 12:00:11 MDT