Re: Revised approach to fixing configuration syntax

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 27 Sep 2013 10:20:12 -0600

On 09/27/2013 12:20 AM, Amos Jeffries wrote:
> On 27/09/2013 6:14 a.m., Alex Rousskov wrote:
>> NextChars(const char *); // the exact sequence of characters given
>> NextToken(String &t); // next name or token as defined in the grammar
>> NextValue(String &v, kind); // next value as defined in the grammar
>> NextSpace(String *v); // Optional White Space (OWS) as in grammar
>> NextLineEnd(); // end of line; may be \0 if preprocessor strips \n
>>
>> Mark CurrentPosition(); // current parsing position
>> void RevertTo(Mark mark); // reverts low-level parsing state to mark

> This implies that we require StringNG terminator-agnostic ability for
> the String& in all the above functions - OR to copy each potential line
> into a temporary String which can be dropped on unwind.

Yes, we should use String unless SBuf is available. SBuf would be a lot
more efficient for this, of course. My notes were written before the
latest Kinkie's SBuf activity; I could not count on it being available.

> Comment handling (see blow for the fix), and the detail that StringNG
> will allow us to implement the undo fuctionality far better than String
> - which implies that the large amount of conversion work will have an
> almost as larhe amount of conversion to SBuf later in 3.5.

Fortunately, Christos already converted most of the calling code to use
previous incarnation of various Next*() methods, so the amount of
additional work is not huge. It would be nice to start with SBuf so that
we do not have to convert String to SBuf later, of course. I think there
is a good chance that Kinkie would be able to merge SBuf soon.

> Since this design is again prioritizing backward compatibility I think
> we can schedule this for 3.5+ and work on some bridging polish to avoid
> 3.4 parser having any user noticible difference from the final strict
> parser. For starters ignore the code internals being vastly different
> maintenance nightmare - that is not a user-visible aspect and we will
> cope with that pain ourselves.
>
> What I am thinking is that if we agree on this grammar being suitable
> (you and I mostly agree, lets see what the others think) then we can
> massage the 3.4 parser to cope with it. No matter how nasty the hacks
> may have to be in that version we drop them in 3.5 going forward. I do
> not see re-implementing the entire parser starting today as a quick
> enough project to fix things before 3.4 needs to be stable.

Your call regarding v3.4, but I suggest the following for that branch:

 - Undo recent parsing changes.

 - Add simple, disabled-by-default support for whitespaces in ACLs:
   https://code.launchpad.net/~measurement-factory/squid/conf-quoted-str

This way, v3.4 syntax and parsing code remain pretty much the same as in
v3.3 but addresses the most pressing need (spaces in ACLs). The code in
the above quoted branch may need some review and massaging to get it
into v3.4, but nothing major (the code is working in production), and we
can help with that. For example, one must change the default for
configuration_includes_quoted_values and may rename that option.

> Since comments go to EOL it will collide with the preprocessor
> un-wrapping wrapped lines if a comment is added anywhere space is
> possible. I am inclined to make comment a case in itself with the
> following alterations to the grammar:
>
> config = *( OWS directive / comment <new line> )
>
> directive = name *( RWS parameter ) comment <new line>
>
> whitespace = *ascii_whitespace_char

The above does not quite work because it prohibits whitespace after a
directive. What you probably want is this:

  config = *( OWS [directive] OWS [comment] <new line> )

  directive = name *( RWS parameter )

  ; note that this prohibits #foo comments at the beginning
  ; of a line, which is something we want to support
  ; if we already support them now (but that is fixable)
  comment = <#> RWS followed by any chars until <new line>

  OWS = *ascii_whitespace_char
  RWS = ascii_whitespace_char OWS

Note that both the original and the above grammars need to be augmented
to say that a comment takes precedence over relaxed_value.

I wish we could also support the following:

  # here are my well-documented access rules
  http_access allow \
      myself # let myself in \
      goodGuys # allow the good guys \
      nsa # and some special cases

but to do that we need to either

a) move folding support into the primary parser (bad idea IMO); or

b) adjust the preprocessor to leave new line characters in the parsing
buffer while folding and adjust the primary grammar to add new line
characters to whitespace/comments/strings while treating "end of buffer"
specially (I favor this option).

> Line wrapping grammar is this:
>
> fold = <\> <new line>
>
> whitespace = *( ascii_whitespace_char / fold )
>
> with no need or possibility to fold inside a comment.

Today, the above does not belong to the primary grammar because folding
is done by the preprocessor, which has its own grammar. The primary
parser never sees the fold sequence. We can change that, of course, but
I am not sure it is a good idea as it will significantly complicate things.

For example, if preprocessor does folding, one can have long text occupy
miltiple lines using quoted strings. If the primary parser does folding,
quoted strings (as defined now) cannot be folded. We can change quoted
string definitions to allow folding, but similar complications will
arise in any multi-character sequence, so I think that adding folding
into the primary grammar is wrong.

> comments should be a # _after_ some whitespace. Not instead of,
> otherwise we cannot tell foo#bar from "foo #bar"

Yeah, this is probably needed to minimize conflicts. The question is
whether we want to treat every lines below as a comment?

 # nothing special, clearly a comment
# no space before hash, beginning of line
 #no space after hash, "beginning" of line
#no space before or after hash, beginning of line
foo bar#comment
foo bar #comment
foo bar# comment

> It is good to have this written down finally.

If the discussion here does not ban the whole idea, we should put the
grammar on some wiki page. It does not belong to /Feature/, does it?

Thank you,

Alex.
Received on Fri Sep 27 2013 - 16:20:42 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 28 2013 - 12:00:11 MDT