Sane syntax

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 14 Sep 2012 10:10:07 -0600

On 09/14/2012 01:23 AM, Kinkie wrote:

> 3. define a sane overall syntax, ignoring backwards compatibility,

I am not sure others would support such a significant change/effort, but
here is how I would approach this:

I. Change parsing code so that all options have to call something like

   ConfigParser::nextElement()

rather than calling one of

   strtok(NULL, w_space),
   strtok(NULL, my_special_pattern), or
   ConfigParser::strtokFile()

functions and their combinations or variations. Converging on a
well-defined element extraction API will require non-trivial adjustment
of some of the parsing code, especially the one that uses custom
strtok() patterns, such as "eol" parsing code. The syntax of some of
those eol options will be changed.

We may need to add a robust block-quoting mechanism to handle inclusion
of HTML and other quote-reach text (see err_html_text for example).
Alternatively, such directives should be converted to load their text
from a file.

II. Change ConfigParser and friends to follow these rules:

0. Preprocessing.

   line = TBD but no changes expected here; we will continue
          to support line continuations using backslashes.

   comment = prefix-comment / suffix-comment

   prefix-comment = a line that starts with optional whitespace
                    followed by <#>

   suffix-comment = optional whitespace followed by <#>,
                    followed by optional whitespace and end of line

Comments are stripped first. Continuation lines are then merged if
needed and fed into stage-1 parser described below.

ConfigParser::nextElement() returns nil at the end of a [merged] line.

1. Structure.

   config = *( directive / whitespace )

   directive = token *( word / whitespace )

At this stage, the parsing is applied to [possibly merged] lines. That
is, there are no new line characters at this stage. Whitespace is ignored.

2. Word syntax.

   word := token / single_quoted_string / double_quoted_string

   token := 1*tchar

   single_quoted_string := <'> *(sqchar / escaped-pair) <'>
   double_quoted_string := <"> *(dqchar / escaped-pair) <">

   tchar = any char except whitespace, quotes, and backslash
   sqchar = any char except single quote and backslash
   dqchar = any char except double quote and backslash
   escaped-pair = backslash followed by any char except new line

The quotes surrounding quoted strings are removed before the word is
returned to the higher-level code. However, their presence is remembered
in the word flags as it is significant for word interpretation described
below.

Legal backslashes are removed before the word is returned to the
higher-level code. TBD: This is not exactly true because "\$macro" is
not a macro but "$macro" is.

At the expense of some backward compatibility, we can exclude more
special characters from tokens (e.g., we can exclude parenthesis and
various operator signs in case we decide to support arithmetic or logic
operations later). Alternatively, we can declare certain tokens reserved
for future use.

3. Word interpretation.

The following rules are used to go from a syntax-level "word" (a
sequence of characters) to a semantics-level "element" object returned
by ConfigParser::nextElement().

  * Words that start with a 5-letter "file:" prefix are interpreted as
file names. The corresponding file is loaded and ran through the
preprocessor. Each line in that file is then interpreted as a single
word. These words are returned via ConfigParser::nextElement() API,
transparently to the caller. TBD: Detail and explain that from-file word
syntax is different from #2 word syntax above because multiple tokens on
one line are interpreted as a single word even if they are not quoted.
TBD: We should honor quoted lines, but do we honor line continuations in
these files?

  * Double-quoted words are checked for macros. Macros are prohibited in
directive names. It is also an error to specify a macro that the
corresponding squid.conf directive parameter does not support. The
directive and the macros determine exactly when and how the macros are
expanded. TBD: Detail $macro and ${macro(parameters)} syntax.

  * Tokens and single-quoted words are not checked for macros.

TBD: We may want to interpret \n and \r inside double-quoted strings
specially so that it is possible to include new lines in directive
parameters. It may make sense to reserve other \<alphanumeric> sequences
as well.

The above needs more polishing and detail, but should be consistent and
mostly backward compatible. The syntax will handle ACL values with
spaces. Is that something we should move towards?

Thank you,

Alex.
Received on Fri Sep 14 2012 - 16:10:25 MDT

This archive was generated by hypermail 2.2.0 : Sat Sep 15 2012 - 12:00:06 MDT