Re: Squid 3.4.0.1 configurator problems

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Thu, 26 Sep 2013 23:23:46 -0600

On 09/26/2013 12:13 PM, Amos Jeffries wrote:
> On 09/26/2013, Alex Rousskov wrote:
>> The only real problem with /re/ syntax as the default is that it does
>> not work well with URLs, which are very common in Squid patterns. That
>> is why I think a string-based "re" may be a better default for Squid.

> Which menas that is make escaping mandatory in one form or another.
> Which is giant leap #1 down the slipery slope towards
> "/http:\\\/\\\/foo\\/i broke
> it\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/?how/"

The "re" syntax does not use / characters at all so your example, if I
understand it correctly, would be written as

  "http://foo/i broke it how"

> With string based or any other delimiter (including '/') we cannot
> differentiate the pattern token from the delimiter token without
> escaping the pattern token, then any escape-characters in the pattern as
> well.

Sure, but that includes delimiters like () and []. The best solution for
that problem that I know of is to allow folks to use the delimiter that
they want (probably because it does not occur in their specific RE).
Perl uses that approach, and I personally use that Perl feature
frequently. I can think of only one other alternative (approach 1
described below) but it seems too complex to me.

> Given your code expertise you have possibly read the same or
> similar language design document I did about this problem.

Sorry, I do not know which document you are talking about, but I would
be very happy to read it, especially if it proves me wrong.

> Using () brackets or [] brackets we get that nice pairing guarantee
> from regex (in all the flavours I'm aware of) and can apply the above
> mentioned algorithm without any escaping necessary at the squid.conf
> level.

Are you proposing to use the regular expression library itself (or an
equivalent hand-written code) to extract regular expressions from
squid.conf? That is the only case where the RE syntax helps guarantee
something. In all other cases, before the regex library gets the regular
expression and can guarantee anything, Squid has to extract that
expression from squid.conf.

There are two ways Squid can extract a regular expression from squid.conf:

1) By understanding full regular expression syntax. This is doable, but
is not currently supported and is not easy to support correctly (unless
the RE library exposes such parsing support for us). This does not
require escaping RE parts that confuse the Squid parser because there
are no such parts -- the Squid parser becomes fully RE-aware itself!

2) By only understanding squid.conf expression syntax. This is what we
currently support (albeit with poor syntax) and it is relatively easy to
support as long as we keep the syntax simple. This does require escaping
RE parts that confuse the Squid parser (often resulting in double
escaping unless mitigated by a configurable RE delimiter).

For example, consider the following regular expression that starts with
a letter "e" and ends with a right square bracket "]". This example RE
matches one of three sequences of 5 characters such as "ends)".

    ends[) (]

Using approach (1), we could write

    acl foo url_regex (ends[) (])

and the Squid parser would use the outer parenthesis to find the end of
the regular expression and would identify that the parenthesis and space
inside the square brackets is a part of that regular expression. No
problem (except that implementing such a parser is probably very difficult).

Using approach (2) with parens as fixed RE delimiters, we could write

    acl foo url_regex (ends[\) \(])

and the Squid parser would use the outer parenthesis to find the end of
the regular expression and would unescape and ignore the parenthesis
inside the square brackets. No problem (except that the admin must
escape those inner parenthesis, which quickly become tricky when the RE
itself uses parenthesis for grouping.

Using approach (2) with flexible RE delimiter, we could write

    acl foo url_regex /ends[) (]/
or
    acl foo url_regex {ends[) (]}
or
    acl foo url_regex @ends[) (]@

and it will all work without double escaping.

And, just for completeness sake, I would mention that we do need some RE
delimiter in this case because without it any parser would see two
[invalid] regular expressions separated by space instead of one:

    acl foo url_regex ends[) (]

I hope the above illustrates why no single fixed RE delimiter can solve
the double escaping problem (and why the RE syntax itself does not help)
unless we start supporting full RE syntax inside the squid.conf parser
itself.

Cheers,

Alex.
Received on Fri Sep 27 2013 - 05:24:07 MDT

This archive was generated by hypermail 2.2.0 : Fri Sep 27 2013 - 12:00:11 MDT