Re: [PATCH] Add request_header_add option and [request|reply]_header_* manglers fixes

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sat, 30 Jun 2012 08:28:01 -0600

On 06/29/2012 08:45 PM, Amos Jeffries wrote:
> On 30/06/2012 3:53 a.m., Alex Rousskov wrote:
>> On 06/29/2012 01:44 AM, Amos Jeffries wrote:
>>> Can you please make this use the more flexible (logformat) tokens
>>> from format/libformat.la instead? I'm trying to kill off the older
>>> token sets and this is adding yet another format tokenizer to keep
>>> track of.
>> I do share the desire to unify macro parsing and substitution in various
>> Squid places (logging, error pages, helpers, and various squid.conf
>> options with string values). It will give us a lot more configuration
>> power and better code. However, I think that should not be placed as a
>> requirement on this feature because:
>>
>> a) We already have several different pieces of code that do macro
>> substitution or implement a similar functionality. It would be best to
>> have a dedicated project that will unify all that code, taking into
>> consideration all the different code requirements we already have. This
>> will be a difficult, sizable project, focusing on internal code
>> structure rather than user-facing functionality.
>
> So why add a 5th one? Particulary, a 5th with distinct set of tokens.

because the existing ones could not satisfy our needs (without
significant changes) _and_ adding yet another one did not make the
problem much worse. Now, it is possible that we got it wrong, and the
libformat API can actually be used without much changes. I will double
check.

> There is a dedicated project to unify the tokenizers. It produced
> libformat.la. I'm just asking for that project not to be undermined by
> new code. We are already going to have to create special
> backward-compatible parsing for ssl_crtd and external_acl_type formats.

What special parsing does ssl_crtd need in libformat area?

> The main objection from me is the UI tokens.
>
> I would be happy with a temporary tokenizer provided:
> * is used the same logformat token *names* where an existing one
> matches the content (eg %SRC ---> %[tcp::]>a)

> * tokens for SSL used a namespaced format consistent with the
> documented logformat tokens. Example:" %ssl::>{field}cert " - when
> libformat does not (currently) provide a matching token already. In a
> namespaced form consistent with the other documented logformat tokens.
> But why are these needed immedately in the first round of header_add
> patch anyway?

I am happy to rename new macros in the patch to standardize in the
logformat direction. I probably picked the names from the parts of
squid.conf that have not been renamed in the right direction yet. Our
uncertainty which names to use should have been pointed out before or
during the patch submission, sorry.

>>> EXTRA NOTE: along these grounds I was hoping to convert:
>>> X_header_replace Foo some random new header contents
>>>
>>> into:
>>> format <name> <some[ header contents format tokens]>
>>> X_header_replace Foo format=<name> [options] [acl acl ...]
>>>
>>> (and for backward compatibility absence of the "format=" option in the
>>> first token meaning old style random header text content and no ACLs.
>> Why? The proposed
>>
>> x_header_action <name> <value> [acl ...]
>>
>> seems a lot more "natural" to me. The "format" approach makes sense
>> where the same named format is used multiple times in the same config. I
>> doubt that would be common for custom header values.
>
> When the same output is needed for several different ACL lines:
>
> x_header_action Foo "Hello World" authedusers
> x_header_action Foo "Hello World" localnet

This problem is an ACL problem, not header value formatting problem: We
need an ability to name an "or" acl to avoid repeating the same option
name, the same header name, and the same header value. For example:

    acl needsFoo = authedusers or localnet
    x_header_action Foo "Hello World" needsFoo

Please note how this looks more natural, addresses the true problem,
eliminates all repetition, and even eases internal ACL optimization.

[The exact syntax to define "and", "or", and similar "acl expression"
ACLs is to be discussed separately. This is just one example, and the
whole issue is outside this patch scope, of course. I just want to
convince you that "format=" is not the best solution/direction for the
repetition problem you outlined.]

>>> src/cf.data.pre:
>>>
>>> * Please change:
>>> "
>>> If ACLs of a "deny" rule match, the header + field is removed (unless
>>> its value is replaced by the + corresponding request_header_replace
>>> option)
>>> "
>>> with:
>>> "
>>> If ACLs of a "deny" rule match, the header is removed and
>>> request_header_replace is then checked to identify if the removed header
>>> has a replacement.
>>> "
>> This is not what happens though, on two levels:
>>
>> 1) If there is a replacement, the entire header is not removed but the
>> current field value is replaced. This may not matter much in most cases,
>> but the in-place replacement may have an effect on the final order of
>> the header fields.
>>
>> 2) If there is no replacement, the entire header is not removed. Only
>> the current header field is removed. There may be multiple header fields
>> with the same header name.
>
> Um, Will have to check but I think this violates the RFCs clauses about
> mutiple headers combining before processing.

I do not think it necessarily violates those clauses because it can be
viewed as a tool to surgically remove or replace some of those values
even though they have been combined before processing. Just because we
"combine values before processing" does not mean we cannot manipulate
individual values. IIRC, such manipulation is even explicitly permitted
for Via, for example.

> The admin did not configure a particular line of the header was to be
> blocked/replaced. But that the whole Foo header was.

Agreed. There is a conflict between how these options are documented
(operations appear to be defined for combined headers) and how they are
implemented (Squid works with individual header fields). The proposed
patch attempts to reduce that conflict by documenting what actually
happens. The patch does not fix the implementation, but it does not make
it worse either.

> Also the ACL testing can;t be used to identify the second Foo header
> value if we happen to be manglign the first. It just matches true/false
> whether the value X exists anywhere in the header:

My bad! I was already thinking ahead where header_access implementation
is adjusted so that these directives can be applied precisely, to
individual header field values if needed. This requires other header
mangling changes that I would prefer to discuss sometime after this
project is completed.

> so with:
> Foo: a
> Foo: b
>
> If we configure a replacement of Foo with "xxx" when ACL
> "req_header_regex ^b$" is true both line 1 and line 2 will be replaced.
> But if configuring for "req_header_regex Foo ^a$" only line 1 will be
> replaced.

Yes, which is confusing and does not give the admin the right tool to do
surgical header work. We can do better.

Thank you,

Alex.
Received on Sat Jun 30 2012 - 14:28:08 MDT

This archive was generated by hypermail 2.2.0 : Sat Jun 30 2012 - 12:00:06 MDT