Re: [PATCH] Add request_header_add option and [request|reply]_header_* manglers fixes

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 30 Jun 2012 14:45:49 +1200

On 30/06/2012 3:53 a.m., Alex Rousskov wrote:
> On 06/29/2012 01:44 AM, Amos Jeffries wrote:
>
>> Can you please make this use the more flexible (logformat) tokens
>> from format/libformat.la instead? I'm trying to kill off the older
>> token sets and this is adding yet another format tokenizer to keep
>> track of.
> I do share the desire to unify macro parsing and substitution in various
> Squid places (logging, error pages, helpers, and various squid.conf
> options with string values). It will give us a lot more configuration
> power and better code. However, I think that should not be placed as a
> requirement on this feature because:
>
> a) We already have several different pieces of code that do macro
> substitution or implement a similar functionality. It would be best to
> have a dedicated project that will unify all that code, taking into
> consideration all the different code requirements we already have. This
> will be a difficult, sizable project, focusing on internal code
> structure rather than user-facing functionality.

So why add a 5th one? Particulary, a 5th with distinct set of tokens.

There is a dedicated project to unify the tokenizers. It produced
libformat.la. I'm just asking for that project not to be undermined by
new code. We are already going to have to create special
backward-compatible parsing for ssl_crtd and external_acl_type formats.

I *don't* mind, a _temporary_ new tokeniser using the libformat token
set (for the cert details) as long as there are serious plans to update
it to libformat without loading me with more work.

>
> b) Unifying various macro substitution pieces is likely to require
> adjusting options syntax and possibly renaming some of the existing
> macros. Discussing the best ways to do that will take time (especially
> since those syntax discussions are not tied to any specific function),
> and the final changes are likely to go way beyond this patch scope.

Nope.
Just an array of token-X maps to enum-Y for each unique token set inside
libformat.

Config dumps then auto-upgrade to the new sahared token format. And old
configs get warnings about changed tokens.
Naturally I would like to minimize the number of poeple who install new
feature then immediately get told its obsolete AND the number of token
map arrays that have to be created.

>
> c) It would be easier to port this new feature to older releases (I
> believe we even have a patch for v3.0) when it does not depend on newer
> unified substitution/formatting code. Once backported, the new code in
> trunk (and v3.2?) can go its own way, merging with other formatting
> code, etc.
>
> d) IIRC, the formatting code you ask us to reuse may not have access to
> some of the information we need to substitute these macros (because we
> currently do not log that information). In other words, the existing API
> cannot be reused as is and may require significant modifications on its
> own before it can be fully reused.
>
> Would it be OK to require those unification changes as a follow-up project?

The main objection from me is the UI tokens.

I would be happy with a temporary tokenizer provided:
  * is used the same logformat token *names* where an existing one
matches the content (eg %SRC ---> %[tcp::]>a)
  * tokens for SSL used a namespaced format consistent with the
documented logformat tokens. Example:" %ssl::>{field}cert " - when
libformat does not (currently) provide a matching token already. In a
namespaced form consistent with the other documented logformat tokens.
But why are these needed immedately in the first round of header_add
patch anyway?

NP: from your example of how to locate the SSL context and details given
an HttpRequest I see no reason why a simple patch to add %ssl::
namespace to libformat arrays is not trivially possible. Which would be
very useful for logging as well.

>
>
>> EXTRA NOTE: along these grounds I was hoping to convert:
>> X_header_replace Foo some random new header contents
>>
>> into:
>> format <name> <some[ header contents format tokens]>
>> X_header_replace Foo format=<name> [options] [acl acl ...]
>>
>> (and for backward compatibility absence of the "format=" option in the
>> first token meaning old style random header text content and no ACLs.
> Why? The proposed
>
> x_header_action <name> <value> [acl ...]
>
> seems a lot more "natural" to me. The "format" approach makes sense
> where the same named format is used multiple times in the same config. I
> doubt that would be common for custom header values.

When the same output is needed for several different ACL lines:

   x_header_action Foo "Hello World" authedusers
   x_header_action Foo "Hello World" localnet

>
> Long term, we should support regular variables so that naming and
> reusing a commonly used string becomes possible (but, again, I doubt
> these header mangling options would use such format variables a lot).
>

With "format=" we have two internal config RefCount pointers, and by
retaining the input line as a key we could retain format per-tokenized
across reconfiguration for HotConf without having to re-parse the format.
Regular variables would have to be made to work in the same way as
logformat directive currently does in order to provide the same sort of
functionality.

>
>
>> src/cf.data.pre:
>>
>> * Please change:
>> "
>> If ACLs of a "deny" rule match, the header + field is removed (unless
>> its value is replaced by the + corresponding request_header_replace option)
>> "
>> with:
>> "
>> If ACLs of a "deny" rule match, the header is removed and
>> request_header_replace is then checked to identify if the removed header
>> has a replacement.
>> "
> This is not what happens though, on two levels:
>
> 1) If there is a replacement, the entire header is not removed but the
> current field value is replaced. This may not matter much in most cases,
> but the in-place replacement may have an effect on the final order of
> the header fields.
>
> 2) If there is no replacement, the entire header is not removed. Only
> the current header field is removed. There may be multiple header fields
> with the same header name.

Um, Will have to check but I think this violates the RFCs clauses about
mutiple headers combining before processing.

The admin did not configure a particular line of the header was to be
blocked/replaced. But that the whole Foo header was.
Also the ACL testing can;t be used to identify the second Foo header
value if we happen to be manglign the first. It just matches true/false
whether the value X exists anywhere in the header:

so with:
  Foo: a
  Foo: b

If we configure a replacement of Foo with "xxx" when ACL
"req_header_regex ^b$" is true both line 1 and line 2 will be replaced.
But if configuring for "req_header_regex Foo ^a$" only line 1 will be
replaced.

>
> Also, the alternative wording does not say what happens when "the
> removed header has a replacement".
>
> Would the following text address your concerns?
>
> If ACLs of a "deny" rule match, the header field is removed (unless
> its value is replaced by the value from the corresponding
> request_header_replace option).
>
>
> Or the following version, perhaps?
>
> If ACLs of a "deny" rule match, the header field is either
> removed or replaced, depending on whether there is a
> corresponding request_header_replace option.
>
> This is further complicated by the fact that the above text is in the
> middle of another enumeration so we should keep this description one
> sentence long.
>
>
> The current functionality (unchanged by this patch) does not handle
> headers with multiple fields well. Eventually, we may want to redo the
> whole set of header manipulation options to make it more intuitive and
> flexible/correct (and applied both pre- and post-cache) but that is
> subject for another thread, after we are done with this feature!
>

Yes. Okay I'm happy with that wording. Just enough lack of clarity to
leave the broken case for fixing later.

>> src/structs.h:
>> * wrapping again with <list>, <map>, <string>
>>
>> * please make a new header file for the new class HeaderManglers and
>> HeadersWithAcl. We are trying to remove structs.h file.
> Ideally, the mangling code itself should be isolated. One of the reasons
> we did not do it is to keep porting simple: adding those files to the
> right set of Makefile.am blobs across Squid versions could be very
> difficult in my experience. Perhaps this change should be done as a
> separate step, after the code is committed (to provide an easier spin
> off point for porting)?

New files is not a issue with porting. Only new directories if they are
created in branch B searate from a port that uses them. (eg. src/ssl/
creation order causing me issues blocking some useful ssl-bump fixes
from 3.1).

Amos
Received on Sat Jun 30 2012 - 02:46:04 MDT

This archive was generated by hypermail 2.2.0 : Sat Jun 30 2012 - 12:00:06 MDT