Re: Helpers and Notes

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 27 Jan 2013 18:12:06 +1300

On 27/01/2013 8:27 a.m., Tsantilas Christos wrote:
> On 01/26/2013 06:05 AM, Amos Jeffries wrote:
>> On 26/01/2013 10:39 a.m., Tsantilas Christos wrote:
>>> Hi all,
>>>
>>> While we are working in a project we seen that the helpers do wrong
>>> usage of the Notes interface.
>>>
>>> This is maybe it is our mistake (and mostly my mistake) because we do
>>> not have make good documentation of this interface.
>>> In any case I believe that we should fix some thinks.
>>>
>>> In the Notes.h file there are the following classes:
>>>
>>> class Note
>>> -----------
>>>
>>> This is a class used to store a note configuration not a note it self.
>>> It is a note name plus a list of possible values with an ACL list in
>>> each value.
>>> It is used to store configurations for the following cfg lines:
>>> note key value acl1 acl2
>>>
>>>
>>> class Notes
>>> ------------
>>> It is used to store a list of Notes. A list like the following:
>>> note key valueA acl1a acl2a
>>> note key valueB acl1b acl2b
>>>
>>> This is used to add custom notes (or metaHeaders) in an HTTP request
>> With the ACL list being optional and the word "note" being dropped by
>> the config parser what is actually stored is:
>>
>> NotesList{
>> Note{key,value,NULL},
>> Note{key,value,NULL}
>> ...
>> }
> Well, not actually. What it is stored is the following:
> NoteList {
> Note {key, Values{{value1, NULL}, NULL} }
> Note {key, Values{{value1, NULL}, NULL} }
> ...
> }
>
> A Note keep a note name plus a list of values/ACLs. The Acls used to
> select the correct value for a Note.
> It is designed to store configuration not a note pair.
>
> Moreover to get the Value of a Note a new method added the
> Notes::firstValue() to retrieve the first value of the list. It is not
> so cool. It is a little confusing...
>
>> ... which is clearly a list of kv-pairs without any specific information
>> about delimiter, display semantics, or input source location. Nowhere
>> in ths implementation or documentation is it clear that this list
>> structure is *only* ever to be used for parsing the config file. Rather
>> the simplicity of the information stored implies strongly that it is
>> created as re-usable storage structure.
>> ***As was requested when you were planning the Notes feature.***
> Of course it can be used to store kv-pairs. What I am saying is that the
> NotePairs class is better structure to keep this list.
>
>
>>> class NotePairs
>>> -----------------
>>> This is an HttpHeader kid class used to store the notes (or
>>> MetaHeaders). We select HttpHeader class because it has some interesting
>>> features. For example assume that two different subsystems add the same
>>> note:
>>> X-SQUID-ProcessedBy: ReWriter
>>> X-SQUID-ProcessedBy: Adaptation
>>>
>>> Then this is will be merged to
>>> X-SQUID-ProcessedBy: ReWriter, Adaptation
>> Yes NotePairs is clearly and specifically a Mime syntax version of
>> kv-pairs.
>>
>> The interesting and useful fetures are *only* useful when outputting
>> kv-pairs as ICAP or HTTP headers in Mime format. Those features are not
>> useful in any other context Squid uses kv-pairs. ... which is why the
>> helpers avoid that type until the final logging stage where it has no
>> choice.
> So we are agree that the NotePairs class is more useful to keep kv-pairs
> at least for the ICAP/HTTP headers. But this is does not mean that it
> can not be used for storing helpers kv-pairs.
> It is just better to use the same structures for storing similar
> information. It is better to just use NotePairs for storing notes,
> because it can handle more cases (kv-pairs, ICAP/eCAP metaheaders and
> maybe other type of notes in the future)
>
> Also storing all notes in the same object maybe it is good policy
> because they can be used in other than the logging cases.
>
> Currently all notes merged just before logging because they are used
> only here.
>
>>> Squid helper Notes problems
>>> -------------------------------
>>> The changes made in trunk revno:12490 and revno:12495
>>>
>>> Problems I am seeing:
>>> - it uses class Notes to store the notes/metaHeaders not the NotePairs.
>>> It adds the "Notes HelperReply:notes" and the "Notes
>>> HttpRequest::helperNotes" members.
>> This is intentional. see above.
>>
>> The first implementation attempt used a NotePairs list for helpers. The
>> exact same behaviour of merging keys together which make NotePair
>> wonderful for ICAP header display and logging display make this NotePair
>> type difficult to use as a kv-pair list for helpers and any other use
>> where HTTP field-value syntax is needed.
> Why?
>
>> The other difficulty was the whole or at least a large segment of the
>> HTTP parser becomes a dependency, which was not worth the cost and
>> effort for the simplicity of NotePairs. This was probably not noticable
>> when integrating with ICAP due to it already having that dependency in
>> place for ICAP header parsing.
> You do not have to use any HTTP parser. You can just add a kv-pair (a
> key/value pair), simply using the NotePairs::putExt(key, value).
> Why the Notes::add(key, value) is better?
>
>
>> The need for multiple de-references ( A->value[X].value()->value ) when
>> accessing NotesList is accepted as a cost (for now) in exchange for the
>> benefit of *not* merging key values together and not having to link in
>> the HTTP parser.
> I did not found such cases in the existing code.
> What I am seeing is the following usage:
> Note::Pointer statusNote = reply.notes.find("status");
> ...
> const char * result = statusNote->firstValue();
> status = (http_status) atoi(result)
> ...
> urlNote = reply.notes.find("rewrite-url");
> if (urlNote != NULL && strcmp(urlNote->firstValue(), http->uri)
> ....

Which requires two local Pointers to access the value.
firstValue() is the accessor Alex had me add to hide the worst part of
the de-referencing.

Without it we would have:

   reply.notes.find("status")->values[0]->value.termedBuf()

where it is preferrable to go:
    reply.notes.find("status")->firstValue();

>
> The above if we use NotePairs instead of Notes can be easily written as
> follows:
> String status = reply.notes.getByName("status");
> status = (http_status) atoi(status.termedBuf())

Your example works for numeric values because ',' is not a valid digit
and atoi() will abort before parsing the already parsed values.

It gets worse when handling URLs or any other value type which coudl
ontain whitespace or ',' values internally:

1)
  helper reply: OK url=http://example.com/ url=http://invalid.invalid/

translates to a single URL. String uri = reply.notes.getByName("uri");
will produce URL "http://example.com/, http://invalid.invalid/" which
the lenient URL parser (default behaviour) will see as
"http://example.com/,%20http://invalid.invalid/".

2)
   helper reply: OK user="J, Smith" group="group1,A" group=group2
user=alias group=A

  String group --> "group1,A, group2, group3" ... how does one
determine that a group match is supposed to be equal to "group1,A" and
not match "group1" or "A" individually, BUT must match "group2" or "B"
individually?

String user ---> "J, Smith, alias" ... what username is this? it won'
match any, and splitting on ',' will also likey not match any. But "J,
Smith" could match.

3)
   helper reply: OK tag=1 tag=something tag=magic

Will produce a single tag String "1, something, magic". What does the
tag ACL type do?

Notice that all these are parsing the strings a second time just to
interpret the way NotePairs has merged (using a memory copy probably)
into a single String.

Things like this is where the "desired" behaviour of NotesPairs key
merging is more of a problem to be worked around than a good thing.

I hope you can see that these are *not* desired behaviours. But they are
an integral part of NotePairs due to its implementation using objects
optimized for Mime header semantics. NotesList is not quite right either
but offers cleaner access suitable for all these use cases.

If you know of an interface which can store and use the kv-pair from
helpers in a way which the helpers kv-pairs need to be accessed I would
be happy to audit the patch.

> ...
> status urlNote = reply.notes.getByName("rewrite-url");
> if (!urlNote.empty() && strcmp(urlNote->termedBuf(), http->uri)
>
> Moreover we can add a member in NotePairs the
> const char *NotePairs::find(const char *key)
> to simplify it more.

We could. Notice that this is already present in NotesList.

> Also we have an interface in configured Notes to deny user to use some
> names for notes which are reserved. We can add here the helper reserved
> names like "status", "rewrite-url" etc.

They are not forbidden for use. They are in fact *expected* for the
helper to use. The only significance a reserved key has in the helper
protocol is that Squid performs special processing for them. With side
effects that may or may not be wanted, but are up to the users choice.

> Maybe we should require by the user to use always the X- prefix in his
> kv-pairs keys. For example the "status" is a key which can be selected
> by anyone. The user is better to use the "X-Status" as kv-pair key.

Which would make x-status the reserved key name and status key name mean
nothing. Which brings us back to the problem of reserving X-Status ...

... "user" is not what you seem to think. "user" in this case is an
automated helper. The only possible limits on key name are that it may
not contain whitespace, CR, LF, \0, or '=' characters. *anything* else
confirming to the key=["]value["] syntax is acceptible.

X-* Mime tokens are nothing special and due to Notes use in ICAP headers
X-* is a potential overlap with reserved key names. For example;
possibly an external_acl_type helper may be wanted to produce
X-Client-Username:foo for relay to ICAP dynamically.
The helpers interface defines '_' suffix as the reserved marker for
custom local key names on all helper interfaces. This is intentionally
specific to the helper interface and intentionally invalid Mime header
character to prevent the above cases unexpectedly adding protocol
injection vulnerabilities from helper supplied keys. Crossover between
notes and ICAP/HTTP in future could still do injection if desirable and
valid header names used as keys, but explicit use of _ for custom
prevents mistakes causing problems.

>
>> If you are solving problems in the Notes structure systems please
>> resolve that dereference issue.
>>
>>> - It adds some extra methods which does not make sense and confuse more
>>> the Notes interface:
>>> * NotePairs::append(const Notes::NotesList &src)
>>> * NotePairs::append(const NotePairs *src)
>>> These functions used to add a Notes list to a NotePairs List. The
>>> HttpRequest::helperNotes to AccessLogEntry::notes.
>> By 'add' you mean 'append list S to this list'. Which is why it is
>> called 'append()'. Where is the problem? naming, semantics and operation
>> conform to standard conventions.
> The problem is the semantics here. A Notes list holds many informations
> which there is not any way to add them in a NotePairs list. It holds a
> list of "key plus a value/ACL list", not just a key/value pair list.
> Only that.

Then it is not a NotesList it is a NotesPlusOtherList. Possibly this is
part of the confusion. I was under the impression that we defined a list
that could store annotations adde to the transaction.

However, I see no problem with using Note without ACL member set as a
storage type for transaction annotations.

>
>> The first above converts a NotesList into a NotesPairs list.
>> Specifically because the NotePairs list is hard-coded to utilize the
>> MIME format key-value pairs with colon (:) delimiter - which in Squid
>> are stored in HttpHeader class types.
> The HttpHeader holds a list of HttpHeaderEntry which consist my a name
> and a value. It is a key/value pair.

Yes. Well it is a key:value-list set. Which if I am understanding you
right is better stored in NotesList since NotesList stores multiple
values per key name .... or not.

HttpHeader is a sub-type of kv-pair with protocol-specific semantics
built into its accessor tools (such as thet values list merging you
liked for ICAP usage).
NotesList is a sub-type of kv-pair with generic access semantics built
into its accessors.
  You see why I chose the one with simpler more-generic API for storing
generic kv-pairs?

The NotesList implementation could be polished for better access, but
that does not make it the wrong structure to use for storing a generic
kv-pair set.

>
>> The second is added such that any existing AccessLogEntry list can be
>> appended to by other sources producing NotePairs lists (ie RESPMOD).
>>
>>> But It is easier to use NotePairs for HttpRequest::helperNotes member
>>> and just use the (pre-)existing NotePairs::append (the
>>> HttpHeaders::append)
>> Causing the need to covert a key=value string pair into an HttpHeader
>> object. Causing the need to define said HttpHeader as a custom header
>> and fill out its value set. All of which adds HTTP protocol operations
>> (and validity checks) for the sake of parsing the very non-HTTP input
>> "OK foo=hello".
> If the HttpHeader is not a good structure to base the NotePairs we
> should use a different structure. But I still believe that the cost
> adding a kv-pair is not so huge...
>
> Also HttpHeader/HttpHeaderEntry uses mempools, which is a plus.

All the Note* objects should as well. This is something else to look at
upgrading in them.

>
>> Not to mention that "foo_=hello" (_ being defined for use by custom key
>> names) is an invalid HTTP header name rejected by the HttpHeader parser
>> infrastructure, and I have no intention of altering that HTTP-specific
>> parser (potentially affecting on-wire HTTP validity checks) for the sake
>> of unrelated code that does not even use HTTP or Mime syntax.
> The HTTP headers parser will not be used. The foo_ will be used as key
> name without any problem.
> But even if there is problem using "_" in the kv-pair keys and it is so
> important to use it in key names we should rewrite the NotePairs class
> to support it.

AFAIK, all of the accessor helper functions for HttpHeader object type
are used by some part of the HTTP parser. Their operational semantics
are defined by RFC 2616 and the HTTPbis update Drafts.
Using NotePairs as children of HttpHeader means either duplicating the
accessors in a slightly different way, or altering these core pieces of
the HTTP protocol parser. Neither of which is a good choice.

This is not a permanent fixed relationship though. The HttpHeader could
be implemented as a child of NotePair with specialized accessors. Making
a kvPair generic object type for kv-pairs, which is used for all
key-values in use by Squid.

>
>>
>>> * Notes::add(const String &noteKey, const String &noteValue);
>>> * Notes::add(const Notes &src);
>>> * Notes::find(const String &noteKey)
>>> To add and search for a note in Notes List. But it is wrong. We do
>>> not store such Notes here. We are storing Notes configuration, while
>>> parsing the cfg file.
>> So we have a linked list of key-value pairs writen with an generic
>> interface which is not allowed to be used anywhere outside of Parsing?
>> That goes against the design requirements I requested back at the
>> beginning. That Notes and NotesList be a generic list of kv-pairs we
>> could add to arbitrarily and output with various format delimiters.
> Again read my comments above. The Note does not keep only a key/value
> pair. Of course it can be used but it is not so normal, it is designed
> to store other information ...

Not *only* no. However, the ACL field is optional so does not matter.
Conceptually it is equivalent and has a better generic accessor set with
less unwanted behavours.

>
>> We have four desired uses for NotesList in Squid:
>>
>> 1) [A-Z]key ':' ' ' value - I/O for ICAP in Mime header format
>> 2) key '=' value - I/O for ICAP or HTTP in HTTP header field-value syntax
>> 3) key '=' value - helper I/O syntax
>> 4) key ' ' value - squid.conf 'notes' directive syntax
>>
>> NP: (1) has limitations on key character set, and optional (desired 1
>> SP) whitespace after the ':' delimiter. The others do not.
> Nor the ":" nor the space stored in HttpHeader list, only the Header
> name and the header value.
>
>>
>>> We should use NotePairs for HttpRequest::helperNotes and
>>> HelperReply::notes members and use the NotePairs::putExt or
>>> NotePairs::addEntry(new HttpHeaderEntry(HDR_OTHER, name, value)); to
>>> adding note/value pairs.
>>>
>>> - Initially we had select the AccessLogEntry to store notes. The
>>> AccessLogEntry selected because it considered as good structure to hold
>>> HTTP request X-data. (But we had problem on this).
>>> I am not sure if the HttpRequest::helperNotes is the correct position to
>>> add Notes. We should consider if we can add notes to
>>> AccessLogEntry::notes, or select an other X-data structure.
>> HttpRequest is currently abused as a transport mechanism to relay the
>> notes through to AccessLogEntry, which is not directly accessible from
>> the helper handlers.
>> If ALE can be made accessible to both helpers and ACL match system in a
>> useful way the HttpRequest member can be removed.
> We need to select which structure will/can play the role of the X-data.
> Of course I agree we can hold all required information in HttpRequest
> object.
>
>> NP: adding straight to the ALE list of NotePairs adds an extra problem,
>> that the kv-pair is now *copied* into Mime syntax. Once the note has
>> been converted to Mime header syntax it becomes much more difficult to
>> process the list and match a single kv-pair out of a set of identical
>> ones. ACLs for example do not have access to the ALE object to test
>> particular key values and even if they did they can only retrieve the
>> entire list of all values associated with each key name.
>>
>>> I am working on changing the above problems. Also maybe we need to
>>> change the names of some classes to avoid confusion in the future, for
>>> example rename Note and Notes to NoteRule and NoteRules or similar.
>>>
>>> Any opinions on the above?
>> When writing the helper interface use of Notes the problem was only that
>> Notes* classes were hard-coded for a specific use-case output.
>>
>> We had long discussions during the design phase and auditing about using
>> Notes as both header format "Key:value" and field format "key=value" - I
>> was under the impression that you had taken those usage requirements on
>> board and designed Note class which clearly takes a Key and Value
>> separately (without any additional details like ACLs!).
> We may had select wrong name for Note and Notes class. But this is the
> only problem. Please see my comments above, and my first mail of this
> thread which explains the classes.
>
>
>>
>> Other than the link with ACLs, class Note is fine for use as a generic
>> kv-pair node.
>> Other than your documentation of it, class NotesList is reasonably okay
>> for use as a generic list of Note kv-pair. One problem is the use of
>> std::list<Node::Pointer> which causes a double-indirection layer when
>> trying to access any node position of the list.
>> For example: list->values[0]->value().keyValue() instead of just
>> list[0]->keyValue().
>>
>>
>> To resolve what you see as an API abuse "problem" I think all that is
>> needed:
>> 1) make Note the base type representing just a kv-pair
>> 2) make a sub-class associating ACLs as a config specific sub-type of it.
>> 2) make a "Packer" for NotesList that outputs in each of the earlier
>> mentioned syntax formats.
>>
>>
>>
>> To resolve the indirection problems in NotesList I propose adding a
>> template to Squid sources which presents the *next pointer and
>> linked-list accessors and iterators. Such that any class can inherit
>> from it to be a node in a linked-list.
>> We can then unify all the classes and structures which are performing
>> their own linked-list implementations into inheritence from this template.
>> - NotesList would then be a class inheriting from ListNode<> and Note.
>>
>> Sound reasonable?
> Nope.
> I still believe that the best is something like the following:
> - Rename Note and Notes to NoteRule and NoteRules, to avoid confusion
> in the future.

To be clearly a config-specific object type it needs to have Config or
Cfg clearly in its naming.

However for what they store this would be a bit excessive. Why define a
global NotesList type if it were 100% specific to storing a directives
configuration details. Define a global instance and a NotesConfig which
stores a list of directive NotePlusAclList objects one for each config
file line.

> - Maybe rename NotePairs to Notes, for the same reason.
> - Use NotePairs for HttpReply::notes and HttpRequest::helperNotes members
> - If the HttpHeaders is not a good choice as base class of NotePairs we
> should re-implement the NotePairs to fit our needs. But I still believe
> that it is not so bad ...

It would possibly not be bad if it were re-implemented.

Amos
Received on Sun Jan 27 2013 - 05:12:18 MST

This archive was generated by hypermail 2.2.0 : Sun Jan 27 2013 - 12:00:14 MST