Re: %adapt::<last_h mess

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 15 Mar 2011 10:51:52 +1300

 On Mon, 14 Mar 2011 09:26:49 -0600, Alex Rousskov wrote:
> On 03/11/2011 05:52 PM, Amos Jeffries wrote:
>> On 12/03/11 12:59, Alex Rousskov wrote:
>>> Hello,
>>>
>>> All of the following is related to the %adapt::<last_h
>>> access.log
>>> format code (possibly parameterized with the header or element
>>> name):
>>>
>>> * squid.conf says it is the "last ICAP header received",
>>> essentially.
>>>
>>> * LFT_ADAPTATION_LAST_HEADER takes ICAP headers from all ICAP
>>> messages
>>> received, merges them (last header field wins if there are two ICAP
>>> headers with a field that is named the same), and logs the matching
>>> ICAP
>>> header field value.
>>>
>>> * LFT_ADAPTATION_LAST_HEADER_ELEM is the same as the
>>> LFT_ADAPTATION_LAST_HEADER above, but logs the matching ICAP header
>>> field value element.
>>>
>>> * LFT_ADAPTATION_LAST_ALL_HEADERS logs the last ICAP header
>>> received as
>>> described in squid.conf.
>>>
>>>
>>> In other words,
>>>
>>> * squid.conf and %adapt::<last_h without parameters are about
>>> logging
>>> the last ICAP header received;
>>>
>>> * parameterized %adapt::<last_h merges all the received ICAP
>>> headers
>>> (later headers overwrite earlier ones if there are name conflicts)
>>> and
>>> logs the last matching header field value/element.
>>>
>>>
>>> I suspect %adapt::<last_h without parameters is generally useless
>>> in
>>> access.log due to the volume of mostly irrelevant information. When
>>> somebody is using %adapt::<last_h in production, they probably log
>>> specific header field values or elements.
>>>
>>> I know that folks use parameterized %adapt::<last_h to log various
>>> transaction IDs and flags delivered by ICAP and eCAP services. For
>>> example, an eCAP service may identify the detected Virus ID or an
>>> authenticated user ID.
>>>
>>> In the cases I am familiar with, the merging performed by
>>> parameterized
>>> %adapt::<last_h is the desired/correct behavior because those Squid
>>> admins care about the ID value and not about which adaptation
>>> service
>>> determined it. They cannot assume that the ID was determined by the
>>> last
>>> service Squid talked to.
>>>
>>> However, our documentation describes a different behavior (no
>>> merging,
>>> just use the last received ICAP header), and I would not be
>>> surprised if
>>> there are users who use parameterized %adapt::<last_h while relying
>>> on
>>> the documented behavior. These users probably have just one ICAP or
>>> eCAP
>>> service or their last service always sets the header they log;
>>> otherwise, they would be logging wrong values due to the current
>>> implementation.
>>>
>>>
>>> To resolve the above conflicts, we need to change documentation,
>>> implementation, or both. I propose the following:
>>>
>>> 1. Add "%adapt::<merged_h" and document the currently implemented
>>> merging algorithm where the last header wins conflicts but all
>>> headers
>>> can play. Make sure the code implements it in all cases
>>> (parameterized
>>> and not).
>>>
>>> 2. Deprecate "%icap::<last_h" and remove support for
>>> "%adapt::<last_h".
>>> If somebody complaints, we can add "%adapt::<last_h" back, this
>>> time
>>> with the right implementation.
>>>
>>> Meanwhile, "%icap::<last_h" will default to "%adapt::<merged_h" for
>>> backward compatibility. Since no released version has
>>> "%adapt::<merged_h" code yet (they have "%icap::<last_h"), now is a
>>> good
>>> time for this step.
>>
>> No need for new names IMO.
>> We have <last_h being documented as the specific "last received
>> headers" and <h being a broader "the headers".
>
> Reusing <h sounds good!
>
>> IMO:
>> * Change existing parameterized adapt::<last_h (merging) to be what
>> happens with parameterized adapt::<h
>> * Add new parameterized adapt::<last_h which only considers the
>> last
>> ICAP headers. As per the documented semantics of <last_h.
>> * Implement non-parameterized adapt::<h to output the merged ICAP
>> headers (somehow).
>
> I am not sure I understand the first bullet correctly. I also think
> that
> parameterized cases should just narrow down what is logged and not
> change the semantics of what is logged, so we should not
> consider/document them specially (except when reporting
> implementation
> bugs). Does the end-result summary bellow match your recommendation?
>
> 1. adapt::<h logs all received headers merged into one. If two or
> more
> responses containing header field(s) with the name NAME are received,
> the header field(s) from the last received response containing header
> field named NAME win, and all fields named NAME from the previously
> received responses are discarded. Parameterized cases just narrow
> down
> what is logged from those merged headers.
>
> 2. adapt::<last_h logs the header of the last received response.
> Parameterized cases just narrow down what is logged from that
> response
> header.

 Yes. Exactly.

 I went a little further and looked at how to accomplish it. My bullets
 were kind of a step-by-step todo list ending in that state.

 What I meant was the current implementation of adapt::<{Foo}last_h
 already does what we actually want adapt::<{Foo}h to be doing.
 Just shuffling the logformat tags fixes that up as a first step.
 Followed by implementing the currently missing bits for each tag.

 Amos
Received on Mon Mar 14 2011 - 21:51:54 MDT

This archive was generated by hypermail 2.2.0 : Tue Mar 15 2011 - 12:00:03 MDT