Re: %adapt::<last_h mess

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Mon, 14 Mar 2011 09:26:49 -0600

On 03/11/2011 05:52 PM, Amos Jeffries wrote:
> On 12/03/11 12:59, Alex Rousskov wrote:
>> Hello,
>>
>> All of the following is related to the %adapt::<last_h access.log
>> format code (possibly parameterized with the header or element name):
>>
>> * squid.conf says it is the "last ICAP header received", essentially.
>>
>> * LFT_ADAPTATION_LAST_HEADER takes ICAP headers from all ICAP messages
>> received, merges them (last header field wins if there are two ICAP
>> headers with a field that is named the same), and logs the matching ICAP
>> header field value.
>>
>> * LFT_ADAPTATION_LAST_HEADER_ELEM is the same as the
>> LFT_ADAPTATION_LAST_HEADER above, but logs the matching ICAP header
>> field value element.
>>
>> * LFT_ADAPTATION_LAST_ALL_HEADERS logs the last ICAP header received as
>> described in squid.conf.
>>
>>
>> In other words,
>>
>> * squid.conf and %adapt::<last_h without parameters are about logging
>> the last ICAP header received;
>>
>> * parameterized %adapt::<last_h merges all the received ICAP headers
>> (later headers overwrite earlier ones if there are name conflicts) and
>> logs the last matching header field value/element.
>>
>>
>> I suspect %adapt::<last_h without parameters is generally useless in
>> access.log due to the volume of mostly irrelevant information. When
>> somebody is using %adapt::<last_h in production, they probably log
>> specific header field values or elements.
>>
>> I know that folks use parameterized %adapt::<last_h to log various
>> transaction IDs and flags delivered by ICAP and eCAP services. For
>> example, an eCAP service may identify the detected Virus ID or an
>> authenticated user ID.
>>
>> In the cases I am familiar with, the merging performed by parameterized
>> %adapt::<last_h is the desired/correct behavior because those Squid
>> admins care about the ID value and not about which adaptation service
>> determined it. They cannot assume that the ID was determined by the last
>> service Squid talked to.
>>
>> However, our documentation describes a different behavior (no merging,
>> just use the last received ICAP header), and I would not be surprised if
>> there are users who use parameterized %adapt::<last_h while relying on
>> the documented behavior. These users probably have just one ICAP or eCAP
>> service or their last service always sets the header they log;
>> otherwise, they would be logging wrong values due to the current
>> implementation.
>>
>>
>> To resolve the above conflicts, we need to change documentation,
>> implementation, or both. I propose the following:
>>
>> 1. Add "%adapt::<merged_h" and document the currently implemented
>> merging algorithm where the last header wins conflicts but all headers
>> can play. Make sure the code implements it in all cases (parameterized
>> and not).
>>
>> 2. Deprecate "%icap::<last_h" and remove support for "%adapt::<last_h".
>> If somebody complaints, we can add "%adapt::<last_h" back, this time
>> with the right implementation.
>>
>> Meanwhile, "%icap::<last_h" will default to "%adapt::<merged_h" for
>> backward compatibility. Since no released version has
>> "%adapt::<merged_h" code yet (they have "%icap::<last_h"), now is a good
>> time for this step.
>
> No need for new names IMO.
> We have <last_h being documented as the specific "last received
> headers" and <h being a broader "the headers".

Reusing <h sounds good!

> IMO:
> * Change existing parameterized adapt::<last_h (merging) to be what
> happens with parameterized adapt::<h
> * Add new parameterized adapt::<last_h which only considers the last
> ICAP headers. As per the documented semantics of <last_h.
> * Implement non-parameterized adapt::<h to output the merged ICAP
> headers (somehow).

I am not sure I understand the first bullet correctly. I also think that
parameterized cases should just narrow down what is logged and not
change the semantics of what is logged, so we should not
consider/document them specially (except when reporting implementation
bugs). Does the end-result summary bellow match your recommendation?

1. adapt::<h logs all received headers merged into one. If two or more
responses containing header field(s) with the name NAME are received,
the header field(s) from the last received response containing header
field named NAME win, and all fields named NAME from the previously
received responses are discarded. Parameterized cases just narrow down
what is logged from those merged headers.

2. adapt::<last_h logs the header of the last received response.
Parameterized cases just narrow down what is logged from that response
header.

For example, if we receive the following response headers from service1:

X-Subscriber-ID: ID
X-Foo: foo1
X-Foo: foo2

and then the following response headers from service2:

X-Virus-ID: ECAR-test-signature
X-Foo: foo3
X-Foo: foo4

then adapt::<h becomes:

X-Subscriber-ID: ID
X-Virus-ID: ECAR-test-signature
X-Foo: foo3
X-Foo: foo4

and adapt::<last_h becomes:

X-Virus-ID: ECAR-test-signature
X-Foo: foo3
X-Foo: foo4

Thank you,

Alex.
Received on Mon Mar 14 2011 - 15:26:57 MDT

This archive was generated by hypermail 2.2.0 : Tue Mar 15 2011 - 12:00:03 MDT