Re: [PATCH] %>la for intercepted connections

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 31 Aug 2011 16:04:59 +1200

 On Tue, 30 Aug 2011 21:46:05 -0600, Alex Rousskov wrote:
> On 08/30/2011 09:00 PM, Amos Jeffries wrote:
>> On Tue, 30 Aug 2011 15:55:56 -0600, Alex Rousskov wrote:
>>> On 08/28/2011 01:10 PM, Amos Jeffries wrote:
>>>> On 29/08/11 06:39, Tsantilas Christos wrote:
>>>>> On 08/27/2011 08:03 PM, Amos Jeffries wrote:
>>>>>> On 28/08/11 02:50, Tsantilas Christos wrote:
>>>>>>> %>la for intercepted connections
>>>>>>>
>>>>>>> This patch adjusts the %>la logformat code handling for
>>>>>>> intercepted
>>>>>>> connections
>>>>>>> based on the following rules:
>>>>>>> - If the corresponding http_port or https_port option has an
>>>>>>> explicit
>>>>>>> listening host name or IP address, then log the IP address.
>>>>>>> - Otherwise, log a dash character.
>>>>>>>
>>>>>>> Also adjusts %>lp logformat code handling for intercepted
>>>>>>> connections to
>>>>>>> always
>>>>>>> log the port number from the corresponding http_port or
>>>>>>> https_port
>>>>>>> option.
>>>>>>
>>>>>> +1. Looks fine.
>>>>>>
>>>>>> Amos
>>>>>
>>>>> I will commit this patch to trunk if there is not any objection.
>>>>>
>>>>>
>>>>> PS. I forgot to mention that this is a Measurement Factory
>>>>> project.
>>>>
>>>>
>>>> This whole thing itches a worry in the back of my mind. Updating
>>>> the
>>>> release notes about %>la creation today makes me realize what it
>>>> is.
>>>>
>>>> We are using ">" on tags to indicate incoming things,
>>>
>>> I do not think that part is accurate. I will try to provide a
>>> better
>>> definition below.
>>>
>>>> usually state
>>>> shared with the clients view of the world. This change makes the
>>>> tag
>>>> loose that overlap with the clients world view on intercepted
>>>> traffic.
>>>>
>>>> What do you think about resurrecting %la / %lp for this data
>>>> instead?
>>>
>>> I think ">" is the right choice here because we are logging the
>>> Squid
>>> address where the client has connected to:
>>>
>>> ">" means information related to the client-Squid connection
>>> "<" means information related to the Squid-server connection
>>>
>>
>> Yes. And lack of it appears to be consistently representing squid
>> view
>> of something regardless of whether it was client or server.
>>
>> ... Such as the config port a transaction came through. ie "%la"
>>
>>> "l" means information related to the Squid side of a connection
>>
>> and _that_ is what this patch breaks. Or rather obfuscates for
>> intercepted traffic.
>>
>>>
>>> Thus,
>>>
>>> ">l" means information related to the Squid side of a client-Squid
>>> connection, and that is what we want to log.
>>>
>>
>> Which worries me. I agreed to it earlier on grounds that is was
>> squid
>> outward view of the connection. But taking a closer look at the
>> concepts
>> and documentation vs the patch the misgivings comes back.
>>
>> The patch changes meaning of that definition from "local address" to
>> "listening address".
>
> Yes, for intercepted connections. Listening address is a local
> address.
>
>
>> "local address" ("the Squid side of a client-Squid connection") at
>> the
>> connection/TCP/IP level is what al->tcpClient contains right now,
>> before
>> patching. The actual real client->Squid connections IP:port.
>
> If we are to go into these low-level details, one could argue that
> there
> is no actual/real client-Squid connection at all because the client
> does
> not think it is talking to Squid.
>
>
>> Meaning our definition for the "l" is a bit wrong here.
>>
>> Consider there are two FD involved with each connection and how we
>> handle those.
>> FD 1 is listening, it has la of ::, and lp of 3129. no remote.
>> FD 2 is a connection received on that. It has local=10.0.0.1:80
>> remote=192.168.0.52:123
>>
>> FD 3 is listening, it has la of 192.168.0.1, and lp of 3128. no
>> remote.
>> FD 4 is a connection received on that. It has
>> local=192.168.0.1:3128
>> remote=192.168.0.52:456
>>
>> now the details as you describe:
>>
>>> ">" means information related to the client-Squid connection
>>
>> ... AIUI that would be FD 2 and FD 4.
>>
>>> "l" means information related to the Squid side of a connection
>>
>> ... AIUI that would be from FD 4 : 192.168.0.1 (>la) and 3128 (>lp)
>>
>> BUT you want FD 3 local and FD 4 remote to log here. Why not also
>> log FD
>> 1 local and FD 2 remote on their line? they are the same "the Squid
>> side
>> of a client-Squid connection" by that definition.
>
> I do not fully understand your specific examples. I see no relevant
> differences between FD1-2 and FD3-4 groups, and I do not understand
> how
> a single connection can have four Squid descriptors associated with
> it.

 That was two connections. I find the fact that you could not tell them
 apart indicative of the problem we are discussing.

 Pair 1+2 was "http_port 3129 intercept" and connection arriving there.
 Pair 3+4 was "http_port 3128" and connection arriving there.

>
>> My goal here is consistency and clarity of individual tokens. These
>> are
>> about to be used to dynamically generate redirected URLs in
>> deny_info
>> and error page texts.
>>
>>
>> I suggested %la / %lp since they seem more fuzzy on where the
>> details
>> comes from without > or < claims. Seems a perfect fit for local
>> squid
>> view of something equally fuzzy. Along the lines of how we use %un
>> for
>> "any username we can find" as opposed to the specific sources.
>> AND they have the extra benefit of previously being used to log the
>> config IP:port by older Squid (under the conditions you want to make
>> >la
>> do so). Reviving them with this more consistent definitive content
>> would
>> technically just be a policy change on their removal. Keeping the
>> policy
>> decision to _move_ origdst over to >la, leaving cases like Linux
>> DNAT
>> where both have valid non-identical details.
>>
>> The alternative that occurs to me is our recent use of %S_ where "S"
>> means Squid. Also a perfect fit by the definitions. But not as
>> easily
>> backward compatible.
>
>
> I believe that since connection is intercepted, it is in the gray
> area
> and many conflicting things will be "kind of true" about it.
>
> If you insist on %la, and Christos is fine with that, let's add %la
> that
> does what Christos implemented for %>la and also log a dash for %>la
> when the connection is intercepted.
>
> While the above adds more work, what is critical for me, based on
> user
> requests, is that a single logformat option records actual Squid
> address
> for non-intercepted connections and specified Squid http_port address
> for intercepted connections.
>
> My understanding is that such functionality is needed in environments
> where Squid handles regular and intercepted requests on multiple
> http_ports and where billing and similar needs require the knowledge
> of
> the port handling each transaction.

 Seems weird design to bill on squid listening port rather than client
 IP. Smells like a system that insists on "http_access allow all" at the
 top of the config as well.

 Amos
Received on Wed Aug 31 2011 - 04:05:03 MDT

This archive was generated by hypermail 2.2.0 : Wed Aug 31 2011 - 12:00:03 MDT