Re: [PATCH] Do not send unretriable requests on reused pinned connections

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sat, 01 Dec 2012 16:27:16 +1300

On 1/12/2012 1:31 p.m., Henrik Nordström wrote:
> fre 2012-11-30 klockan 15:30 -0700 skrev Alex Rousskov:
>
>> Squid is sending POST requests on reused pinned connections, and
>> some of those requests fail due to a pconn race, with no possibility for
>> a retry.
> Yes... and we have to for NTLM, TPROXY and friends or they get in a bit
> of trouble from connection state mismatch.
>
> If sending the request fails we should propagate this to the client by
> resetting the client connection and let the client retry.

It seems to me we are also forced to do this for ssl-bump connections.
  * Opening a new connection is the wrong thing to do for server-first
bumped connections, where the new connection MAY go to a completely
different server than the one whose certificate was bumped with. We
control the IP:port we connect to, but we cannot control IP-level load
balancers existence.
  * client-first bumped connections do not face the lag, *BUT* there is
no way to identify them at forwarding time separately from server-first
bumped.
  * we are pretending to be a dumb relay - which offers the ironclad
guarantee that the server at the other end is a single TCP endpoint (DNS
uncertainty is only on the initial setup. Once connected packets reach
*an* endpoint they all do or the connection dies).

We can control the outgoing IP:port details, but have no control over
the existence of IP-level load balancers which can screw with the
destination server underneath us. Gambling on the destination not
changing on an HTTPS outbound when retrying for intercepted traffic will
re-opening at least two CVE issues 3.2 is supposed to be immune to
(CVE-2009-0801 and CVE-2009-3555).

Races are also still very possible on server-bumped connections if for
any reason it takes longer to receive+parse+adapt+reparse the client
request than the server wants to wait for. Remember we have all the slow
trickle arrival of headers, parsing, adaptation, helpers and access
controls to work though before it gets to use the pinned server conn.
For example Squid is extremely likely to lose closure races on a mobile
network when some big event is on that everyone has to
google/twitter/facebook about while every request gets bumped and sent
through an ICAP filter (BBC at the London Olympics).

>
>> When using SslBump, the HTTP request is always forwarded using a server
>> connection "pinned" to the HTTP client connection. Squid does not reuse
>> a persistent connection from the idle pconn pool for bumped client
>> requests.
> Ok.
>
>> Squid uses the dedicated pinned server connection instead.
>> This bypasses pconn race controls even though Squid may be essentially
>> reusing an idle HTTP connection and, hence, may experience the same kind
>> of race conditions.
> Yes..
>
>> However, connections that were just pinned, without sending any
>> requests, are not "essentially reused idle pconns" so we must be careful
>> to allow unretriable requests on freshly pinned connections.
> ?

A straight usage counter is deftinitely the wrong thing to use to
control this whether or not you agree with us that re-trying outbound
connections is safe after guaranteeing teh clietn (with encryption
certificate no less) that a single destinatio has been setup. What is
needed is a suitable length idle timeout and a close handler.
  Both of which for bumped connections should trigger un-pinning and
abort the client connection. If the timouts are not being set on
server-bump pinned connections then that is the bug and needs to be
fixed ASAP.

The issue is not that the conn was used then pooled versus pinned. The
issue is that async period between last and current packet on the socket
- we have no way to identify if the duration between has caused problems
(crtd, adaptation or ACL lag might be enough to die from some race with
NAT timeouts). Whether that past use was the SSL exchange (server-bump
only) or a previous HTTP data packet. I agree this is just as much true
on bumped connections which were pinned at some unknown time earlier as
it is for connections pulled out of a shared pool and last used some
unknown time earlier. Regardless of how the persistence was done they
*are* essentially reused idle persistent connections. All the same
risks/problems, but whether retry or alternative connection setup is
possible differs greatly between the traffic types - with intercepted
traffic (of any source) the re-try is more dangerous than informing the
client with an aborted connection.

>
>> The same logic applies to pinned connection outside SslBump.
> Which it quite likely the wrong thing to do. See above.
>
> Regards
> Henrik
>

Amos
Received on Sat Dec 01 2012 - 03:27:32 MST

This archive was generated by hypermail 2.2.0 : Sat Dec 01 2012 - 12:00:32 MST