Re: filtering HTTPS/CONNECT (summary and continuation of discussion) from Marcus Kool on 2012-03-16 (squid-dev)

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Fri, 16 Mar 2012 18:05:58 -0300

There were 4 threads about 'filtering HTTPS' and I will try to
summarise here.

Current situation with Squid 3.1.19:
What happens inside a CONNECT is practically not filterable because
1) sslBump is not used, or
2) sslBump is used and SSL+HTTP can be filtered, but it breaks the
    other data streams for Skype et al. Using the unsafe options
    'sslproxy_cert_error allow all' and 'sslproxy_flags DONT_VERIFY_PEER'
    to circumvent the latter problem are far from desirable.

The wiki features pages say that Alex Rousskov is working on BumpSslServerFirst
and MimicSslServerCert but unfortunately Alex has not (yet) participated in the
discussion.

What I consider as the "desired situation":
*all* traffic will be filterable, since if there is an exception for
one category of data, one can write an application that makes a tunnel
using this particular category of data and hence is able to circumvent
all efforts to filter traffic.

To filter HTTP is trivial. To filter HTTPS there are two options:
1) to filter without sslBump and then the filter only receives
    "CONNECT <endpoint>:443" on which it has to make a decision to block
    or not. This cripples the filter since it does not has access to the
    content and in many cases can not detect which application sends
    what (type of) data.
    An additional drawback is that connection can be blocked but an
    understandable error message cannot be presented to the end user.
2) use sslBump. The filter will receive "CONNECT <endpoint>:443" as well as
    "https://endpoint/path" (and content for RESPMOD) for SSL+HTTP based
    connections so this is optimal for filtering SSL+HTTP connections.
    The discussion was much around what to do with data streams that are not
    SSL+HTTP. This can be any protocol encapsulated by SSL or simply any
    protocol.

To be able to filter all data, Squid needs a modification to present raw data
about the non-SSL+HTTP data streams to a filter (URL redirector or ICAP).
To keep the discussion focussed on one type of filter I will assume that
an ICAP server is used as the filter.

The ICAP protocol has a considerable overhead (CPU processing) and extending
the ICAP protocol for data stream filtering is not the first choice.
Amos and Henrik were "optimistic" about implementing a new pipe filter.

The data streams for a bidirectional pipe have a different behavior than
HTTP and SSL+HTTP. Both client and server can send data at any time. And
for some, the server initiates the protocol and for others, the client
initiates. OpenVPN is a chameleon and can pretend to be an SSL+HTTP server
but is also a VPN server.

In all cases that Squid sends a request to a filter, it would be
a *big* plus if it informs the filter what it already knows about the
CONNECT endpoint. E.g. If it has SSL/TLS or not.

Since sslBump is being rewritten for 3.3 it is a good opportunity
to make Squid suitable for filtering *all* data streams.

The new sslBump flow could be something like this:

A) open socket to server. If error, close socket to client.
B) do the logic for ICAP REQMOD CONNECT endpoint:443
C) start SSL handshake to server and take care of all certificate issues.
    If the SSL handshake fails with a PROTOCOL error, the socket must be closed,
    a new socket must be opened, and Squid will assume that the endpoint
    uses an other protocol than SSL. Squid goes into tunnel mode and all
    filtering will be done by the new pipe filter.
    Squid may get a new option to define its behaviour in case the SSL handshake
    fails. The options could be called sslBumpForNoneSSL with values
    prohibitNoneSSL (terminate connection), passNoneSSL (always allow),
    filterNoneSSL (default value - let new pipe filter decide).
D) Squid now knows that the connection has a SSL/TLS wrapper but does not know
    yet if inside the wrapper HTTP is used.
    Squid monitors what the client *and* the server send on the pipe. If the
    client sends first and sends a valid HTTP command, Squid assumes that the
    connection has SSL+HTTP.
    If there is no SSL+HTTP Squid goes into tunnel mode and all filtering will be
    done with the new pipe filter.
E) do the "normal processing" and ICAP REQMOD/RESPMOD for https://endpoint/path

The total work of Squid+filter can be reduced if B) is done after C) since
Squid can inform the filter about the SSL handshake and the filter does
not have to do its own probe.

There was a suggestion for a connection cache which allows it to skip checks
and make assumptions about a new CONNECT to an endpoint that was CONNECTed before.

The new pipe filter requires a new protocol yet to be defined.
Squid initially tells the filter what it already knows about the endpoint.
I.e. uses SSL or not, time to CONNECT, endpoint address, cached information.
The Squid pipe sends copies of all data to the filter and the filter can reply
with one of the following: OK (proceed with this data), REPLACE-CONTENT (content
and a flag to optionally also terminate the connection), TERMINATE (just close
sockets), OK-FOR-ALL (proceed and do not consult me any more for this connection).
Squid also informs the filter when the connection is terminated by the
client or the server.

How do we go on from here?
Received on Fri Mar 16 2012 - 21:06:04 MDT

This archive was generated by hypermail 2.2.0 : Sat Mar 17 2012 - 12:00:10 MDT