Re: ICAP vectoring points from Steve Hill on 2012-11-29 (squid-dev)

From: Steve Hill <steve_at_opendium.com>
Date: Thu, 29 Nov 2012 13:12:32 +0000

On 29.11.12 11:14, Eliezer Croitoru wrote:

> And why you dont want to spoof th client IP?

The question is more: why do I want to spoof the IP?

There are 3 locations where these servers tend to be installed:
1. As "just another machine" on a LAN. There is a single interface
which connects to the LAN. The firewall that sits between the LAN and
the internet is configured to drop web traffic from anything that isn't
the proxy.
2. Between the LAN and the Internet. The server has 2 interfaces - one
connected to the LAN, one connected to the internet, it acts as both the
proxy and the internet firewall.
3. Between the LAN, internet and servers. The server has 3 interfaces,
LAN, internet and servers, it acts as proxy, internet firewall and a
router between the servers and the LAN.

(1) tends to be reasonably uncommon, but it does happen, often because
the customer has a separate firewall that they are happy with and just
want to add web filtering to their network; sometimes because the
physical layout of the network wouldn't allow anything else without
major work.

(2) is the most common setup.

(3) happens sometimes, but usually physical network layout, etc.
precludes it; often it is undesirable to put all the routing load
through the proxy server as well.

In all 3 configurations, the internet-bound traffic will be NATted
before it goes onto the internet anyway, so there is no benefit to it
from the spoofing behaviour.

In configurations (1) and (2), the servers are on the same network as
the workstations, so the spoofing behaviour would require the customer
to make routing changes to all their servers - this is a configuration
step that would be better avoided.

In configuration (3) the spoofing behaviour may be desirable since it
allows the servers to see the user's real IP address and doesn't require
further configuration changes to the servers.

So in the vast majority of the configurations I described, the spoofing
adds no benefit whatsoever, and is detrimental since it requires special
routing to be configured all over the place.

Whilst I agree that in a perfect situation, the network would be
configured to allow the spoofing to work in all situations, I think most
people in the real world will agree that they are rarely in a situation
where they can say "rip out the whole network and start again" and
instead have to retrofit servers to networks in less than perfect
configurations.

> If you do ask me such filtering solution is better left as a self
> maintained service on different server then the cache on a RealWorld
> scenario.
> Filtering is nice but in any case there is in the Real world the worst
> choice to do such filtering is on the same machine of the cache service.

This depends on the load being placed on the server. For heavy loads, I
agree, and ICAP does indeed allow the filtering load to be moved to a
separate machine (or cluster). However, this doesn't change the problem
described above - if the proxy server itself cannot be installed between
the workstations and all the web servers then the mandatory spoofing
behaviour is always going to be a lot of trouble, irrespective of where
the ICAP server lives.

> Actually the very basics of RESPMOD ICAP is to not be in front of a
> cache but rather after it.
> The other option is to use the no-store header for modified request
> which in almost any case shouldn't be cached and will fix any problem.

If the request has been modified then it is indeed trivial and sensible
to make it uncachable. There is, however, a problem:
1. A "highly privileged" user requests an object. Since they are highly
privileged, no filtering is done and the object is stored in the cache
2. Now, a "low priv" user requests the same object. Ordinarily, the
ICAP server would send them a "your request has been blocked" response.
However, since the object was already cached in (1), it is served
without even asking the ICAP server about it.

The object could be set to Vary based on the user to prevent the cache
object being retrieved, but that rather defeats the purpose of caching.
All of the filtering (for all privilege levels) could be done in one
go and appropriate headers inserted, but that would lead to a big
overhead in unnecessary filtering. As Amos suggested, all the privilege
levels could be figured out in an external ACL, a tag generated for that
and this used as the Vary header. All these things will work, but they
rather seem like the wrong way of doing it - to my mind, the correct
thing to do is to handle the object as it is being delivered to the user
instead of as it is being delivered to the cache.

-- 
  - Steve Hill
    Technical Director
    Opendium Limited     http://www.opendium.com
Direct contacts:
    Instant messager: xmpp:steve_at_opendium.com
    Email:            steve_at_opendium.com
    Phone:            sip:steve_at_opendium.com
Sales / enquiries contacts:
    Email:            sales_at_opendium.com
    Phone:            +44-844-9791439 / sip:sales_at_opendium.com
Support contacts:
    Email:            support_at_opendium.com
    Phone:            +44-844-4844916 / sip:support_at_opendium.com

Received on Thu Nov 29 2012 - 13:12:39 MST

This archive was generated by hypermail 2.2.0 : Fri Nov 30 2012 - 12:00:18 MST