Re: [squid-users] Multiple uplinks for different traffic types not working as intended: Help needed

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 03 Jun 2012 17:58:59 +1200

On 3/06/2012 9:33 a.m., Marcel Meckel wrote:
> Hi,
>
> i'm trying to achieve some sort of multi-uplink caching-solution
> for a company office with 3 uplinks of different speed.
>
> Squid 3.1.6 on Debian Squeeze.
>
> Simplified network-topology looks like this:
>
>
> |proxy1| |proxy2| |proxy3| ----|Uplink1| slow, fixed IP
> | | | /
> ---------------------------- ----
> | Switch |---|GW|---|Uplink2| cellular, dyn IP
> ---------------------------- ----
> | | | \
> |client1| |client2| |proxy0| ----|Uplink3| fast, dyn IP
>
>
> \----------- Company LAN -----------/
>
> In reality there are more switches and way more users involved
> and Uplink3 is not there yet but coming in the next days because
> bandwith is an issue.
>
> GW box is NAT'ing the LAN and does *policy* routing:
>
> Source-IP == proxy1? -> Use Uplink1
> Source-IP == proxy2? -> Use Uplink2
> Source-IP == proxy3? -> Use Uplink3
>
> This means, if proxy1 does a DIRECT to contact Origin servers,
> connection is going out on Uplink1. Proxy2 is routed to Uplink2 etc.
>
>
> The requirement reads like this:
>
> a) When clients use proxy0
> 1. company domain .example.com on the internet should be
> reached via fixed IP (UL1)
> 2. .otherstuff.tld is to be fetched over UL2.
> 3. .youtube.com and .github.com is to be fetched over UL3.
> 4. Other traffic should use UL3.
> 5. If an Uplink is down any other Uplink should be used.
> b) When clients use proxy1 all stuff is to be fetched over UL1
> c) When clients use proxy2 all stuff is to be fetched over UL2
> d) When clients use proxy3 all stuff is to be fetched over UL3
> e) Cache objects should not be stored on multiple servers
>
> In general users will use proxy0 as their proxy server. some developers
> sometimes have to test things on the internet with different client IP
> addresses so they are allowed to use e.g. proxy3 to get out with a
> dynamic client IP.
>
> My config so far looks like this:
>
> on proxy1:
>
> cache_peer proxy2.local sibling 8080 3130 proxy-only
> cache_peer proxy3.local sibling 8080 3130 proxy-only
>
> on proxy2:
> on proxy3:
>
> accordingly

Add "htcp" option to better satisfy (e). The default ICP is a bit old
and does not cope well with HTTP/1.1 object variants.

>
> on proxy0:
>
> cache_peer proxy1.local parent 8080 3130 no-query proxy-only
> cache_peer proxy2.local parent 8080 3130 no-query proxy-only
> cache_peer proxy3.local parent 8080 3130 no-query proxy-only default
>
> never_direct allow all

This never_direct forbids (4) and (5) from working. proxy0 must never go
"DIRECT" over any uplink. All traffic MUST go via one of the proxy1/2/3
peers.

>
> cache_peer_domain proxy1.local .example.com
> cache_peer_domain proxy2.local .otherstuff.tld
> cache_peer_domain proxy3.local .youtube.com .github.com
>
>
> a) 1-3 works
> a) 4 doesn't work:
>
> on proxy0:
>
> Failed to select source for 'http://www.google.com/'
> always_direct = 0
> never_direct = 1
> timedout = 0
>
> a) 5 doesn't work. as soon as e.g. proxy3 does down, proxy0 complains
> that it can't connect to an intermediate proxy server.
> This is expected with the current config.

See the comment about never_direct. "cache_peer_domain" also says what
domains a peer is allowed to service, only those go to the peer. Use
cache_peer_access with ACL logics for anything more complex than a
policy of "these domains are allowed to go here, nothing else".

Omitting both cache_peer_access and cache_peer_domain the default is to
allow requests to use the peer.

What you need to make (4) and (5) work is cache_peer_access deny rules
for the peers where those domains are NOT allowed to go. Leaving other
traffic unspecified for those peers.

>
> b)-e) works
>
> So here are my questions:
>
> What do i have to change to make the default parent work?

cache_peer_domain says only *.youtube.com and *.github.com may be passed
to that peer. Google is not mentioned and therefore not permitted.

> Is a) 5 possible with squid? adding round-robin to all peers
> in proxy0's config didn't change anything. when proxy3 is down
> www.youtube.com can't be accessed when client uses proxy0.

cache_peer_domain overrides round-robin or any other selection algorithm
by explicitly specifying which domains can/can't go there.

For example, if you have two peers with cache_peer_domain saying
.youtube.com. And round-robin on all three peers. YT would be
round-robin only between the two it was allowed to go to.

>
> Besides these 2 things is there anything you would do completely
> different?
>
> Is the no-query option on proxy0's cache_peers ok?
>
> #
> # Simpler Solution?
> #
>
> I guess in this scenario i could also replace all 4 proxy servers
> with only one squid server with 3 different IP addresses and select
> tcp_outgoing_address according Origin domain names.
> The gateway would then choose the uplink according to squids
> outgoing ip address.

Indeed, MUCH simpler and faster. The thing to watch out for is whether
doing that will limit your client traffic. If you are needing more
uplinks simply to handle that you may not be able to merge the proxies
without creating a bottleneck problem.

Its a bit confusing why you need such specific and different domain
uplink logics in proxy1/2/3 and 0. If you can avoid that and make each
proxy have the same tcp_outging_address selection, you avoid the
bottleneck problems of one proxy and also the need for multiple
listening addresses. All traffic can go to one of the proxies and the
uplinks get used as needed.

NP: the configuration you have right now appears to be the model
designed for use with proxy1/2/3 one each on the Internet side of
uplinks 1/2/3 and proxy0 on the LAN side. Load balancing of proxy0 to
its parents has the side effect of *connection*-balancing over the
links, but decision happens at layer 7 (HTTP) instead of layer 3 (IP).

>
> To solve b)-d) one could make squid listen on 3 additional ports
> and choose tcp_outgoint_address according to acl myport, right?

It could. But consider carefully why you need to determine uplink usage
by *proxy* used. If it is really needed at all, can it be based better
on client src address or something instead of the TCP port they happen
to use?

Basing it solely on the receiving port opens you to security problems
with people finding the ports that work, and avoiding the limits.
Preventing that requires additional firewall security to ensure specific
TCP connections can't be made, and adds a lot of complexity to your
switching fabric, firewalls, and proxy configs. Simply so you can load
balance uplinks its far more trouble than its worth.

>
> Mhh, maybe i should give this all-in-one approach a try.
>
> I checked the FAQ and Wiki but coudn't find this scenario.

Load sharing multiple connections is a IP-layer feature of the kernel.
Squid operates a few levels higher up the network stack. There are load
balancing, uplink bonding, multi-homing solutions possibly available on
your systems kernel already that work MUCH better than Squid when it
comes to load-balancing. At best Squid performs connection-balancing,
which is only properly balancing for TCP SYN packets. The bulk of TCP is
in the data packets and thus can leave uplink usage looking *very*
unbalanced from a traffic management perspetive.

Amos
Received on Sun Jun 03 2012 - 05:59:11 MDT

This archive was generated by hypermail 2.2.0 : Sun Jun 03 2012 - 12:00:02 MDT