Re: [squid-users] How to use tcp_outgoing_address with cache_peer from Amos Jeffries on 2013-04-26 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 26 Apr 2013 21:31:25 +1200

On 26/04/2013 8:37 p.m., Alex Domoradov wrote:
> First of all - thanks for your help.
>
>> Problem #1: Please upgrade your Squid.
>>
>> Squid-2.6 has been 3 years since the last security update, nearly 5 years
>> since your particular version was superceded.
> ok, I will update to the latest version
>
>> On 24/04/2013 12:15 a.m., Alex Domoradov wrote:
>>> Hello all, I encountered the problem with configuration 2 squids. I
>>> have the following scheme -
>>>
>>> http://i.piccy.info/i7/0ecd5cb8276b78975a791c0e5f55ae60/4-57-1543/57409208/squids_schema.jpg
>
>> Problem #2: Please read the section on how RAID0 interacts with Squid ...
>> http://wiki.squid-cache.org/SquidFaq/RAID
>>
>> Also, since youa re using SSD, see #1. The older Squid like 2.6 push
>> *everything* through disk which reduces your SSD lifetime a lot. Please
>> upgrade to a current release (3.2 or 3.3 today) which try to avoid disk a
>> lot more in general and offer cache types like rock for even better I/O
>> savings on small responses.
> ok. The main reason why I choose raid0 is to get necessary disk space ~400 Gb.

It does not work the way you seem to think. 2x 200GB cache_dir entries
have just as much space as 1x 400GB. Using two cache_dir allows Squid to
balance teh I/O loading on teh disks while simultaenously removing all
processing overheads from RAID.

>>> acl AMAZON dstdom_regex -i (.*)s3\.amazonaws\.com
>>> cache_peer_access 192.168.220.2 allow AMAZON
>>>
>>> acl RACKSPACE dstdom_regex -i (.*)rackcdn\.com
>>> cache_peer_access 192.168.220.2 allow RACKSPACE
>>
>> FYI: these dstdom_regex look like they can be far more efficiently replaced
>> by dstdomain ACLs and even combined into one ACL name.
> Did you mean something like
>
> acl RACKSPACE dstdomain .rackcdn.com
> cache_peer_access 192.168.220.2 allow RACKSPACE
>
> Would be any REALLY speed improvements? I know that regexp is more "heavy"

Yes I meant exactly that.

And yes it does run faster. The regex is required to do a forward-first
byte-wise pattern match with extra complexity around whether what its up
to is part of the wildcard .* at the front or not. dstdomain does a
reverse string comparison. In this case never more than 3 block-wise
comparisons, usually only 2.
It comes out to some few percentage points faster on something even as
simple as this change.

>
>>> url_rewrite_program /usr/bin/squidguard
>>> url_rewrite_children 32
>>>
>>> cache_dir null /tmp
>>> cache_store_log none
>>> cache deny all
>>>
>>> acl local_net src 192.168.0.0/16
>>> http_access allow local_net
>>>
>>> *** parent_squid squid.conf ***
>>>
>>> http_port 192.168.220.2:3128
>>> acl main_squid src 192.168.220.1
>>>
>>> http_access allow main_squid
>>> http_access allow manager localhost
>>> http_access allow manager main_squid
>>>
>>> icp_access allow main_squid
>>>
>>> cache_mem 30 GB
>>> maximum_object_size_in_memory 128 MB
>>> cache_dir aufs /squid 400000 16 256
>>> minimum_object_size 16384 KB
>>> maximum_object_size 1024 MB
>>> cache_swap_low 93
>>> cache_swap_high 98
>>
>> The numbers here look a little strange. Why the high minimum object size?
> I have made some investigation and got the following information
>
> The number of projects for 2013
> # svn export http://svn.example.net/2013/03/ ./
> # find . -maxdepth 1 -type d -name "*" | wc -l
> 1481
>
> The number of psd files
> # find . -type f -name "*.psd" | wc -l
> 8680
>
> Total amount of all psd files
> # find . -type f -name '*.psd' -exec ls -l {} \; | awk '{ print $5}' |
> awk '{s+=$0} END {print s/1073741824" Gb"}'
> 116.6 Gb
>
> The number of psd files less than 10 Mb
> # find . -type f -name '*.psd' -size +10000k -size -100000k | wc -l
> 915
>
> The number of psd files between 10-50 Mb
> # find . -type f -name '*.psd' -size +10000k -size -50000k | wc -l
> 5799
>
> The number of psd files between 50-100 Mb
> # find . -type f -name '*.psd' -size +50000k -size -100000k | wc -l
> 1799
>
> The number of psd files larger than 100 Mb
> # find . -type f -name '*.psd' -size +50000k -size -100000k | wc -l
> 167
>
> So as you can see ~87% of all psd files 10-100 Mb

Overlooking the point that yoru minimum is 16MB, and maximum is 1024MB
(a rerun with those numbers would be worth it).

The total size of *all* the .psd files is relevant to what I focus on,
which is smaller than your available disk space. So limiting on size is
counterproductive at this point. You are in a position to cache 100% of
the .psd files and gain from re-use of even small ones.

Also, the .zip files you are also caching matter. I expect that will
make my point invalid. But you should know their parameters like the
above experiment as well.

>
>>> refresh_pattern \.psd$ 2592000 100 2592000 override-lastmod
>>> override-expire ignore-reload ignore-no-cache
>>> refresh_pattern \.zip$ 2592000 100 2592000 override-lastmod
>>> override-expire ignore-reload ignore-no-cache
>>>
>>> All work fine, until I uncomment on main_squid the following line
>>>
>>> tcp_outgoing_address yyy.yyy.yyy.239
>>>
>>> When I try to download any zip file from amazon I see the following
>>> message in cache.log
>>>
>>> 2013/04/22 01:00:41| TCP connection to 192.168.220.2/3128 failed
>>>
>>> If I run tcpdump on yyy.yyy.yyy.239 I see that main_squid trying to
>>> connect to parent via external interface without success.
>>>
>>> So my question. How may I configure main_squid that it could connect
>>> to the parent even with configured
>>> tcp_outgoing_address option?
>>
>> #3 The failure is in TCP. Probably your firewall settings forbidding
>> yyy.yyy.yyy.239 from talking to 192.168.220.2.
> No, there are no forbidding from firewall. As I describe above, all
> packets with src ip yyy.yyy.yyy.239 go through table ISP2 that looks
> like
>
> # ip ro sh table ISP2
> yyy.yyy.yyy.0/24 dev bond1.2000 scope link src yyy.yyy.yyy.239
> default via yyy.yyy.yyy.254 dev bond1.2000

This is the routing selection. Not the firewall permissions rules.

> and that's a problem. I see the following packets on my external interface
>
> # tcpdump -nnpi bond1.2000 port 3128
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on bond1.3013, link-type EN10MB (Ethernet), capture size 65535 bytes
> 13:19:43.808422 IP yyy.yyy.yyy.239.36541 > 192.168.220.2.3128: Flags
> [S], seq 794807972, win 14600, options [mss 1460,sackOK,TS val
> 3376000672 ecr 0,nop,wscale 7], length 0
> 13:19:44.807904 IP yyy.yyy.yyy.239.36541 > 192.168.220.2.3128: Flags
> [S], seq 794807972, win 14600, options [mss 1460,sackOK,TS val
> 3376001672 ecr 0,nop,wscale 7], length 0
> 13:19:46.807904 IP yyy.yyy.yyy.239.36541 > 192.168.220.2.3128: Flags
> [S], seq 794807972, win 14600, options [mss 1460,sackOK,TS val
> 3376003672 ecr 0,nop,wscale 7], length 0
>
> So as I understand connection to my parent go through table ISP2
> (because tcp_outgoing_address set src ip for the packets to the
> yyy.yyy.yyy.239) and external interface bond1.2000 when I expected
> that it would be established via internal interface bond0.

The golden question then is whether you see those packets arriving on
the parent machine?
And what happens to them there.

Amos
Received on Fri Apr 26 2013 - 09:31:41 MDT

This archive was generated by hypermail 2.2.0 : Fri Apr 26 2013 - 12:00:04 MDT