Re: [squid-users] Distributed High Performance Squid from Amos Jeffries on 2009-08-19 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 20 Aug 2009 12:54:52 +1200

On Wed, 19 Aug 2009 16:58:33 -0700, "Joel Ebrahimi" <jebrahimi_at_bivio.net>
wrote:
> Hi,
>
> Im trying to build a high performance squid. The performance actually
> comes from the hardware without changes to the code base. I am a
> beginning user of squid so I figured I would ask the list for the
> best/different way of setting up this configuration.
>
> The architecture set up is like this: There are 12 cpu cores that each
> run an instance of squid, each of these 12 cores has access to the same
> disk space but not the same memory, each is its own instance of an OS
> and they can communicate on an internal network, there is a network
> processor that slices up sessions and can hand them off to any one of
> the 12 cores that is available, there is a single conf file and a single
> logging directory.
>
> The current problem I can see with this set up is that each of the 12
> instances of squid acts individually, therefore any one of them could
> try to access the same log file at the same time. Im not sure what
> impact this could cause with overwriting data.
>
> I actually have it set up this way now and it works well though it's a
> very small test environment and Im concerned issues may only pop up in
> larger environments when accessing the logs is very frequent.
>
> I was looking through some online materials and I saw there are other
> mechanisms for log formatting. The ones that I thought may be of use
> here are either the daemon or udp. There is actually a 13th core in the
> system that is used for management. I was wondering if setting up udp
> logging on this 13th core and having the 12 instances of squid send the
> log info over the internal network would work.
>
> Thought or better ideas? Problems with either of these scenarios?
>

You cannot share file resources or cache folders between multiple Squid
yet.

What you need to do is allocate a separate cache and logs for each
instance. There are also some other settings which need to be unique per
instance. They are all listed here:
http://wiki.squid-cache.org/MultipleInstances

For highest performance you want one of the instances to be a load-balancer
for the rest. Using CARP algorithm is the best available so far to reduce
resource usage (it ensures each URL is sort of 'pinned' to a back-end cache
to prevent duplicate storage of objects). You do not want this parent to do
any storage of objects outside if the in-transit memory for very 'hot'
ones.

* For best speed performance stick with 2.7 for now until 3.x is brought up
to a faster req/sec processing speed.

* Ignore any tutorials you find talking about tuning Squid 2.5 or older.
2.6+ are very different and the old info about tuning can be worse than the
current defaults.

* Use COSS for storage of small objects, and if you have a Linux use AUFS
for storage of large ones. Avoid memory storage of large objects (~MB+) in
Squid-2. Generally do not place more than one cache_dir per physical disk
spindle/drive. The exception there is COSS, which can share with one other.

* Avoid RAID for cache_dir disks/drives.

* Use the internal DNS resolver (default).

* Avoid regex like the plague.

* Avoid using helper apps since the queues add limits on the overall
request handling per-instance.

* Check your ACL sequences and types carefully. The wiki lists groups of
'fast' and 'slow' ACL ('fast' is just grouping, not a true speed indicator.
The 'slow' ones are slow though). Avoid the slow ones as much as possible.

The rest is ensuring you have fast disks, lots of RAM, and fine tuning the
various knobs until you get a good setup for your traffic load.

Amos
Received on Thu Aug 20 2009 - 00:54:57 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 20 2009 - 12:00:04 MDT