Re: [squid-users] What is decent/good squid performance and architecture

From: Robert Borkowski <rborkows@dont-contact.us>
Date: Wed, 06 Jul 2005 09:37:19 -0400

Chris Robertson wrote:
>>-----Original Message-----
>>From: jos houtman [mailto:jos@hyves.nl]
>>Sent: Saturday, July 02, 2005 3:08 PM
>>To: squid-users@squid-cache.org
>>Subject: [squid-users] What is decent/good squid performance and
>>architecture
>>
>>
>>hello list,
>>
>>Iam running a website and have setup 3 squidservers as reverse proxy's
>>to handle the images on the website.
>>And before I try to tweak even more i am wondering what is considered
>>good performance in requests/min.
>>
>>some basic stats to get an idea:
>>- only images files are servers
>>- avarage size 40KB
>>- possible number of files somewhere between 10 and 15 million (and
>>growing).
>>- the variaty of files thats accessed? ...
>>I got these stats from a squid servers thats running for 2/3 days now.
>>Internal Data Structures:
>> 2024476 StoreEntries
>> 146737 StoreEntries with MemObjects
>> 146721 Hot Object Cache Items
>> 2000067 on-disk objects
>>
>>Is it safe to assume that the number of images actually accessed is
>>about 2million?
>>
>
>
> That is a fairly safe assumption (give or take a few thousand). I love this list. Some of the service requirements just make me gawk. 10-15 million images...
>
>
>>on our dual xeon with 4GB ram sata disk servers i can get about 250
>>hits/seconds
>>on our dual xeon 8 GB scsi server i can get about 550 hits/seconds
>>are these decent numbers?

550 hits/second * 40KB average object size * 3 squids = 515 Mbps
Make sure you have enough upstream bandwidth before worrying about
further performance. Even at 250 hits/second you'd be close to
saturating 100BaseT on each squid box (If that's what you're using).

>>i'am running aufs on the 8GB server, and diskd on the other servers.
>>does that contribute to the big difference or is it mainly the memory
>>and disk speed.
>>
>
>
> Given just the information above (and assuming that the OS and number of cache disks are the same between servers), I would guess that it's just a function of memory and disk speed (more objects cached in RAM, faster access to those not cached).
>
> In any case, http://www.squid-cache.org/mail-archive/squid-users/200505/0974.html is an example of 700 hits per second. No hardware specifics in the email. There is a patch for squid to use epoll on linux that at least one person had a good experience with http://www.squid-cache.org/mail-archive/squid-users/200504/0422.html.
>
> Here's an email from Kinkie (one of the Squid Devs if I'm not mistaken) describing 500 hits/sec on a Pentium IV 3.2GHz w/2GB RAM as "not really too bad." He also has a HowTo set up describing running multiple instances of Squid on a single box: http://squidwiki.kinkie.it/squidwiki/MultipleInstances. If you are running out of CPU on one processor (Squid doesn't take full advantage of Multi-CPU installations), this might be something to look into.
>
>
>>I think that the variaty of files accessed by the clients is getting to
>>big (especially during peak hours) for the squid servers to cache
>>efficiently. And i am hoping that its possible to distribute the variaty
>>over the squid servers. So that during normal operations eachs squid
>>servers would only have to serve a third of the 2 million files.
>>Do you have some good idea's about how to achieve this?
>>Is there a way to have some kind of distribution based on the url?
>>Iam hoping this is possible without rewriting the webapplication
>>and so that a failure of 1 servers would go unnoticed for the public.
>>
>
>
> One method would be to set the cache servers up as cache-peers using the proxy-only option. The message at http://www.squid-cache.org/mail-archive/squid-users/200506/0175.html is all about clustering squids for internet caching, but it does imply that ICP peering should work just fine up to 8 servers. If you want to limit what each squid caches based on hierarchy, a combination of urlpath_regex acls and the no_cache directive are capable. No promises on what that will do to performance.
>
> For more explicit suggestions it would help to know how your caches are set up currently (separate IPs w/RR DNS? Using a HW load balancer? Software cluster?).
>

Another method would be CARP. I haven't used it myself, but it's used to
split the load between peers based on URL. Basically a hash based load
balancing algorithm.

If you have a load balancer with packet inspection capabilities you can
also direct traffic that way. On F5 BigIPs the facility is called
iRules. I'm pretty sure NetScaler can do that too.

-- 
Robert Borkowski
Received on Wed Jul 06 2005 - 07:37:28 MDT

This archive was generated by hypermail pre-2.1.9 : Mon Aug 01 2005 - 12:00:02 MDT