Re: [squid-users] Large ACL's list, the ways to distribute squid caches, asking pro-users for advice. from Amos Jeffries on 2008-01-22 (squid-users)

From: Amos Jeffries <squid3@dont-contact.us>
Date: Wed, 23 Jan 2008 10:12:38 +1300 (NZDT)

> Awesome thanks! There is no words after that superb reply.
>
>> Wow. wow. This takes the prize for the year I think.
>> Do you have any hair or sanity left after working with that?
> This prize shall keeping by previos admins I think... ). I'm only try to
> soft reorganization of the IT-infrastructure. My question was coming
> because of I new with squid, but have expirience in planning and
> optimizing network services.
>
> > *Any* regexp, give huge performance downgrade. There are better
>> alternatives in most situations and some juggling to reduce the hit in
>> others where its needed.
>>
> Agree with you.
>
>> If you want to help out the community could you please record your
>> current speed/hit statistics (squidclient mgr:info) before starting any
>> of this.
> No problem. Here it is.
>
> System (CentOS 5.0):
> Linux xxx.ru 2.6.18-8.el5xen #1 SMP Thu Mar 15 19:56:43 EDT 2007 x86_64
> x86_64 x86_64 GNU/Linux
>
> ===============
> Squid Object Cache: Version 2.6.STABLE6
> Start Time: Sun, 09 Dec 2007 14:16:30 GMT
> Current Time: Tue, 22 Jan 2008 11:25:43 GMT
> Connection information for squid:
> Number of clients accessing cache: 0
> Number of HTTP requests received: 32657579
> Number of ICP messages received: 0
> Number of ICP messages sent: 0
> Number of queued ICP replies: 0
> Request failure ratio: 0.01
> Average HTTP requests per minute since start: 516.8
> Average ICP messages per minute since start: 0.0
> Select loop called: 647156210 times, 5.858 ms avg
> Cache information for squid:
> Request Hit Ratios: 5min: 0.0%, 60min: 0.0%
> Byte Hit Ratios: 5min: 7.5%, 60min: 5.0%
> Request Memory Hit Ratios: 5min: 0.0%, 60min: 0.0%
> Request Disk Hit Ratios: 5min: 0.0%, 60min: 0.0%
> Storage Swap size: 0 KB
> Storage Mem size: 160 KB
> Mean Object Size: 0.00 KB
> Requests given to unlinkd: 0
> Median Service Times (seconds) 5 min 60 min:
> HTTP Requests (All): 0.30459 0.30459
> Cache Misses: 0.35832 0.35832
> Cache Hits: 0.00000 0.00000
> Near Hits: 0.00000 0.00000
> Not-Modified Replies: 0.00000 0.00000
> DNS Lookups: 0.14261 0.09971
> ICP Queries: 0.00000 0.00000
> Resource usage for squid:
> UP Time: 3791353.110 seconds
> CPU Time: 641947.363 seconds
> CPU Usage: 16.93%
> CPU Usage, 5 minute avg: 65.86%
> CPU Usage, 60 minute avg: 61.17%
> Process Data Segment Size via sbrk(): 71548 KB
> Maximum Resident Size: 0 KB
> Page faults with physical i/o: 3
> Memory usage for squid via mallinfo():
> Total space in arena: 71548 KB
> Ordinary blocks: 35487 KB 10778 blks
> Small blocks: 0 KB 0 blks
> Holding blocks: 356 KB 1 blks
> Free Small blocks: 0 KB
> Free Ordinary blocks: 36060 KB
> Total in use: 35843 KB 50%
> Total free: 36060 KB 50%
> Total size: 71904 KB
> Memory accounted for:
> Total accounted: 9469 KB
> memPoolAlloc calls: 4089880809
> memPoolFree calls: 4089815358
> File descriptor usage for squid:
> Maximum number of file descriptors: 1024
> Largest file desc currently in use: 841
> Number of file desc currently in use: 723
> Files queued for open: 0
> Available number of file descriptors: 301
> Reserved number of file descriptors: 100
> Store Disk files open: 0
> IO loop method: epoll
> Internal Data Structures:
> 48 StoreEntries
> 48 StoreEntries with MemObjects
> 26 Hot Object Cache Items
> 0 on-disk objects
>
> =============
> This stats is not showing "real" load because of New Year's holidays,
> when nobody worked :).

Ah well. We'll just have to keep an eye on req/min 'since start' and see
the improvement. The CPU usage is rather high too at 68% with no users.

>
>> First,
>> Are you running a squid 2.6 stable 18? That release has the fastest
>> squid code out so far.
>
> It's Stable6, but shall be upgraded sooner.
>
> Saying about other advices, I'd thought morning about absolutely no
> needed this HUGE acl lists for every user in database and how can I take
> it out. Your solutions show powerful expirience to solve it.
>
> I decide to take revision of squid configuration and rewrite sources of
> accounting system to prevent it reconfiguring Squid (this system is
> another opensource project of third people).
> Also we shall use mysql_auth or self-developed helper for authentication
> puproses.
>
> Saying about caching: we have trouble with hard disk subsystem load on
> this channel and not so good hitrate because of wide surfing habits of
> student from different countries.

Should not be an issue on a Xeon. Do you have RAID? If so, only RAID10
plays nicely with squid, the others double the disk acces times for
everything.

> I think we can turn on caching when we shall setup additional caching
> servers and turn on CARP.

If its the browsing habits causing this, you will likely find the same
problem regardless of the number of servers. Just dividing the speed
trouble to a wider base of HDD where its less noticable.

>
> What do you think about creating 1-2Gb cache-directory in RAM? Think It
> was impressive latency down (from 7-8ms on HDD to 80-100ns in RAM) and
> bandwidth up (4.6GBps instead of 80MBps). Servers have good power backup
> systems and can save it, or nothing can prevent put the system in
> hibernate mode.

For speed, I don't think there is anything faster than a RAM-cache.
Squid is always using a RAM cache, yours is just currently set to
extremely small (8MB).

If you have the 1 GB free RAM now, just set this:
cache_dir null /null
cache_mem 1024 MB

And kill the 'no_cache deny all'

> --------------------------------------
> Have a good day,
>
> Serg Androsov.
>
> P.S. It's current squid.conf which coming with Accounting system. There
> is without ACL's
> ======================================
> http_port 172.16.3.1:8080
> hierarchy_stoplist cgi-bin ?

This only affects communication with other caches (ie the CARP bit).

>
> acl QUERY urlpath_regex cgi-bin \?
> no_cache deny QUERY

After your upgrade to stable 18, this can die. Just add the
refresh_patterns I mention below. The lines above are a legacy of when
dynamic pages were not cache-friendly.

>
> cache_mem 8 MB
> cache_swap_low 90
> cache_swap_high 95
>
> cache_dir null /null
> cache_store_log none

no: store_log none

> maximum_object_size 1024 KB
> maximum_object_size_in_memory 8 KB

Using a RAM-cache those two above need to be the same value.

> log_ip_on_direct on
> client_netmask 255.255.255.255
> ftp_user squid@bsu.edu.ru
> ftp_list_width 64
> ftp_passive on
> ftp_sanitycheck on
> auth_param basic program /usr/lib64/squid/ncsa_auth
> /usr/local/sacc/etc/ncsa_passwd
> auth_param basic children 30
> auth_param basic realm SAcc internet proxy server
> auth_param basic credentialsttl 2 hours
> refresh_pattern ^ftp: 1440 20% 10080
> refresh_pattern ^gopher: 1440 0% 1440

Right here add:
refresh_pattern cgi-bin 0 0% 0
refresh_pattern \? 0 0% 0

> refresh_pattern . 0 20% 4320
> quick_abort_min 16 KB
> quick_abort_max 16 KB
> quick_abort_pct 95
> range_offset_limit 0 KB
>
> forwarded_for on
>
> # time ALC's
> acl night time SMTWHFA 00:00-07:00
> acl time1700 time SMTWHFA 17:00-23:59
> acl time1900 time SMTWHFA 19:00-23:59
> acl time2000 time SMTWHFA 20:00-23:59
> acl time2100 time SMTWHFA 21:00-23:59
> acl time0900 time SMTWHFA 08:00-18:00
>
> acl music urlpath_regex -i \.mp3 \.avi \.mpg \.mpeg

Aha. Try:

acl music rep_mime_type -i ^audio/ ^video/

http_reply_access deny music
http_reply_access allow all

That will also catch hidden media items, and only test actual replies
instead of all the non-auth'd requests.

>
> acl all src 0.0.0.0/0.0.0.0
> acl manager proto cache_object
> acl localhost src 127.0.0.1/255.255.255.255
> acl admins src 172.16.1.0/24 172.16.3.1/32
>
> acl to_localhost dst 127.0.0.0/8
> acl SSL_ports port 443 563
> acl Safe_ports port 80 # http
> acl Safe_ports port 21 # ftp
> acl Safe_ports port 443 563 # https, snews
> acl Safe_ports port 70 # gopher
> acl Safe_ports port 210 # wais
> acl Safe_ports port 1025-65535 # unregistered ports
> acl Safe_ports port 280 # http-mgmt
> acl Safe_ports port 488 # gss-http
> acl Safe_ports port 591 # filemaker
> acl Safe_ports port 777 # multiling http
> acl CONNECT method CONNECT
>
> acl good_url url_regex -i "/etc/squid/acl/good_url"

Next step in optimisation then is to take a good look at that file.
It probably needs splitting into two:

acl good_url_d dstdomain "/etc/squid/acl/good_url_domains"
acl good_url_r dstdomain "/etc/squid/acl/good_url_regex"

Lines without needing any of the path/to/file URL pieces can be dstdomain.
The fastest destination check squid has.
ie:
www.example.com

URL which _have_ to be full URL can stay regex, but try not to add things
to that.
ie:
http://.../good_file

Then the dstdomain should be checked before regex so it short-cuts the
slow test.
http_access allow good_url_d
http_access allow good_url_r

> http_access allow all good_url

So anybody on the internet is allowed at those URL??
We recomend if possible defining your net ranges and using:
http_access allow localnet good_url

>
> http_access deny to_localhost
> http_access allow manager localhost
> http_access allow manager admins
> http_access deny manager

These should probably be ahead of the good_url tests so they again
short-circuit the bigger lookups. Basically the more items in the ACL and
the slower the type the lower down you want it (either in physical lines
or later on a single access line). Pending any access dependancy.

>
> acl users proxy_auth REQUIRED
>
> no_cache deny all

now: cache deny all
BUT, to use RAM-cache this needs to die anyway.

squid.conf should end with:
http_access deny all

(I assume it slipped out with the user-ACL cuts, but check that anyway.)

Amos
Received on Tue Jan 22 2008 - 14:12:43 MST

This archive was generated by hypermail pre-2.1.9 : Fri Feb 01 2008 - 12:00:05 MST