Re: RAID or multiple cache_dir s?

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Thu, 31 Dec 1998 14:18:38 +0100

Paul Gregg wrote:

> o For a reasonably well used cache, is using multiple cache_dir
> entries and efficient as a hardare RAID-0 box? (Yes I know there
> will be some small performance loss, but will squid make use of
> all the disks efficiently?)

Yes. The usage pattern won't be identical to a striped disk, but it will
be efficient.

> o Another thread asked the question if a disk (and thus a
> cache_dir) went bad would squid continue to perform well using
> the remaining disks ? Is this true?

You are currently required to manually (or automatically by a
appropriate monitoring process) remove the failed cache_dir line from
squid.conf and restart squid. Until the failed cache_dir is removed from
squid.conf anything may happen.

> o Whats the maximum limits on Squid? Should I continue to throw
> disk at a box, or buy a separate box. If I can save on RAID, I
> can afford extra memory and so have a 1Gb RAM box - Squid won't
> break?

No known limits, but there are some manageability issues with huge
caches. For example if Squid crashes (which it sometimes does) then it
will take ages for Squid to rebuild the index of a huge cache and there
will be a noticeable impact on performance and behaviour during this
time.

> o Is it the general consensus to avoid Transparent proxying if
> possible, and that it is better to force the user to knowingly
> use the proxy settings in their browser?

My recommendation is to force users to set their proxy settings,
preferably using a PAC file.

My recommended way to force users to configure their proxy settings is
redirecting non-proxied port 80 traffic to a page with easy to follow
instructions on how to configure the proxy settings, and information on
why this is done (in a positive attitude).

Why you should use transparent proxying:

* Users do not need to change anything in their proxy settings
* Users do not need to be aware that they use a proxy

Why you should NOT use transparent proxying:

* Users are forced to use a proxy without knowing.
* If the box fails then the users has no way of getting around this
until the network manager removes the redirection (or the network is
smart enought to not redirect to a failed box).
* There are some serious issues with TCP/IP when doing transparent TCP
redirection. This may render the service unusable for some people, and
less efficient for some other.
* Routing redirection policy can only be set on IP+PORT basis.
* Only port 80 traffic can be transparently proxied using a wildcard
rule. All sites using other ports need additional redirection rules to
be transparently proxied.
* Only HTTP can be transparently proxy-cached. You can't transparently
proxy-cache FTP as there is no known FTP-protocol proxy which can cache
(doing caching in the FTP protocol in a reliable manner is very hard,
while caching of FTP urls from WWW clients using a proxy is very easy).

Why you should use a PAC script:

* It automatically detects failed proxies
* Proxy policy can be set on any granularity you like, from everything
down to individual URLs. This is good for a blacklist of
sites/subsites/urls/methods which can't be proxied.
* It allows you to expand to use more proxy servers if needed.

---
Henrik Nordstrom
Spare time Squid hacker
Received on Thu Dec 31 1998 - 06:12:39 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:43:46 MST