Re: [squid-users] Development of SQL-based ACL? from Joshua Goodall on 2005-06-09 (squid-users)

From: Joshua Goodall <joshua@dont-contact.us>
Date: Fri, 10 Jun 2005 14:47:21 +1000

On Thu, Jun 09, 2005 at 02:41:57PM +0200, Didde Brockman wrote:
> Hello users,
>
> I'm investigating the possibilities of using Squid in a large
> corporate environment. We have found the documentation to be rather
> complete, and after a period of evalutation two questions has come up
> and now I'm hoping you guys are able to help me out.
>
> 1. Are there any references available on _large_ projects
> incorporating Squid? "Large" being defined as roughly 100,000 -
> 200,000 clients 24/7. The numbers available on the site are outdated
> and seems to be focused on smaller solutions. I'm looking for some
> example hardware setups or basic numbers pertaining to the load of
> systems experiencing _heavy_ traffic.

I've worked on multiple sites with requirements similar to yours.
I'll just throw some numbers at you, then. You can easily do a couple
of hundred requests per second - even with regex ACLs and authentication
- on a single 3GHz Xeon. You don't say whether those are simultaneous
clients or total seats, which makes a huge difference, nor whether you
actually want to cache or just do this for filtering and access control.

It'd help if you knew either the number of requests per second, or the
bandwidth, or both. I've seen large clusters sized based on both those
measures. Filtering can seriously damage your throughput - I've seen
the same 3GHz Xeons reduced to a peak of 80req/sec (or about 4-5mbit/sec)
through a heavyweight (but very full-featured) filtering subsystem.

Above 8 servers I've observed serious degradation of the ICP protocol
used for cache peering. So if you need more than that, or you might
grow to more than that, either discontinue ICP (which will impact
your hit ratio) or move to a hierarchy, with a lower layer doing
authentication & filtering plus an upper parent layer doing caching.

Use source-IP-affinity loadbalancing as much as possible. There are
some funny failure modes with other algorithms. Hardware load-balancers
are usually a better choice than LVS, but LVS can do the job.
Don't forget to make sure your high-availability tools (the LBs) are
supported to the hilt - if you lose them, you lose the entire service.

If you can, overprovision by at least 50% and have a 3/6/12-month
capacity projection.

> 2. We would like to be able to administer ACL's through rules stored
> in a external database (SQL). Also, this would pair an IP with
> specific settings for a specific client, i.e. 10.0.0.122 is allowed
> to do this, but not that. Also, if client 10.0.0.122 requests a
> "forbidden" HTTP-resource its request should be redirected to a local
> page. Basically, these settings should be customisable for all users
> so they can acknowledge a forbidden resource, absorb the information
> on the local page (the target of the redirect) and then choose to
> continue on to their (forbidden) destination or not.

You're going to need to develop a redirector. What you've described
sounds remarkably similar to N2H2, but that might not be the kind
of product that suits your business requirements. The redirector
will probably then work in conjunction with a simple front-end to
your policy database, which means policy must be changeable in
real-time. So your database designs and internal protocols will
also impact performance.

I strongly advise use of 407 Proxy Authentication rather than per-IP
controls, unless you've already solved the problem of authenticating
identity by IP address (e.g. perhaps you're a dial-up ISP using
RADIUS?)

Have you thought about how you're going to manage "banned" URLs?
There are many solutions. Trying to manage a list of blocked sites
yourself is probably the worst one.

> For the second point, I'm assuming this would require us to contract
> a developer to make the necessary alterations the Squid's source
> code, so I'm also curious as to your experiences (if any) with known
> companies offering to do customisations on Squid.

You shouldn't have to modify Squid itself, unless you want to do anything
"magical". On the other hand, if you do find something you need, get a
really, really good Unix systems programmer. There are some
lurking on the Squid mailing lists. I can recommend a couple that are
based in Australia, if you mail me privately. If you decide to hire
a company, be sure to check the technical background of their lead
developers.

You also need a good business analyst to develop the access rules
framework, should you decide to develop your own rather than use one
off-the-shelf. A bad system becomes horribly fine-grained and unusable
very quickly. You also need to consider your reporting requirements when
designing. Believe me, those logs become very large, very quickly.

Get the balance right and you will easily push tens or hundreds of
megabits through a cluster of Squids. Get it wrong and you've got
hundreds of thousands of irate users on your hands. Best of luck :)

-- 
Joshua Goodall                           "as modern as tomorrow afternoon"
joshua@roughtrade.net                                       - FW109

Received on Thu Jun 09 2005 - 22:47:37 MDT

This archive was generated by hypermail pre-2.1.9 : Fri Jul 01 2005 - 12:00:02 MDT