Re: [squid-users] file containing regex

From: Ahsan Ali <ahsan@dont-contact.us>
Date: Sun, 2 Dec 2001 22:06:17 +0500

The reason I am looking into it is that in this part of the world bandwidth
it prohibitively expensive. And at the same time, for most ISP's blocking
porn is not an option - yet at times, 70% of web content coming through is
porn. What I wanted to do is, have a bunch of regexs matching porn (use one
of the blacklists) and cache these for an indefinitely long period of time -
something, ideally, like the offline mode in squid but without causing other
web content to become forcefully cached in the proxy.

The proxy I have at my disposal right now is a Dual Xeon 550 with 256MB
ram - I'll be adding ram to it soon. I hope to get a Dual Athlon XP 1800
with 2GB ram soon and that machine will then be handling the cache load.

Another area I am looking into is that can I use a regex to match URLs and
if there is a match, redirect them to another proxy server? This one could
be running in offline mode and in this way porn would cease to consume a
major portion of my bandwidth, especially since most porn sites are
images/movie clips and as such are static content anyway.

I don't have access to a squid box right now - tomorrow I intend to look
into cache hierarchies and see if this is possible.

What are your views on the feasibility of such a caching infrastructure?

-Ahsan Ali

----- Original Message -----
From: "Henrik Nordstrom" <hno@marasystems.com>
To: "Ahsan Ali" <ahsan@khi.comsats.net.pk>; "squid"
<squid-users@squid-cache.org>
Sent: Sunday, December 02, 2001 9:08 PM
Subject: Re: [squid-users] file containing regex

> Each refresh pattern (directly, or included, does not matter) uses a small
> amount of CPU. The longer Squid has to search your list of refresh
patterns
> before finding the correct one, the more CPU it uses.
>
> If you have very large lists then consider adding the most common ones
early
> in the list to save some CPU time. This is true for any regex based list.
>
> A modest sized CPU should be fine with lists up to about a thousand
entries.
> in most installations. There is no direct cutoff point as the CPU usage
only
> grows by a small amount for each entry. If you are in a situation that you
> need tens of thousands of refresh_pattern lines then something is
seriously
> wrong and you probably need to reconsider why you need all those refresh
> patterns..
>
> If you are worried how far you can push your system then it is very easy
to
> test. Simply add a huge number of refresh_pattern lines not likely to
match
> in full and observe how your Squid behaves.
>
> This all ofcourse depends on how much CPU you have to spare for
> refresh_pattern processing.
>
> Regards
> Henrik
>
>
> On Sunday 02 December 2001 15.11, Ahsan Ali wrote:
> > So If I keep adding regexs to squid.conf itself, how many can I
> > realistically scale to before it blows up?
> >
> > Regards,
>
>
> --
> MARA Systems AB
> Giving you basic free Squid support
> Priority support or Squid enhancements available on request
>
>
Received on Sun Dec 02 2001 - 10:06:57 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:05:08 MST