Managing large http_access lists: alternative methods

From: Scott Lystig Fritchie <fritchie@dont-contact.us>
Date: Thu, 09 Apr 1998 01:31:25 -0500

This will be my last question/observation/suggestion message for the
evening, I promise. I also searched my archive of previous messages,
the FAQ, etc. as best as I could looking for possible answers.

MRNet is operating some Squid caches which, at the moment, have no
access restrictions on them. The goal was to make it as easy as
possible for our customers to see the beauty and wisdom of
participating in the cache hierarchy. That policy has worked
moderately well for our customers ... and too well for non-customers
who attempt to launder their connections when breaking in to Web-based
chat systems, etc. So, it's (beyond reasonable) time to clamp down.

However, I'd like to keep open-to-our-customers policy. However, the
list of *aggregated* networks that Squid would have to check is a list
of at least 122 networks. Lots of CIDR blocks, holes punched in CIDR
blocks, Class B networks, multiply-homed customers, you name it. And
since we were recently purchased by another ISP, I'd have to add
another two to four dozen networks to encompass our entire customer
network base.

My first question is: is there a significant performance penalty for
checking so many http_access statements, or throwing lots of networks
statements in a fewer number of acl statements? When our busiest
cache is busiest, we see 75K TCP connections and 30K ICP queries per
hour. (Dunno exactly how many of those would disappear once the ACLs
go into place.)

I don't really want to have to manage that list of networks by hand.
Am I too idealistic (or too lazy) in hoping for a better way to manage
these things?

If there isn't a better way, how harebrained would these ideas be
considered?

        * Since our routers have a lot of info of who's close and who
        isn't, it would be cool to ask a router that info.
        Unfortunately, that's difficult to do. Perhaps a hacked
        version of gated, listening in on what its neighbor routers
        are saying, which could answer queries from Squid about who's
        close (i.e. a customer) and who isn't. (I've been incubating
        this idea for a while ... would also be cool for making an
        intelligent, dynamic autoconfiguration script writer pretty
        easy. "Oh, you're located near the Duluth POP. You want to use
        Duluth's cache first, not the one in Mankato.")

        * A pool of external processes, like the DNS resolver and
        redirector processes, which use traceroute back to a client.
        If the client is more than N hops away, or if certain router
        interfaces are reached within N hops, they're too far away to
        be a customer.

        * Cache this info (or the first 3 octets of the client's
        address) in the data structures Squid already keeps. It would
        make subsequent postitive/negative lookups quick, there
        would be an automatic expiration mechanism (for those
        customers silly enough to leave us for another ISP), and
        stateful memory when squid is killed/restarted.

OK, call me lazy and crazy. :-)

-Scott
Received on Wed Apr 08 1998 - 23:35:22 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:39:38 MST