Re: squid rewrite

From: Stewart Forster <slf@dont-contact.us>
Date: Tue, 10 Jun 1997 16:27:24 +1000

> Andres Kroonmaa wrote:
> >
> >
> > As your thoughts are quite similar to those I have thought of for
> > quite some time, I would like you to take a look at these ideas and
> > comment aggressively. As it appeares to be too big to include with a
> > mail, here is a URL: http://cache.online.ee/cache/Devel/design.txt
>
> Ok - from what I can tell there is one basic question:
>
> Can squid get better?

        Absolutely.

> I don't know... I think that it can, but I can't really be sure, except
> for the basic problems like integrating ftp gets and support for ftp puts.

        That's be nice. I'd also like to see some support for NNRP caching,
but that's another issue...

> There are some problems with the scalability of squid though... these
> include:
>
> 1) you are limited by filehandles

        Well, not really. We never see greater than 1000 FD's used now
that we've fixed the bug. We'd be pushing VERY darn hard to see close to
Solaris's 4096 limit.

> 2) you are limited by having one process that does everything, so you
> can't do SMP nicely

        This is a problem, BUT squid is not really CPU bound, so the problem
is not as big as it seems. We can still drive 50 TCP and 150 UDP hits/sec and
still not run out of CPU. Because our code busy waits when the cache gets
busy, I don't really know how much capacity is left (167MHz UltraSparc), but
it certainly doesn't seem to wanting even at these loads. Disk and network
I/O are far more likely to be limiting factors. Still, having multiple
front-ends sharing a common store is a good way to start.

> These are listed (now that I look) in your document....
> I can't see any other problems with squid, (though you say that there is
> a problem with the expiration policy...)

        I've just completed and implemented full LRU for squid's objects. It
provides another speed increase in that you only remove what you need and
you're only removing the oldest objects. Hit rates rose by another 2% even
when traffic rose by 15%.

> Create a process that does the listen on the incoming sockets. It also
> functions as a database manager, in that it knows all of the objects
> in the cache and where they are stored. This database will include
> locking, so that as an object is being downloaded it isn't expired.
>
> This database is stored as (?a seperate file?) a memory-mapped (man mmap)
> file with the synchronous flags on so that one update immediately
> becomes visible to all process that have the same memory-mapped
> file. Note that these process can't change the file (they open
> it with "PROT_READ" - only the "central-object-manager" has access
> to "PROT_WRITE")
>
> When something connects to the main process it hands the new filedescriptor
> to a seperate (pre-forked) process over a unix-domain socket. The parent
> then closes the fd, and creates an internal "job-number" for that child process.
> The job number is so that the manager knows how loaded the squid is
> and can decide as to if it should start sending requests to the second
> squid-baby.
> This child process then gets the "GET _url_" request. This process
> then checks the object-map to see if the object is in the cache.

        Uh. BIG problem here. You can't handover connected sockets between
two already running processes.

> If an object is not in the cache it sends a request for a "storage space"
> which will (at least now) be a position in the "cache_dir" directory.
>
> This baby process essentially behaves exactly like the current (novm) squid,
> being a "non-forking" process in a big select loop. The parent process
> handles all the "cache-management" for it though.
>
> If the baby-process gets too loaded (eg nears the fd limit) it means that
> the parent process can then fork (possibly pre-fork?) a second
> process that it can then pass all requests to. If there are multiple
> CPUs in the machine it can send every 2nd request to one process, then
> the other... thus we eliminate both problems at once.
>
> The only way I can figure a way around both these problems is to
> use shared memory (or some form of ICP - shared memory will be the most
> efficient, I think, since it doesn't involve system calls and waiting
> on unix-domain sockets etc)
>
> If we want to get really fancy we could mmap the entire structure
> that the data is stored in.... letting the OS handle all of the disk-io
> without the lag caused by the filesystem.
>
> Problems with this include: There is a maximum size of your allocated
> memory (unsigned int on linux, at least). This means that you can have a
> 2.1gig cache max. I think that you can get away with multiple processes
> mapping multiple segments of this size (not sure at all here) at once,
> and then passing requests between the cache-processes. This is messy.
> Creating an squid-FS like this means that you win on the number of
> filehandles used (big time - you don't even use a fd when doing mmaps,
> well you do for a second or two, but that's it). It means, though,
> that you have the above problem and you would have to basically
> "re-invent" the filesystem... coz you will have to be able to handle
> lots of different sized objects without fragmentation etc)

        Above ideas are sound enough, except you can't pass accept'ed
sockets between 2 running processes. This means that much of your model
described above falls down, BUT the ideas are there and good.

        Cheers,

                Stew.
Received on Tue Jul 29 2003 - 13:15:41 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:20 MST