Re: Performance question

From: Jon Kay <jkay@dont-contact.us>
Date: Thu, 17 Jun 1999 11:25:22 +0000

> I assume that the conversion from URL-filename is trivial in terms of
> time. Where is most of the time being spent?

The converstion from URL to filename is indeed trivial. What is not
trivial
is the conversion the kernel from filename to file inode - e.g., to
actually
access the file Squid requests.

The reason is that Squid creates a vast number of files - one per object
cached,
which amounts to millions of files at the caches with heaviest load.
Searching
directories becomes the biggest overhead. Most operating system
directory search
algorithms are tuned to work with relatively small numbers of files -
hundreds at
worst (there are exceptions - Network Appliance is said to have better
search
algorithms).

If you think about it, Squid has to choose what directory to put each
object.
Putting them all in one directory would result in searching through a
million
objects on each lookup. Instead, Squid creates a large system of
subdirectories,
typically 16 top-level directories, each containing 256 subdirectories.
Now
the kernel can efficiently search each directory, but it still has to go
through
those extra directories' inodes to reach the file. The kernel must
potentially
wait for disk seek and rotational delays to access each inode.

Normally, programs don't worry much about putting files in
subdirectories, but
most programs work in an environment where they are helped by a cache of
inodes.
Such caches typically hold tens of inodes - obviously, rapidly swamped.
Note
that you would need an inode cache of size 10,000 to work with Squid.
That is
by no means infeasible, but since smaller inode caches have sufficed up
'til now,
most operating systems do not have such a thing. Since bigger caches
would help
with browsers, too, there is an incentive for OS vendors to add such
features,
just as they have added features to support tens of thousands of TCP
connections
for web service.

                                                        Jon
Received on Thu Jun 17 1999 - 10:18:36 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:46:54 MST