As you probably know, Squid (1.1.x) tends to use a lot of virtual memory (VM) to operate well. There two large uses for VM are object metadata, and in-transit object data. Squid uses about 120 bytes/object for the metadata. This includes the URL, timestamps, and the index to the disk file. As objects are retrieved from Web servers, Squid holds the object data (i.e. "body") in VM until the transfer is complete. At that point, if the object is cachable, it will begin writing the data to disk. Otherwise, the object is simply destroyed. This avoids any disk activity for uncachable objects.
The use of VM for in-transit objects is problematic for very busy caches, or for very large objects. There is a workaround for the large objects--we can free the memory up to the lowest offset of all clients reading from it. This also means that large objects are uncachable.
A few months ago, I began development of a branch version called ``NOVM.'' This version does not use VM for in-transit object data, but does still hold the entire cache metadata in VM. All objects are written to disk as they are retrieved from Web servers or neighbor caches. This version essentially trades off memory for file descriptors.
One might expect that the 1.NOVM.x version does not perform as well as 1.1.x, presumably because the object data can be accessed very quickly from virtual memory. In other words, the NOVM version might perform less well because it uses the disk for all transfers.
What is a good measure of performance? A number of things come to mind:
For this (simple) experiment, I have focused on the service-time as a measure of performance. This is simply the amount of elapsed time from client connection establishment to connection close. This value is logged as the second field in Squid's access.log.
Three computers were used for this experiment: (1) a 75Mhz Pentium running Linux, (2) a SGI Indy running IRIX, and (3) a Sun Sparcstation 1+. These three systems were connected via a dedicated 10Mb/s ethernet segment (i.e. no other machines on the segment). The Sparcstation ran the HTTP client, the Pentium ran the HTTP server, and Squid ran on the SGI.
The simulated HTTP server was written to be a low-impact (non-forking) TCP server application. It accepts connections, reads requests (which are ignored), and then writes a simple HTTP reply followed by a random amount of bogus content. The object size is randomly chosen to match the real file size distributions we see on the NLANR caches.
The HTTP client (tcp-banger2) was also written to be simple and low-impact. It reads URLs from stdin and generates Proxy HTTP requests. A command line parameter limits the number of simultaneous proxy connections.
The experiment was run four times, twice with 1.NOVM.10 and twice with 1.1.10.
The same squid.conf file was used for all runs. Additionally, squid was patched to log some statistics (page faults, VM usage from mallinfo(), and FD usage) once per second to cache.log.
Number of Requests vs Time
This graph shows how quickly each run was completed, and also the rate at which connections were handled. This table summarizes the graph:
RUN TOTAL TIME REQUEST RATE --------------- --------------- ------------ NOVM/MISS 688 seconds 14.5 req/sec NOVM/HIT 610 seconds 16.4 req/sec VM/MISS 648 seconds* 15.4 req/sec VM/HIT 593 seconds 16.9 req/sec
*The final connection of the VM/MISS run took an unusually long time (as you can see on the graph below). 648 seconds is when the 9999th connection completed.
Note that the two HIT cases are quite close to each other.
Service Times
This graph shows cumulative distribution histograms of the service times (2nd field of access.log). Here the two HIT runs are similar to each other, and the two MISS runs are similar to each other as well. Interestingly, the NOVM/HIT run has a better median service time than the VM/HIT case, but they are quite close.
RUN MEDIAN --------------- ------------ NOVM/MISS 3.59 seconds NOVM/HIT 2.16 seconds VM/MISS 3.46 seconds VM/HIT 2.29 seconds
Filedescriptor Usage
This graph shows filedescriptor usage over time. The graph shows that the NOVM/MISS case peaks at about 260 FDs, both NOVM/HIT and VM/MISS at 140 FDs, and the two VM/HIT case peaks around 110 FDs. Remember there are a maximum of 63 simultaneous client connections.
VM Usage
This graph shows the amount of memory used by the Squid process over time. The value was acquired from the mallinfo(3) function call. Clearly the NOVM cases use much less memory. On the VM/MISS plot, you can see the 8Mb 'cache_mem' pool being filled up by in-memory objects in two stages.
Note that the two HIT cases do not begin at zero. This is because the HIT runs were made immediately following the MISS runs without killing and restarting the Squid process.