|Extended Cache Statistics - Annex I to Contract|
Project description: Extended Cache Statisticsaccepted for project funding by the TERENA Technical Committee, 4th December 1998.
1. Summary of proposalA large number of administrators in today's caching community maintains a cache server based on the squid software. The squid development during the last 6 month resulted in new versions with an amazing improvement of performance, flexibility and at least stability. In contrast to the development of the caching software itself, there is an increasing lack of accompanying tools to administer a squid cache and to measure its effectiveness. The purpose of this project is the development of tools to gather and extract statistics from a number of squid caches. Additionally, the visualization of certain results should be facilitated.
The squid software generates logfiles which can enable the administrator to acquire interesting aspects of the caching service - if the necessary numbers are successfully extracted from the sheer amount of data. Due to the size of the logfiles and for performance issues in generating long term statistics, many cache maintainers only look occasionally into these logfiles to get an impression about the current situation.
Although the logfile is usually processed offline, there are still advantages to it. An online processing would require constant polling of the data of interest from the squid at a not too high rate. The offline processing allows for an arbitrarily fine granularity while processing the file.
Still, most solutions suffer from the volume of possible and interesting data. Also, most data gets more interesting, if combined with other related data. Thus the logfile parsing results must be easy to incorporate into an SQL database. From this database, different sets of simple data can be put together into a more complex view of the behaviour of a cache or set of caches, all with simple SQL statements. Still, most users and even administrators prefer easy-to-handle tools. Thus a custom-tailored web interface based upon the database should be able to return graphs on the fly based on the user selected combination of views.
Currently, there are a few logfile processors available, but all suffer from their respective singularity. Some processors will give numbers which others don't give you, some will generate long ASCII based reports, others are designed to produce images with coloured graphs. Additionally, there is no squid statistic tool which supports an interface to standard databases.
A very well known processor is 'calamaris' which at least suffers from being implemented in Perl. For instance, each weekday our 10 caches in the DFN caching service accumulate well over 4 GB of logfile data. Calamaris spends over 18 hours on a high performance workstation in order to process these data. Other logfile processors might be faster, but less minute in their output.
For performance reasons and to get an impression on calamaris potential, we developed a prototype implementation of calamaris in C++. This port is sufficiently faster, and might even be sped up further for multiprocessor and/or multihost environments, but currently lacks at least in support for the up to date version 2.x of squid.
2. ObjectivesA promising prototype approach of the well-known calamaris tool to the better performing C++ is already done, as far as squid 1.x logfiles are concerned. So far, this prototype only yields to comprehend textual result. The proposed project is to look into parsing squid 2.x and (perhaps) netcache results, and return the results in a way which is easy to incorporate into a database. Furthermore, a prototypical web interface module is to be prepared which shows how different on-the-fly views of data are achieved.
3. DeliverablesThe scope of the work consists of several issues:
The project cut off point will be 9 months after the project starts. Any follow up work identified during the project will be considered as a new project.
4. Contribution commitments to the projectThe TERENA caching task force (TF-CACHE) has a considerable number of members who are interested in the results of this project. All deliverables will be reviewed by the task force and, where possible, any recommendations will be taken into account.
5. Evaluation criteriaThe results of the project will be evaluated by the TERENA caching task force (TF-CACHE) and their views reported back to the TERENA Technical Committee (TTC).
6. Change control mechanismModification of the project during its lifetime is subject to the following procedure: