Help in using logs for simulator

From: <G.C.Jawaheer@dont-contact.us>
Date: Mon, 28 Sep 1998 16:11:17 +0000

Hello,

I am new to Squid. I am developing a program in order to simulate
the performance of different cache replacement policies of proxies.
However, being a newbie to the field, I need to check the correctness
of my reasoning and assumptions. Consequently, in the following
paragraphs, I am exposing my objectives and my line of thoughts and
I'll be most grateful to anybody who can answer my queries and
correct my mistakes.

I have at my disposal the access logs of a Squid 1.1 in native
format, i.e., "time elapsed remotehost code/status bytes method URL
rfc931 peerstatus/peerhost". The proxy from which I obtained these
logs was configured as a peer cache. My aim is to use these logs in
order to obtain the following:

1] the requested URL
2] the time/date of the request
3] the size of the document returned to the client

I am not interested in knowing anything about the clients making the
requests. I am also not taking into consideration the consistency of
documents.

Is it possible to extract the above mentioned data from the native
format of access logs?

Does Squid make real time decisions about which document to cache,
i.e., is it correct to assume that, to every requested document,
there is or there will eventually be, a copy in a cache somewhere,
albeit the local proxy cache or a parent cache?

Going back to the native format of the access log of Squid 1.1,
"time elapsed remotehost code/status bytes method URL rfc931
peerstatus/peerhost", the "time" field will give me the "time/date of
the request" (albeit as UNIX time stamp). I can get the "requested
URL" from the "URL" field. But what about the "size of the document
returned to the client". The "bytes" field IS NOT the size of the
returned document, at least not under all circumstances. However,
from what I understand, there are situations where the "byte" field
will be the "size of returned document" (perhaps when I have a
TCP_HIT). Am I right? For example, when I have an ICP_QUERY with a
UDP_MISS, the "bytes" field must be interpreted differently.

Thus, in order to retrieve those records where the "byte" field
represent the size of the returned document to the client, I need to
look for particular code/status combinations. Is that correct? Do I
need to look for particular peerstatus/peerhost combinations
also?Assuming that my above reasoning is correct, I need help to
interpret the code/status combinations, i.e., I don't know which
code/status combinations to look for?

Last but not least, I thank anybody who can pull me out of this pool
of ignorance.

Regards,

Gawesh
+++++++++++++++++++++++++++++++++++++++++++++++++++
 Gawesh C JAWAHEER
 MSc Information Systems and Technology, 1997/98
 City University, UK
+++++++++++++++++++++++++++++++++++++++++++++++++++
Received on Mon Sep 28 1998 - 08:12:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:12 MST