Re: diskd Q1/Q2 parameters backwards

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Mon, 7 Jan 2002 23:22:10 +0100

OK. The error is neither in documentation or code, but a conceptual one in
how to address the problem.

Is it best to try to push the drives as hard as possible,
assuming your drives is always faster than network, or do you want to
bypass the drives when disk request queues are too high for comfort
leveling out at the speed of your network..

My gut feeling is that one do want to bypass the disk before starting to
block extensively due to disk I/O queues.. But at the same time, one do
not want to throw away a lot of disk I/O away only because of small peaks
in I/O queues.

You seems to be wanting the reverse than my gut feeling, rather block on
your disks than touch your network. If you have a strong disk subsystem
then this do make sense. The stronger your disk system is, the higher Q1,
eventually bypassing Q2. Once above Q2 further increases of Q1 is only
likely to make extremely minor difference, if any.

Both diskd and async-io is struggling with finding correct balances of
these concepts. It is not an easy task to find good generic defaults that
fits all, certainly not with the relatively small and dumb queue concepts
both are using. For "Q1" a better measure is probably a relatively quickly
decaying average of I/O latency than the actual queue length, and to split
the measurement (and priorities) on swapins, swapouts and deletes.

This experience is based on playing with the parallells of the Q1 and Q2
limits in the async-io implementation, but there the situation is indeed a
bit different as the possible queue length is close to unlimited..
(async-io has parallells to Q1, Q1.5 and Q2).

Further aspects one need to consider in this is the VM impact of the disk
I/O. This is an area I haven't explored very deeply yet, and most likely
quite dependent on the OS.

What has been seen very clearly in my benchmarks is that
a) A squid with no cache performs quite well, even with many concurrent
clients.
b) A squid with cache performs much worse even if there does not seem to
be any too large I/O queues. Aggressively bypassing the cache helps
throughput, but at the expense of hit ratio and total response time.

If you see that MISS response times are climbing significantly then there
is something really bad going on which should not happen.

But these measurements is a bit tricky to get in a readable form as most
benchmarks start pounding the system harder with more clients when
response times climbs, making it a bit hard to analyze what is actually
going on "at the border".

A good way to find the correct realations of these is perhaps to have a
benchmark running with a fixed amount of concurrent client request,
measuring response times and hit ratio, while playing with the I/O queue
limits of the proxy.

Regards
Henrik

On Monday 07 January 2002 22.36, Duane Wessels wrote:

> In the implementation, Q1/magic1 is where Squid takes more desperate
> measures. For example, intentionally failing on open. We want to
> avoid Q1/magic1 if possible.

> I've been doing some benchmarks and banging my head on this for a couple
> of days. With Q1 < Q2 (as in the comments and defaults) really bad
> things happen. Squid hits the Q1 limit and hit ratio goes down,
> response time goes WAY up and doesn't come back down for a long time.
Received on Mon Jan 07 2002 - 15:22:42 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:45 MST