Re: Deferred reads

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sat, 03 Jun 2000 21:23:06 +0200

Andres Kroonmaa wrote:

> I'm just wondering if there might be some difference whether the cpu
> bottleneck happens in userland vs kernel. If kernel if written so
> that there are few shared locks, then pushing too much work on it
> may block some other tasks from proceeding. In this sense it has
> less impact on the whole system if bottleneck happens in userland.

Varies from OS to OS. For example if it is a microkernel based OS then
most "kernelspace" are done in "userspace" from the kernels point of
view.

> we'd need to address a problem with that: what if neither socket ever
> blocks?

Easily manageable. For example you could make sure reads are always
large enought to always drain the queue, or simpler only allow one
operation per poll loop. Probably both is the best ;-)

> naturally by polling. I think we should settle somewhere in between.
> One way to do it is to limit amount of work to few read+writes per
> socket in a pass. loop around all sockets, and if there is noone left
> for io, go to poll() them all together.

I think there should be a poll between two (large) reads of a
filedescriptor. This to give Squid a chance to detect that there are
other filedescriptors ready, and to give them a fair chance to get
processed.

> I disagree. If you have 1000 dialup users at about 33,6K each, you'd
> need 10-20M of internatinal link and pretty good backbone. This moves
> per-session bottleneck to client-side most of the time.
> persistent connections are only increasing, meaning potentially lots
> of traffic via single client socket. actual servers can be pretty
> many, so objects being small doesn't mean too much any more.

Persistent connections are increasing yes, but so far pipelining is not.
persistent connections without pipelining has the same request pattern
as "normal" ones, only that the connection isn't reestablished between
each request.

And even with pipelining many pages will fit well inside rational TCP
send buffer sizes (32K +).

> Sure. I just try to look further. sure poll is required when socket
> gets EWOULDBLOCK. My guess is that we waste more CPU and gain little
> performance if we poll immediately all blocking sockets. at least one
> would probably be ready in under 1mSec, and we ask kernel to check for
> all of them.

Certainly, and if there is data in the TCP send queue then waiting a
little won't hurt latency at all. However this will not always be the
case. You also have a problem with concurrency. For buzy connections
waiting a little will not decrease the performance. It will be most
noticeable on the initial request processing.

> > True. However here it is quite likely more important to optimize the
> > sizes of read() operations to keep a nice saw-tooth pattern in the TCP
> > window sizes when congested.
>
> not sure what you mean. as I understand it, tcp is most efficient
> when receiving buffers are empty and transmit buffers are full.
> I'd make read size as large as possible. I think socket buffers
> are not taking any considerable memory, so I'd increase them if
> that helps.

Sorry. I did not express myself in a good way. What I meant is that it
is important to control the reads to avoid causing tiny TCP windows less
than a packet size. This is to avoid causing the origin server to send
tiny packets while we are buzy sending already received data down to the
client.

> I agree that large latency is an issue. but small one? look at this
> as gathering: if we don't get additional traffic in few msec, we
> give up and send whats gathered so far. If we get more traffic in
> this time, we send more in one shot and feel efficient ;)

Well.. it depends on your application. If you have dialup users then
adding a few ms won't be noticed much unless there is a large chain of
proxies. For lan users it will probably be noticed.

> not sure either ;) perhaps its because I don't understand how its
> done right now.

In principle:

Data forwarding is done "randomly", but assignment of available
"bandwidth space" is done at intervals (currently once/second). If there
is no "space" available then the connection will be deferred until it
has been assigned more "space".

> yes, but deferral based on rate limiting is tied to time. And this
> will need to be evaluated.

Well.. yes, but not in anyway resebling how it is done today. The
reevaluation is best done on a global basis when the rate controller
reevaluates bandwidth assignments.

> this means that delay-poll time reaches zero and squid is saturated.
> I'd omit accepting more request in that case, for eg.

When to stop accepting more requests depends on the application and
service level guarantees.

/Henrik
Received on Sat Jun 03 2000 - 13:29:56 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:28 MST