Re: commloops and modio development from Henrik Nordstrom on 2001-01-13 (squid-dev)

From: Henrik Nordstrom <hno@dont-contact.us>
Date: Sat, 13 Jan 2001 20:07:14 +0100

Robert Collins wrote:

> > I'll second this I think, and kill the feature that there can be
> > multiple clients listening to one server connection. Having the
> > possibility of multiple clients makes things quite complicated.
>
> It may make it complex, but on low bandwidth sites, it's a must
> (in my book). Plus when auto downloading programs (like mozilla
> setup) think there's been an error and restart, ending up with
> two parallel downloads. No leave this in please. Really.

Ok. I'll give up and go back to my original thinking which is almost the
same thing, but with reference counters on returned buffers allowing the
stream to be split.

The complexity is not so much in the actual stream splitting, but in how
to manage when the attached clients are at different speeds, one or more
lagging behind the first, or starting at different times. Then combine
this with virtually unlimited object sizes and you get a quite messy
situation.

> > I'd also kill the ability to reattach to a "not-yet-aborted"
> > request, based on the assumption that we can implement storate
> > of partial objects.
>
> I don't agree with this. Henrik I think you raised the point that
> early abort should still exist to control when to stop the object
> download and cache the partial, and when to finish it anyway.
> However it's not a big loss compared to the first item, so I'm not
> going to stress as much :-]. The point about this is that dynamic
> objects won't be satisfiable by combining ranges, but are
> satisfiable by fully cached objects.

Quite few dynamic objects are cacheable, but hopefully that will change
as more and more of the web is database driven. A real dynamic object is
rarely cacheable anyway.

Note: At least IE has implemented Range support for completing partially
downloaded objects. This will up the awareness of the server providers
to support Range for images and other objects where download
completetion of partially downloaded objects are interesting.
>
> > So exacly how simple can "single client per server connection" get:
> > * no need for deferred reading
> > * no need for buffering
> >
> > Why no need for deferred reads:
> >
> > It is handled automatically by the dataflow. When the client wants data
> > you try to read. If there is no data then register for notification of
> > available data.
> >
>
> Uhmm, I am actively (read in my copious spare time) preparing to put
> hooks into squid to allow modification of data coming into
> squid, before it hits the store & clients, and also for data
> leaving squid to the clients. This is for eventual iCAP integration.
> (so virus scanners can sit beside squid, not above it as parents).

Which brings it's own set of problems. Data modification is one thing,
which must insert itself in the path somewhere. Where to depends on if
the modification is global, or differs per client. Any needed buffering
should be performed by the modifier I think.

Virus scanning and similar things requires a quite different approach I
think. Virus scanning requires the whole object to be downloaded, and
then verified with the scanner. To do this we most likely will have to
insert a hook between the protocols and the store/client(s), spooling
the reply in it's own store and sending it to the virus checker. If OK
replay the spooled reply to the rest of the code to resume the
processing.

Neither of these two needs to make the overall framework more complex.
The scanning most likely needs some extensions to be manageable and
allow the user to abort the download somehow. HTTP (not even 1.1) is
well designed for long delays in data processing.

/Henrik
Received on Sat Jan 13 2001 - 12:54:12 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:17 MST