Re: commloops and modio development

From: Robert Collins <robert.collins@dont-contact.us>
Date: Sun, 14 Jan 2001 10:04:32 +1100

----- Original Message -----
From: "Henrik Nordstrom" <hno@hem.passagen.se>
To: "Robert Collins" <robert.collins@itdomain.com.au>
Cc: <squid-dev@squid-cache.org>
Sent: Sunday, January 14, 2001 6:07 AM
Subject: Re: commloops and modio development

> Ok. I'll give up and go back to my original thinking which is almost the
> same thing, but with reference counters on returned buffers allowing the
> stream to be split.
>
> The complexity is not so much in the actual stream splitting, but in how
> to manage when the attached clients are at different speeds, one or more
> lagging behind the first, or starting at different times. Then combine
> this with virtually unlimited object sizes and you get a quite messy
> situation.

I don't think it's that bad. I just had an idea about this - it's not thought out yet but... I think what needs to change is the
current push only model.

Currently data to the client is only sent via an event (clientSendMoreData).
That means when we have two unbalanced downloads that one storeclient spends 99% of it's time going "I'm not ready to send, hang on
and call be again"

I think the same situation applies to disk based objects?

How does this sound as an _idea_:

we allow a push & pull model:

If we can get informed by tcp when our out buffer is empty (or better yet hits a low water mark), client_side checks the store to
see if it can read more data, and if it can, write immediately. If it cannot, it just returns.

If a client is getting back logged by the available data from the store, it simply tells the store that it doesn't want to be called
when more data is available: it'll pull the data out.

Probably fundamentally broken as a concept, but the thing I am trying to address is the issue that we have several conflicting
requirements:

Utilise upstream bandwidth efficiently
Uitilse downstream bandwidth per client efficiently (keep the out buffer full & pipeline requests)
Minimise 'useless' code looping. I.E. the clientsideSendMoreData check for client agent status.

> > > I'd also kill the ability to reattach to a "not-yet-aborted"
> > > request, based on the assumption that we can implement storate
> > > of partial objects.
> >
> > I don't agree with this. Henrik I think you raised the point that
> > early abort should still exist to control when to stop the object
> > download and cache the partial, and when to finish it anyway.
> > However it's not a big loss compared to the first item, so I'm not
> > going to stress as much :-]. The point about this is that dynamic
> > objects won't be satisfiable by combining ranges, but are
> > satisfiable by fully cached objects.
>
> Quite few dynamic objects are cacheable, but hopefully that will change
> as more and more of the web is database driven. A real dynamic object is
> rarely cacheable anyway.

Web developers need to be rfc 2616 aware, and make use of the cache-control functionality for private revalidatable data,
dynamically cacheable pages and the like. Dynamic for most applications does not mean immediately old, (for example for stock
quotes - it's not going to change for 15 minutes or whatever interval that provider/country uses. I hope we'll see increasing
dynamic-cacheable responses.

> Note: At least IE has implemented Range support for completing partially
> downloaded objects. This will up the awareness of the server providers
> to support Range for images and other objects where download
> completetion of partially downloaded objects are interesting.

Neato.

> >
> > > So exacly how simple can "single client per server connection" get:
> > > * no need for deferred reading
> > > * no need for buffering
> > >
> > > Why no need for deferred reads:
> > >
> > > It is handled automatically by the dataflow. When the client wants data
> > > you try to read. If there is no data then register for notification of
> > > available data.
> > >
> >
> > Uhmm, I am actively (read in my copious spare time) preparing to put
> > hooks into squid to allow modification of data coming into
> > squid, before it hits the store & clients, and also for data
> > leaving squid to the clients. This is for eventual iCAP integration.
> > (so virus scanners can sit beside squid, not above it as parents).
>
> Which brings it's own set of problems. Data modification is one thing,
> which must insert itself in the path somewhere. Where to depends on if
> the modification is global, or differs per client. Any needed buffering
> should be performed by the modifier I think.

Yes agreed. And there are four hooking points : incoming requests, outgoing requests, incoming responses, outgoing responses.
Incoming requests and outgoing responses are things like adding language accept headers/altering cookie headers
outgoing requests and incoming responses are things like redirecting downloads to mirror sites/scanning for virus or 'illegal'
content.

>
> Virus scanning and similar things requires a quite different approach I
> think. Virus scanning requires the whole object to be downloaded, and
> then verified with the scanner. To do this we most likely will have to
> insert a hook between the protocols and the store/client(s), spooling
> the reply in it's own store and sending it to the virus checker. If OK
> replay the spooled reply to the rest of the code to resume the
> processing.

No it doesn't require full downloading :-]
iCAP defines a preview (I think it's 4kb), from which the virus scanner can request the rest of the object via a Continue response,
or indicate that for that object it's seen enough, carry on.
So we need to buffer ~4K between the origin and the store. Then we either release the stream into the store, or loop it through the
virus scanner without spooling it to store at all (from memory iCAP considered this issue carefully: if the iCAP server requests the
whole object it must return the whole object to us. If no changes are made we MAY return our previously cached object - whether to
spool the intermediary becomes a load/performance choice for us)

> Neither of these two needs to make the overall framework more complex.
> The scanning most likely needs some extensions to be manageable and
> allow the user to abort the download somehow. HTTP (not even 1.1) is
> well designed for long delays in data processing.

Again, that has been fairly well considered in the particular spec I am interested in - iCAP (see the IETF OPES working group). I am
interested in more than iCAP, for example in-process data modification (like the gif animation breaking code), data stream protocol
layer access rules on responses (ie break open CONNECT links when they are not SSL). Some of those are covered by iCAP too..

Rob
Received on Sat Jan 13 2001 - 15:52:57 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:18 MST