Re: Data --> object store --> client from Robert Collins on 2002-08-09 (squid-dev)

From: Robert Collins <robertc@dont-contact.us>
Date: 09 Aug 2002 22:51:41 +1000

On Fri, 2002-08-09 at 23:35, Joey Coco wrote:
> Thats kinda what I figured, that it was generic for that reason alone. I
> imagine a 50 meg file would cause a problem if it didn't throttle every
> little bit at a time. But smaller content-types of text/html shouldn't
> cause problems. Too bad the size of the request cannot really be
> detected..

It's a point that comes into discussion quite often. I've answered it in
this list before, for more detail, have a look in the archives (I'm not
sure where though :} )

> I'd like to be able to scan the entire html contents of the request before
> it is sent to the client, but not care about large media objects..

Ok, there are two scenarios here:
1) Read only, you are generating statistics or something similar.
2) Read/write, you are inserting/deleting/altering the content.

I'm going to address 1) now, 2) is much harder (although some work is
underway now that will make developing 2) patchs much easier in the
future).

There is no generic way to do 1) in HEAD (but there may be soon).

What you need to do for 1) is:
* Hook into clientSendMoreData , in two places:
  a) where the if (http->out.offset != 0) {
  b) where the assert(rep || (body_buf && body_size));
  statements appear.
  at these points call a non blocking function that parses the data
looking for *whatever*.
* Add context to clientHttpRequest to store your parser state. I
*strongly* recommend adding a pointer to a private state object, rather
than adding multiple fields to the clientHttpRequest. This allows a
couple of things:
  1) check MIME types via an access list and only parse appropriate
replys (set the parser context to NULL and skip the parser calls).
  2) the parser is now independent of the client side code and can be
reused.
* use a parser that parses to the end of the current stream, saves its
context and returns. Then you simply push data at it, and it will
seamless carry on. There are such parsers around, and it's not that hard
to build a simple, fairly efficient one. The important point is that
functions like strstr are *NOT* suited to manipulating streams of data,
which is what squid deals with.

Rob

application/pgp-signature attachment: This is a digitally signed message part

Received on Fri Aug 09 2002 - 06:51:46 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:16:02 MST