Re: "lookahead caching" from Robert Collins on 2001-03-05 (squid-dev)

From: Robert Collins <robert.collins@dont-contact.us>
Date: Tue, 6 Mar 2001 08:33:11 +1100

----- Original Message -----
From: "Joe Cooper" <joe@swelltech.com>
To: "Brian Szymanski" <bks10@cornell.edu>
Cc: <squid-dev@squid-cache.org>
Sent: Tuesday, March 06, 2001 3:50 AM
Subject: Re: "lookahead caching"

> Yes it's been discussed at length on this very mailing list about 9
> months ago, maybe? A search of the archives will probably turn it up.
> All of the potential problems have been hashed out, or at least
touched
> on...and the Squid developers were generally clear on where they stand
> on the issue (they aren't going to do it, as there are better areas
for
> their work).
>
> That being said, Moez has written an html parsing content filter
(based
> on Robert's new module framework)..the next step for that parser will
be
> a parsing pre-fetch module. However, that's probably a good ways off.
> We have other tasks to tackle first, and bugs to be worked out of what
> we're already doing. (And I have to find the funding for Moez's
> continuing work. ;-)

And I need to find the time to implement the intermediary 'module' in
squid - sits between server side protocols, client side protocols and
the object store. The functionality for it today is spread around a half
dozen files and lacks a consistent API. This is needed to make the
filter code robust & reliable. (I.E. deal with range request & content
processing or ragnge requests & transfer encoding.

>
> The target for his work will likely be pre-fetching of images and
> content that is contained within the current page. This would allow
the
> cache to load everything that the client will soon be requesting
> (because browsers only open 4 connections at once, usually). Further,
> it makes things like satellite bandwidth really nicely usable for web
> browsing.
>
> The idea of pre-fetching everything linked from a page is almost
> nightmarish in it's bandwidth eating proportions. I have no intention
> of moving in that direction, as it would only be useful for a single
> client cache and even then it is questionable (and probably very
> upsetting to origin server admins). We don't sell single user web
> caches, so we're not working on features for those kinds of purposes.
> It's probably possible to add a lot of smarts to the module to choose
> what gets loaded and what doesn't, and to limit the number of links to
> pull. But that's big work.

Or perhaps have a fifo queue per cache client (ip, username couples)
with requests, and pull them out of there in groups of 8 or 10?

> You're welcome to take a look at Moez's module (I think Robert has
> included it into his modules CVS tree, am I right Robert? If not,
would
> we be able to add it there, as another 'working module example'?). It
> already does some limited form of parsing of the html coming through
> Squid, using a subset of the functions in libhtmlparse. I don't think
> anyone would complain if you came up with a derivative module that
does
> what you invision. (Though I would strongly suggest just pre-fetching
> the images on the current page, rather than pre-fetching every link on
> the page--imagine a page of bookmarks...things get ugly real fast).

the htmldemo module is in the CVS branch rbcollins_filters. There are
four odd filters there - a straight string replacement one, a spy
(counts traffic to the log), the htmldemo from Moez, and uh, uh yeah.
Anyway I don't think I've seen Moez's main module but I'd be happy to
have it sit in the same branch.

<SNIP>
Rob
Received on Mon Mar 05 2001 - 14:29:49 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:36 MST