Re: "lookahead caching" from Joe Cooper on 2001-03-05 (squid-dev)

From: Joe Cooper <joe@dont-contact.us>
Date: Mon, 05 Mar 2001 10:50:29 -0600

Yes it's been discussed at length on this very mailing list about 9
months ago, maybe? A search of the archives will probably turn it up.
All of the potential problems have been hashed out, or at least touched
on...and the Squid developers were generally clear on where they stand
on the issue (they aren't going to do it, as there are better areas for
their work).

That being said, Moez has written an html parsing content filter (based
on Robert's new module framework)..the next step for that parser will be
a parsing pre-fetch module. However, that's probably a good ways off.
We have other tasks to tackle first, and bugs to be worked out of what
we're already doing. (And I have to find the funding for Moez's
continuing work. ;-)

The target for his work will likely be pre-fetching of images and
content that is contained within the current page. This would allow the
cache to load everything that the client will soon be requesting
(because browsers only open 4 connections at once, usually). Further,
it makes things like satellite bandwidth really nicely usable for web
browsing.

The idea of pre-fetching everything linked from a page is almost
nightmarish in it's bandwidth eating proportions. I have no intention
of moving in that direction, as it would only be useful for a single
client cache and even then it is questionable (and probably very
upsetting to origin server admins). We don't sell single user web
caches, so we're not working on features for those kinds of purposes.
It's probably possible to add a lot of smarts to the module to choose
what gets loaded and what doesn't, and to limit the number of links to
pull. But that's big work.

You're welcome to take a look at Moez's module (I think Robert has
included it into his modules CVS tree, am I right Robert? If not, would
we be able to add it there, as another 'working module example'?). It
already does some limited form of parsing of the html coming through
Squid, using a subset of the functions in libhtmlparse. I don't think
anyone would complain if you came up with a derivative module that does
what you invision. (Though I would strongly suggest just pre-fetching
the images on the current page, rather than pre-fetching every link on
the page--imagine a page of bookmarks...things get ugly real fast).

BTW-Cacheflow systems have a parsing pre-fetch like the one I've
described wherein all content on the current page is loaded loaded at once.

Questions answered?

Brian Szymanski wrote:

> hi,
>
> i was wondering if anyone's ever considered the idea of adding
> "lookahead caching" to squid. that is, for speed (definitely not
> bandwidth reduction purposes), do something like the following:
>
> whenever a new page is added to the cache, try to download any page
> linked from that page and put it in a (smaller) different cache called
> the transient cache. when a page in the transient cache is viewed, it
> moves to the stable cache. the stable cache is managed in squid's
> standard fashion. the transient cache can be managed in an LRU fashion
> indexed on the date that this webpage's (most recent) parent was viewed.
>
> that way, the typical users experience of clicking on some page, reading
> for awhile, and then clicking on one of the links will be greatly
> accelerated. this idea is basically what a product called peak net.jet
> was doing a couple of years back (it doesn't look like they're still in
> business though), but only with netscape's builtin cache.
>
> so my question are as follows:
> would this be difficult to implement?
> would it violate netiquette to implement this? (imagine a single
> user on a cable modem sucking around 10 times as much bandwidth on the
> net)
> any other thoughts?
>
> thanks for reading...
> brian szymanski
> bks10@cornell.edu

                                   --
                      Joe Cooper <joe@swelltech.com>
                  Affordable Web Caching Proxy Appliances
                         http://www.swelltech.com
Received on Mon Mar 05 2001 - 10:38:40 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:36 MST