Yes it's been discussed at length on this very mailing list about 9 
months ago, maybe?  A search of the archives will probably turn it up. 
All of the potential problems have been hashed out, or at least touched 
on...and the Squid developers were generally clear on where they stand 
on the issue (they aren't going to do it, as there are better areas for 
their work).
That being said, Moez has written an html parsing content filter (based 
on Robert's new module framework)..the next step for that parser will be 
a parsing pre-fetch module.  However, that's probably a good ways off. 
We have other tasks to tackle first, and bugs to be worked out of what 
we're already doing.  (And I have to find the funding for Moez's 
continuing work. ;-)
The target for his work will likely be pre-fetching of images and 
content that is contained within the current page.  This would allow the 
cache to load everything that the client will soon be requesting 
(because browsers only open 4 connections at once, usually).  Further, 
it makes things like satellite bandwidth really nicely usable for web 
browsing.
The idea of pre-fetching everything linked from a page is almost 
nightmarish in it's bandwidth eating proportions.  I have no intention 
of moving in that direction, as it would only be useful for a single 
client cache and even then it is questionable (and probably very 
upsetting to origin server admins).  We don't sell single user web 
caches, so we're not working on features for those kinds of purposes. 
It's probably possible to add a lot of smarts to the module to choose 
what gets loaded and what doesn't, and to limit the number of links to 
pull.  But that's big work.
You're welcome to take a look at Moez's module (I think Robert has 
included it into his modules CVS tree, am I right Robert?  If not, would 
we be able to add it there, as another 'working module example'?).  It 
already does some limited form of parsing of the html coming through 
Squid, using a subset of the functions in libhtmlparse.  I don't think 
anyone would complain if you came up with a derivative module that does 
what you invision.  (Though I would strongly suggest just pre-fetching 
the images on the current page, rather than pre-fetching every link on 
the page--imagine a page of bookmarks...things get ugly real fast).
BTW-Cacheflow systems have a parsing pre-fetch like the one I've 
described wherein all content on the current page is loaded loaded at once.
Questions answered?
Brian Szymanski wrote:
> hi,
> 
> i was wondering if anyone's ever considered the idea of adding
> "lookahead caching" to squid. that is, for speed (definitely not
> bandwidth reduction purposes), do something like the following:
> 
> whenever a new page is added to the cache, try to download any page
> linked from that page and put it in a (smaller) different cache called
> the transient cache. when a page in the transient cache is viewed, it
> moves to the stable cache. the stable cache is managed in squid's
> standard fashion. the transient cache can be managed in an LRU fashion
> indexed on the date that this webpage's (most recent) parent was viewed.
> 
> that way, the typical users experience of clicking on some page, reading
> for awhile, and then clicking on one of the links will be greatly
> accelerated. this idea is basically what a product called peak net.jet
> was doing a couple of years back (it doesn't look like they're still in
> business though), but only with netscape's builtin cache.
> 
> so my question are as follows:
>     would this be difficult to implement?
>     would it violate netiquette to implement this? (imagine a single
> user on a cable modem sucking around 10 times as much bandwidth on the
> net)
>     any other thoughts?
> 
> thanks for reading...
> brian szymanski
> bks10@cornell.edu
                                   --
                      Joe Cooper <joe@swelltech.com>
                  Affordable Web Caching Proxy Appliances
                         http://www.swelltech.com
Received on Mon Mar 05 2001 - 10:38:40 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:13:36 MST