Re: Implementing a new caching algorithm.

From: Nick Lewycky <nicholas@dont-contact.us>
Date: Fri, 11 Mar 2005 20:17:57 -0500

Chris Mai wrote:
> No I dont think that we are actually analyzing the HTML that is
> retrieved. What we are doing is accessing the access.log file and to
> create a prediction model, so that if the user most recent request
> history matches a path in the prediction model then we will try and
> predict the next page that will be requested. For example, if after
> analyzing the access.log file one of the paths in the prediction model
> is a,b,c,d and the users most recent request history consist of a, b,
> c, we will fetch d and store it in the cache expecting the user to
> request d next. So I dont think we would be able to give you a hand.
> Sorry. But if it is not too much trouble for you, would you be able to
> help us, by letting us know where the request for the page is located
> as well as where the page returned is processed.

Ok, I wasn't sure as your description lacked details.

Assuming you start with Squid 3 (still in development, CVS) then you
need to deal with ClientStreams. These are horribly underdocumented[1],
but you can follow the example from the ESI code. (Or my code[2], so far
as it works.)

When the data comes in from the origin server, a chain of observers is
called to process the data. You need to implement four functions:
bufferData, streamRead, streamDetach and streamStatus.[3] Then you
install your stream node in client_side_reply.cc:1926 (right above the
"#if ESI" line.)

You will need to implement two client stream nodes, one to process the
returned page, and one to fetch with.

I expect that you'll run into the same problems that I do. My page
fetching is implemented in PrefetchStream.cc. You'll need your own page
fetcher; feel free to copy mine, and send me patches for any bugs you
might fix.

Squid isn't smart enough to collapse incoming requests. So if you go out
and fetch URL X, then a client request comes in for URL X before the
origin server responds, squid will place a second request on the origin
server. It will wait for that second one to complete before sending the
data back to the client. I haven't dealt with this yet and we'll both
need a solution. There is a branch called "collapsed forwarding"[4], but
it currently only applies to accelerator setups. It could perhaps be
adapted.

Good luck.

Nick Lewycky

[1] - http://squidwiki.kinkie.it/squidwiki/ClientStreams
[2] - http://devel.squid-cache.org/cgi-bin/diff/prefetching
[3] - http://www.squid-cache.org/Doc/Prog-Guide/prog-guide-8.html#ss8.3
[4] - http://devel.squid-cache.org/projects.html#collapsed_forwarding
Received on Fri Mar 11 2005 - 18:46:12 MST

This archive was generated by hypermail pre-2.1.9 : Fri Apr 01 2005 - 12:00:04 MST