Re: squid cache acceleration from Robert Collins on 2003-04-25 (squid-dev)

From: Robert Collins <robertc@dont-contact.us>
Date: 26 Apr 2003 08:26:42 +1000

On Fri, 2003-04-25 at 18:49, atit_ldce wrote:
> Hello Squid developer,
> I am using squid 2.5 stable1 on redhat linux 7.1
> can any one tell me does squid provides object prefetching facility?
> my concern is as follow:
> squid get request GET www.yahoo.com from client X
> now i want squid to fetch all other web object at home page of yahoo
> site into memory [ either going to fetch from disk or from networlk]
> even before that objects are requested..
> this feature allows to have better hit ratio and lower median service
> time , thus overall improved response time.
>
> does squid support this?

No. It's been discussed several times, but AFAIK no-one has implemented
it - mainly due to the theoretical issues with it.

> if so how?
> if not on which part of squdi code i should concentrate to add this
> feature...

You'll want to hook into the data stream, perhaps in the store, or else
in client_side, to parse the HTML stream. Then you'll need to create a
list of objects to retrieve, with headers *exactly* matching those the
client will use, and then spawn your pre-fetch requests. Note that
you'll want to limit the rate of the pre-fetch to a certain number of
parallel requests, and you'll want to stop pre-fetching once the client
is active. You'll also want to abort non-cachable replies.

Lastly, here is my recollection of the issues relating to this:
* Varying objects: You won't save any time grabbing a client-negotiated
object if your headers result in the server sending a different entity..
(I.e. getting the English version of a page when the client decides to
ask for Spanish). In fact, you will increase your bandwidth
requirements, and decrease your hit rate. How common will this be? I
don't know - an interesting research point.
* Limiting excess overhead: A number of URL's in documents are never
fetched: A page with 5 images and 50 links should not result in 55
pre-fetch requests. So your heuristic on what to retrieve will be very
important. And - it's not trivial.
* Dynamic URL's. You will need a full DOM emulator in squid to calculate
every dynamic URL if you choose to prefetch them.
* Parallelism: The client will be requesting these same objects
fractions of second later. If the headers for your request's response
haven't been parsed and found cachable by the time the client request's
response arrives, then the requests won't be joined, and you'll download
objects twice.
* Overhead: You will be making squid perform much more work, raising the
hardware requirements to support a given link: If the link is more
easily expanded, then you'll have a net lose with prefetching.

Uhm, thats from memory :}.

Rob

-- 
GPG key available at: <http://users.bigpond.net.au/robertc/keys.txt>.

application/pgp-signature attachment: This is a digitally signed message part

Received on Fri Apr 25 2003 - 16:27:27 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:19:42 MST