Re: [squid-users] Caching Web sites

From: Andrew Reid <andrew.reid@dont-contact.us>
Date: Tue, 11 Dec 2001 15:38:20 +1030

On Mon, Dec 10, 2001 at 08:54:08PM -0800, sean.upton@uniontrib.com wrote:

> Or, the answer would be have a redirector write the redirected (cached) URLs
> to a database, which can be queried for unique values by a script automating
> wget. Or use the redirector to fork off wget to preemtively follow links.
> Serves the same end. A script based redirector like Pyredir should be easy
> enough to hack to do this.

Is there an echo in here? You've basically expanded on what I said
here, elaborating on the specifics of the approach.

> > I'd guess that you could either implement some sort of redirection
> > (such as squidGuard) that redirects users to locally cached copies of
> > the data.

However, as I said in the following paragraph, it's probably not the
best way to go about it.

> > .. but that's not the ideal way of doing it. It would be good if it
> > was cached to the Squid cache store. However, many objects would
> > probably expire (or worse still, never make it to the cache) before
> > they were able to be used.

Moreover, calling things like wget to do the work for Squid is messy,
especially as Squid has internal functions for retrieving and caching
objects (that's what it does, remember?).

What's probably required is a module or add-on to Squid which
maintains its own internal database of frequently used sites. Part of
this functionality should include an automated daemon that
periodically updates that list of so that it always has a fresh
copy. (eg, when an object expires in the cache that is marked as
"Frequently Used", the daemon must retrieve a fresh version of the
object.

That even has issues to think about, possibly able to be worked around
through configuration file options and such. What happens if an object
that is marked as "Frequently used" expires every 5 minutes? You'll be
getting a fresh copy so regularly it may have an impact on your
bandwidth bill.

I'd be interested in having a play with the concept, possibly comming
up with a prototype and going from there. There are some fairly large
issues which need to be explored first, though.

   - andrew

-- 
Andrew J. Reid                    "Catapultam habeo. Nisi pecuniam omnem  
andrew.reid@plug.cx               mihi dabis, ad caput tuum saxum immane 
+61 401 946 813                   mittam"                                
Received on Mon Dec 10 2001 - 22:09:34 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:05:18 MST