Re: [squid-users] Ahead Caching

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 05 Oct 2010 22:40:30 +0000

On Tue, 5 Oct 2010 16:16:06 +0300, Isaac Witmer <isaaclw_at_gmail.com> wrote:
> How would you do it?
> with wget, the only way of having it crawl through websites, is to
> recurse... isn't it?

For the wget command line yes. But that is not the only piece involved.

You need to make two access_log entries for Squid. One that pipes only the
client requests to the wget script and one that records only the wget
requests. see http://www.squid-cache.org/Doc/config/access_log for details
on log ACLs.

This method give you two very important effects;
 1) wget requests are not passed to wget to fetch twice or more...
 2) you have a logs to compare the bandwidth consumption of wget vs
non-wget traffic

what I think you will find is that the pre-caching wastes more bandwidth
overall than non pre-caching, possibly to the point of slowing access for
the real requests. It is also useless on Web2.0 websites with ajax etc.

NP: you may want to write a logging daemon to do all this instead of
tailing the access log. That will give you log data in real-time without
any problems during rotation/reconfigure/restart.

Amos

>
> I tried screwing around, and the best I came up with was this:
>
>>#!/bin/bash
>>log="/var/log/squid3/access.log"
>>
>>while (true); do
>> echo "reading started: `date`, log file: $log"
>> sudo tail -n 80 $log | grep -P "/200 [0-9]+ GET" | grep "text/html"
|
>> awk '{print $7}' | wget -q -rp -nd -l 1 --delete-after -i -
>> sleep 5
>> echo
>>done
>
>
> It's not so clean...
>
> On Tue, Oct 5, 2010 at 11:51 AM, John Doe <jdmls_at_yahoo.com> wrote:
>>
>> From: flaviane athayde <flavianeathayde_at_gmail.com>
>>
>> > I try to put a shell script that read the Squid log, and use it to
>> > run
>> > wget with "-r -l1 -p" flag, but it also get its on pages, making a
>> > infinit loop, and I can't resolve it.
>>
>> Why recurse?
>> If you take your list from the log files, you will get all accessed
files
>> already... no?
>>
>> JD
>>
>>
>>
Received on Tue Oct 05 2010 - 22:40:37 MDT

This archive was generated by hypermail 2.2.0 : Wed Oct 06 2010 - 12:00:02 MDT