Re: [squid-users] Caching Expired Objects - One Small Step Forward

From: Solomon Asare <solomonasare@dont-contact.us>
Date: Sun, 7 Oct 2007 07:02:23 -0700 (PDT)

Hi All,
the long skelatel howto:

THE ENVIROMENT.
Ubuntu, Squid 2.6, Apache 2.2, a small ISP with about
11% www.youtube.com traffic, 2Mbps on submarine fibre
to the US cost about US$5,000 - US$7,000 over here -
Accra, Ghana.

THE PROBLEM
Some Youtube clips are not cacheable, even with the
overides provided for in squid2.6. The objects too are
already expired so if you are able to cache them (such
as using Squid3), you are still unable to get HITs.
Moreover, youtube serves the same video from several
servers with different urls, so when it is cached and
not expired, you still may not get a HIT.

PROPOSED SOLUTION, BRIEF:
I use a non-cacheable Apache proxy on the same machine
as the squid, as a parent to the squid for youtube
videos. The apache proxy then sanitises the youtube
clips so that squid can cache them. Using the video ID
as a key, I then use jesred to rewrite the urls for
youtube video objects that are already in my squid
cache so that I will have a HIT.

1 SQUID
i. Modify squid.conf to Cache Youtube Clips.
acl utubevids urlpath_regex get_video\?video
cache allow utubevids
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY

ii. Modify squid.conf to handle the size of video
clips.
maximum_object_size 128 MB
cache_dir ufs /var/spool/squid 70000 16 256

iii. Modify squid.conf to redirect youtube videos for
url_rewriting.
url_rewrite_program /usr/lib/squid/jesred
acl utuberedir urlpath_regex get\_video\?video\_id\=.*
url_rewrite_access allow utuberedir

iv. Modify squid.conf to use a parent proxy for
youtube videos.
cache_peer 192.168.0.20 parent 8000 3130 no-query
default no-netdb-exchange no-digest
acl utubevidsdirect urlpath_regex
get\_video\?video\_id\=.*
always_direct allow !utubevidsdirect

v. Modify refresh_pattern in squid.conf for youtube
video clips.
refresh_pattern -i
get\_video\?video\_id\=.*youtube\.com 10080 990%
999999 reload-into-ims ignore-no-cache override-expire
ignore-private
refresh_pattern -i
youtube\.com\/get\_video\?video\_id\=.* 10080 990%
999999 reload-into-ims ignore-no-cache override-expire
ignore-private

You could merge the two into one by ignoring the
youtube.com. I am not sure if other sites don't use
the same syntax so I kept it there.

2 APACHE
i. Activate the following modules:
cd /etc/apache2/mods-enabled
ln -s ../mods-available/expires.load
ln -s ../mods-available/headers.load
ln -s ../mods-available/proxy.conf
ln -s ../mods-available/proxy_http.load
ln -s ../mods-available/proxy.load

ii. Configure Apache proxy
(/etc/apache2/mods-enabled/proxy.conf)
<IfModule mod_proxy.c>
        ProxyRequests On
        <Proxy *>
                AddDefaultCharset off
                Order deny,allow
                Deny from all
                Allow from 192.168.0
                Allow from 127.0.0
                Allow from 127.0.1
        </Proxy>
        ProxyVia On
        Header unset Cache-Control
        Header unset Expires
        Header unset Pragma
</IfModule>

iii Configure Apache Proxy Port in
/etc/apache2/ports.conf
Listen 8000

3 CONFIGURE THE JESRED REDIRECTOR (url_rewriter).

i. Configure /etc/jesred.conf
allow = /etc/jesred.acl
rules = /etc/jesred.youtube.rules

ii. Configure /etc/jesred.acl
0.0.0.0/0

iii. Configure /etc/jesred.youtube.rules
touch /etc/jesred.youtube.rules

A typical line in your /etc/jesred.youtube.rules
produced by the script below will look like:
regex get\_video\?video\_id\=yDC9iJyTUmc.*
http://74.125.10.23/get_video?video_id=yDC9iJyTUmc&origin=ash-v57.ash.youtube.com

4 SCRIPT TO UPDATE /etc/jesred.youtube.rules kept in
/usr/local/bin/buildrules
#!/bin/bash
cat /etc/jesred.youtube.rules |grep -v ^# |grep -v ^$
|grep 'regex get\\_video\\?video\\_id\\=' >
/tmp/jesred/jesred.rules.2.tmp
cat /etc/jesred.youtube.rules |grep -v 'regex
get\\_video\\?video\\_id\\=' >
/tmp/jesred/jesred.rules.1.tmp
cat /var/log/squid/store.log |grep youtube.com |grep
flv |grep SWAPOUT |awk '{split($13,idarray,"=");
split(idarray[2],idarr,"&"); print ("regex
get\_video\?video\_id\="idarr[1]".*", $13);}' >>
/tmp/jesred/jesred.rules.2.tmp
cat /tmp/jesred/jesred.rules.2.tmp |sort -u >
/tmp/jesred/jesred.rules
cp /etc/jesred.youtube.rules
/tmp/jesred/jesred.youtube.rules.old
mv /tmp/jesred/jesred.rules.1.tmp
/etc/jesred.youtube.rules
cat /tmp/jesred/jesred.rules >>
/etc/jesred.youtube.rules
rm /tmp/jesred/jesred.rules
rm /tmp/jesred/jesred.rules.2.tmp
squid -k rotate

5. ENTRY IN CRONTAB with crontab -e to update my
redirector rules on the 13th minute of every hour.
13 * * * * /usr/local/bin/buildrules

6. RESULTS
My Byte Hitrate for *.youtube.com has climbed from
about 3% to 30% over 8 days. I expect it to rise as
the number of youtube video objects in the cache
increases from a few hundreds to several thousands
over the next several months.

7.TODO
i. Update the script to expunge regexp rules for
expired objects.
ii. House Cleaning (which may never get done).
iii. A howto to share.
iv. Should the script be real time using tail -F? I
like watching videos a few more times after the first
view, except that most of the time I watch them from
the browser's cache.

8. COMMENTS
This is only one of many ways that this goal can be
achieved and certailnly not the best being a non-guru,
although a determined linux user. I guess the features
I couldn't find in squid2.6-stable that made me add
apache may be included in later stable releases to
make this irrelevant.

Unfortunately, I did not keep any logs whilst doing
this so I amy have skipped a few steps. If I have, it
will show sooner or later. I have tried to put
together as much of the info that I think someone
might need. The squid mailing list proved very
helpful, and I am very grateful. There are many on the
list prepared to help, although you may come accross a
few who will repeatedly tell you how easy it is to do
what you want to do without sharing how. Don't dispare
if you bump into them.

Regards,
solomon.
Received on Sun Oct 07 2007 - 08:02:30 MDT

This archive was generated by hypermail pre-2.1.9 : Thu Nov 01 2007 - 13:00:01 MDT