Re: [squid-users] Peering caches (squid and 3rd parties) - How to

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Tue, 11 Jun 2013 23:49:58 +0300

On 6/11/2013 11:24 PM, Guillermo Javier Nardoni - Grupo GERYON wrote:
> Hello everyone,
>
> We have this situation and we tried a lot of configurations without success.
>
> • 1000 Customers
> • 4 Caches BOX running Squid 2.7 on Debian Squeeze • Caches are full-meshed
> to each other • Every Squid is running in transparent mode (http_port 3128
> transparent) • Every Squid is running HAARPCACHE on localhost at port 8080
> (HAARPCACHE is a Thundercache 3.1 fork wich Works PERFECT for caching sites
> like youtube with lots of HITS) .
> • Every Squid is connected to Internet through RB1 • RB2 (Mikrotik RouterOS)
> is doing round-robin selection on every squid redirecting all trafic to port
> 80 to internet to port 3128 on squid
>
> cat /etc/haarp/haarp.lst
> root_at_cpe-58-1-26-172:/etc/haarp# cat /etc/haarp/haarp.lst
> http.*\.4shared\.com.*(\.exe|\.iso|\.torrent|\.zip|\.rar|\.pdf|\.doc|\.tar|\
> .mp3|\.mp4|\.avi|\.wmv)
> http.*\.avast\.com.*(\.def|\.vpu|\.vpaa|\.stamp)
> http.*(\.avg\.com|\.grisoft\.com|\.grisoft\.cz).*(\.bin|\.exe)
> http.*(\.avgate\.com|\.avgate\.net|\.freeav\.net|\.freeav\.com).*(\.gz)
> http.*\.bitgravity\.com.*(\.flv\.mp4)
> http.*\.etrustdownloads\.ca\.com.*(\.tar|\.zip|\.exe|\.pkg)
> http.*flashvideo\.globo\.com.*(\.mp4|\.flv)
> http.{1,4}vsh\.r7\.com\/.*(\.mp4)$
> 74\.125\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[
> 0-9][0-9]?)
> #http.*\.googlevideo\.com.*videoplayback
> #http.*fpatch\.grandchase\.com\.br.*(\.kom|\.mkom|\.mp3)
> http.*(\.kaspersky-labs\.com|\.geo\.kaspersky\.com|kasperskyusa\.com).*(\.av
> c|\.kdc|\.klz|\.bz2|\.dat|\.dif)
> #http.*\.mccont\.com.*\.flv
> http.*\.metacafe\.com.*\.flv
> http.{1,4}media\w*\.justin.tv\/archives\/(\w|\/|-)*\.flv(\?.*|$)
> http.{1,4}\w*juegos\w*\.juegosdiarios\.com\/(\w|\/|-)*\.swf$
> http.{1,4}\w*\.juegosjuegos\.com\/games(\w|\/|-)*\.swf$
> ##http.*(\.windowsupdate\.com|(\.microsoft\.com)).*(\.cab|\.exe|\.iso|\.zip|
> \.psf)
> http.*(\.windowsupdate\.com|(update|download|dlservice|windowsupdate)\.micro
> soft\.com)\/.*(\.cab|\.exe|\.iso|\.zip|\.psf|\.txt|\.crt)$
> http.*\.pornotube\.com.*\.flv
> http.*\.terra\.com.*\.flv
> #http.*uol\.com\.br.*\.flv
> http.*\.viddler\.com.*\.flv
> #http.*\.video\.msn\.com.*\.flv
> http.*(porn|img).*\.xvideos\.com\/videos\/(thumbs\/)?.*(\.jpg|\.flv\?.*|\.mp
> 4\?.*)$
> http.*\.youtube\.com.*videoplayback\?
> http.*\.ziddu\.com.*(\.exe|\.iso|\.torrent|\.zip|\.rar|\.pdf|\.doc|\.tar|\.m
> p3|\.mp4|\.avi|\.wmv)
> http.*edgecastcdn\.net/.*(\.mp4|\.flv)
> http.*adobe\.com/.*(\.cab|\.aup|\.exe|\.msi|\.upd|\.msp)
> http.*\.eset\.com.*\.nup
> http.*\.nai\.com.*(\.zip|\.tar|\.exe|\.gem)
> http.*\.pop6\.com.*(\.flv)
> http.*\.symantecliveupdate\.com.*(\.zip|\.exe)
> #http.*\.xpg\.com\.br.*
> http.{1,4}\w*\.ytimg\.com.*(hqdefault(\.jpg|\.mp4)$|M[0-9]+\.jpg\?sigh=)
> http.{1,4}\w*google(\.\w|\w)*\.doubleclick\.net\/pagead\/ads\?.*
> http.*img[0-9]\.submanga\.com\/(hd)?pages\/.*(\.jpg|\.webp)
> http.*(profile|s?photos|video).{0,5}\.ak\.fbcdn\.net\/.*(\.mp4\?.*|\_[a-z]\.
> jpg$|\.mp4$|\_[a-z]\.png$)
> #http.*(profile|s?photos|video).{0,5}\.ak\.fbcdn\.net\/.*(\.mp4\?.*|\_n\.jpg
> $|\.mp4$|\_n\.png$)
> http.*\.video\.pornhub\.\w*\.com\/videos\/.*\.flv\?.*
> http.*\.(publicvideo|publicphoto)\.xtube\.com\/(videowall\/)?videos?\/.*(\.f
> lv\?.*|\_Thumb\.flv$)
> http.*public\.tube8\.com\/.*\.mp4.*
> http.*videos\..*\.redtubefiles\.com\/.*\.flv
> (205\.196\.|199\.91\.)[0-9]{2,3}\.[0-9]{1,3}\/.*
> #http.*\.rapidshare\.com\/cgi-bin\/.*\.cgi\?.*sub=download
> http.*\.vimeo.com\/.*\.mp4(\?.*)?$
> http.*images\.orkut\.com\/orkut\/photos\/.*\.jpg$
> http.{1,4}(\w|\/|\.|-)*media\.tumblr\.com\/(\w|\/|-|\.)*tumblr(\w|\/|-)*(\.p
> ng|\.jpg)$
> #http.{1,7}speedtest(\w|-)*(\.|\w)+\/speedtest\/(random.*\.jpg|latency\.txt)
> \?.*
> #http.{1,10}testdevelocidad.{1,5}\/speedtest\/(random.*\.jpg|latency\.txt)\?
> .*
> #http.{1,7}(\.|[a-z]|[0-9]|-)+(\/\w+)?(\/speedtest)+\/(random[0-9]+x[0-9]+\.
> jpg|latency\.txt)
>
> As you can well see!, youtube and many others sites is cachings its content
> through HAARPCACHE and not by squid itself. BTW It Works GREAT.
>
> Configuration on every squid.conf at /etc/squid
>
> Proxy1:
> IP: 192.168.1.1
>
> cache_peer 192.168.2.1 sibling 3128 3130
> proxy-only cache_peer 192.168.3.1 sibling 3128
> 3130 proxy-only cache_peer 192.168.4.1 sibling
> 3128 3130 proxy-only
>
> acl haarp_lst url_regex -i "/etc/haarp/haarp.lst"
> cache deny haarp_lst
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-digest dead_peer_timeout 2
> seconds cache_peer_access 127.0.0.1 allow haarp_lst cache_peer_access
> 127.0.0.1 deny all
>
>
> Proxy2:
> IP: 192.168.2.1
>
> cache_peer 192.168.1.1 sibling 3128 3130
> proxy-only cache_peer 192.168.3.1 sibling 3128
> 3130 proxy-only cache_peer 192.168.4.1 sibling
> 3128 3130 proxy-only
>
> acl haarp_lst url_regex -i "/etc/haarp/haarp.lst"
> cache deny haarp_lst
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-digest dead_peer_timeout 2
> seconds cache_peer_access 127.0.0.1 allow haarp_lst cache_peer_access
> 127.0.0.1 deny all
>
> Proxy3:
> IP: 192.168.3.1
>
> cache_peer 192.168.2.1 sibling 3128 3130
> proxy-only cache_peer 192.168.1.1 sibling 3128
> 3130 proxy-only cache_peer 192.168.4.1 sibling
> 3128 3130 proxy-only
>
> acl haarp_lst url_regex -i "/etc/haarp/haarp.lst"
> cache deny haarp_lst
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-digest dead_peer_timeout 2
> seconds cache_peer_access 127.0.0.1 allow haarp_lst cache_peer_access
> 127.0.0.1 deny all
>
> Proxy4:
> IP: 192.168.4.1
>
> cache_peer 192.168.2.1 sibling 3128 3130
> proxy-only cache_peer 192.168.3.1 sibling 3128
> 3130 proxy-only cache_peer 192.168.1.1 sibling
> 3128 3130 proxy-only
>
> acl haarp_lst url_regex -i "/etc/haarp/haarp.lst"
> cache deny haarp_lst
> cache_peer 127.0.0.1 parent 8080 0 proxy-only no-digest dead_peer_timeout 2
> seconds cache_peer_access 127.0.0.1 allow haarp_lst cache_peer_access
> 127.0.0.1 deny all
>
>
>
>
> Everything “Works” fine when you browse sites but those who MUST go through
> HAARPCACHE PEER don’t.
> Let’s picture that.
>
> Client 1 ask for http://www.youtube.com/watch?v=juqyzgnbspY . RB2, through
> its round-robin selection, redirect this petition to Proxy1
>
> Proxy1 accept this connection and according to “acl haarp_lst”, it goes to
> cache_peer 127.0.0.1.
>
> Proxy1 -> Peer 127.0.0.1:
> • Is http://www.youtube.com/watch?v=juqyzgnbspY on local cache?.
> o If YES: RETURN HIT with the FILE
> o If NO: RETURN MISS, download the file from Internet, save it to disk and
> serve the file.
>
>
>
> Client 80 ask for http://www.youtube.com/watch?v=juqyzgnbspY . RB2, through
> its round-robin selection, redirect this petition to Proxy3
>
> Proxy3 accept this connection and according to “acl haarp_lst”, it goes to
> cache_peer 127.0.0.1.
>
> Proxy3 -> Peer 127.0.0.1:
> • Is http://www.youtube.com/watch?v=juqyzgnbspY on local cache?.
> o If YES: RETURN HIT with the FILE
> o If NO: RETURN MISS, download the file from Internet, save it to disk and
> serve the file.
>
>
> As you can see, the same file is downloaded twice (at least) if the petition
> is not redirected to the same cache box.
> How can I achieve the goal to ask every cache and if the file is cached on
> any sibling or parent it shouldn’t be downloaded from internet but the cache
> itself.
>
> Note 1: I can run HAARPCACHE on 0.0.0.0/0 if this is a solution.
>
> The schematic bellow shows how are connected clients, caches and routers.
> http://picpaste.com/njTPeEBb.jpg
>
>
> Please forgive my errors writing in english.
>
Nice to know about your setup..
This cache you have should support ICP or HTCP and allow squid to find
out if the file is in cache.
I think it's not possible right now because of the dynamic links youtube
has and the low support of these caches in hierarchy protocols.
This might not be the case but it's a good direction.

If you can try squid newest version from HEAD that has StoreID in it you
might find it very powerfull in your situation.
There is a small "bug" which when StoreID is being used the proxy asks
from the sibling only a StoreID url in the ICP requests.
If you do ask me I think that it should work this way in your setup but
in a setup when you have parent proxy it should send the original request.

Do you want to try this feature which will reduce the need for an upper
layer cache proxy??

If you do I will be happy to guide you and make sure the setup will work
very good.

Regards,
Eliezer
Received on Tue Jun 11 2013 - 20:50:37 MDT

This archive was generated by hypermail 2.2.0 : Wed Jun 12 2013 - 12:00:17 MDT