Re: [squid-users] Cache Chrome updates

From: Nick Hill <nick_at_nickhill.co.uk>
Date: Wed, 16 Apr 2014 07:36:33 +0100

Hi Jasper
I have compiled 3.4 to provide the store_id functionality implemented
by Ellizer.

I have it running in a production heterogeneous environment.
I'm still checking for bugs, but seems to work well.

#squid.conf file for Squid Cache: Version 3.4.4
#compiled on Ubuntu with configure options: '--enable-async-io=8'
'--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap'
#'--enable-delay-pools' '--enable-underscores' '--enable-icap-client'
'--enable-follow-x-forwarded-for' '--with-logdir=/var/log/squid3'
#'--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536'
'--with-large-files' '--with-default-user=proxy'
#'--enable-linux-netfilter' '--enable-storeid-rewrite-helpers=file'

#Recommendations: in full production, you may want to set debug
options from 2 to 1 or 0.
#You may also want to comment out strip_query_terms off for user privacy

logformat squid %tg.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<a %mt

#Explicitly define logs for my compiled version
cache_store_log /var/log/squid3/store.log
access_log /var/log/squid3/access.log
cache_log /var/log/squid3/cache.log

#Lets have a fair bit of debugging info
debug_options ALL,2
#Include query strings in logs
strip_query_terms off

acl all src all
#Which domains do windows updates come from?
acl windowsupdate dstdomain .ws.microsoft.com
acl windowsupdate dstdomain .download.windowsupdate.com

acl QUERY urlpath_regex cgi-bin \?

#I'm behind a NAT firewall, so I don't need to restrict access
http_access allow all

#Uncomment these if you have web apps on the local server which auth
through local ip
#acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
#http_access deny to_localhost

visible_hostname myclient.hostname.com
http_port 3128

#Always optimise bandwidth over hits
cache_replacement_policy heap LFUDA

#Windows update files are HUGE! I have set this to 6Gb.
#A recent (as of Apr 2014) windows 8 update file is 4Gb
maximum_object_size 6 GB

#Set these according to your file system
cache_dir ufs /home/smb/squid/squid 70000 16 256
coredump_dir /home/smb/squid/squid

#Guaranteed static content from Microsoft. Usually fetched with range
requests so lets not revalidate. Underscore, 40 hex(SHA1 hash) .
extension
refresh_pattern _[0-9a-f]{40}\.(cab|exe|esd|psf|zip|msi|appx) 518400
80% 518400 override-lastmod override-expire ignore-reload
ignore-must-revalidate ignore-private
#Otherwise potentially variable
refresh_pattern -i
ws.microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 reload-into-ims
refresh_pattern -i
download.windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 reload-into-ims
#Default refresh patterns last if no others match
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320

#Directive sets I have been experimenting with
#override-lastmod override-expire ignore-reload ignore-must-revalidate
ignore-private
#reload-into-ims

#Windows updates use a lot of range requests. The only way to deal with this
#in Squid is to fetch the whole file as soon as requested
range_offset_limit -1 windowsupdate
quick_abort_min -1 KB windowsupdate

#My internet connection is not just used for Squid. I want to leave
#responsive bandwidth for other services. This limits D/L speed
delay_pools 1
delay_class 1 1
delay_access 1 allow all
delay_parameters 1 1200000/1200000

#We use the store_id helper to convert windows update file hashes to bare URLs.
#This way, any fetch for a given hash embedded in the URL will deliver
the same data
#You must make your own /etc/squid3/storeid_rewrite instructiosn at end.
#change the helper program location from
/usr/local/squid/libexec/storeid_file_rewrite to wherever yours is
#It is written in PERL, so on most Linux systems, put it somewhere
convenient, chmod 755 filename
store_id_program /usr/local/squid/libexec/storeid_file_rewrite
/etc/squid3/storeid_rewrite
store_id_children 10 startup=5 idle=3 concurrency=0
store_id_access allow windowsupdate
store_id_access deny all

#We want to cache windowsupdate URLs which include queries
#but only those queries which act on an installable file.
#we don't want to cache queries on asp files as this is a genuine server
#side query as opposed to just a cache breaker
acl wupdatecachablequery urlpath_regex
(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|appxbundle|esd)\?

cache allow windowsupdate wupdatecachablequery
cache deny QUERY

#Given windows update is un-cooperative towards third party
#methods to reduce network bandwidth, it is safe to presume
#cache-specific headers or dates significantly differing from
#system date will be unhelpful
reply_header_access Date deny windowsupdate
reply_header_access Age deny windowsupdate

#Put the following line in /etc/squid3/storeid_rewrite ommitting the
starting hash. Tab separates fields
#_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
   http://wupdate.squid.local/$1

root_at_ubuntuserver:/etc/squid3# jed squid.conf
root_at_ubuntuserver:/etc/squid3# cat squid.conf
#squid.conf file for Squid Cache: Version 3.4.4
#compiled on Ubuntu with configure options: '--enable-async-io=8'
'--enable-storeio=ufs,aufs,diskd' '--enable-removal-policies=lru,heap'
#'--enable-delay-pools' '--enable-underscores' '--enable-icap-client'
'--enable-follow-x-forwarded-for' '--with-logdir=/var/log/squid3'
#'--with-pidfile=/var/run/squid3.pid' '--with-filedescriptors=65536'
'--with-large-files' '--with-default-user=proxy'
#'--enable-linux-netfilter' '--enable-storeid-rewrite-helpers=file'

#Recommendations: in full production, you may want to set debug
options from 2 to 1 or 0.
#You may also want to comment out strip_query_terms off for user privacy

logformat squid %tg.%03tu %6tr %>a %Ss/%03>Hs %<st %rm %ru %[un %Sh/%<a %mt

#Explicitly define logs for my compiled version
cache_store_log /var/log/squid3/store.log
access_log /var/log/squid3/access.log
cache_log /var/log/squid3/cache.log

#Lets have a fair bit of debugging info
debug_options ALL,2
#Include query strings in logs
strip_query_terms off

acl all src all
#Which domains do windows updates come from?
acl windowsupdate dstdomain .ws.microsoft.com
acl windowsupdate dstdomain .download.windowsupdate.com

acl QUERY urlpath_regex cgi-bin \?

#I'm behind a NAT firewall, so I don't need to restrict access
http_access allow all

#Uncomment these if you have web apps on the local server which auth
through local ip
#acl to_localhost dst 127.0.0.0/8 0.0.0.0/32
#http_access deny to_localhost

visible_hostname myclient.hostname.com
http_port 3128

#Always optimise bandwidth over hits
cache_replacement_policy heap LFUDA

#Windows update files are HUGE! I have set this to 6Gb.
#A recent (as of Apr 2014) windows 8 update file is 4Gb
maximum_object_size 6 GB

#Set these according to your file system
cache_dir ufs /home/smb/squid/squid 70000 16 256
coredump_dir /home/smb/squid/squid

#Guaranteed static content from Microsoft. Usually fetched with range
requests so lets not revalidate. Underscore, 40 hex(SHA1 hash) .
extension
refresh_pattern _[0-9a-f]{40}\.(cab|exe|esd|psf|zip|msi|appx) 518400
80% 518400 override-lastmod override-expire ignore-reload
ignore-must-revalidate ignore-private
#Otherwise potentially variable
refresh_pattern -i
ws.microsoft.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 reload-into-ims
refresh_pattern -i
download.windowsupdate.com/.*\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
43200 80% 43200 reload-into-ims
#Default refresh patterns last if no others match
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern . 0 20% 4320

#Directive sets I have been experimenting with
#override-lastmod override-expire ignore-reload ignore-must-revalidate
ignore-private
#reload-into-ims

#Windows updates use a lot of range requests. The only way to deal with this
#in Squid is to fetch the whole file as soon as requested
range_offset_limit -1 windowsupdate
quick_abort_min -1 KB windowsupdate

#My internet connection is not just used for Squid. I want to leave
#responsive bandwidth for other services. This limits D/L speed
delay_pools 1
delay_class 1 1
delay_access 1 allow all
delay_parameters 1 1200000/1200000

#We use the store_id helper to convert windows update file hashes to bare URLs.
#This way, any fetch for a given hash embedded in the URL will deliver
the same data
#You must make your own /etc/squid3/storeid_rewrite instructiosn at end.
#change the helper program location from
/usr/local/squid/libexec/storeid_file_rewrite to wherever yours is
#It is written in PERL, so on most Linux systems, put it somewhere
convenient, chmod 755 filename
store_id_program /usr/local/squid/libexec/storeid_file_rewrite
/etc/squid3/storeid_rewrite
store_id_children 10 startup=5 idle=3 concurrency=0
store_id_access allow windowsupdate
store_id_access deny all

#We want to cache windowsupdate URLs which include queries
#but only those queries which act on an installable file.
#we don't want to cache queries on asp files as this is a genuine server
#side query as opposed to just a cache breaker
acl wupdatecachablequery urlpath_regex
(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|appxbundle|esd)\?

cache allow windowsupdate wupdatecachablequery
cache deny QUERY

#Given windows update is un-cooperative towards third party
#methods to reduce network bandwidth, it is safe to presume
#cache-specific headers or dates significantly differing from
#system date will be unhelpful
reply_header_access Date deny windowsupdate
reply_header_access Age deny windowsupdate

#Put the following line in /etc/squid3/storeid_rewrite ommitting the
starting hash. Tab separates fields
#_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
   http://wupdate.squid.local/$1

On 16 April 2014 07:26, Jasper Van Der Westhuizen
<jvdwesthuiz_at_shoprite.co.za> wrote:
>
>
> On Tue, 2014-04-15 at 14:38 +0100, Nick Hill wrote:
>> URLs with query strings have traditionally returned dynamic content.
>> Consequently, http caches by default tend not to cache content when
>> the URL has a query string.
>>
>> In recent years, notably Microsoft and indeed many others have adopted
>> a habit of putting query strings on static content.
>>
>> This could be somewhat inconvenient on days where Microsoft push out a
>> new 4Gb update for windows 8, and you have many such devices connected
>> to your nicely cached network. Each device will download exactly the
>> same content, but with it's own query string.
>>
>> The nett result is generation of a huge amount of network traffic.
>> Often for surprisingly minor updates.
>>
>> I am currently testing a new configuration for squid which identifies
>> the SHA1 hash of the windows update in the URL, then returns the bit
>> perfect cached content, irrespective of a wide set of URL changes. I
>> have it in production in a busy computer repair centre. I am
>> monitoring the results. So far, very promising.
>
> Hi Nick
>
> As you rightly said, Windows 8 devices are becoming more and more common
> now, specially in the work place. I don't want to download the same 4GB
> update multiple times. Would you mind sharing your SHA1 hash
> configuration or is it perhaps available somewhere?
>
> Regards
> Jasper
Received on Wed Apr 16 2014 - 06:36:40 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 16 2014 - 12:00:05 MDT