Re: [squid-users] Re: Cache Windows Updates ONLY

From: Nick Hill <nick_at_nickhill.co.uk>
Date: Sun, 13 Apr 2014 16:11:16 +0100

Dear Amos

Thank you for reviewing the config and giving your deeply considered comments.

On 13 April 2014 09:56, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> Did your tests find any actual benefits in these "override-lastmod
> override-expire ignore-reload ignore-must-revalidate ignore-private"
> settings ?
>
> My tests earlier showed the reload-into-ims option was all that was
> needed to make update caching behave nicely. It is also the only one of
> those options which produces RFC compliant behaviour by the proxy.

Yes! Clients generate zillions of range requests. This creates loads
of revalidation.

I have adopted the assumption that exe, cab and such files on windows
update servers are static. A different file will take a different URL.

Perhaps there are border cases where this assumption would fail, and
maybe this needs more thought. Although I think it is fair to
guarantee URLs with an embedded SHA1 checksum will always deliver the
same content.

I might rewrite this part to use reload-inot-ims for URL patterns
which don't include a checksum, and use the full override and never
expire for those URLs which do embed a checksum.

> NP: Squid understands byte units whenever you see "KB" being used in config.
>
> So:
> maximum_object_size 200 MB
> maximum_object_size 6 GB
>
> Which is the first "howler". That directive deoes not take an access
> list and only last value set matters. So adding " windowsupdate" to the
> 6GB line and setting the 200MB value are both just useless text in the
> config file.

Ok. I really would like to limit object size on ACL, but will have to
live with that!

>
>
>>
>> #My internet connection is not just used for Squid. I want to leave
>> #responsive bandwidth for other services. This limits D/L speed
>> delay_pools 1
>> delay_class 1 1
>> delay_access 1 allow all
>> delay_parameters 1 1200000/1200000
>
> It is better to use QoS controls in the system network settings that
> limit Squid (usually by PID number) than applying a class-1 delay pool
> to everything.

I do have an iptables firewall set up and will perhaps add that to the
bottom of my to-do list, unless I find it ineffectual and problematic.
>
>>
>> #We use the store_id helper to convert windows update file hashes to bare URLs.
>> #This way, any fetch for a given hash embedded in the URL will deliver
>> the same data
>> #You must make your own /etc/squid3/storeid_rewrite instructiosn at end.
>> #change the helper program location from
>> /usr/local/squid/libexec/storeid_file_rewrite to wherever yours is
>> #It is written in PERL, so on most Linux systems, put it somewhere
>> convenient, chmod 755 filename
>> store_id_program /usr/local/squid/libexec/storeid_file_rewrite
>> /etc/squid3/storeid_rewrite
>> store_id_children 10 startup=5 idle=3 concurrency=0
>> store_id_access allow windowsupdate
>> store_id_access deny all
>>
>
> concurrency=0 is bad. Although I see this is due to a lack of
> concurrency in the helper. Thats a bug which should get fixed.
>
>
>> #We want to cache windowsupdate URLs which include queries
>> #but only those queries which act on an installable file.
>> #we don't want to cache queries on asp files as this is a genuine server
>> #side query as opposed to a cache breaker
>> acl wupdatecachablequery urlpath_regex
>> (cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|appxbundle|esd)\?
>>
>> #Deny caching for URLs matching query but not windowsupdate
>> cache deny QUERY !windowsupdate
>> #Deny caching for URLs matching query and windowsupdate but not cachable updates
>> cache deny QUERY windowsupdate !wupdatecachablequery
>
> What does this help with exactly? Current Squid are prefectly capable of
> caching despite query-string presence.
> In fact we recommend dropping acl QUERY entirely and adding this right
> above the '.' refresh_pattern:
> refresh_pattern -i (/cgi-bin/|\?) 0 0% 0

I have three classes.
Any URL with a query string.
Any URL to a windows update server.
Any URL to a windows update server which is specifically cache-able

To paraphrase the logic coded here:
Don't cache anything with a query string UNLESS it matches the ACL
wupdatecachablequery.

another way to write this more succinctly might be:
cache deny QUERY
cache allow wupdatecachablequery

But I am not certain whether the deny clause will take a higher
priority than the allow clause in cases where both ACLs match. The
fandangled logic avoids this.

>
>
>>
>> #Given windows update is un-cooperative towards third party
>> #methods to reduce network bandwidth, it is safe to presume
>> #cache-specific headers or dates significantly differing from
>> #system date will be unhelpful
>> reply_header_access Date deny windowsupdate
>> reply_header_access Age deny windowsupdate
>
> The "given" actually is not true IME. So not a safe assumption.
>
> Bad behaviour in the HTTP/1.1 revalidation by clients is a common side
> effect of the override-* and ignore-* options being used on refresh_pattern.
> The overrides used above make Squid ignore the caching boundary
> conditions about when objects become stale or expire. So the client
> fetch can a) MISS earlier than necessary, or b) HIT on a stale object
> with headers indicating it is obsolete well before delivery time -
> client DO resolve that by re-fetching with a forced reload. In (a)
> refreshing uses full-object bandwidth more frequently than necessary, in
> (b) repairing the corrupted objects costs 2x bandwidth a normal MISS
> would have cost.
>
> When reload-into-ims is used Squid translates annoying reload behaviour
> into friendlier refresh behaviour. At worst Squid is required to do a
> revalidation (almost no cost in bandwidth) to update the timestamps on
> content delivered to the client. Avoiding problem (b) above entirely is
> well worth that (very small) extra time delay on occasional WU.
>
> Caching and revalidation seems in my experience to be performed properly
> by the windows update tools. At least in WindowsXP SP2 and Windows 7
> which I have tested on.
>>
>> #Put the two following lines in /etc/squid3/storeid_rewrite ommitting
>> the starting hash
>> #^http:\/\/.+?\.ws\.microsoft\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
>> http://wupdate.squid.local/$1
>> #^http:\/\/.+?\.windowsupdate\.com\/.+?_([0-9a-z]{40})\.(cab|exe|ms[i|u|f]|asf|wm[v|a]|dat|zip|psf|appx|esd)
>> http://wupdate.squid.local/$1

I'll update these patterns to be server agnostic.

I'll update the refresh pattern to account for whether a URL has an
embedded checksum. If not, use reload-into-ims else assume it is
guaranteed static.
Received on Sun Apr 13 2014 - 15:11:25 MDT

This archive was generated by hypermail 2.2.0 : Sun Apr 13 2014 - 12:00:05 MDT