Re: [squid-users] Re: Caching large files (i.e .ipsw)

From: Eliezer Croitoru <eliezer_at_ngtech.co.il>
Date: Thu, 07 Nov 2013 02:09:59 +0200

Hey Archer,

I analyzed couple of the addresses which seems like the "edgesuite.net"
domain is simply a CDN.
 From the addresses you sent you can use:
http://appldnld.apple.com/content.info.apple.com/iPod/SBML/osx/bundles/061-2967.20080313.Cnvkg/iPod_25.1.3.ipsw

http://appldnld.apple.com.edgesuite.net/content.info.apple.com/iPod/SBML/osx/bundles/061-2967.20080313.Cnvkg/iPod_25.1.3.ipsw

which are the same exact object my MD5 hash and ETAG.
the only difference I have seen is in the expiration date.
also the xml files is quite an asset:
http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/

These are of the addresses that will response with that same xml:
http://ax.itunes.apple.com/WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/
http://itunes.apple.com/WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version

The above means that the above urls can be mirrored using StoreID
feature which from the content point of view is 2++ urls that leads to
the same object even on the MD5 and ETAG level.
The only block in these cases is the header "Cache-Control: max-age=0,
no-cache, no-store" which makes these urls "uncachable\friendly".
These are pretty big files...

There is an issue about the 304 responses from the server which should
be ignored by default since we do trust a server that response with a
304 on cached file verification.

I would say that there is an option to use these patterns pretty safely:

^http:\/\/([a-z0-9\.]+)\.apple\.com\.edgesuite\.net\/content\.info\.apple\.com\/((iOS|iPhone)[a-zA-Z0-9\/\.\,\_\-]+\.(ipsw|ipd|ipcc))$
                              http://appledl.squid.internal/$2

^http:\/\/([a-z0-9\.]+)\.apple\.com\/((iOS|iPhone)[a-zA-Z0-9\/\.\,\_\-]+\.(ipsw|ipd|ipcc))$
                              http://appledl.squid.internal/$2

^http:\/\/([a-z0-9\.]+)\.apple\.com\.edgesuite\.net\/((iOS|iPhone)[a-zA-Z0-9\/\.\,\_\-]+\.(ipsw|ipd|ipcc))$
                              http://appledl.squid.internal/$2

These are pretty risky ones:
http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/

^http:\/\/([a-z0-9\.]+)\.apple\.com\.edgesuite\.net\/(WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/[a-zA-Z0-9\.\-\_\/\?]*)$
                              http://appledlxml.squid.internal/$2

^http:\/\/([a-z0-9\.]+)\.apple\.com\/(WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/[a-zA-Z0-9\.\-\_\/\?]*)$
                              http://appledlxml.squid.internal/$2

All the above will need a refresh_pattern something like this:
refresh_pattern ^http://(appledlxml|appledl)\.squid\.internal/.* 10080
80% 79900 refresh-ims override-expire ignore-reload ignore-private
ignore-no-store reload-into-ims

If anyone plans to use this pattern notice that it can lead to some
strange behavior of the cache.
You can remove the applexml patterns since they are sensitive.. very..

Also since edgesuite caches these files I would assume you should have
pretty fast access to all of these files.
In a case you do like these suites and just want to cache what you can
you can try to use this refresh_patterns:
refresh_pattern
^http://([a-z0-9\.]+)\.apple\.com\.edgesuite\.net\/((iOS|iPhone)[a-zA-Z0-9\/\.\,\_\-]+\.(ipsw|ipd|ipcc))$
  10080 80% 79900 refresh-ims override-expire ignore-reload
ignore-private ignore-no-store reload-into-ims

refresh_pattern
^http:\/\/([a-z0-9\.]+)\.apple\.com\.edgesuite\.net\/content\.info\.apple\.com\/((iOS|iPhone)[a-zA-Z0-9\/\.\,\_\-]+\.(ipsw|ipd|ipcc))$
10080 80% 79900 refresh-ims override-expire ignore-reload
ignore-private ignore-no-store reload-into-ims

#end of patterns.
The above are human crafted and partially tested so be careful to make
sure I am human and not always mistake but it happens..

If I would have more urls for these domains:
http://swdownload.apple.com
http://swcdn.apple.com

I might be able to find a pattern for them also.

If until now you haven't seen StoreID feel free to look at:
http://wiki.squid-cache.org/ConfigExamples/DynamicContent/Coordinator
http://wiki.squid-cache.org/Features/StoreID

Which will clarify almost all your doubts about what is De-Duplication
vs Dynamic-Content.

Also I wanted to write that the mentioned IOS files headers are very
nice since they do include Content-MD5 which makes things more easier to
verify.

As a sidenote:
An ICAP service that can validate the full content MD5 hash for these
specific domains+urls can make a webcache response with more cached
objects using couple twists on the way to make sure that squid
verification of requests cachiness will be tested by outside logic.

Eliezer

On 11/06/2013 10:49 PM, Archer wrote:
> Hopefully these two url's should be of some help;
>
> This link is used by iTunes every time a software update/restore is done.
>
> http://ax.phobos.apple.com.edgesuite.net/WebObjects/MZStore.woa/wa/com.apple.jingle.appserver.client.MZITunesClientCheck/version/
>
> The content on this server only added to and almost never changed, so I'm
> hoping i can tell squid that all content (except the initial xml file) is
> alway fresh so that it is never deleted.
>
> i.e.
> http://appldnld.apple.com/iOS6.1/091-2397.20130319.EEae9/iPad2,1_6.1.3_10B329_Restore.ipsw
> http://appldnld.apple.com/iOS7/031-1020.20131022.14lik/iPad2,1_7.0.3_11B511_Restore.ipsw
>
> when new software is brought out, it is simply added to the list rather than
> old software being removed.
>
>
>
> The following links are used for OS X software updates:
>
> http://swdownload.apple.com
> http://swcdn.apple.com
>
> Honestly, I'm not entirely sure how these ones work, but i suspect it is
> fairly similar to the iOS one above.
>
>
>
> --
> View this message in context:http://squid-web-proxy-cache.1019090.n4.nabble.com/Caching-large-files-i-e-ipsw-tp4662838p4663155.html
> Sent from the Squid - Users mailing list archive at Nabble.com.
Received on Thu Nov 07 2013 - 00:10:16 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 07 2013 - 12:00:35 MST