Re: [squid-users] Caching Pandora

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Sun, 26 Jul 2009 19:08:23 +1200

Jason Spegal wrote:
> I am currently using the following for the items in question.
>
> refresh_pattern pandora.com 0 300% 31536000
> refresh_pattern . 0 80% 3156000

The dot (.) pattern matches every URL in existence.

For the pandora files you don't need to go 300%, but do need to add all
the available override-* and ignore-* violations available to the
"pandora.com" pattern.

I'd also try making the pandora pattern:
   -i http://[^a-z\.]*pandora\.com/?

>
> With violations off these work well. However they fail to cache all the
> items I would like. When I had violations on I had tried refresh_pattern
> . 0 0% 0 as well as setting all refresh_pattern to 0 0% 0 which still
> failed to refresh the pages properly. I had also tried rebuilding the
> cache from scratch several times.
>
> Other relevant pattern's I am using:
>
> #Dynamic Content
> refresh_pattern -i cgi-bin 0 0% 0 refresh-ims

The following is a violation even if it works with violations not enabled.
> refresh_pattern -i \? 0 0% 3156000 refresh-ims
> refresh_pattern -i .(asp|aspx|php|pl|xml|rss|kml|cgi|py|pyc) 0 0% 0
> refresh-ims

> #HTML
> refresh_pattern text/html 0 80% 2592000 refresh-ims
> refresh_pattern text/css 0 80% 2592000 refresh-ims
>
> #Java & Javascript
> refresh_pattern -i .(js|jar|java) 0 100% 31536000
>
> #By MIME-Type
> refresh_pattern application/* 0 300% 31536000
> refresh_pattern audio/* 0 300% 31536000
> refresh_pattern images/* 0 300% 31536000
> refresh_pattern text/* 0 300% 31536000
> refresh_pattern video/* 0 300% 31536000
>

? mime patterns in the URL? with Squid?

Do you have a patch that doe this? If so please consider contributing
back to the project.

>
> When I had violations on the Pandora entry was similar to this...
>
> refresh_pattern pandora.com 0 300% 31536000 override-expire
> reload-into-ims ignore-reload ignore-no-cache ignore-private
> ignore-no-store ignore-auth

A single pattern like that should be all you need to add.

Some of the non-caching parameters are only able to be overridden in the
2.HEAD code though. You may need to grab a copy of the HEAD code and use
that.

PS. all of your file extension patterns above are using the very unsafe
.XX syntax. The pattern is a regex and matches anywhere in the URL. Its
likely catching a whole lot of URL which should not.

  Please use: \.XX(\?.*)?$ instead. ie \.(js|jar|java)(\?.*)?$

Amos

> Amos Jeffries wrote:
>> Jason Spegal wrote:
>>> I would wager it's content control given what they are. However with
>>> violations on they can be cached. Without they cannot. I just haven't
>>> been able to figure out how to get squid to behave with violations
>>> turned on. My only other option I can see is to setup a second squid
>>> with violations and filter all the traffic to/from Pandora through it.
>>
>> Use refresh_pattern with a regex that only matches pandora URL.
>>
>> I'll wager you have either added all the overrides to the . pattern,
>> or have a overly-greedy regex in use.
>>
>> Amos
>>
>>>
>>> Adrian Chadd wrote:
>>>> This doesn't surprise me. They may be trying to maximise outbound
>>>> bits, or try to retain control over content, or not understanding
>>>> caching, or all/combination of the above.
>>>>
>>>> I'd suggest contacting them and asking.
>>>>
>>>>
>>>>
>>>>
>>>> adrian
>>>>
>>>> 2009/7/26 Jason Spegal <jspegal_at_comcast.net>:
>>>>
>>>>> A little bit messy but here are some snippets.
>>>>>
>>>>> ###Access.log
>>>>>
>>>>> 1248572380.275 178 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
>>>>> http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
>>>>>
>>>>> - DIRECT/208.85.40.13 -
>>>>> 1248572409.144 8472 10.10.122.241 TCP_MISS/200 1581181 GET
>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4? -
>>>>> DIRECT/208.85.41.38 application/octet-stream
>>>>> 1248572439.512 94 10.10.122.241 TCP_MEM_HIT/200 55396 GET
>>>>> http://images-sjl-2.pandora.com/images/public/amz/3/0/2/3/602498413203_500W_499H.jpg
>>>>>
>>>>> - NONE/- image/jpeg
>>>>> 1248572570.898 300 10.10.122.248 TCP_MISS/200 6521 GET
>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>
>>>>> - DIRECT/208.85.41.23 image/jpeg
>>>>> 1248572600.538 29937 10.10.122.248 TCP_MISS/200 7704188 GET
>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3? -
>>>>> DIRECT/208.85.41.38 application/octet-stream
>>>>> 1248572615.735 11507 10.10.122.241 TCP_MISS/200 2109481 GET
>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4? -
>>>>> DIRECT/208.85.41.36 application/octet-stream
>>>>> 1248572635.903 179 10.10.122.248 TCP_REFRESH_UNMODIFIED/304 232 GET
>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>
>>>>> - DIRECT/208.85.41.23 -
>>>>> 1248572641.444 40 10.10.122.241 TCP_HIT/200 21616 GET
>>>>> http://images-sjl-2.pandora.com/images/public/amz/8/7/6/1/602498611678_300W_273H.jpg
>>>>>
>>>>> - NONE/- image/jpeg
>>>>>
>>>>> ###Store.log
>>>>>
>>>>> 1248572380.275 RELEASE -1 FFFFFFFF
>>>>> 097EAE1108DCEF192ED1C3BFF1F6C1B5 304
>>>>> 1248572380 -1 -1 unknown -1/0 GET
>>>>> http://images-sjl-1.pandora.com/images/public/amz/1/2/0/4/727361124021_500W_495H.jpg
>>>>>
>>>>> 1248572409.144 RELEASE -1 FFFFFFFF
>>>>> 6B93B1BF958703B3FC3CD1ADDD515695 200
>>>>> 1248572400 -1 1248572400 application/octet-stream
>>>>> 1580815/1580815 GET
>>>>> http://audio-sjl-t3-2.pandora.com/access/7008639604707703825.mp4?
>>>>> 1248572570.897 SWAPOUT 00 0004CF23
>>>>> BEEE111A39B596B14903743011AF2C36 200
>>>>> 1248572570 1248490006 -1 image/jpeg 6181/6181 GET
>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>
>>>>> 1248572600.538 RELEASE -1 FFFFFFFF
>>>>> 070416ED935AD18DCA793569D2C6A652 200
>>>>> 1248572570 -1 1248572570 application/octet-stream
>>>>> 7703822/7703822 GET
>>>>> http://audio-sjl-t3-2.pandora.com/access/3642267922875646389.mp3?
>>>>> 1248572615.735 RELEASE -1 FFFFFFFF
>>>>> B0EB42B39131DF028BA3BE9A39CC24E4 200
>>>>> 1248572604 -1 1248572604 application/octet-stream
>>>>> 2109115/2109115 GET
>>>>> http://audio-sjl-t2-2.pandora.com/access/5722981497105294607.mp4?
>>>>> 1248572635.903 RELEASE -1 FFFFFFFF
>>>>> CDCA0D3510080D121E5578310976676E 304
>>>>> 1248572635 -1 -1 unknown -1/0 GET
>>>>> http://images-sjl-3.pandora.com/images/public/amz/2/2/4/4/039841434422_130W_130H.jpg
>>>>>
>>>>> 1248572886.822 RELEASE -1 FFFFFFFF
>>>>> A95C86074129546301911C2FC251071D 200
>>>>> 1248572872 -1 1248572872 application/octet-stream
>>>>> 2086824/2086824 GET
>>>>> http://audio-sjl-t1-1.pandora.com/access/5188159311574708305.mp4?
>>>>>
>>>>> ###Wireshark
>>>>>
>>>>> Hypertext Transfer Protocol
>>>>> HTTP/1.0 200 OK\r\n
>>>>> Date: Sun, 26 Jul 2009 05:12:58 GMT\r\n
>>>>> Server: Apache\r\n
>>>>> Content-Length: 6137729\r\n
>>>>> Cache-Control: no-cache, no-store, must-revalidate, max-age=-1\r\n
>>>>> Pragma: no-cache, no-store\r\n
>>>>> Expires: -1\r\n
>>>>> Content-Type: application/octet-stream\r\n
>>>>> X-Cache: MISS from ichiban\r\n
>>>>> X-Cache-Lookup: MISS from ichiban:3128\r\n
>>>>> Via: 1.0 ichiban (squid)\r\n
>>>>> Proxy-Connection: keep-alive\r\n
>>>>> \r\n
>>>>>
>>>>> mos Jeffries wrote:
>>>>>
>>>>>> Jason Spegal wrote:
>>>>>>
>>>>>>> I was able to cache Pandora by compiling with
>>>>>>> --enable-http-violations
>>>>>>> and using a refresh_pattern to cache everything regardless. This
>>>>>>> however
>>>>>>> broke everything by preventing proper refreshing of any site. If
>>>>>>> it could be
>>>>>>> worked where violations only happened as directly specified in the
>>>>>>> configuration it would be a workable solution. I did some testing
>>>>>>> and I
>>>>>>> could not confirm that it was anything in the configuration file
>>>>>>> itself that
>>>>>>> was causing the issue. I wouldn't recommend using this as such.
>>>>>>>
>>>>>>>
>>>>>> Which indicates that there are fine tuning possible to cache just
>>>>>> Pandora.
>>>>>> Find yoursef one of the Pandora URLs in your access.log and take a
>>>>>> visit to
>>>>>> www.redbot.org or the ircache.org cacheability engine.
>>>>>>
>>>>>>
>>>>>> Amos
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Henrik Nordstrom wrote:
>>>>>>>
>>>>>>>> lör 2009-07-25 klockan 12:05 -0600 skrev Brett Glass:
>>>>>>>>
>>>>>>>>
>>>>>>>>> One of the largest consumers of our HTTP bandwidth is Pandora,
>>>>>>>>> the free
>>>>>>>>> music service. Unfortunately, Pandora marks its streams as
>>>>>>>>> non-cacheable and
>>>>>>>>> also puts question marks in the URLs, which is a huge waste of
>>>>>>>>> bandwidth.
>>>>>>>>> How can this be overridden?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> The questionmark can be ignored. See the "cache" directive. But
>>>>>>>> if there
>>>>>>>> is other parameters behind there (normally not logged) that just
>>>>>>>> may not
>>>>>>>> help..
>>>>>>>>
>>>>>>>> Regarding non-cacheable.. most crap can be overridden by
>>>>>>>> refresh_pattern.
>>>>>>>>
>>>>>>>> But, if it's a streaming service (I know nothing about Pandora)
>>>>>>>> then you
>>>>>>>> are quite likely out of luck.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Henrik
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE16
   Current Beta Squid 3.1.0.10 or 3.1.0.11
Received on Sun Jul 26 2009 - 07:08:36 MDT

This archive was generated by hypermail 2.2.0 : Sun Jul 26 2009 - 12:00:04 MDT