Re: [squid-users] Problem with HTTP Headers

From: Ghassan Gharabli <sounarose_at_googlemail.com>
Date: Sun, 13 Nov 2011 19:14:48 +0200

Dear Amos,

After allowing access "Head" method in Squid Config

I deleted www.facebook.com from cache andthen I tried executing

squidclient -m head http://www.facebook.com

Results :

HTTP/1.0 302 Moved Temporarily
Location: http://www.facebook.com/common/browser.php
P3P: CP="Facebook does not have a P3P policy. Learn why here: http://fb.me/p3p"
Set-Cookie: datr=hfW_TtrAQmi_2SxwAUY4EjPH; expires=Tue, 12-Nov-2013 16:51:17 GMT
; path=/; domain=.facebook.com; httponly
Content-Type: text/html; charset=utf-8
X-FB-Server: 10.53.10.59
X-Cnection: close
Content-Length: 0
Date: Sun, 13 Nov 2011 16:51:17 GMT
X-Cache: MISS from Peer6.skydsl.net
X-Cache-Lookup: MISS from Peer6.skydsl.net:3128
Connection: close

I am not seeing any pragma or cache-control and expires! but redbot
shows the correct info there!.

BTW .. I am also using store_url but im sure nothing is bad there . I
am only playing with Dynamic URL regarding to Pictures and Videos
extensions so I have only one thing left for me to try which is unlike
to do it ..

acl facebookPages urlpath_regex -i /(\?.*|$)

First does this rule affect store_url?

For example when we have url like

http://www.example.com/1.gif?v=1244&y=n

I can see that urlpath_regex requires Full URL which means this rule matches :

http://www.example.com/stat?date=11

I will try to ignore this rule and let me focus on facebook problem
since we have more than 60% traffic on Facebook.

Let me test denying only PHP , HTML for a day to see if facebook http
header is being saved in cache.

Ghassan

On Sun, Nov 13, 2011 at 2:26 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 13/11/2011 12:15 p.m., Ghassan Gharabli wrote:
>>
>> Hello Amos,
>>
>> I understand what you wrote to me but I really do not have any rule
>> that tells squid to cache wwww.facebook.com header ..
>
> According to http://redbot.org/?uri=http%3A%2F%2Fwww.facebook.com%2F
>
> FB front page has Expires, no-store, private, and must-revalidate. Squid
> should not be caching these at all unless somebody has maliciously erased
> the control headers. Or your squid has ignore-* and override-*
> refresfh_patterns for them (I did not see any in your config, which is good)
>
> Can you use:
>   squidclient -m HEAD http://www.facebook.com/
>
> to see if those headers you get match the ones apparently being sent by the
> FB server.
>
>>
>> I only used refresh_pattern to match Pictures , Videos&  certain
>> extensions by using ignore-must-revalidate , ignore-no-store ,
>> ignore-no-cache , store-stale .. etc
>>
>> and howcome this rule doesnt work ?
>>
>> refresh_pattern -i \.(htm|html|jhtml|mhtml|php)(\?.*|$)               0 0%
>>
>> This rule tells squid not to cache these extensions if we had static
>> URL or dynamic URL.
>
> The refresh_pattern algorithm only gets used *if* there are no Expires or
> Cache-Control headers stating specific information.
>
> Such as "private" or "no-store" or "Expires: Sat, 01 Jan 2000 00:00:00 GMT".
>
>
>>
>> As I noticed every time you open a website for example www.mtv.com.lb
>> then you try to open it again next day but you get the same news (
>> yesterday) which confused me and allow me to think that maybe Squid
>> ignore all headers related to website if you cached for example
>> pictures and multimedia objects thats why I was asking which rule
>> might be affecting websites?.
>>
>> I cant spend my time on adding list to "cache deny" on websites that
>> were being cached so I thought of only removing the rule caused squid
>> to cached Websites .
>>
>> How to ignore www.facebook.com not to cache but at the same time I
>> want to cache pictures , FLV Videos , CSS , JS but not the header of
>> the main page (HTML/PHP).
>
> With this config:
>   acl facebook dstdomain .facebook.com
>   acl facebookPages urlpath_regex -i \.([jm]?htm[l]?|php)(\?.*|$)
>   acl facebookPages urlpath_regex -i /(\?.*|$)
>   cache deny facebook facebookPages
>
> and remove all the refresh_pattterns you had about FB content.
>
> Which will cause any FB HTML objects which *might* have been cachable to be
> skipped by your Squid cache.
>
> Note that FLV videos in FB often come directly from youtube, so are not
> easily cached. The JS and CSS will retain the static/dynamic properties they
> are assigned by FB. You have generic refresh_pattern rules later on in yoru
> config which extend their normal storage times a lot.
>
>>
>> refresh_pattern ^http:\/\/www\.facebook\.com$             0 0% 0
>>
>> I tried to use $ after .com as I only wanted not to cache the main
>> page of Facebook but still I want to cache Pictures and Videos at
>> Facebook and so on at other websites .
>
> And I said the main page is not "http://www.facebook.com" but
> "http://www.facebook.com/"
>
> so you should have added "/$" instead of just "$".
>
> BUT, using "cache deny" as above this becomes not relevant any more.
>
> Amos
>
Received on Sun Nov 13 2011 - 17:14:57 MST

This archive was generated by hypermail 2.2.0 : Mon Nov 14 2011 - 12:00:02 MST