Re: [squid-users] Cannot get conent from msnbc that have # in UR

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Wed, 12 Nov 2008 18:31:55 +1300

Nicole wrote:
> On 11-Nov-08 My Secret NSA Wiretap Overheard Nicole Saying :
>>
>> Hello all
>>
>> I have started to receive complains from people trying to get video's from
>> msnbc.com that use a # character in the URL.
>>
>> Such as:
>>
>> http://www.msnbc.msn.com/id/22425001/vp/27657223#27657223
>> http://www.msnbc.msn.com/id/22425001/vp/27652443#27652443
>>
>>
>> The access log shows that it is removing the pound sign and everything after.
>>
>> 7 TCP_MISS:DIRECT
>> 9.2.2.7 - - [11/Nov/2008:09:59:30 -0800] "GET
>> http://www.msnbc.msn.com/id/22425001/vp/27657223 HTTP/1.1" 200 477
>> TCP_MISS:DIRECT
>> 9.2.2.7 - - [11/Nov/2008:10:00:18 -0800] "GET
>> http://www.msnbc.msn.com/id/22425001/vp/27652443 HTTP/1.1" 200 477
>> TCP_MISS:DIRECT
>>
>>
>> I cannot see in my config why it would be truncating out the pound
>> character.
>>
>>
>> Any assistance greatly appreciated.
>>
>>
>
> On additional i forgot to include:
> This seems true for squid 2.6 and 2.7-stable5
>
>
> cache.log:
> 2008/11/11 16:33:28| Oversized chunk header on port 59375, url
> http://www.msnbc.msn.com/id/3036677
>
>
> This seems to be true on every browser I test. Enable proxy.. will not load.
> Disable proxy (on the browser) and the url loads.
>

Ah. Bingo.
This is a combination of two problems:
  1) the msnbc stream software is sending chunked-encoded response to
Squid when it should not be.
  2) and the hack in Squid-2 to cope with that bad behavior has a limit
on the header size it can handle.

You might have to use the Accept-Encoding hack on them:

  # Fix broken sites by removing Accept-Encoding header
  acl broken dstdomain ...
  header_access Accept-Encoding deny broken

PS. an upgrade to 3.1 beta might be an option for you also.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE5 or 3.0.STABLE10
   Current Beta Squid 3.1.0.2
Received on Wed Nov 12 2008 - 05:32:00 MST

This archive was generated by hypermail 2.2.0 : Thu Nov 13 2008 - 12:00:03 MST