Re: your suggestion for range_offset_limit from Amos Jeffries on 2009-11-25 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 26 Nov 2009 17:32:02 +1300

Matthew Morgan wrote:
> On Wed, Nov 25, 2009 at 7:09 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> Matthew Morgan wrote:
>>> Sorry it's taking me so long to get this done, but I do have a question.
>>>
>>> You suggested making getRangeOffsetLimit a member of HttpReply. There are
>>> two places where this method currently needs to be called: one is
>>> CheckQuickAbort2() in store_client.cc. This one will be easy, as I can just
>>> do entry->getReply()->getRangeOffsetLimit().
>>>
>>> The other is HttpStateData::decideIfWeDoRanges in http.cc. Here, all we
>>> have access to is an HttpRequest object. I looked through the source to see
>>> if I could find where a request owned or had access to a reply, but I don't
>>> see anything like that. If getRangeOffsetLimit were a member of HttpReply,
>>> what do you suggest doing here? I could make a static version of the
>>> method, but that wouldn't allow caching the result.
>> Ah. I see. Quite right.
>>
>> After a bit more though I find my original request a bit weird.
>>
>> Yes it should be a _Request_ member and do its caching there. You can go
>> ahead with that now while we discuss whether to do a slight tweak on top of
>> the basic feature.
>>
>>
>> [cc'ing squid-dev so others can provide input]
>>
>> I'm not certain of the behavior we want here if we do open the ACLs to reply
>> details. Some discussion is in order.
>>
>> Simple way would be to not cache the lookup the first time when reply
>> details are not provided.
>>
>> It would mean making it return potentially two different values across the
>> transaction.
>>
>> 1) based on only request detail to
>> and other on request+reply details. decide if a range request to possible.
>> and then
>> 2) based on additional reply details to see if the abort could be done.
>>
>> No problem if the reply details cause an increase in the limit. But if they
>> restrict it we enter grounds of potentially making a request then canceling
>> it and being unable to store the results.
>>
>>
>> Or, taking the maximum of the two across two calls? so it can only increase.
>> would be slightly trickier involving a flag a well to short-circuit the
>> reply lookups instead of just a magic cache value.
>>
>> Am I seriously over-thinking things today?
>>
>>
>> Amos
>
> Here's a question, too: is this feature going to benefit anyone? I
> realized later that it will not solve my problem, because all the
> traffic that was getting force downloaded ended up being from windows
> updates. The urls showing up in netstat and such were just weird
> because the windows update traffic was actually coming from limelight.
> My ultimate solution was to write a script that reads access.log,
> checks for windows update urls that are not cached, and manually
> download them one at a time after hours.
>
> If there is anyone at all who would benefit from this I would still be
> *more* than glad to code it (as I said, it would be my first real open
> source contribution...very exciting), but I just wondered if anyone
> will actually use it.

I believe people will find more control here useful.

Windows update service packs are a big reason, but there are also
similar range issues with Adobe Reader online PDFs, google maps/earth,
and flash videos when paused/resumed. Potentially other stuff, but I
have not heard of problems.

This will allow anyone to fine tune the places where ranges are
permitted or forced to fully cache. Avoiding the problems a blanket
limit adds.

>
> As to which approach would be better, I don't know enough about that
> data path to really suggest. When I initially made my changes, I just
> replaced each reference to Config.range_offset_limit or whatever.
> Today I went back and read some more of the code, but I'm still
> figuring it out. How often would the limit change based on the
> request vs. the reply?

Just the once. On first time being checked for the reply.
And most likely on the case of testing for a reply mime type. The other
useful info I can think of are all request data.

You can ignore if you like. I'm just worrying over a borderline case.
Someone else can code a fix if they find it a problem or need to do mime
checks.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE7 or 3.0.STABLE20
   Current Beta Squid 3.1.0.15

Received on Thu Nov 26 2009 - 04:32:26 MST

This archive was generated by hypermail 2.2.0 : Sat Nov 28 2009 - 12:00:05 MST