Re: your suggestion for range_offset_limit from Matthew Morgan on 2009-11-27 (squid-dev)

From: Matthew Morgan <lytithwyn_at_gmail.com>
Date: Fri, 27 Nov 2009 10:49:15 -0500

Amos Jeffries wrote:
> Matthew Morgan wrote:
>> On Wed, Nov 25, 2009 at 7:09 PM, Amos Jeffries <squid3_at_treenet.co.nz>
>> wrote:
>>> Matthew Morgan wrote:
>>>> Sorry it's taking me so long to get this done, but I do have a
>>>> question.
>>>>
>>>> You suggested making getRangeOffsetLimit a member of HttpReply.
>>>> There are
>>>> two places where this method currently needs to be called: one is
>>>> CheckQuickAbort2() in store_client.cc. This one will be easy, as I
>>>> can just
>>>> do entry->getReply()->getRangeOffsetLimit().
>>>>
>>>> The other is HttpStateData::decideIfWeDoRanges in http.cc. Here,
>>>> all we
>>>> have access to is an HttpRequest object. I looked through the
>>>> source to see
>>>> if I could find where a request owned or had access to a reply, but
>>>> I don't
>>>> see anything like that. If getRangeOffsetLimit were a member of
>>>> HttpReply,
>>>> what do you suggest doing here? I could make a static version of the
>>>> method, but that wouldn't allow caching the result.
>>> Ah. I see. Quite right.
>>>
>>> After a bit more though I find my original request a bit weird.
>>>
>>> Yes it should be a _Request_ member and do its caching there. You can go
>>> ahead with that now while we discuss whether to do a slight tweak on
>>> top of
>>> the basic feature.
>>>
>>>
>>> [cc'ing squid-dev so others can provide input]
>>>
>>> I'm not certain of the behavior we want here if we do open the ACLs
>>> to reply
>>> details. Some discussion is in order.
>>>
>>> Simple way would be to not cache the lookup the first time when reply
>>> details are not provided.
>>>
>>> It would mean making it return potentially two different values
>>> across the
>>> transaction.
>>>
>>> 1) based on only request detail to
>>> and other on request+reply details. decide if a range request to
>>> possible.
>>> and then
>>> 2) based on additional reply details to see if the abort could be done.
>>>
>>> No problem if the reply details cause an increase in the limit. But
>>> if they
>>> restrict it we enter grounds of potentially making a request then
>>> canceling
>>> it and being unable to store the results.
>>>
>>>
>>> Or, taking the maximum of the two across two calls? so it can only
>>> increase.
>>> would be slightly trickier involving a flag a well to short-circuit the
>>> reply lookups instead of just a magic cache value.
>>>
>>> Am I seriously over-thinking things today?
>>>
>>>
>>> Amos
>>
>> Here's a question, too: is this feature going to benefit anyone? I
>> realized later that it will not solve my problem, because all the
>> traffic that was getting force downloaded ended up being from windows
>> updates. The urls showing up in netstat and such were just weird
>> because the windows update traffic was actually coming from limelight.
>> My ultimate solution was to write a script that reads access.log,
>> checks for windows update urls that are not cached, and manually
>> download them one at a time after hours.
>>
>> If there is anyone at all who would benefit from this I would still be
>> *more* than glad to code it (as I said, it would be my first real open
>> source contribution...very exciting), but I just wondered if anyone
>> will actually use it.
>
> I believe people will find more control here useful.
>
> Windows update service packs are a big reason, but there are also
> similar range issues with Adobe Reader online PDFs, google maps/earth,
> and flash videos when paused/resumed. Potentially other stuff, but I
> have not heard of problems.
>
> This will allow anyone to fine tune the places where ranges are
> permitted or forced to fully cache. Avoiding the problems a blanket
> limit adds.
>
>>
>> As to which approach would be better, I don't know enough about that
>> data path to really suggest. When I initially made my changes, I just
>> replaced each reference to Config.range_offset_limit or whatever.
>> Today I went back and read some more of the code, but I'm still
>> figuring it out. How often would the limit change based on the
>> request vs. the reply?
>
> Just the once. On first time being checked for the reply.
> And most likely on the case of testing for a reply mime type. The other
> useful info I can think of are all request data.
>
> You can ignore if you like. I'm just worrying over a borderline case.
> Someone else can code a fix if they find it a problem or need to do mime
> checks.
>
> Amos

Great! I'll see what I can do. I really appreciate the help and encouragement!
Received on Sat Nov 28 2009 - 02:15:14 MST

This archive was generated by hypermail 2.2.0 : Mon Nov 30 2009 - 12:00:06 MST