Re: [squid-users] Re: ICP and HTCP and StoreID

From: Nikolai Gorchilov <niki_at_x3me.net>
Date: Fri, 14 Feb 2014 13:47:02 +0200

OK, Amos. Completely agree with your points. I din't want to enter
into such lengthy discussions regarding a small optional feature, that
brings little CPU optimisation. As I said earlier, I don't mind
rewriting same URL twice (once on HTCP, then on HTTP request). Peace!
:-)

Let's discuss working solutions for:
a/ No StoreID is used outside Squid
b/ StoreID normalization on incoming ICP/HTCP requests
c/ false-negative HTTP revalidation

Best,
Niki

On Fri, Feb 14, 2014 at 5:20 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On 14/02/2014 2:20 p.m., Nikolai Gorchilov wrote:
>> On Fri, Feb 14, 2014 at 2:04 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>>> On 2014-02-14 09:04, Alex Rousskov wrote:
>>>>
>>>> On 02/13/2014 05:11 AM, Nikolai Gorchilov wrote:
>>>>
>>>>> I'd suggest first to review all possible StoreID use cases involving
>>>>> cache peers before proceeding further.
>>>>>
>>>>> Let's define A as originating proxy and B - as the next hop proxy in
>>>>> the request forwarding chain. UDP is alias for both ICP or HTCP query,
>>>>> while TCP is synonym of the following HTTP request.
>>>>>
>>>>> Here are all valid usage scenarios I could think of:
>>>>> 1. A & B use same StoreID rewiring logic
>>>>> - No StoreID processing for incoming UDP on B is necessary
>>>>> - UDP request uses StoreID
>>>>> - TCP request uses URL
>>>>> 2. A & B use different StoreID rewriting logic
>>>>> - StoreID processing on incoming UDP on B
>>>>> - UDP request uses URL
>>>>> - TCP request uses URL
>>>>> 3. A with StoreID enabled, B - disabled
>>>>> - UDP request uses URL
>>>>> - TCP request uses URL
>>>>> 4. A with StoreIID disabled, B - enabled
>>>>> - StoreID processing on incoming UDP on B
>>>>> - UDP request uses URL
>>>>> - TCP request uses URL
>>>>>
>>>>> In order to support all of the above we need the following two config
>>>>> options:
>>>>> - configuration switch to enable or disable StoreID processing on
>>>>> incoming UDP
>>>>> - cache_peer option to enable/disable querying the respective peer
>>>>> using StoreID instead of URL
>>>>
>>>>
>>>>
>>>>> If you see any rifts in the above logic, please say.
>>>>
>>>>
>>>> I question the value of supporting the implied "no StoreID processing"
>>>> optimization above. AFAICT, if Squid always uses URLs for anything
>>>> outside internal storage, everything would work correctly and all use
>>>> cases will be supported well, without any additional options.
>>>>
>>>> If somebody wants to extend ICP/HTCP to include StoreId in the request
>>>> (as an optional additional field), they may do so, but that optional
>>>> optimization does not change the overall design principle: StoreId for
>>>> the internal storage; URL for everything else.
>>>
>>>
>>> Exactly.
>>>
>>>
>>> Keeping two distinct cache_peer internal index representations in-sync with
>>> regards to how some helper service is producing the IDs is not as trivial a
>>> job as implied by the proposal.
>>> Consider the process of upgrading either Squid or the helper on server A
>>> simply *10 seconds* earlier than server B. For that period one of the
>>> services may be pushing garbage cache IDs into the other. In that same time
>>> the latest Squid could process several thousand requests - not exactly a
>>> trivial amount of cache churn.
>>
>> UDP requests doesn't push anything. They just check if the peer has an
>> object. If wrong (not in sync) cache ID is used - not a big deal.
>> UDP_MISS response will be generated. And the originating peer will
>> decide what to do next.
>
> But during this period there will be that huge amount of false-negative
> results. Causing a desync in the frntend proxy as it believes either
> that the object is not cached (adding to its own cache and bumping out
> other existing content), or to fetch via some other route (possibly
> causing cache of alternative path to churn).
>
> Either way its a waste of resources and work just so a small
> optimization can take place in IPC/HTCP packet handling. Since chances
> are high that the expensive store-ID lookup in the peer will be
> short-circuited by the helper response cache anyway.
>
>
>>
>>> Also, the connection between those peers is not necessarily a direct 1-hop
>>> connection. It may involve any kind of HTTP interception software
>>> (firewalls, deep packet inspectors, etc) overlooked by even the most well
>>> intended administrator.
>>
>> We're talking ICP/HTCP here. HTTP request shall always go with URL....
>
> You just made the mistake of assuming "HTTP interceptin software" means
> TCP. It does not.
>
> HTTP is transported over both TCP and UDP. HTCP for example has full
> headers and is used at times for cache invalidation. Then there is the
> COAP protocols.
>
>>
>> I really don't understand your logic. Both you and Alex seem to be OK
>> with the fact Squid is using StoreID for during HTTP with cache peers
>> (let's call it "known limitation"), but using StoreID for ICP/HTCP
>> queries is considered a bug that needs a fix.
>
> No. We are both *not okay* with using StoreID for the HTTP requests
> between peers.
>
> Alex said the overall design principle:
> "StoreId for the internal storage; URL for everything else."
>
>
> "internal storage" != HTTP.
>
>
>>
>> For me it's quite the opposite - StoreID over HTTP shall be fixed
>> ASAP, StoreID over ICP/HTCP shall be considered "known limitation".
>
> There is no "StoreID over X", never was.
>
> StoreID leaving the Squid instance in traffic is a bug.
>
> The known limitation for the StoreID model is that it leads to a high
> false-negative rate for HTTP revalidation.
> It causes a disconnect between the original request used to cache the
> object and the current request. So ETag header for the cached objct does
> not always match the current requested URL and causes a refresh update
> with new content.
>
> Amos
>
Received on Fri Feb 14 2014 - 11:47:52 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 14 2014 - 12:00:04 MST