Re: [squid-users] Re: ICP and HTCP and StoreID from Amos Jeffries on 2014-02-13 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 14 Feb 2014 16:20:21 +1300

On 14/02/2014 2:20 p.m., Nikolai Gorchilov wrote:
> On Fri, Feb 14, 2014 at 2:04 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On 2014-02-14 09:04, Alex Rousskov wrote:
>>>
>>> On 02/13/2014 05:11 AM, Nikolai Gorchilov wrote:
>>>
>>>> I'd suggest first to review all possible StoreID use cases involving
>>>> cache peers before proceeding further.
>>>>
>>>> Let's define A as originating proxy and B - as the next hop proxy in
>>>> the request forwarding chain. UDP is alias for both ICP or HTCP query,
>>>> while TCP is synonym of the following HTTP request.
>>>>
>>>> Here are all valid usage scenarios I could think of:
>>>> 1. A & B use same StoreID rewiring logic
>>>> - No StoreID processing for incoming UDP on B is necessary
>>>> - UDP request uses StoreID
>>>> - TCP request uses URL
>>>> 2. A & B use different StoreID rewriting logic
>>>> - StoreID processing on incoming UDP on B
>>>> - UDP request uses URL
>>>> - TCP request uses URL
>>>> 3. A with StoreID enabled, B - disabled
>>>> - UDP request uses URL
>>>> - TCP request uses URL
>>>> 4. A with StoreIID disabled, B - enabled
>>>> - StoreID processing on incoming UDP on B
>>>> - UDP request uses URL
>>>> - TCP request uses URL
>>>>
>>>> In order to support all of the above we need the following two config
>>>> options:
>>>> - configuration switch to enable or disable StoreID processing on
>>>> incoming UDP
>>>> - cache_peer option to enable/disable querying the respective peer
>>>> using StoreID instead of URL
>>>
>>>
>>>
>>>> If you see any rifts in the above logic, please say.
>>>
>>>
>>> I question the value of supporting the implied "no StoreID processing"
>>> optimization above. AFAICT, if Squid always uses URLs for anything
>>> outside internal storage, everything would work correctly and all use
>>> cases will be supported well, without any additional options.
>>>
>>> If somebody wants to extend ICP/HTCP to include StoreId in the request
>>> (as an optional additional field), they may do so, but that optional
>>> optimization does not change the overall design principle: StoreId for
>>> the internal storage; URL for everything else.
>>
>>
>> Exactly.
>>
>>
>> Keeping two distinct cache_peer internal index representations in-sync with
>> regards to how some helper service is producing the IDs is not as trivial a
>> job as implied by the proposal.
>> Consider the process of upgrading either Squid or the helper on server A
>> simply *10 seconds* earlier than server B. For that period one of the
>> services may be pushing garbage cache IDs into the other. In that same time
>> the latest Squid could process several thousand requests - not exactly a
>> trivial amount of cache churn.
>
> UDP requests doesn't push anything. They just check if the peer has an
> object. If wrong (not in sync) cache ID is used - not a big deal.
> UDP_MISS response will be generated. And the originating peer will
> decide what to do next.

But during this period there will be that huge amount of false-negative
results. Causing a desync in the frntend proxy as it believes either
that the object is not cached (adding to its own cache and bumping out
other existing content), or to fetch via some other route (possibly
causing cache of alternative path to churn).

Either way its a waste of resources and work just so a small
optimization can take place in IPC/HTCP packet handling. Since chances
are high that the expensive store-ID lookup in the peer will be
short-circuited by the helper response cache anyway.

>
>> Also, the connection between those peers is not necessarily a direct 1-hop
>> connection. It may involve any kind of HTTP interception software
>> (firewalls, deep packet inspectors, etc) overlooked by even the most well
>> intended administrator.
>
> We're talking ICP/HTCP here. HTTP request shall always go with URL....

You just made the mistake of assuming "HTTP interceptin software" means
TCP. It does not.

HTTP is transported over both TCP and UDP. HTCP for example has full
headers and is used at times for cache invalidation. Then there is the
COAP protocols.

>
> I really don't understand your logic. Both you and Alex seem to be OK
> with the fact Squid is using StoreID for during HTTP with cache peers
> (let's call it "known limitation"), but using StoreID for ICP/HTCP
> queries is considered a bug that needs a fix.

No. We are both *not okay* with using StoreID for the HTTP requests
between peers.

Alex said the overall design principle:
"StoreId for the internal storage; URL for everything else."

"internal storage" != HTTP.

>
> For me it's quite the opposite - StoreID over HTTP shall be fixed
> ASAP, StoreID over ICP/HTCP shall be considered "known limitation".

There is no "StoreID over X", never was.

StoreID leaving the Squid instance in traffic is a bug.

The known limitation for the StoreID model is that it leads to a high
false-negative rate for HTTP revalidation.
It causes a disconnect between the original request used to cache the
object and the current request. So ETag header for the cached objct does
not always match the current requested URL and causes a refresh update
with new content.

Amos
Received on Fri Feb 14 2014 - 03:20:31 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 14 2014 - 12:00:04 MST