Re: [PATCH] experiment in private caching

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 13 May 2014 22:39:29 +1200

On 13/05/2014 9:04 a.m., Alex Rousskov wrote:
> On 05/11/2014 05:29 AM, Amos Jeffries wrote:
>> This patch seeks to reduce the constant MISS situation we have from
>> false failures doing Host: header verify.
>
> General comments:
>
> FWIW, I have a general feeling we are going to regret the proposed
> change, but I do not have enough specific reasons to object to it. If
> verification failures are so prevalent that we need this change, I
> wonder whether the verification itself is such a good idea...

So far it is for just a few CDN networks (Google and Akamai
predominantly). Unfortunately they are also popular and the kind where
an attacker can host their server as easily as the victum. Thus
similar-IP is no protection at all.

These CDN are so large AFAICT they use the entire /24 worth of IPs per
end-user network and only return 10-20 in any given DNS lookup. Leaving
~230-240 IPs "unverified". If the admin does not follow the
recommendation of ensuring they and client use the same recursive
resolvers this explodes to a full /16 or more. Clients using Googles'
recursive resolvers get shunted randomly all over the world as their
resolver exit point. Which screws with geo-DNS based CDN responses like
these.

>
> One of the problems with this patch is that it implicitly adds a second,
> optional store key to a transaction. This may look innocent enough
> because the code changes are simple, but I suspect it may create
> significant Store API/design headaches long-term, in an area where the
> current Store API is already quite bad. Below, I will suggest a
> different implementation vector that does not solve all the problems but
> keeps Store API principles intact.
>
> Performance concerns: To determine whether the requested object is in
> the cache, SMP Store requires semi-expensive lookups (one for each SMP
> cache_dir and one for the shared cache_mem). AFAICT, this patch doubles
> the number of such lookups per typical miss response due to a new
> http.cc loop. The implementation using existing Store APIs (suggested
> below) avoids this additional overhead.

There are also large(r) performance issues with the StoreID approach.

Using StoreID it becomes impossible for the unverified requests to ever
HIT on any existing content cached from verified sources. This actively
reduces both the HIT ratio and safety of responses. HIT being safer than
MISS in these cases.

In theory the one I picked has a higher overall HIT ratio possible.
 * best-case scenario all unverified traffic gets HITs.
 * worst-case all of it becomes MISSes for unverified IP and the ratio
drops to one cached object/variant per unique dstIP. Which is the
maximum possible in the Store-ID approach.

>
> If somebody is complaining about these misses now, I bet somebody will
> later ask you to merge several destination IPs together, to further
> reduce miss ratio in cases where _multiple_ destination IPs correspond
> to the same host and all fail validation. I do not like using "slippery
> slope" arguments so I am not using this as an objection, but do keep
> such possible requests in mind as you evaluate different design choices.
>

Oh yes, that one has already come up. See top comment.

>
> Specific comments follow.
>
>> + /// An additional context to add to the store-ID key for security.
>> + String storeSecurityKey;
>
> Can we reuse a part of the existing Store ID API here, instead of adding
> one more way to form a cache key? That is, instead of appending
> storeSecurityKey to the computed key hash, can we just make it a part of
> the stored Store ID value itself?

We can. But see above. I think we would end up with a worse HIT ratio
than we have right now.

>
> Reusing Store ID will not allow requests that fail validation to hit
> cached entries that correspond to successfully validated requests, but
> (a) that could be a good thing in some cases and (b) I wonder whether
> that is a small price to pay for avoiding new uncertainties related to a
> new way to generate a store key?

Regarding (a) - which cases?

>
> Today there are two major store key paths:
>
> 1. From store_id, if any.
> 2. From Request URI if there is no store_id.
>
> If you adjust your patch to use Store ID API, we will have:
>
> 1. From store_id, if any.
> 2. From Request URI if there is no store_id.
>
> No changes!
>
> Today there is one way to get store_id set:
>
> i. If there is store_id helper:
> From store_id helper response.
>
> If you adjust your patch to use Store ID API, we will have:
>
> i. If there is store_id helper:
> From store_id helper response,
> with storeSecurityKey appended when needed.
>
> ii. If there is no store_id helper:
> From Request URI with storeSecurityKey appended
> (always needed if store ID is set without the store_id helper)
>
> The added benefit of the Store ID API approach is that the
> storeSecurityKey information may not even need to go into the
> HttpRequest in this case. You may be able to keep it inside
> ClientHttpRequest.
>
> I am sure the above sketch is missing some details, but I think it can
> work. What do you think?

I know it can work. It is just being driven by the "higher HIT ratio"
group. So options that decrease the HIT ratio are not accepted, and we
may as well do nothing if those are the only options available. I
personanly am happy not adding any of this patch.

>
> BTW, are you sure that serving content cached by regular requests to
> requests that fail validation is always desirable? Is it possible that
> some failed-validation requests will actually get to a different/newer
> version of the site (e.g., when DNS changed but Squid does not know
> about that yet)? If you are not sure, please note that the patch will
> introduce breakage in these cases that does not exist today. Using the
> Store ID approach avoids that AFAICT.

It is possible without this patch. If it is a problem then the site
cache controls are broken or Squid is explicitly configured with
refresh_pattern abuses.

Skipping code specifics a few items while we discuss the design. One
more comment at end...

>
>> + /// An additional context to add to the store-ID key for security.
>> + String storeSecurityKey;
>
> Please consider using a less grandiose name for this data member. This
> is not really store key or a key to store security. Something like
> storeIdSuffix may work better.
>
>
>> + // add a security context (the original dst IP) to the cache key
>
> Please note that you are not using just the destination IP address. You
> are using both the IP address and the port number. That is probably the
> right thing to do, but the documentation should reflect what you do.
>
>
>> +/* XXX
>> // NP: it is tempting to use 'flags.noCache' but that is all about READing cache data.
>> // The problems here are about WRITE for new cache content, which means flags.cachable
>> http->request->flags.cachable = false; // MUST NOT cache (for now)
>> - // XXX: when we have updated the cache key to base on raw-IP + URI this cacheable limit can go.
>> +*/
>
> This needs polishing, of course. Replacing the above with a comment that
> we no longer set flags.cachable because we allow destination-specific
> caching is probably sufficient. There will be no XXX here if your
> changes are committed.
>
>
>> The expected behaviour by Squid for the troublesome servers is the same
>> as if they were emitting broken Vary: header. Squid will cache a variant
>> for each destination IP that fails verification, with a mix of HIT/MISS,
>> until one suddenly passes and all traffic becomes HITs on the
>> non-varying entry.
>
>> cf.data.pre:
>> When set to OFF (the default):
>> Squid allows suspicious requests to continue but logs a
>> security warning and blocks caching of the response.
>
> Please do not forget to update the above documentation.
>
>
>> + // storeGetPublicByRequest() returns best-match from a set of entries.
>> + // So we loop to invalidate all the entries it can find.
>> + do {
>
> Is this to allow requests that failed validation to purge responses to
> any request that did not fail validation? For example,
>
> a) Validated request for google IP with Host: google.com
> b) "Invalid" request for badguys IP with Host: google.com
>
> Will a response to (b) purge the already cached response to (a)? Did
> unpatched Squid suffer from that kind of problem as well?
>
>
>> NOTE: due to my sparse level of knowledge about the store API complexity
>> I am needing help testing that the patch attached actually does the
>> above behaviour.
>> Specifically that it does not cache one of the suspect objects in a way
>> that leaves it able to be HIT on without the dst-IP security key.
>
> Personally, I cannot give assurances about Store API changes because
> Squid creates and uses store keys in many contexts. Tracking all of them
> down and making sure they work as intended is a lot of work! In many
> ways, your change is similar to the Store ID change (it is also a new
> way to generate a store key), and we already know that Store ID broke a
> few things in unexpected ways. This is one of the reasons I strongly
> encourage you to investigate using the Store ID APIs to implement this
> feature.
>
> Request adaptation happens after host validation. Request adaptation may
> change the request URI and/or the Host header. However, storeSecurityKey
> will remain set if it was set before adaptation and will not be set if
> the new request actually becomes invalid. Are both behaviors desirable?
>

That we need to discuss some more.

The adaptation output is created on the assumption that the URL
generated from Host header is correct/valid. How does that combine with
the implication of explicit verification falure? Is the adaptation
result any more trustable than the original Host header? (GIGO etc.)

> How will this change work with cache peering? My understanding is that
> peer queries from requests that failed Host validation may result in
> HITs and may be cached by the requesting peer, right? Will the
> requesting peer use the special key (with storeSecurityKey appended) to
> cache the response from the remote peer?

No changes to the peer situation from this.

Amos
Received on Tue May 13 2014 - 10:39:39 MDT

This archive was generated by hypermail 2.2.0 : Tue May 13 2014 - 12:00:13 MDT