RE: [squid-users] Strange misses of cacheable objects [SOLVED]

From: Anatoli <me_at_anatoli.ws>
Date: Tue, 22 Apr 2014 07:46:54 -0300

Amos,

I thought CVE-2009-0801 was a little bit about other thing. Now, after
researching it in more detail, I believe I understand what you mean.

For the sake of other people investigating the same issues, CVE-2009-0801
overview:

Squid, when transparent interception mode is enabled, uses the HTTP Host
header to determine the remote endpoint, which allows remote attackers to
bypass access controls for Flash, Java, Silverlight, and probably other
technologies, and possibly communicate with restricted intranet sites, via a
crafted web page that causes a client to send HTTP requests with a modified
Host header.

So the problem is that if a client tries to connect (could be a direct
forgery by the client or some webpage generated one) to say,
host1.rogue.xxx, and it resolves to 192.168.0.2 (our local server with
sensitive information), the request fulfillment with the patch would look
like this:

1. Client (who's IP is 192.168.2.20/255 - no direct connection to the
sensitive server) tries to establish an outgoing connection via the
router/firewall to a random or well-known IP (e.g. 189.190.191.192 or an IP
of google.com)
2. Firewall checks the rules and finds that it's a public IP so it allows
the connection and forwards it to the squid
3. Squid checks the IP vs. Host, finds a forgery attempt and proceeds to the
patch code
4. There it picks the IP of the resolved Host and tries to connect to the
sensitive server
5. It will succeed at this IF there is no firewall checking behind squid

If I understand it correctly, the only possible attack scenario is when a
squid server has unrestricted access to sensitive information and at the
same time the affected client does NOT have a direct access to it (if the
client DOES have it, he doesn't even go thru the proxy).

Looks like an incomplete firewall setup to me. Also please take a look at
the comments of this article (one of the original articles about the issue):
http://forums.theregister.co.uk/forum/1/2009/02/23/serious_proxy_server_flaw
/. They almost all agree it's a very UNcommon attack vector.

So, if the risk is so narrow and can easily be mitigated by a correct
firewalling of squid's outgoing connections (that should be done in any
case, think of some unknown vulnerability in squid and its unrestricted
access to sensitive information; also why should squid have access to
intranet at all?), why not to make an option (possibly with the default
value set to off) to permit this functionality? I clearly prefer to have a
rock-solid caching and one additional firewall rule than to have
unpredictable cache behavior and lots of cacheable traffic going to the
external world.

I believe the only risk with this issue is when squid admins are unaware of
it. To mitigate this there could be explicit warnings in the documentation
and at the squid's startup output (maybe at -d 2, when this option is turned
on) that this option requires some firewalling of squid's outgoing access.

The main problem is that (at least in my setup) the unpatched version
doesn't even work with windows updates sometimes, not to mention the
maxobjsize issue with almost no caching at all. When I installed squid for
the first time 3 weeks ago (solid caching was needed for one infrastructure
with bandwidth issues) and saw how low the hit rate was (like 10% of the
traffic) I thought that maybe it wasn't worth the time spent on its correct
configuration. Now, with more than 90% of traffic cached it's clearly a win.
And my assumption is that a lot of squid users are in a similar situation.

The rationale is that there are a lot of ways to secure squid, but without
this patch there are no ways for stable caching.

-------

With respect to my security note about the patch, I wanted to make it clear
that it's just a working concept of the idea, so if the developers find the
idea applicable, they would need to perform their own security checks as I'm
not an official contributor and may be unaware of some particular
implementation issues, and the sysadmins that decide to use it in production
before an official release are doing so at their own risk.

-------

With respect to the maximum_object_size, 'make' creates src/cf_parser.cci
file which has calls to config options processing like:

default_line("maximum_object_size 4 MB");

...

if (!strcmp(token, "cache_dir")) {
    cfg_directive = "cache_dir";
    parse_cachedir(&Config.cacheSwap);
    cfg_directive = NULL;
    return 1;
};
if (!strcmp(token, "store_dir_select_algorithm")) {
    cfg_directive = "store_dir_select_algorithm";
    parse_string(&Config.store_dir_select_algorithm);
    cfg_directive = NULL;
    return 1;
};

...

if (!strcmp(token, "maximum_object_size")) {
    cfg_directive = "maximum_object_size";
    parse_b_int64_t(&Config.Store.maxObjectSize);
    cfg_directive = NULL;
    return 1;
};

parse_cachedir is defined in src/cache_cf.cc at line 1914 and at line 1958
it has a call to update_maxobjsize() which limits the store_maxobjsize
variable (the internal maximum_object_size variable of the store data
structure) to the value of maximum_object_size defined at the moment of
execution of this function, for all stores (all store directories). So if
parse_cachedir is called before
parse_b_int64_t(&Config.Store.maxObjectSize), we get the effect of the
default_line("maximum_object_size 4 MB"). BUT, when we get to the
parse_b_int64_t(&Config.Store.maxObjectSize) line, the option is processed
and shown at the cachemgr config page.

The src/cf_parser.cci file is generated by src/cf_gen.cc, which is compiled
and then called by make. The compiled src/cf_gen.cc takes all the
instructions to generate src/cf_parser.cci (as well as the
squid.conf.documented and squid.conf.default files) from src/cf.data.pre.

And the order of initialization functions in src/cf_parser.cci depends on
the order of the config entries in src/cf.data.pre. A very simple fix would
be to move cache_dir entry in src/cf.data.pre to the end of the file, but
this will also affect the generated squid.conf.documented and
squid.conf.default files (nothing serious compared to not processing
correctly the maximum_object_size, but not a clean solution).

A better solution, I believe, would be to group all related options in
src/cf.data.pre and sort them according to the processing dependencies. For
the cache-related options we should group them and then place the cache_dir
entry at the end of this group. This way the documentation stays logically
grouped/sorted and the maximum_object_size problem is fixed.

Regards,
Anatoli

-----Original Message-----
From: Amos Jeffries [mailto:squid3_at_treenet.co.nz]
Sent: Tuesday, April 22, 2014 01:44
To: squid-users_at_squid-cache.org
Subject: Re: [squid-users] Strange misses of cacheable objects [SOLVED]

On 22/04/2014 11:04 a.m., Anatoli wrote:
> OK, found the problem. All the "problematic" objects are from multi-IP
> domains and sometimes the browser resolves them and sends the request to
an
> IP that is not in the list (this is for intercept mode).
>
> So, in the browser with http_watch I see that the request for
> http://www.googleadservices.com/pagead/conversion_async.js is sent to
> 173.194.118.122, but in nslookup with set debug option I see:
>
> Name: pagead.l.doubleclick.net
> Addresses: 173.194.118.45
> 173.194.118.58
> 173.194.118.57
> Aliases: www.googleadservices.com
>
> The IP resolved by the browser is not in the list!
>
> So, squid interprets this as a destination IP forgery and doesn't cache
the
> response. This behavior is documented at host_verify_strict option. By
> default it's set to off, that's why it's difficult to discover the reason.
> If you set it to on and try to download a problematic object, squid will
> return URI Host Conflict (409 Conflict) and in the access.log you'll see
> TAG_NONE/409 (additionally, with increased debug levels, you'll also see
> security alerts).

The beta releases optimistically had strict verification enabled by
default. Sadly, we had to disable it by default due to a high number of
issues seen with Google and Akamai hosted sites.

>
> This should partly explain the numerous complaints about
more-than-expected
> misses.
>
> This is actually a problem, as the IP mismatches are not due to an
> artificially crafted request, but a normal functioning of the DNS and
> different levels of its caching. The reason for IP mismatch should be the
> frequency of DNS updates for these multi-IP domains. Actually, you can
see
> with nslookup in debug mode that www.googleadservices.com has the default
> TTL of just 5 min, cdn.clicktale.net of 2 min, google.com of 1 min 25 sec
> and global.ssl.fastly.net of 25 sec. When I restart DNS Client service, I
> get a HIT from squid for almost all of the originally published
problematic
> objects without any security alerts, until the IP discrepancies start to
> appear again.
>
> So, it looks like the destination IP forgery check should be relaxed
somehow
> (for example, with /24 mask as the majority of the mismatches in the IPs
are
> in the last octet) or squid should cache for a long time all the IPs for
all
> the domains, just for this forgery check.
>

Unfortunately we are walking a very thin line in the security already
between safe/unsafe actions.

NP: It took over 2 years with multiple people getting involved and
counter-checking each other on use-cases and testing on live traffic to
reach the state we have today. So do not be discouraged by what I'm
about to say below.

> Another (at least as a temporary workaround) option would be to disable
this
> check completely as it actually poses very little risk for a correctly
> configured squid with trusted clients. At the same time, an untrusted
client
> could request a virus for some known file via his own host and make squid
> this way cache and distribute an infected file to the rest of the clients.

This is not an option. The biggest hurdle resolving this vulnerability
is that *all* clients can be hijacked or subverted - so there are no
trusted clients at all.

>
> The best option, I think, would be for the requests considered as forgery
to
> overwrite the destination IP provided by the client with one of the
resolved
> IPs for the domain in the Host field (like with client_dst_passthru off).
>

Doing this action is the vulnerability described in CVE-2009-0801.
Any client can send a forged Host header and cause the proxy to resolve
the IP to be a different one. Bypassing *firewall* IP-level protections.

How do you know the Host header contains accurate data?
There are only two guarantees:
 1) that the client was *definitely* fetching from the TCP IP:port.
 2) that the IP:port in #1 does *not* match the server DNS records.

The implication are that this is either a hijacking, or the server moved.

> And here is a patch for this. Please note I haven't done extensive
security
> issues verifications,

Please do that before posting patches to bypass security restrictions.
Particularly security restrictions which are so obviously annoying to
many people. We don't exactly like being annoying so there is always a
good reason for it when we are.
<snip>

>
> After applying this patch the hit rate increased significantly for all
types
> of objects, not only for those that match refresh_pattern options. No more
> random misses, than hits, then misses again.

NOTE: All clients behind your network are now vulnerable to a 15 line
javascript, or 6 line flash applet which can be embeded in any web page.
All it takes is one client with scripting enabled to run it and the
entire network is hijacked.

As you found already at least one of the major sources of verification
failures is an advertising service (googleadservices). Given that ad
services commonly present scriptlets written by unknown third-parties...

There are infections out there which use this vulnerability. Also a
forwarding loop DoS is just as easy to trigger as cache corruption and
has far more immediate side effects - this effect is used by at least
one security scanning software (by Trend Micro) to detect vulnerable
proxies [by crashing them].

Since you seem to have the ability to find and make patches:
 The only way we know of to safely cache these files is to add the
destination IP+port of the server where the object was fetched to the
cache key. That is expected to raise the HIT ratio somewhat by allowing
"bad" clients to get HITs without corrupting anything for "good"
clients. Lack of time to focus on it has been the main blocker in adding
that.
 Note this will still cause some extra MISS when the DNS used by Squid
and the client are out of sync - as the "bad" objects get cached one for
each untrusted origin.

Also Note that the verification should not place any restrictions to HIT
on content already in the cache. A "bad" fetch can safely be delivered a
HIT cache by an earlier "good" fetch. So sites which are cache friendly
to begin with have a much reduced likelihood of encountering a MISS from
this problem even if the do move IPs.

Unfortunately there are prices to be paid for violating protocols (in
this case TCP). Extra MISS on some traffic is one of them. Just like
losing the ability to authenticate users.

>
> Still, the adobe .exe file was not caching. So I decided to continue the
> investigations and finally found what the problem was.
>
> With adequate debug_options enabled, squid was saying that the object size
> was too big (I've added the CL (Content-Length), SMOS (store_maxobjsize)
and
> EO (endOffset) variables to the log line).
>
> 2014/04/21 00:35:35.429| store.cc(1020) checkCachable:
> StoreEntry::checkCachable: NO: too big (CL = 33560984; SMOS = 4194304; EO
=
> 268)
>
> Clearly, something was wrong with the maxobjsize, that was set in the
config
> to 1Gb and the log was reporting it being set to 4Mb (what I discovered
> later to be the default value).
>
> After some additional research, I found that in the src/cf_parser.cci file
> (generated by make) there are 2 calls to the configuration initialization
> functions for almost all the configuration options - the first one is for
> the predefined (default) values and the second one for the config file
> values. There is a function parse_cachedir (defined in src/cache_cf.cc)
that
> initializes the store data structure with the options related to the store
> (like maxobjsize), and it is called when the config parser finds cache_dir
> option in the config and it's not called again when it finds all other
cache
> related options. So, if you put in your config something like this (like
it
> was in mine):
>
> cache_dir aufs /var/cache 140000 16 256
> maximum_object_size 1 GB
>
> then the maximum_object_size option is processed and you see it at the
> cachemgr config page but it has no effect as the store data structure
> parameter maxobjsize was already initialized (with the default value) by
> parse_cachedir before parsing the "maximum_object_size 1 GB" line, so we
> have 4Mb (default) effective maximum_object_size.
>
> If we have a config with
>
> maximum_object_size 1 GB
> cache_dir aufs /var/cache 140000 16 256
>
> we get the effective maximum_object_size for the store set to 1Gb as
> expected.

Aha. Thank you for tracking this one down. That is a behaviour we have
been looking for for a while.
 I'm still a little unfamilar with the store internals though. Can you
please point me at the place you found the early initialization being done?

>
> There are warnings in the documentation that the order of config options
is
> important, but it is only explained in the context of ACLs and other
> unrelated settings. In my opinion, this is a huge problem as it is nothing
> obvious what should precede what. There should be at least a note in the
> documentation for each option affected by the order of config processing
and
> there should be a final "all effective values" output at squid
> initialization (maybe with -d 2 and higher) and of course cachemgr config
> page should show correct (effective) values.

Some people complain that dumping over 16KB to the logs (possibly
syslog) on each daemon startup is a bit unfriendly.

The cachemgr "config" report should contain all finalized configuration
settings. Unfortunately that does not show toggle-like and repeated
configuration values nicely.
If it is showing anything inacurate for the cache dir max-size=
parameters that is a bug that needs fixing.

>
> Now it is:
> maximum_object_size @ cachemgr config page: 2147483648 bytes
> Effective maximum_object_size: 4194304 bytes
>
> And a better solution would be to call parse_cachedir (and similar
> functions) at the end of the config file processing (an extremely simple
fix
> in the src/cf_parser.cci generation).

FYI: parse_*() and similar *are* the config file processing.

>
> Now, with the patch and the "correct" order of maximum_object_size and
> cache_dir (put cache_dir after all the cache-related options, including
> memory cache ones), all "problematic" objects are cached as expected and
> there is a huge (like 10-fold on average and more than 100-fold for WU and
> similar) increase in the hit rate. Rock-solid caching!
>
> Regards,
> Anatoli
>

Cheers
Amos
Received on Tue Apr 22 2014 - 10:47:19 MDT

This archive was generated by hypermail 2.2.0 : Wed Apr 23 2014 - 12:00:05 MDT