Re: [squid-users] Mixing cached and non-cached access of same URLs by session-id

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 21 Aug 2009 12:36:59 +1200

Schermuly-Koch, Achim wrote:
> Hi there,
>
> i am trying to configure squid for the following use-case:
>
> We are using squid as a reverse-proxy cache to speed up our website.
> A large area of the website is public. But there is also a
> personalized area. If a user logs into his personal site, we maintain
> a session for the user (using standard tomcat features jsession-id
> cookie with optional url-rewriting).

Fine.

>
> I can easily tell the private and public area apart by examining the
> URL. So no problem to configure caching for the private area.
>
> However the pages on the public area has a small caveat: If the user
> was logged in the private area, we maintain the "logged-in" state and
> reflect that state on public pages also (outputting "Welcome John
> Doe" in a small box). Of course we must not cache these pages.
>
> # Recognizes mysite acl MYSITE url_regex ^http://[^.]*\.mysite\.de
>
> # Don't cache pages, if user sends or gets a cookie
> acl JSESSIONID1 req_header Cookie -i jsessionid
> cache deny MYSITE JSESSIONID1
>
> acl JSESSIONID2 rep_header Set-Cookie -i jsessionid
> cache deny MYSITE JSESSIONID2
>
>
> This seemed to wor fine. Until i did a jmeter test, mixing Requests
> with and without sessionid cookies. Is seems that if i request an
> already cached url with a session-cookie, that the cached document is
> flushed. This is correct from a security-point-of-view: We are not
> leaking any private data. But a subsequent request without cookie to
> the very same URL will report a cache-miss (regarding the
> response-headers). Which is not so good for performance. The hit
> rate degrades with the number of requests with cookies.
>

This is what you configured:
  If request OR reply contains the session cookie.
   -> The URL is not allowed to be cached.

Of course if Squid find that it has a cached copy it will erase. Because
the _UR_ is not to be cached. Content is not considered.

This is NOT the right way to do privacy caching. See below for why and
how to do it.

>
> Next idea was to use "always_direct allow" instead of "cache deny"
> for private access on public pages. My understanding was, that squid
> bypasses all other internal processing:
>
> # Recognizes mysite acl MYSITE url_regex ^http://[^.]*\.mysite\.de
>
> # Don't cache pages, if user sends or gets a cookie acl JSESSIONID1
> req_header Cookie -i jsessionid
> alway_direct allow MYSITE JSESSIONID1
>
> acl JSESSIONID2 rep_header Set-Cookie -i jsessionid
> alway_direct allow MYSITE JSESSIONID2
>
>
> Big surprise: Even requests without cookie are alway cache-misses.
> Ok. There is no explicit "cache allow" rule. But there wasn't any in
> the former example as well. So what did happen?

Squid ACL have a small behaviour catch:
   The implicit default last rule is always the opposite allow/deny from
the last configured rule.

So this:
  squid.conf: cache deny blah
  implies:
    cache deny blah
    cache allow all

  squid.conf: cache allow blah
  implies:
    cache allow blah
    cache deny all

This works nicely most-times with the http_access and such, but can be a
bit weird on "cache" and others.

> Anyway i added an
> explicit rule at the end, hoping it would be used, if all
> "always_direct" rules where evaluated to "false":
>
> cache allow MYSITE
>
> Even bigger surprise:
>
> Now the requests containing a session-cookie are also served from the
> cache (indicated by a Cache-Hit header, and lacking the "Welcome"
> box). Which is not acceptable, because we might leak private data.

Agreed.

>
> One last effort (now poking in the dark) adding both directives:
>
> alway_direct allow MYSITE JSESSIONID1
> cache deny MYSITE JSESSIONID1
>
> Now the result is like in the very first configuration. Like
> allow_direct wasn't used at all.

It is being used but not like you think it is.

always_direct prevents Squid routing requests to a cache_peer configured
server. Instead Squid is forced to perform DNS lookups and send out a
regular forward-proxy request to any IPs found.
It has nothing to do with storage.

In a revere-proxy configuration doing "always_direct allow" is usually a
Very Bad Thing(tm) (DNS points at Squid being the publc face of the
website right?). It prevents the marked requests being routed to any of
the back-end peers. If they get there at all its a routing fluke, or
weird possibly broken DNS configuration.

>
> Please can anyone help? Is my problem solvable at all? Can someone
> shed some light on what "always_direct" is meant for (i have read
> something about cache-hierarchies...)?

Hope the above helps.

>
> Regards
>
>
> achim

The biggest surprise of all is still hiding unseen by you:

   Every other cache around the Internet visitors use maybe storing the
private area pages!!

   This is because you use a local configuration completely internal to
your Squid to determine what is cacheable and what is not.

The correct way to do this is to:

  * have the web server which generates the pages add a header
("Cache-Control: private") to all pages which are in the private area of
the website. This tells every shared cache (your Squid included) not to
store the private info.

  * have the personal adjustments to the public pages done as small
includes so that the main body and content of the page can be cached
normally, but the small modifications are not.

   For example I like including a small CSS/AJAX script which changes a
generic HTML div on the page from saying "log in <blah>" to "hello X".
Only the small script needs to be private/non-cacheable.
  That is a measure for the web page authors to decide exactly how to do
though.

The HTTP way to achieve similar is to add "ETag:" header with some hash
of the page content in it. So each unique copy of the page is stored
separately. The personalized pages get "Cache-Control: private" added as
well so that whole request get discarded.

Some details indicate "Vary:" header for this, but basing it on the
cookie header with a session ID inside is another very bad idea that
will destroy your HIT rates.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE18
   Current Beta Squid 3.1.0.13
Received on Fri Aug 21 2009 - 00:37:07 MDT

This archive was generated by hypermail 2.2.0 : Mon Aug 31 2009 - 12:00:03 MDT