AW: [squid-users] Mixing cached and non-cached access of same URLs by session-id

From: Schermuly-Koch, Achim <a.schermuly-koch_at_cassini.de>
Date: Mon, 31 Aug 2009 16:20:28 +0200

Hi amos,

thanks for your advise so far. I am still not sure wich path to follow...

>> We are using squid as a reverse-proxy cache to speed up our website.
>> A large area of the website is public. But there is also a
>> personalized area. If a user logs into his personal site, we maintain
>> a session for the user (using standard tomcat features jsession-id
>> cookie with optional url-rewriting).

>> [...] the pages on the public area has a small caveat: If the user
>> was logged in the private area, we maintain the "logged-in" state and
>> reflect that state on public pages also (outputting "Welcome John
>> Doe" in a small box). Of course we must not cache these pages.

>> # Recognizes mysite acl MYSITE url_regex ^http://[^.]*\.mysite\.de
>>
>> # Don't cache pages, if user sends or gets a cookie
>> acl JSESSIONID1 req_header Cookie -i jsessionid
>> cache deny MYSITE JSESSIONID1
>>
>> acl JSESSIONID2 rep_header Set-Cookie -i jsessionid
>> cache deny MYSITE JSESSIONID2

>> This seemed to wor fine. Until i did a jmeter test, mixing Requests
>> with and without sessionid cookies. Is seems that if i request an
>> already cached url with a session-cookie, that the cached document is
>> flushed.

>[...]

>Of course if Squid find that it has a cached copy it will erase. Because
>the _UR_ is not to be cached. Content is not considered.

>This is NOT the right way to do privacy caching. See below for why and
>how to do it.

[...]

> The biggest surprise of all is still hiding unseen by you:

> Every other cache around the Internet visitors use maybe storing the
> private area pages!!

> This is because you use a local configuration completely internal to
> your Squid to determine what is cacheable and what is not.

> The correct way to do this is to:

> * have the web server which generates the pages add a header
> ("Cache-Control: private") to all pages which are in the private area of
> the website. This tells every shared cache (your Squid included) not to
> store the private info.

I agree with that. Do i have to configure the reverse-proxy *explicitely*
to avoid caching "Cache-Control: private" marked pages?

A problem i foresee with that solution is, if i set "Cache-Control:
private" for pages containing personalized content, they will bounce
cached pages with the same URL - but without personalized content
(rember: the page is rendered different, depending on wether the
user is in a session.)

> * have the personal adjustments to the public pages done as small
> includes so that the main body and content of the page can be cached
> normally, but the small modifications are not.
> For example I like including a small CSS/AJAX script which changes a
> generic HTML div [..]

I have thought of that, too. But i would prefer not to touch
the application.

> The HTTP way to achieve similar is to add "ETag:" header with some hash
> of the page content in it. So each unique copy of the page is stored
> separately. The personalized pages get "Cache-Control: private" added as
> well so that whole request get discarded.

That sounds interesting... Are the following assumptions correct:

The ETag would be generated by the webserver. A public page (/index.jsp)
would have _one_ ETag if rendered without and a different unique ETag for
each request (to the same /index.jsp) with a session-cookie. The cache
for the publicly cached page would be left untouched, if the response
bears a "Cache-Control: private" header but with a different ETag. That
implies, the cache is flushed when the webserver responds, not when the
client requests.

Does the Etag have to be unique resource-wide, or is it also possible
to use the same ETag for different resources (since they have
different URLs)?

Is it another "very bad idea (tm)" to reuse the same ETag for each
personalized page. I would assume, it doesn't matter since they are
marked "private" anyway?

> Some details indicate "Vary:" header for this, but basing it on the
> cookie header with a session ID inside is another very bad idea that
> will destroy your HIT rates.

> Amos

Achim
Received on Mon Aug 31 2009 - 14:20:49 MDT

This archive was generated by hypermail 2.2.0 : Mon Aug 31 2009 - 12:00:03 MDT