Re: AW: [squid-users] Mixing cached and non-cached access of same URLs by session-id

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 01 Sep 2009 22:29:38 +1200

Schermuly-Koch, Achim wrote:
> Hi amos,
>
> thanks for your advise so far. I am still not sure wich path to follow...
>
>
>>> We are using squid as a reverse-proxy cache to speed up our website.
>>> A large area of the website is public. But there is also a
>>> personalized area. If a user logs into his personal site, we maintain
>>> a session for the user (using standard tomcat features jsession-id
>>> cookie with optional url-rewriting).
>
>>> [...] the pages on the public area has a small caveat: If the user
>>> was logged in the private area, we maintain the "logged-in" state and
>>> reflect that state on public pages also (outputting "Welcome John
>>> Doe" in a small box). Of course we must not cache these pages.
>
>>> # Recognizes mysite acl MYSITE url_regex ^http://[^.]*\.mysite\.de
>>>
>>> # Don't cache pages, if user sends or gets a cookie
>>> acl JSESSIONID1 req_header Cookie -i jsessionid
>>> cache deny MYSITE JSESSIONID1
>>>
>>> acl JSESSIONID2 rep_header Set-Cookie -i jsessionid
>>> cache deny MYSITE JSESSIONID2
>
>>> This seemed to wor fine. Until i did a jmeter test, mixing Requests
>>> with and without sessionid cookies. Is seems that if i request an
>>> already cached url with a session-cookie, that the cached document is
>>> flushed.
>
>
>> [...]
>
>> Of course if Squid find that it has a cached copy it will erase. Because
>> the _UR_ is not to be cached. Content is not considered.
>
>> This is NOT the right way to do privacy caching. See below for why and
>> how to do it.
>
> [...]
>
>> The biggest surprise of all is still hiding unseen by you:
>
>> Every other cache around the Internet visitors use maybe storing the
>> private area pages!!
>
>> This is because you use a local configuration completely internal to
>> your Squid to determine what is cacheable and what is not.
>
>> The correct way to do this is to:
>
>> * have the web server which generates the pages add a header
>> ("Cache-Control: private") to all pages which are in the private area of
>> the website. This tells every shared cache (your Squid included) not to
>> store the private info.
>
> I agree with that. Do i have to configure the reverse-proxy *explicitely*
> to avoid caching "Cache-Control: private" marked pages?

No, the proxy will avoid caching them by default.

>
> A problem i foresee with that solution is, if i set "Cache-Control:
> private" for pages containing personalized content, they will bounce
> cached pages with the same URL - but without personalized content
> (rember: the page is rendered different, depending on wether the
> user is in a session.)

Yes, this is a problem in some versions of Squid. Proper ETag supporting
Squid will/do not have this problem. Though Squid-2 series handle ETag
better than Squid-3 at present.

>
>> * have the personal adjustments to the public pages done as small
>> includes so that the main body and content of the page can be cached
>> normally, but the small modifications are not.
>> For example I like including a small CSS/AJAX script which changes a
>> generic HTML div [..]
>
> I have thought of that, too. But i would prefer not to touch
> the application.

Okay then you are stuck with the CC:private and ETag to work with.

>
>> The HTTP way to achieve similar is to add "ETag:" header with some hash
>> of the page content in it. So each unique copy of the page is stored
>> separately. The personalized pages get "Cache-Control: private" added as
>> well so that whole request get discarded.
>
> That sounds interesting... Are the following assumptions correct:
>
> The ETag would be generated by the webserver. A public page (/index.jsp)
> would have _one_ ETag if rendered without and a different unique ETag for
> each request (to the same /index.jsp) with a session-cookie. The cache
> for the publicly cached page would be left untouched, if the response
> bears a "Cache-Control: private" header but with a different ETag. That
> implies, the cache is flushed when the webserver responds, not when the
> client requests.
>
> Does the Etag have to be unique resource-wide, or is it also possible
> to use the same ETag for different resources (since they have
> different URLs)?
>
> Is it another "very bad idea (tm)" to reuse the same ETag for each
> personalized page. I would assume, it doesn't matter since they are
> marked "private" anyway?

Theoretically you are right, it _should_ not matter. However in practice
the proxies when seeing 'private' may discard all copies of objects at
the URL. Squid uses its limited ETag support to get around that issue.
So the ETag which are marked private always get discarded even is
previously marked public, but the others not discarded.

ETag is meant to identify a unique copy of each object at a URL. The
compressed vs non-compressed version and the personalized vs
non-personalized versions.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE6 or 3.0.STABLE18
   Current Beta Squid 3.1.0.13
Received on Tue Sep 01 2009 - 10:36:06 MDT

This archive was generated by hypermail 2.2.0 : Tue Sep 15 2009 - 12:00:02 MDT