Re: [squid-users] Generell Squid setup

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 31 Aug 2012 01:26:57 +1200

On 30/08/2012 2:13 a.m., Farkas H wrote:
> Hi Amos,
> thanks for your response.
> My part is the web server in the middle [WS] providing services to
> process data. Users send requests via http-post with embedded http-get
> requests to the web server. I don't want to touch this for the moment.
>
> The web server sends the embedded http-get requests to remote servers
> (not mine), receives the requested data, processes the data and
> returns the result.
> I want to cache the data of the remote servers. I think it's necessary
> to redirect the http-get output of the web server to Squid. I would
> say Squid should be behind the web server and not in front like a
> reverse proxy but I'm not a specialist. What is your opinion? Is there
> a chance to do this (without coding)?
> I appreciate any advice.
> Thanks, Farkas

Squid does not do any semantic re-writing such as you are wanting. It is
too dangerous in HTTP, even if you happen to have stumbled on a
application which does not break when its done.

Your proposed configuration #1 should do what you want (#2 will not).

Or an ICAP service doing the [WS] re-writing operation and supplying
Squid with adapted requests will might be another way to achieve this.
But I'm not so sure about ICAP being workable either, POST are marked
non-cachale by Squid for now.

Amos

>
>
> On 28 August 2012 12:20, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
>> On 25/08/2012 8:41 a.m., Farkas H wrote:
>>> Hi list,
>>>
>>> I'm a little confused about the various configuration options of
>>> Squid. I have the following setup:
>>> Internet clients <-> remote Web server [WS] <-> different remote Web
>>> servers [R1], ..., [Rn]
>>> [WS] processes the data; [R1], ..., [Rn] provide the data
>>>
>>> The clients send requests via http-post to [WS].
>>> [WS] translates the requests and retrieves the required data from
>>> [R1], ..., [Rn] via http-get. [WS] processes the data and sends the
>>> responses to the clients.
>>>
>>> The (requests of [WS] and) the responses of [R1], ..., [Rn] should be
>>> cached (inside [WS] surrounding).
>>> The number of web servers [R1], ..., [Rn] is relatively small. This
>>> should lead to many cache hits.
>>
>> Cache HITs is related to URL space range, not server count. For example
>> Wikipedia has a great many servers all serving the same content, they get
>> HIT ratio up near 100% sometimes since the client requested URLs are all for
>> the one website and usually some "trending" articles.
>>
>> But since these are "delivery" operations which are being cached and served
>> from cache ... the server will never receive the HITs, will never be able to
>> update its state according to their receipt. Resulting in possibly very
>> broken, very client-visible behaviours unintended by the site designer(s).
>>
>>
>>> I have two suggestions for discussion:
>>> (1) normal Squid cache; [WS] acts as a kind of client; [WS] is the
>>> only client of Squid Proxy; the requests of [WS] would have to be
>>> redirected programmatically to Squid Proxy,
>>> (2) reverse proxy (with httpd-accelerator mode).
>>>
>>> Are these options suitable? Which (other) squid setup would you recommend?
>>> Is (1) possible without programming?
>>> Which configuration (from http://wiki.squid-cache.org/ConfigExamples)
>>> should be chosen for (1) or (2)?
>>
>> Do you own those websites or are providing CDN services to their owners?
>> choose (2) - it will pass through the requests unchanged.
>>
>> Are you ISP for those clients? choose (1), but...
>>
>>
>> Are you aware of the difference between HTTP POST and GET semantics? and how
>> that determins very different caching, security, and failure recovery
>> models?
>> Why are you re-writing these critical semantics in a relay?
>>
>> Amos
Received on Thu Aug 30 2012 - 13:27:08 MDT

This archive was generated by hypermail 2.2.0 : Thu Aug 30 2012 - 12:00:04 MDT