RE: [squid-users] Squid to cache a DB?

From: <sean.upton@dont-contact.us>
Date: Mon, 20 Aug 2001 11:03:01 -0700

Many good points, esp. in regards to the advantages of a proxy when scaling
to multiple machines for static content, cause with something like Tux, I
would need to have lots of fast, expensive local storage on my web server
nodes that I would have to run rsync on regularly between the nodes
(otherwise NFS would be my bottleneck); I avoid that with squid... I am
sure that my 2 squid accelerator nodes with decent hardware will outperform
my inexpensive web nodes in this case.

I suspected that my application case was what you considered "semi-dynamic,"
which is a pretty good way to describe its nature, now that I think about
it... Where squid really shines is that I can accelerate both this and
static content well with one general purpose framework; the only constraint
is to make sure that the application design was relatively cache-safe, which
is something every web-app developer should think about anyway (but most
likely don't).

>mod_gzip is neat. I've alpha-quality transfer-encoding running with
>squid, as content encoding alteration is not recommended for proxies. In
>acceleration mode however, it's much more allowable - someday I'll get
>onto that :].

That's really cool! What would be neat, perhaps in the future, is if squid
in http_accel mode could support any particular encodings as requested by
the browser, and cache the translation to such encoding(s) on a per-object
basis... I think that mod_gzip just added a concept of pre-compressed pages
as an alternative to on-the-fly compression, but it looks like that has some
issues with the way Apache deals with serving files; I'm sure something like
this would be much more graceful if (hypothetically) implemented in a
caching http accelerator instead...

Sean

-----Original Message-----
From: Robert Collins [mailto:robert.collins@itdomain.com.au]
Sent: Friday, August 17, 2001 6:44 PM
To: sean.upton@uniontrib.com
Cc: squid-users@squid-cache.org
Subject: RE: [squid-users] Squid to cache a DB?

On 17 Aug 2001 12:05:43 -0700, sean.upton@uniontrib.com wrote:
> Robert Collins wrote:
> >The _only_ content worth accelerating is static
> >content. Dynamic content - changing content - will never have the same
> >hit ratio in a http-accelerator, and thus does not make as effective use
> >of the acceleration. There is a class of semi-dynamic data that is also
> >worth accelerating, but that is a different discussion.
>
> I'm not sure if I completely agree with this... I get the feeling that for
> purely static content, unless you can benefit from a cache hierarchy, an
> accelerated http server like TUX w/ Zero Copy kernel patches is going to
> serve those static files quicker (or for that matter a farm of nodes like
> this).

Sure. Thats getting into the boundary line between web server and
content distribution though :]. I'm not saying that squid is a panacea
though.. Lets split this logically into
client-side nodes and
data storing nodes.
And lets also propose that we have 50Gb of data, of which 5Gb is in
active use - but we can't predict day to day which 5Gb :}.

* Flexible disk allocation
Tux, while soooo fast, doesn't adjust it's content automatically, so you
need a lovely disk virtualization environment or many servers with >50Gb
of disk. (This may or may not be an issue - but the nice thing about the
1ru servers available today, is that while you only get 36-72Gb
(depending on whether you mirror or not) pernode, you can get a ton of
them into a rack.
Squid, give each node 10Gb to play with, and you'll stay very close to
optimal. Now squid isn't as fast as tux - but it's being worked on :}.
For smaller datasets squid will probably be as cost-effective. - but on
to the next point -

* Content management
Tux - rsync or virtualised disks or something similar to one of those
two.
Squid - nothing needed.

> Or for static text/html, Apache with mod_gzip. There must be a

mod_gzip is neat. I've alpha-quality transfer-encoding running with
squid, as content encoding alteration is not recommended for proxies. In
acceleration mode however, it's much more allowable - someday I'll get
onto that :].

> reason some of us running caching accelerators are doing just that, given
> all the other options available out there: that reason is "predictable"
> dynamic content, of which much is, in fact, cacheable. Perhaps this is
> what you mean by 'semi-dynamic' data? IHMO, using Squid as an accelerator
> provides the best balance for accelerating the widest range of content for
> many applications, including static and dynamic content.

Yes. Search result caching for example, or the data from a discussion
group - it changes, but not on every request. I agree completely. The
point I was making about static data is that it has a moderate
management overhead for all non-acceleration solutions, and acceleration
solutions quite neatly auto-adjust to changing conditions.

> My company, for example, uses app servers that dynamically publish
content,
> which generally is the same for all users who browse or search the site.
> Everything, for example, in one of our newest applications is
> cache-friendly: search results and browsing are all dynamic, CPU-intensive
> database driven events, and we use GET requests for everything, which
means
> near everything is cachable.
>
> The difficulty, of course, is that a certain _class_ of dynamic data is
not
> cacheable: anything heavily personalized; some of this limitation can be
> overcome. Small amounts of personalization can (in a limited sense) be
done
> on the client-side with Javascript and cookies. For example, in
e-commerce,
> someone's shopping cart view page is NOT cached, and it sets a cookie for
> the number of items in the cart every time it is refreshed. Other
'catalog
> viewing' pages (i.e. looking at an entry for a book on Amazon) on the site
> can be cached, but a message at the top of the page saying 'you have 7
items
> in your cart' could be done from the client side (via scripting) from a
> cached page because of a previously set cookie... I guess what I am saying
> is that caching requires app design considerations in dynamic content, but
> that this is a very appropriate use-case for a proxy cache as an http
> accelerator.

Yes. Here you've really crossed the boundary into truely dynamic data.

> And the HIT ratios are good: we have an online newspaper classified ad
> search system that searches about 18-20k ads at any given time... that
setup
> behind squid as an accelerator averages about an 88% HIT ratio (including,
> of course, images); I would estimate that at least 80% of the most popular
> 'entry' and 'browse' page views are cached, and 30% of search result lists
> are cached. I don't think this is too bad (especially since most page
views
> in our application are search/browse result lists involving catalog
queries
> / BTree traversals in an object database), because the ones that do get
HITs
> are the most demanded by our users: the most popular content will also be
> the fastest.

Excellent stats there.

> One might say, caching like this should be done within the app server you
> are using.

Thats only got marginal use - if when you find you have to scale to
mutliple app serversm you lose 50% of that benefit :].

> Sure, but why not cached at the proxy too? The app server we
> use (Zope) has cache managers for both internal RAM-based caching of
> executed code, as well as cache managers for HTTP headers used in an http
> accelerator like Squid.
>
> I guess I see a lot of value in using Squid as an accelerator for dynamic
> content. I'm sure others' mileage varies...

I was a bit harsh in my comment - but what you are describing as dynamic
I would describe as semi-dynamic - this is my other discussion. And I
see *huge* value in accelerating dynamically generated content that does
not change on every request.

Rob

> Sean
>
> =========================
> Sean Upton
> Senior Programmer/Analyst
> SignOnSanDiego.com
> The San Diego Union-Tribune
> 619.718.5241
> sean.upton@uniontrib.com
> =========================
Received on Mon Aug 20 2001 - 12:59:40 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:01:51 MST