Re: [squid-users] Squid to cache a DB?

From: Robert Collins <robert.collins@dont-contact.us>
Date: Sat, 18 Aug 2001 00:47:49 +1000

----- Original Message -----
From: "Mikhail Teterin" <mi@misha.privatelabs.com>
To: <squid-users@squid-cache.org>
Sent: Friday, August 17, 2001 11:55 PM
Subject: Re: [squid-users] Squid to cache a DB?

> On 16 Aug, Scott Baker wrote:
> > I'm doing some work for a rather large website
(www.livejournal.com)
> > and we're using a DB to store some image files.
They're
> > stored as blob files, but will always have the same type
URL
> > http://www.livejournal.com/userpic/3837
>
> Although you are not asking for evaluation, I can't resist to point
out,
> that this is a very flawed arrangement. Many people are doing
this --
> for programming/administrating convenience, but it adds
tremendous
> overhead to web-serving -- as you already found out. Instead of
serving
> the file directly off the disk, the web server has to transfer it
from
> the DB-server, which also has to find it within a large file...

The additional overhead from the presence of a database engine _can_ be
minimal. There is no essential difference between an efficient
filesystem and an efficent database server (those large files are often
allocated sequentially, which means there is effectively no performance
overhead once the file is open). (Consider a database table to be a
custom fs, and the parallels of data storage and manipulation should be
clear).

Having said that, all low-end database engines I know of are noticable
less efficient at retrieving such objects than the native filesystem for
the same machine. After all databases are designed to be able to query
and index their data, which is not the primary purpose for a file
system.

> Putting http-accelerator in front of the web-server (Apache has its
own
> acceleration module, BTW) will speed it up, but will make the
whole
> thing even uglier (and bulkier) -- even more resources will be needed
by
> this machine. And why? So some lame admin can have it easier? It
is
> mighty stupid to http-accel _static_ content...

This is _wrong_. The _only_ content worth accelerating is static
content. Dynamic content - changing content - will never have the same
hit ratio in a http-accelerator, and thus does not make as effective use
of the acceleration. There is a class of semi-dynamic data that is also
worth accelerating, but that is a different discussion.

Also you seme to have missed the point:
lets say the apache+db server can serve 100 static requests and 20
dynamic requests per second. Put a separate squid box - or two - in
front of that server, and given that the static content headers are set
right, at that 'peak load' the apache+db server will only be getting 50
static requests, and 20 dynamic requests per second. If the static data
has a local access pattern, then the load change may be even more
pronounced. The end result? A more scalable system, without having to
reengineer the apache+db system. It's also quite clean:
cache--web content server/generator--database
as opposed to your suggestion
web content server/generator--database
                            \-file system for blobs
which will likely have serious problems with security if those blobs are
able to have new content uploaded.

> Consider storing the files in the _file system_, as it was
meant
> from the beginning. You can store the _file names_ in the
database
> for administrative purposes, but the blobs themselves can be
served
> directly...
>
> > Basically this is causing a bottleneck on the database server.
It's
> > making a lot more requests than is necessary on the already
overworked
> > DB server. So what I want to do is put a squid box in front of the
DB
> > to cache the images as they are requested and reduce some load on
the
> > DB server...

Should work fine. Many folk do this already.

Rob

> -mi
>
Received on Fri Aug 17 2001 - 08:47:32 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:01:42 MST