Re: [squid-users] Ramdisks

From: Joe Cooper <joe@dont-contact.us>
Date: Wed, 21 Nov 2001 18:04:56 -0600

Henrik Nordstrom wrote:

> Actually, as said before I think a storage model can be found that does not
> penalize small objects, mostly eleminating the need of that ramdisk, instead
> allowing the ram to be used for proper hot object caching.
>
> The more I think of these issues, the more appealed I am by my old "multi
> level cyclic filesystem" idea, which basically is a log log structured file
> system with metadata optimized for log structures and the voilatility of a
> cache, and log cleaners at various time intervals of the log. Such a system
> can be made to operate close to the I/O bandwidth of modern drives. Certainly
> so for writes, and with some intelligent data ordering and prefetching reads
> can also be improved a lot. Maybe not in Polygraph testing, but quite
> likely in real life.. not sure how realistic model for locality of reference
> Polymix-4 represents. The nature of log cleaning also allows one to use idle
> time to improve hit rate and to increase performance during peak usage.

Let me see if I get what you're pointing at by the term 'log structured':

The log stores the meta-data and object location in a fixed position on
disk (pre-allocated large enough for the full cache_dir).

Object blocks are pre striped onto the RAW disk (possibly allocated and
used in bigger than FS block-size stripes), or simply 'imagined' on the
disk by the Squid log system if we are still sitting atop a traditional UFS.

The writer threads order object writes to fill in expired spots on the
disk near where the reader threads are going to be acting next.

Possibly a separate logging thread is notified of the completion of the
object write when the write thread returns, so that it can be added to
the log.

If there is currently no 'easy' way to write the object the writer
thread buffers it, or drops it based on write buffer pressure at the
time and potentially other object characteristics (perhaps shorter
expires time, or current popularity while it's in the mem cache). This
would happen if the reads are saturating the disk bandwidth, or if there
is no read 'near' the spot we want to put the new object in.

Am I about right, or are you envisioning a more traditional logged FS
that logs data alongside metadata into the log area in big chunks, which
can then be written out to the normal disk area as read movement allows?
   I lean towards the former...but I expect both would be effective.

Anyway, pretty cool. To be honest, though, at this point I think the
CPU usage problem is one of the biggest that Squid faces. When we can
get 80-100 reqs/sec from a single IDE disk (dependent on workload), and
only 130 from two, and only 150 from three or four or five--even though
they each have a dedicated bus...the law of diminishing returns is
getting pretty ugly and it really hurts scalability. Of course, CPU
could be saved somewhat by reducing disk overhead.

> With the COSS approach and segment based space recycling one can acheive
> similar results (not too unexpected as the basic ideas is the same), or
> perhaps even slightly better from a I/O perspective as there is less meta
> data, but at a higher penalty in memory usage and hit rate. The beauty of
> coss is simplicity as it mainly needs to care about data, not metadata.
> In-memory metadata is wastly simpler to deal with than any on-disk
> equivalence.

True. I like the idea of COSS for a lot of reasons...And I'm doubtful
that any more traditional FS type could do better for small objects.
The ideal balance, I expect, will be from the following layout:

A real hot object memory cache with lazy flushes to -> disk I/O layer

Disk I/O then selects either COSS or a more traditional FS based on
object size (small objects go to COSS, while big objects feed into a UFS
or logged FS).

Lazy writes can be implemented pretty easily, I think, and would give a
pretty good improvement to throughput while hopefully only marginally
impacting hit rate. Right now, Squid is compelled to write every object
to disk, leading to a /lot/ of outgoing disk bandwidth for objects that
as you said never get used. Someone on squid-users mentioned last week
that he had implemented 'picky writes' for his accelerators which is
similar--it writes an object only if the object is accessed a second
time while it still resides in the cache_mem. This is a simple way to
achieve it, and probably just as effective as more complicated flushing
decisions. This, of course, requires a reasonable sized cache_mem and a
policy to decide what to write and when. I will look into adding a
'picky writes' option to Squid to give it some testing.

> On Wednesday 21 November 2001 20.57, Joe Cooper wrote:
>
>
>>More tuning is needed on the RAM disk+hard disk balance. But with RAM
>>prices being what they are, I'm definitely going to work on it some
>>more. It might allow us to get into the networks where the money is
>>(and the only networks that are still coming online in the US--small
>>ISPs are becoming a thing of the past here). We need at least 48Mbits
>>uplink support from a single reasonable sized box to get into those
>>markets...we're getting closer (22Mbits is our largest machine in
>>service and it's sweating most of the day), but still not there.

-- 
Joe Cooper <joe@swelltech.com>
http://www.swelltech.com
Web Caching Appliances and Support
Received on Wed Nov 21 2001 - 17:02:03 MST

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:38 MST