Henrik Nordstrom wrote:
> Actually, as said before I think a storage model can be found that does not 
> penalize small objects, mostly eleminating the need of that ramdisk, instead 
> allowing the ram to be used for proper hot object caching.
> 
> The more I think of these issues, the more appealed I am by my old "multi 
> level cyclic filesystem" idea, which basically is a log log structured file 
> system with metadata optimized for log structures and the voilatility of a 
> cache, and log cleaners at various time intervals of the log. Such a system 
> can be made to operate close to the I/O bandwidth of modern drives. Certainly 
> so for writes, and with some intelligent data ordering and prefetching reads 
> can also be improved a lot. Maybe not in Polygraph testing, but quite 
> likely in real life.. not sure how realistic model for locality of reference 
> Polymix-4 represents. The nature of log cleaning also allows one to use idle 
> time to improve hit rate and to increase performance during peak usage.
Let me see if I get what you're pointing at by the term 'log structured':
The log stores the meta-data and object location in a fixed position on 
disk (pre-allocated large enough for the full cache_dir).
Object blocks are pre striped onto the RAW disk (possibly allocated and 
used in bigger than FS block-size stripes), or simply 'imagined' on the 
disk by the Squid log system if we are still sitting atop a traditional UFS.
The writer threads order object writes to fill in expired spots on the 
disk near where the reader threads are going to be acting next.
Possibly a separate logging thread is notified of the completion of the 
object write when the write thread returns, so that it can be added to 
the log.
If there is currently no 'easy' way to write the object the writer 
thread buffers it, or drops it based on write buffer pressure at the 
time and potentially other object characteristics (perhaps shorter 
expires time, or current popularity while it's in the mem cache).  This 
would happen if the reads are saturating the disk bandwidth, or if there 
is no read 'near' the spot we want to put the new object in.
Am I about right, or are you envisioning a more traditional logged FS 
that logs data alongside metadata into the log area in big chunks, which 
can then be written out to the normal disk area as read movement allows? 
   I lean towards the former...but I expect both would be effective.
Anyway, pretty cool.  To be honest, though, at this point I think the 
CPU usage problem is one of the biggest that Squid faces.  When we can 
get 80-100 reqs/sec from a single IDE disk (dependent on workload), and 
only 130 from two, and only 150 from three or four or five--even though 
they each have a dedicated bus...the law of diminishing returns is 
getting pretty ugly and it really hurts scalability.  Of course, CPU 
could be saved somewhat by reducing disk overhead.
> With the COSS approach and segment based space recycling one can acheive 
> similar results (not too unexpected as the basic ideas is the same), or 
> perhaps even slightly better from a I/O perspective as there is less meta 
> data, but at a higher penalty in memory usage and hit rate. The beauty of 
> coss is simplicity as it mainly needs to care about data, not metadata. 
> In-memory metadata is wastly simpler to deal with than any on-disk 
> equivalence.
True.  I like the idea of COSS for a lot of reasons...And I'm doubtful 
that any more traditional FS type could do better for small objects. 
The ideal balance, I expect, will be from the following layout:
A real hot object memory cache with lazy flushes to -> disk I/O layer
Disk I/O then selects either COSS or a more traditional FS based on 
object size (small objects go to COSS, while big objects feed into a UFS 
or logged FS).
Lazy writes can be implemented pretty easily, I think, and would give a 
pretty good improvement to throughput while hopefully only marginally 
impacting hit rate.  Right now, Squid is compelled to write every object 
to disk, leading to a /lot/ of outgoing disk bandwidth for objects that 
as you said never get used.  Someone on squid-users mentioned last week 
that he had implemented 'picky writes' for his accelerators which is 
similar--it writes an object only if the object is accessed a second 
time while it still resides in the cache_mem.  This is a simple way to 
achieve it, and probably just as effective as more complicated flushing 
decisions.  This, of course, requires a reasonable sized cache_mem and a 
policy to decide what to write and when.  I will look into adding a 
'picky writes' option to Squid to give it some testing.
> On Wednesday 21 November 2001 20.57, Joe Cooper wrote:
> 
> 
>>More tuning is needed on the RAM disk+hard disk balance.  But with RAM
>>prices being what they are, I'm definitely going to work on it some
>>more.  It might allow us to get into the networks where the money is
>>(and the only networks that are still coming online in the US--small
>>ISPs are becoming a thing of the past here).  We need at least 48Mbits
>>uplink support from a single reasonable sized box to get into those
>>markets...we're getting closer (22Mbits is our largest machine in
>>service and it's sweating most of the day), but still not there.
-- Joe Cooper <joe@swelltech.com> http://www.swelltech.com Web Caching Appliances and SupportReceived on Wed Nov 21 2001 - 17:02:03 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:14:38 MST