Re: Cobalt from Gideon Glass on 1998-10-03 (squid-users)

From: Gideon Glass <gid@dont-contact.us>
Date: Sat, 3 Oct 1998 20:26:26 -0700 (PDT)

>
> The pII box would probably do better than the single CacheRaQ. It has
> more disk bandwidth so disk writes would go faster, leaving the disks
> less busy and hence more ready to handle disk reads. The added memory
> would also help -- more object data would be cached in main memory.
> However, for disk reads, including disk reads necessary for open
> calls, squid 1.1.x will block, so more spindles isn't necessarily
> going to buy you much. My understanding is that with Squid 2 and
> async I/O, this is no longer a problem.

async I/O as in an async filesystem? Is that a reality on linux yet?

I believe the squid async I/O stuff works fine on linux. I haven't
tried it myself since I've been busy with other things, but I am
pretty sure that the threaded version works fine. More info is at

http://squid.nlanr.net/Squid/FAQ/FAQ-19.html#ss19.4

    Ultimitley, the plan is to grow a "farm" of Cobalts. I like the idea of
    multiple processors/machines for speed and redundancy. Could you comment
    for example, would 2 CacheRaQ's of the above config do better than my
    squid box? I know my RAID0 Ultra/Wide disk array is superior to an

Assuming you're running squid 1.1.x, I think 2 cacheraqs would do
better than the single PII due to synchronous opens/reads. I don't
know what the answer is if you run squid 2. Those 4 disks may still
be better than 2 disks + 2 cpus, but with 3 cacheraqs it seems less
likely and with 4 I really doubt it.

    UltraATA single drive, but I was thinking some of whats lost in the cobalt
    could be made up with the 64bit processor, dma io, and possibly a farm of
    cobalts (a la alteon).

Dave Miller has tuned our Ultra DMA disk I/O (and other things) quite
well. As an example, we do better on SPECweb than a lot of other
single-disk systems (including x86 linux 2.0.x on 200Mhz Pentium with
the same version of apache); I don't have numbers convenient but if
you're curious email me and I can dig them up. Our processors aren't
actually that powerful, but for almost all of our applications
it doesn't matter. (One exception is big perl cgi scripts.)

Actually, I don't know that aggressive UDMA buys us much. Web caching
inherently involves accessing a large number of objects with limited
locality of reference. As a result, web caching is mostly disk seek
limited. (Benchmarks I've done comparing 7200RPM drives, which is
what we actually ship in CacheQube, with 5400RPM drives shows very
modest performance improvement for 7200RPM, on the order of a couple
percent. But we got the drives for a decent price, and they were
about the right size, so we used them...) With squid 1.1.x, we're
also limited by the synchronous open calls, but this problem will go
away fairly soon as we move to Squid 2.

> In the following benchmark conditions
>
> - 200MB of content through a 64MB RAM CacheQube
> - 40% hit rate
> - 10KB mean document size

    We see about 43% on our box (above squid config). What I am concerned
    with mostly, is disk thrash and latency. Do you feel the single disk of
    the CacheRaQ is going to lead to hellacious disk thrashing that will cause
    pages to be served with high latency?

With high throughput yes, you could become disk bound and start to see
high latency. The stats on the cacheqube/cacheraq give you hit vs miss
latency, and also if it's really bad, you can tell from direct
experience with a browser. If this happens, the solution is to
increase the number of cache servers.

    Also, when using multiple CacheRaQ's, say in a farm off an Alteon, how do
    they communicate? Is it standard ICP?

The switch takes care of everything (no ICP is required). When a
new HTTP request comes in, it looks at the destination IP address
(i.e. the HTTP server's IP address). The switch runs that address
through a hash function which picks one of your cache servers.
The switch then forwards the HTTP request to that server. The cache
server, operating transparently, gets the request and handles it as it
normally would. For all active HTTP connections (i.e. TCP connections
on port 80 from a given client to a given server), the switch knows
which cache server is involved, so it can route the IP packets for any
given connection through to the proper cache server.

This scheme requires that the cache servers support transparent
caching, but all the cache vendors I know about do support
transparent caching. I'm not sure about all-software proxy caches
(MS Proxy, NS Proxy, Netcache SW version).

>
> squid spends about 60% of its time in open(2). Some of the calls
> take upwards of 250ms. Clearly there is motivation to move to Squid 2.

    And I am assuming that Cobalt is moving to squid2, which will be a free
    upgrade to those of us that buy the Cobalt with a squid 1.1.22 on it? Do
    you have a timeline?

I don't have a timeline for Cobalt moving to Squid 2, but I doubt we
would charge for it.

    I like the idea of the Cobalt being inexpensive, and allows you to grow a
    farm instead of outright buying a box (such as a NetCache, CacheFlow,
    sparc+inktomi, etc). I am very curious on benchmarking statistics against
    those platforms. Say a 2x, 3x or 4x Cobalt CacheRaQ against a single

I think a standardized benchmark ala SPECweb would be great to compare
all these systems. Hopefully such a benchmark would be harder to
circumvent than SPECweb96, where you can get away with bad disk I/O
by adding lots of memory. Anyway, when something is available we'll
certainly release numbers.

    CacheFlow, NetCache etc. The price would be cheaper, the aggregate disk
    space, processing power etc would be greater in favor of the Cobalt, and
    you would have the redundancy in case of failure.

Indeed..

gid
Received on Sat Oct 03 1998 - 20:28:28 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:42:19 MST