Re: [squid-users] High load server Disk problem

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Thu, 19 Aug 2010 21:32:18 +1200

Robert Pipca wrote:
> Hi,
>
> 2010/8/18 Jose Ildefonso Camargo Tolosa <ildefonso.camargo_at_gmail.com>:
>> Yeah, I missed that last night (I was sleepy, I guess), thanks God you
>> people are around!. Still, he would need faster disk access, unless
>> he is talking about 110Mbps (~12MB/s) instead of 110MB/s (~1Gbps).
>>
>> So, Robert, is that 110Mbps or 1Gbps?
>
> We have 110Mbps of HTTP network traffic (the actual bandwidth is round
> 250Mbps, but I'm talking about HTTP only).
>
> But aufs doesn't seem to behave that well. I have it with XFS mounted
> with noatime. I ran "squid -z" on the aufs cache_dir to see if aufs
> behaves better with less objects. It does, but I still get quite a lot
> of these:
>
> 2010/08/18 21:22:20| squidaio_queue_request: WARNING - Disk I/O overloading
> 2010/08/18 21:22:35| squidaio_queue_request: WARNING - Disk I/O overloading
> 2010/08/18 21:22:35| squidaio_queue_request: Queue Length:
> current=533, high=777, low=321, duration=20
> 2010/08/18 21:22:50| squidaio_queue_request: WARNING - Disk I/O overloading
> 2010/08/18 21:22:50| squidaio_queue_request: Queue Length:
> current=669, high=777, low=321, duration=35
> 2010/08/18 21:23:05| squidaio_queue_request: WARNING - Disk I/O overloading
> 2010/08/18 21:23:05| squidaio_queue_request: Queue Length:
> current=422, high=777, low=321, duration=50
> 2010/08/18 21:23:22| squidaio_queue_request: WARNING - Disk I/O overloading
> 2010/08/18 21:41:46| squidaio_queue_request: WARNING - Queue congestion
>
> So, duration keeps growing....so the problem will occur again.
>
> Now, it seems that COSS behaves very nicely.
>
> I'd like to know if I can adjust the max-size option of coss, with
> something like "--with-coss-membuf-size" ? Or is really hard-coded?

It can be altered but not to anything big...

>
> I use the aufs cache_dir to do youtube and windowsupdate caches. So if
> I could increase max-size of the coss cache_dirs to around 100MB, I
> could leave the aufs cache_dir to windowsupdate fles only (which are
> around 300MB+). Is it possible?

No. The "buf"/slices are equivalent to swap pages for COSS. Each is
swapped in/out of disk as a single slice of the total cache. Objects are
arranged on them with temporal locality so that ideally requests from
one website or webpage all end up together on a single slice.
  Theory goes being that clients only have to wait for the relevant COSS
slice for their requested webpage to be swapped into RAM and all their
small followup requests for .js .css and images etc get served directly
from there.

Your COSS dirs are already sized at nearly 64GB each (65520 MB). With
objects up to 1MB stored there. That holds most Windows updates, which
are usually only a few hundred KB each.
I'm not sure what your slice size is, but 15 of them are stored in RAM
at any given time. You may want to increase that membuf= parameter a
bit, or reduce the individual COSS dir size (requires a COSS dir erase
and rebuild).

The rule-of-thumb for effective swap management seems to be storing 5min
of data throughput in memory to avoid overly long disk IO wait times.
Assuming an average hit-rate of around 20% that comes to needing 1min of
full HTTP bandwidth in memory either (combined: cache_mem RAM cache +
COSS membuf) at any given time.

Disclaimer: thats just my second-rate interpretation of a recent thesis
presentation on memory vs flash vs disk service. So testing is recommended.

It may also be time for you to perform an HTTP object analysis.

  This involves grabbing a period of the logs and counting how many
objects go through your proxy, grouped in regular size brackets (0-512,
-1K, -2K, -4KB, 8K, 16K, 32K ... ) etc.
[there are likely tools out there that do this for you.]

  There are three peaks that appear in these counts: one usually near
zero for the IMS requests; one in the low 4KB-128KB for the general
image/page/script content; and one around the low 1MB-50MB for video
media objects. Between these last two peaks there is a dip. IMO the
min-size/max-size boundary between COSS and AUFS should be there around
the low point of the dip somewhere.

  The bigger group of objects are popular, but too large for COSS to
swap in/out efficiently, AUFS handles these very nicely. The objects in
the smaller bump are the reverse, too small to wait for individual AUFS
swapin/out and likely to be custered in the highly inter-related webpage
bunches that COSS handles very well.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.6
   Beta testers wanted for 3.2.0.1
Received on Thu Aug 19 2010 - 09:32:26 MDT

This archive was generated by hypermail 2.2.0 : Mon Aug 23 2010 - 12:00:02 MDT