Re: [squid-users] Hypothetically comparing SATA\SAS to NAS\SAN for squid.

From: Marcus Kool <marcus.kool_at_urlfilterdb.com>
Date: Wed, 22 Jan 2014 13:06:19 -0200

IOs have a variable size and for writing an object to a file with the aufs store,
the OS write meta data to the file system log, updates the inode table and writes the data to a new file.
So for aufs for one logical 'write object to disk' there are 3 IOs.
I do not know the internals of rock fs but most likely only does one IO for each 'write object to disk'.

The thing is that one does not really need to dig in to the nitty gritty fine details to know what
the expected performance of the disk subsystem is and it is pretty difficult to interpret numbers
since there are many layers in the kernel (file system, LVM, block device, physical disk) and
and only looking IOPS is not appropriate since latency and suing a correctly sized test/production
system is equally important. E.g. a disk with 5000 IOPS will be considered slow if the
application demands 9000 IOPS.

If you understand that a smart RAID systems that use cache to combine IOs, buffer small peaks, do readaheads
and a lot more to get a higher overall throughput.
For the internal disk controller: the more cache and the more physical disks one uses, the better
overall IO throughput. There are knobs to turn like choice of file system, choice of RAID
and they make a difference but relatively small.
For the NAS and SAN: algorithms are usually smarter and the larger disk arrays can present
a virtual disk to a host that has a very high throughput, high IOPS and low latency.

When testing, one should test for the desired configuration, so if you want a Squid
instance that can handle N requests/sec you should configure a test system that is
subject to N requests/sec and start with a disk configuration of which you can
expect that it can cope with N requests/sec, i.e. N requests/sec produce at least N 16 KB writes
per second and N/10 16 KB reads per second. And do various tests with various
configurations: vary the number of file systems (2/4/8) and vary the configuration of
the [virtual] disk.

In a previous email you showed some worry about TCP packet loss, but for a NAS one should
not have to worry about packet loss: packet loss is something that occurs and is
normal on a WAN but occurs rarely on a LAN and if it happens on a LAN, the LAN is undersized
and must be rightsized or there is a hardware problem like a faulty cable or faulty interface.
In many environments the LAN for storage is separated from the other LAN traffic
and consequently hosts have multiple LAN cards.

The configuration options are without limits and one can even choose to use LVM to do disk striping.
It works, but generally disk striping inside a SAN or NAS works much better since there are more disks
and the disk array is smarter and can put a virtual disk with high IO demands on those
physical disks who give the highest IOPS.

Maybe I gave the wrong impression that IOPS is 'the holy performance parameter' but it is not.
IOPS is a good starting point to in calculating the minimum configuration that
is required.

A SAN or NAS with a large battery backed cache and 128+ physical disks is most likely
to be the best performer for Squid. Of course Squid will not use 128+ disks but a small
portion so this option is only valid for medium/large size companies with many
computers using SAN or NAS.

Marcus

On 01/22/2014 10:08 AM, Eliezer Croitoru wrote:
> Thanks Marcus,
>
> I indeed understood the subject in more depth.
> For now I do look for couple answers about the IOPS tools I can use.
> I will just say that on the next squid RPM release I will write about somethings related to these subjects.
> On what level would be the way to test\verify the IOPS?
> To me it seems like it's not the FS but the more lower levels of the devices.
> I am asking my self about when testing IOPS what would be the way to measure it?
> Since we have the maximum per the device and the current usage it is fairly weird to even test.
> The basic assumption when testing should be first ACCEPT and only then REJECT from my understanding.
> I heard couple things about the kernel this and the kernel that and these are like this and that but it requires more then just the basics to say that.
> It is very confusing what tool to measure with the disk quality or the disk performance.
> The kernel might be blame of couple things but I would not expect a 5400 RPM drive to be faster then the speed of light which from what I have understood is something users "want".
> is there anyone that can actually read this thing:
> $ iostat /dev/sda -d 2
> Linux 3.11.0-15-generic (eliezer-HP) 22/01/14 _x86_64_ (4 CPU)
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 5.85 95.47 136.58 128853206 184347015
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 0 0
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 1.00 0.00 20.00 0 40
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 0.00 0.00 0.00 0 0
>
>
> I am trying to understand how kB or kb or tps is converted into IOPS?
> bits I know bytes I know Kilo I know and disk that cuts metal I know.
> one bytes = 8 bits
> one Kbyte = 8 bits * 1024
> one IO = ??
>
> Thanks In Advance,
> Eliezer
>
> On 20/01/14 03:21, Marcus Kool wrote:
>>
>> The raw transfer speed of a disk is only interesting when an application
>> does
>> very large sequential I/Os and squid does not do that.
>> Squid writes a lot to disk and reads relatively little and since the
>> average object size is
>> often around 13 KB, this is also the average I/O size.
>> A better performance parameter of disks is I/Os per second (IOPS).
>> Average latency is also an interesting parameter but usually the IOPS is
>> the
>> more important parameter.
>>
>> The following numbers indicate the speed of disk systems for random 16K
>> I/O:
>> individual disk: 75-200 IOPS
>> individual SSD: 1000-60000 IOPS
>> internal RAID disk array with 12 disks and battery backed cache:
>> 600-2000 IOPS
>> high end SAN or NAS with RAID: 600-20000+ IOPS
>
>
>
Received on Wed Jan 22 2014 - 15:06:24 MST

This archive was generated by hypermail 2.2.0 : Thu Jan 23 2014 - 12:00:06 MST