Re: [squid-users] SSD trim and rock store

From: Niki Gorchilov <niki_at_gorchilov.com>
Date: Sun, 2 Mar 2014 23:56:50 +0200

On Sun, Mar 2, 2014 at 1:40 AM, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 03/01/2014 09:11 AM, Niki Gorchilov wrote:
>> I'm musing on the performance implications of using rock store on SSD.
>
> There is very little experience with SSDs for Squid, especially when
> using Rock store (a relatively new feature). Folks naturally expect SSDs
> to be faster, but I have not seen (or do not recall) any high-quality
> comparisons specific to Squid and Rock store. Someday, we will make one.

I'm planing to conduct a basic performance study very soon.

>> As per my understanding the underlying filesystem is unaware of any
>> unused blocks in the big rock file, thereby using fstrim or "discard"
>> mount option will have no effect.
>
> Some Linux file systems know that the blocks are unused (at least the
> blocks at the end of the file), but unused blocks ought to be irrelevant
> for a cache in a steady state (the common/interesting case) because
> there are no unused blocks in that state.

Yep, sparse files have value for a limited time - till the rock file is full.

>> Once the rock file is full, SSD io performance will degrade
>> considerably, due to the read-earase-modify-write cycle on every rock
>> change.
>
> There should be no erase-modify steps if your rock slot size is a
> multiple of OS page and disk block sizes. Only read-write. If that is
> not what you see, it may be a bug.

Unfortunately, it's not that simple due to the following two NAND
flash oddities:
1. data pages cannot be overwritten. They have to be erased first
2. data is written by pages (usually 4K), but erased by blocks
(usually 64 pages or 256K)

Wikipedia has quite good article on the topic -
http://en.wikipedia.org/wiki/Write_amplification.

> Also, for a typical large cache, it is probably more like
> write-write-write-read-write-... cycle because a high portion of cache
> hits should come from the memory cache but nearly all cachable misses
> are written to disk.

Which is even worse, as write amplification ruins random writes most.

Best,
Niki
Received on Sun Mar 02 2014 - 21:57:38 MST

This archive was generated by hypermail 2.2.0 : Tue Mar 04 2014 - 12:00:06 MST