Re: memory mapped store entries

From: Stephen R. van den Berg <srb@dont-contact.us>
Date: Tue, 25 Aug 1998 04:19:47 +0200

Stewart Forster wrote:
>Stephen van den Berg wrote:
>> The process shouldn't be waiting on this, since the kernel would be
>> doing this in the background (if at all; since we're the only one opening
>> the file, the system could keep all pages in memory until we terminate
>> and then write them back).

> I'm unclear as to how the kernel handles the flushing of memory
>pages out to disk. If this is in progress will the kernel block the
>current page access until that flush is complete? I thought it did,

How would it do this? The kernel doesn't really have a way of arresting
a process if it writes to a mmapped page it currently writes back to disk.
Also, why would the kernel want to do this?
The kernel simply sees a dirty page, schedules a write, and is done with
it. The write will take effect in the background, grabbing whatever
data is in there at that moment. If the userprogram scribbled into
the page in the meantime, then so be it, the new scribbles jump over
to the disk.

>since to do otherwise would require the kernel to copy the data and then
>write that out.

Why copy?

>> The process would stall if it had to be paged in from disk, yes. But,
>> since we're the only one that have the file open, apart from the initial
>> read when we read in the file the first time, there shouldn't really
>> be any disk reads since everything can be kept in memory.

>So you're saying that you want to mlock() all these pages into RAM so they
>will never get paged out and cause delays?

It would be an option, but I don't think it's necessary.

Original scenario (no mmap):
- We have a large array of entries which are accessed frequently by
  the program and therefore have a low probability of being swapped out
  given the fact that we have more memory than is needed to fit
  everything in. Occasionally some page out of this large array might
  get written to swap anyway, but these will be random incidents.

New scenario (with mmap):
- We have a large array of entries which are accessed frequently by
  the program and therefore have a low probability of being dropped out
  of the buffer cache. In fact, the probability of something being dropped
  out of the buffer cache is reduced even further because we effectively
  have more memory available than in the case without mmap (we share some
  things with the kernel's buffer cache under an intelligent kernel).
  Occasionally some page out of this large array might get dropped from
  memory anyway, but these will be random incidents.

Correct me if I'm wrong, but didn't I just demonstrate that the page-in/out
behaviour of the mmapped case is going to be better (marginally) than
the behaviour without mmap?

>> >I vote VERY strongly against any form of mmap() data access unless it is
>> >wrapped within a thread to separate page fault activity away from the main
>> >thread.

>> Do you know of any kernels which do not deal with these mmap'd files
>> intelligently as conjectured above?

>I'm concerned with mmap() as a whole for any OS.

I'd agree that there are some *very* bad mmap() implementations around,
but they're steadily improving all over the place, so we might as well
start taking advantage of it (carefully).

>> I guess we'd have to maintain separate mmap/no-mmap cases anyway.
>> BTW, on Linux (2.0.35), this particular mmap approach appears to be showing
>> benefits only, no noticeable drawbacks so far.

>Got some figures? How have you measured this?

Well, in the low memory situation (tested back in December 1997), a squid
*with* these patches ran noticeably faster (mostly because of the
reduced memory footprint; i.e. instead of swapping sometimes, it was
out of the swap always). Other than that, I've simply assumed that since
it reduces memory useage, it must be good for the speed of the system
overall.

> What about consistency in
>the event of a crash?

Works like a charm. I can "kill -9" my squid any time I want, it
will come back smooth and fast with *no* false URLs and no race conditions.
As to kernel-crash consistency, I have to admit that I don't have
much experience with that due to the simple fact that my proxy server's
kernel (Linux) has not crashed since more than a year.
If you're concerned, you could build in checkpoints, where squid tells
the kernel to sync mmap and disk content *now* (this will happen
asynchronously again).

> What about needing to extend the file which mmap()
>can't do transparantly? (Sure we can just pre-allocate that one)

Not much of a problem. I preallocate a bit (adaptively), then when
the limit is reached, a simple unmap and new mmap will do nicely.
This will repeat a few times while your cache is filling the first
time. After having reached a steady state, there is no fluctuation
there anymore.

> Are we
>happy to mlock() pages into RAM to ensure mmap()'s performance?

Like I said, I don't think this will make much of a difference.

>How about under extremely high loads where disks are busy and take up to
>100ms to get a page in/page out and that 100ms means 10 requests that you
>can't service while that happens?

This will be just as bad in the case of the occasional swapped out page.

>I'm happy to admit that mmap() will provide some benefits under lowish loads.
>My concern is always at the bleeding edge of performance and I'm ever happy
>to sacrifice some low-end speed by 5-10% if it means high end speed is 10%
>faster.

I'm servicing 400 requests per minute average on peak time, using a
Pentium 133 and 192MB of RAM
Available buffercache: 118MB
Resident size of squid: 29MB+46MB
Mmapped storeSwapLogData + shared libs: 29MB

Memory in use according to malloc: 39MB
Allocated memory according to malloc: 46MB
Total accounted for by squid: 35MB

Storage swap size: 6.9GB
Storage Mem size: 14MB
StoreEntries in use: 568552 15MB (28 bytes each)
StoreEntries allocated: 573117 15MB
storeSwapLogData (mmapped): 568552 23MB (44 bytes each)

I'm not sure if this qualifies as low-load or high-load. I can tell you
that it works, and I can also tell you that also works with 96MB of RAM
probably, but not without the mmap patch.

-- 
Sincerely,                                                          srb@cuci.nl
           Stephen R. van den Berg (AKA BuGless).
This signature third word omitted, yet is comprehensible.
Received on Tue Jul 29 2003 - 13:15:52 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:52 MST