Re: memory mapped store entries

From: Michael O'Reilly <michael@dont-contact.us>
Date: 26 Aug 1998 09:51:30 +0800

"Stephen R. van den Berg" <srb@cuci.nl> writes:
> Michael O'Reilly wrote:
> >Disk writes are ALWAYS done in a process context, and that process is
> >blocked for the duration.
>
> What happens if several processes/threads hold the mmap. In which
> context is the write done. Are the others allowed to continue?

It's pretty random. thread X will block on any page fault. The key
question is for how long, and that'll depend a lot on how easy it'll
be to get a free page, and how long it takes to do a disk operation.
 
> >See above. Consider the case when squid is doing lots of writing, a
> >portion of the mmap is paged out, a page fault is taken, it needs a
> >page to page it back in, but all the available blocks are dirty....
> >This happens more often than it sounds, particularly on boxes that are
> >very busy.
>
> Hmmm..., so you're telling me that it is likely to happen that when
> squid writes many things and fills up the buffer cache faster than
> that it could be flushed to disk, that *then* the kernel might
> decide to nibble off one of our mmaped pages, the process might
> block for an extended period when this page is being accessed and
> it tries to page it back in?

No. The kernel may nibble a page at any time. It's constantly trying
to grow the disk cache by unmapping pages that haven't been touched
for some time.
 
> So, how is this going to be worse than the same scenario without the
> mmapped file in there? [ .. ]

Good question. I don't know how much harder the kernel tries to unmap
mmappe mem, vs how hard it tries to swap.
 
> >I think you've been lucky so far... There's two cases here. SIGKILL
> >when a structure is only 1/2 written,
>
> This has been catered for by putting the swap_file_number field
> at the *end* of the struct. Also, by making sure that the swap_file_number
> field is updated first upon entry-deletion, and updated last upon
> entry-creation. This should make it 100% SIGKILL (or SEGV :-) proof.

Neat. :)
 
> > and some fatal when a structure
> >crosses a page boundry and only one page gets written to disk.
>
> Yes, when the kernel crashes. Normally not a very frequent event.

True.

[..]
> >When you blithely talk about unmapping and remapping 200 and 300 meg
> >structures I get a bit worried. :) This would want to be a VERY rare
> >occurence.
>
> On production caches which have been filled, this *is* an event which
> does not occur anymore.

Sounds good.
 
> >See above. I would think it's [mlock] essential.
>
> Ok, using mlock() changes the odds a bit in the write overflow case. I.e. it
> would force the kernel to be happy with a smaller buffer-cache to play
> with, it might even make the kernel page out some of squid's other data
> structures (not part of the mmap), unless we mlock() them too.
> We end up with the same problem here, though. We give the kernel
> less choices, is that good? Won't it stall squid regardless, only
> this time on a buffer-write?

I guess it'll be one of those suck and it and see things.
 
> >Yes, but the squid executable is MUCH smaller, and more frequently
> >accessed than the index data. Also the swap is on a seperate disk, not
> >on the VERY busy cache disks.
>
> The squid swaplog file we're mmapping *should* be on a separate disk
> as well (it is here, it's on the log disk, it could be put on the swap
> disk).

Hmm. Isn't there one swaplog per cache disk?
 
> >Hmm. I'm doing over 8000 requests/min, on a Pentium-II 300, with
> >512Meg of ram. I'm running at over 80% CPU. Using async-io with 32
> >threads (16 won't keep up) with a 40Gig cache over a 6 disk array.
>
> Interesting, what is your byte-hit-ratio?

Depending on how you measure it, about 15%.

Michael.

Ps. Don't get the idea that I think mmap shouldn't be added. I'd love
to give it a go. :) These 5 mins squid restarts are a serious pain,
and squid 1.2b23 at least has a lot of bugs that get triggered on busy
caches during restart.
Received on Tue Jul 29 2003 - 13:15:53 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:53 MST