"Stephen R. van den Berg" <srb@cuci.nl> writes:
> Michael O'Reilly wrote:
> >Disk writes are ALWAYS done in a process context, and that process is
> >blocked for the duration. 
> 
> What happens if several processes/threads hold the mmap.  In which
> context is the write done.  Are the others allowed to continue?
It's pretty random. thread X will block on any page fault. The key
question is for how long, and that'll depend a lot on how easy it'll
be to get a free page, and how long it takes to do a disk operation. 
 
> >See above.  Consider the case when squid is doing lots of writing, a
> >portion of the mmap is paged out, a page fault is taken, it needs a
> >page to page it back in, but all the available blocks are dirty.... 
> >This happens more often than it sounds, particularly on boxes that are 
> >very busy.
> 
> Hmmm..., so you're telling me that it is likely to happen that when
> squid writes many things and fills up the buffer cache faster than
> that it could be flushed to disk, that *then* the kernel might
> decide to nibble off one of our mmaped pages, the process might
> block for an extended period when this page is being accessed and
> it tries to page it back in?
No. The kernel may nibble a page at any time. It's constantly trying
to grow the disk cache by unmapping pages that haven't been touched
for some time.
 
> So, how is this going to be worse than the same scenario without the
> mmapped file in there? [ .. ]
Good question. I don't know how much harder the kernel tries to unmap
mmappe mem, vs how hard it tries to swap.
 
> >I think you've been lucky so far... There's two cases here. SIGKILL
> >when a structure is only 1/2 written,
> 
> This has been catered for by putting the swap_file_number field
> at the *end* of the struct.  Also, by making sure that the swap_file_number
> field is updated first upon entry-deletion, and updated last upon
> entry-creation.  This should make it 100% SIGKILL (or SEGV :-) proof.
Neat. :)
 
> > and some fatal when a structure
> >crosses a page boundry and only one page gets written to disk. 
> 
> Yes, when the kernel crashes.  Normally not a very frequent event.
True.
[..] 
> >When you blithely talk about unmapping and remapping 200 and 300 meg
> >structures I get a bit worried. :) This would want to be a VERY rare
> >occurence.
> 
> On production caches which have been filled, this *is* an event which
> does not occur anymore.
Sounds good. 
 
> >See above. I would think it's [mlock] essential.
> 
> Ok, using mlock() changes the odds a bit in the write overflow case.  I.e. it
> would force the kernel to be happy with a smaller buffer-cache to play
> with, it might even make the kernel page out some of squid's other data
> structures (not part of the mmap), unless we mlock() them too.
> We end up with the same problem here, though.  We give the kernel
> less choices, is that good?  Won't it stall squid regardless, only
> this time on a buffer-write?
I guess it'll be one of those suck and it and see things.
 
> >Yes, but the squid executable is MUCH smaller, and more frequently
> >accessed than the index data. Also the swap is on a seperate disk, not 
> >on the VERY busy cache disks.
> 
> The squid swaplog file we're mmapping *should* be on a separate disk
> as well (it is here, it's on the log disk, it could be put on the swap
> disk).
Hmm. Isn't there one swaplog per cache disk?
 
> >Hmm. I'm doing over 8000 requests/min, on a Pentium-II 300, with
> >512Meg of ram. I'm running at over 80% CPU. Using async-io with 32
> >threads (16 won't keep up) with a 40Gig cache over a 6 disk array.
> 
> Interesting, what is your byte-hit-ratio?
Depending on how you measure it, about 15%.
Michael.
Ps. Don't get the idea that I think mmap shouldn't be added. I'd love
to give it a go. :) These 5 mins squid restarts are a serious pain,
and squid 1.2b23 at least has a lot of bugs that get triggered on busy
caches during restart.
Received on Tue Jul 29 2003 - 13:15:53 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:53 MST