Re: [PATCH] SMP Caching: Core, IPC, Shared memory cache, and Rock Store

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Sun, 11 Sep 2011 18:33:39 -0600

On 09/08/2011 10:46 PM, Amos Jeffries wrote:
> On 07/09/11 12:08, Alex Rousskov wrote:
>> Attached compressed patch contains changes related to SMP shared
>> memory cache and Rock Store support. ...
>> The patch preamble is several pages long and attempts to document all
>> major changes.

Hi Amos,

    Thank you for looking through such a big set of changes. The
attached compressed patch should address your comments sent so far. I
disagreed with just a couple of items, hopefully not controversial.
Please let me know if I missed anything. Per-comment details are below.

> src/cache_cf.cc:
> * free_YesNoNone() is useless. Please use a macro to elide:
> #define free_YesNoNone(x) // do nothing.

Replaced with an equivalent inline. We should not use macros where a
non-macro solution works well.

> src/cf.data.pre:
> * please use the IFDEF: config functionality when compiler build
> dependencies are involved.
> * please add the new DEFAULT_DOC: tag to clarify the default value.
>
> NAME: memory_cache_shared
> IFDEF: HAVE_ATOMIC_OPS
> DEFAULT_DOC: Automatic detection in SMP mode. Disabled otherwise.

I do not think we should #ifdef memory_cache_shared. The option itself
works fine whether HAVE_ATOMIC_OPS are defined or not, and the actual
list of required features to support shared memory caching is already
complex (you need more than atomics) and will probably become even more
so as we support more IPC primitives on various platforms.

Besides, having a stable set of options is better for admins that work
with Squids on multiple platforms or multiple build environments.

In general, I think we should use the following heuristics when deciding
whether a given squid.conf option should be IFDEFed:

1. If it is a part of an --enable feature, use IFDEF.
2. If it cannot build or properly function without IFDEF, use IFDEF.
3. Otherwise, do not use IFDEF.

I did add a DEFAULT_DOC line as you requested. I do not know where it
will be used so perhaps the wording should be polished further.

I also polished and detailed this option's documentation.

> NAME: disk_io_timeout
> IFDEF: HAVE_FS_ROCK
> DEFAULT_DOC: No disk I/O time limit.

I moved this option to cache_dir rock where it belongs. This was an old
TODO for other obvious reasons, and placing it in cache_dir also avoids
IFDEF complications.

> ** NP: this IFDEF addition may impact removal of the #if
> !USE_DNSSERVERS macros from src/cache_cf.cc (addition of conditions
> instead of full removal).

I reversed that change as Rock Store no longer needs time_msec parsing
functions.

> src/fs/Module.cc:
> * s/#ifdef/#if/

Fixed.

> src/fs/rock/RockFile.h:
> * please implement or erase the "XXX: rename" before merge.
> * the class also appears to be an exact on-disk bit format or not.
> Please document that clearly on the class itself. With similar warnings
> to StoreMeta about not using virtuals etc.

Done.

> src/fs/rock/RockRebuild.cc:
> * s/debugs(47,0, "/debugs(47, DBG_CRITICAL, "ERROR:/

Done.

> src/fs/rock/RockSwapDir.cc:
> * please move <iomanip> include below the squid "local" files. If it is
> required by any of the headers it should be included there.

Done.

> * s/debugs(47,0, "/debugs(47, DBG_CRITICAL, "/

Done.

> * also all the nearby critical "Failed to" need an ERROR: or FATAL:
> prefix for log monitors.

Done.

> * drop the _SQUID_MSWIN_ around mkdir(). We have a portability wrapper
> now that maps from the unix version.

Done.

> * re: "XXX: should we support resize?" - I think right now we should at
> minimum drop max_objsize down to what the map supports and warn about
> the change.

After some consideration, I no longer think we should consider
supporting dynamic resizing of a rock cache_dir because (a) it is
difficult to do safely in SMP mode and (b) it should not be normally
needed. I think we should support off-line resizing though, but that is
future work.

The above applies to both cache_dir size (which the XXX above was
talking about) and the maximum cache_dir entry size (which you commented
on) changes.

I have replaced that XXX with TODOs to handle reconfiguration better:

> Rock::SwapDir::reconfigure(int, char *)
> {
> // TODO: do not update a parameter if we cannot propagate that change
> // TODO: warn if reconfigure changes any parameter that we cannot update

> * in Rock::SwapDir::validateOptions() - the wasted space warning and
> dump should be DBG_IMPORTANT by the looks of it. Possibly using
> opt_parse_cfg_only to bump up to critical on -k parse if its that
> important.

I agree (added another XXX), but the above comment refers to the
commented out code. That code came from non-SMP Squid v3.1 where it
worked fine, but it needs significant changes before it can be enabled now.

I made the exclusion of that code clearer using #if/#endif guards.

FWIW, I consider the warning itself very useful and do intend to
reintroduce it soon. However, since other file system modules do not
warn about similar problems (which are not specific to Rock Store), I
hope we can add that feature later :-).

> * HERE << "Rock::SwapDir::createStoreIO:' - is redundant output

Removed.

> * 'DBG_IMPORTANT, "Internal ERROR:' - IMO we should display "BUG:
> blah" for this type of message. Change if you agree.

Changed.

> * "XXX: otherwise too expensive to count" - 100 cache entries is way
> too small if this is expected to be useful info. Why bother even
> producing these stats if its not?

I found them useful for debugging mapping algorithms/bugs when working
with very small cache_dirs (which usually expose many synchronization
bugs due to high contention for slots).

It is possible to constantly maintain stats for large maps as well (so
that we do not have to compute stats from scratch when asked for them).
However, since these stats are shared and, hence, have to be atomic, I
was worried about the performance impact of such stats maintenance.

I left this code and XXX as they are for now because while we lack stats
for large maps, the code is still useful for debugging and as a template
for future code if we decide to add stats for large maps and measure its
performance impact.

> src/fs/rock/RockSwapDir.h:
> * re: "return 0xFFFFFF; } /// Core sfileno maximum" - please define as
> MAX_SFILENO in Store.h. Could be useful elsewhere.

Defined as StoreFilenMax in Store.h and used everywhere(?) instead of
0xFFFFFF.

> src/ipc/Kid.h:
> * re: "TODO: processes may not be kids; is there a better place to put this?"
> - if you want to create a string array for debugs() display it will
> need a file of its own with same name as the enum and to be in the
> Ipc namespace. Otherwise Kid.h is a good place for now.

Noted, thank you. I left this as is for now because the current code is
very simple, and we may be better off waiting for "not kid" Squid
processes to start doing IPC before we try to accommodate their needs.

> src/ipc/StrandSearch.h:
> * sys/types.h is guaranteed to be included by config.h in the .cc please remove

Done. I did not change src/eui/Eui48.cc which also includes sys/types.h.

> src/ipc/mem/Pages.cc:
> * please implement the "TODO: adjust cache_mem description to say ..." and remove.

Done. There is now also more documentation about the shared memory cache
in memory_cache_shared.

> src/refresh.cc:
> * !! been looking for the details of this bug recently. Please commit
> separately so I can port to 3.1.

Done as trunk rr11721.

> src/store.cc:
> * in StoreController::maintain() please update the WARNING: to DBG_CRITICAL

Done.

> src/structs.h:
> * "///< number of disk processes required ..."
> - the "<" refers doxygen to variable on previous logical line
> (n_configured). I think you mean just /// to refer to following line
> (n_strands)

Indeed. Fixed.

> src/tests/stub_MemStore.cc and src/tests/stub_Port.cc:
> * the STUB macros provide STUB_RETVAL(x) for functions with return
> values. You may want to use that and cut out a lot of lines.

Done.

> src/tests/testRock.cc and src/tests/testRock.h:
> * ??

Removed placeholder. We will add a few tests shortly.

> I noticed quite a few TODO comments about splitting in-transit
> objects out of memory cache and the global store_table (that the
> index?). I think its about time we discussed how that and the changes
> in this patch will impact on our future goal of implementing
> collapsed forwarding in squid-3.

Let's start a new thread for that. In short, I suspect shared memory
cache and Rock Store will not prohibit collapsed forwarding. In the
worst case (which I hope we can avoid), users will have a choice between
SMP caching and collapsed forwarding, which is still better than no
choice at all.

> Extra:
> My understanding from last year was that rock would be designed to
> replace COSS. Is that still correct or are we going to have to still
> port the COSS fixes from 2.7?

Just like COSS, Rock Store is meant for high-performance environments.
My goal is to offer Rock Store as a COSS replacement option for folks
migrating from Squid2/COSS. However, depending on the deployment
environment, more work will be needed to reach that status. I cannot
claim that Rock Store is equivalent to COSS, nor can I promise that I
can satisfy every COSS user need.

Making COSS work in Squid3 would not hurt, of course, but I do not
currently plan working on that. I do have some first-hand experience
with Squid2 COSS. At the beginning of the Rock Store project, I have
decided it would be more cost-efficient to implement Rock Store rather
than rewrite COSS to fix known COSS problems. Others must balance
different constraints and may reach a different conclusion, of course.

Thank you,

Alex.

Received on Mon Sep 12 2011 - 00:34:17 MDT

This archive was generated by hypermail 2.2.0 : Mon Sep 12 2011 - 12:00:03 MDT