Re: [RFC] shutdown stages

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 09 Aug 2011 10:49:33 +1200

 On Mon, 08 Aug 2011 11:08:17 -0600, Alex Rousskov wrote:
> On 08/07/2011 05:22 AM, Amos Jeffries wrote:
>
>> At present shutdown happens over two stages. With server socket and
>> auth
>> shutdown immediately and everything else 30 seconds or so later.
>>
>> We have a number of bugs open about this, ranging from complaints
>> that
>> idle squid still take a full 30sec shutdown when they should
>> fast-track
>> the process and close quickly. Through to crashes when helper
>> shutdown
>> or swap.state dump are started too late and not given enough time to
>> complete.
>>
>> I would like to split that process down into several steps with as
>> much
>> stuff being shut down as fast as reasonable.
>>
>> stage 1:
>> * TIMEOUT: shutdown_lifetime/2 converted to a timeout, to trigger
>> stage-2 early if needed.
>> * close ICP, HTCP, HTTP, HTTPS ports, but leave DNS open
>> * prevent keep-alive, and close all connections on completion
>> * disable the background cyclic events (ie garbage collection). we
>> can
>> trigger these "manually" as part of the shutdown if they are needed
>> at
>> certain points.
>> * begin swap.state dump of index
>> * call stage 2 immediately IF no active client connections AND
>> stage<2.
>>
>> stage 2:
>> * TIMEOUT: shutdown_lifetime/2 converted to a timeout, to trigger
>> stage-3 early if needed.
>> * disable new reads.
>> * disconnect delay pools. push any buffered information out to
>> clients
>> at full speed and terminate. (bad idea? good idea?)
>> * call stage 3 immediately IF no active client connections AND
>> stage<3.
>>
>> stage 3:
>> * terminate remaining incomplete active clients using their close
>> handlers.
>> * loop until async call queue is empty.
>> * skip to stage N immediately.
>>
>> stage N:
>> * terminate the event loop and exit.
>>
>>
>> Attached is a proposed patch which alters shutting_down into an
>> incremental stage counter and alters the documentation. But does not
>> alter the fundamentals of the current state.
>>
>> I believe we can easily add several shutdown functions to the
>> SignalEngine. Or move these to the cache manager as series of async
>> calls with the SignalEngine simply triggering the cache manager
>> shutdown
>> event.
>
> I do not know whether the proposed stages are right or wrong overall
> because I do not see what specific principle was used to divide
> things
> up this way and not some other way. I do think the concept of stages
> is
> applicable, but I would define it differently:

 So far I was thinking start the operations which may take several
 cycles of async call as early as possible. Like closing idle helpers and
 connections. Will take one maybe two loops of AsyncCall and TCP RTT.
 Kicking as many as possible during the grace period minimises the
 TIME_WAIT and RAM usage quickly.
  Similar to what you have below but more focus of resource reduction
 growth-curve.

>
> Stage 1; "Time-limited no-harm winding down": Stop accepting/creating
> new master transactions. All existing master transactions should
> start
> shutting down if possible. No harm to any existing transactions at
> this
> stage. The maximum duration of this stage is controlled by
> shutdown_lifetime in squid.conf.
>
> Stage 2; "Controlled or high-level tear down": All existing master
> transactions are forcefully terminated. All modules are asked to shut
> down if possible. If some cache_dirs need to start writing their
> index
> to disk, they will do this at this time. Event loop ends at the very
> end
> of this stage, when there are no new events to process. The maximum
> duration of this stage is unknown/unlimited because a module may take
> a
> long time to cleanup (e.g., write its index file to disk).
>
> Stage 3; "Uncontrolled or low-level tear down": Squid main() exits.
> Static destruction/cleanup starts. The maximum duration of this stage
> is
> unknown/unlimited but no slow activities are expected at this stage.
>
>
> We could limit Stage 2+3 duration by forcefully killing our own
> process,
> I guess. I do not know whether that is needed. It would require a
> separate timeout value, IMO.

 Came to a similar conclusion. Although was hoping we could avoid both.
 The only thing which does not complete in a short period is the
 swap.state dump. IIRC you have that going in steps of N objects, so at
 the queue drain stage it could still self-abort and log a failure
 encouraging a longer shutdown_lifetime.
  I see no reason why we can't push the swap.state dump to start as soon
 as the accept queue is drained. No new requests could add things to the
 index. That can be left until the rockstore diskers are merged though.

 The main push now is just to get the shutdown global counting stages
 instead of being on/off toggle. The change does not affect any of the
 code which depends on on/off behaviour. That will stay happening in the
 early stage-1 as it is now, until updated.

>
> A few specific low-level comments:
>
> 1) We must not disable delay pools. At least in some environments,
> those
> pools protect infrastructure or clients. At other environments the
> pools
> affect billing/expenses. They must be enforced at all times.

 Okay, no problem.

>
> 2) The event loop (including async call queue) should run as long as
> somebody is using them. Their too-early termination is one of the
> biggest design flaws causing bugs now. I believe you have documented
> that they should run longer, but I am not sure. I tried to be more
> explicit about it in my stage descriptions.

 My stage 2+3 is your stage 2. Due to those bug reports involving
 AsyncCall which are queued but never happen. We need some time after the
 force-close to drain the AsyncQueue(s) before we can start the forced
 exit of event->loop() and main().

 So the purpose of stage-3 being to wait for the queue to drain.

>
> 3) I would not "disable new reads" as a separate step. At some point
> (when the shutdown timeout expires), we have to kill all
> still-existing
> transactions, if any. At that point, all I/O activity will cease.
> This
> should be driven from transactions to Comm, and not by some global
> Comm
> flags, IMO.
>

 The read schedule can check if (shutdown>N) return COMM_ERR_SHUTDOWN or
 somesuch. No new globals.
 No problem either yay.

 Amos
Received on Mon Aug 08 2011 - 22:49:40 MDT

This archive was generated by hypermail 2.2.0 : Tue Aug 09 2011 - 12:00:03 MDT