Robustness project adjustments

From: Alex Rousskov <rousskov@dont-contact.us>
Date: Sun, 09 Mar 2008 11:06:58 -0600

Hello,

    This email describes upcoming changes in the Squid3 robustness
project. The Squid3 robustness project has two related goals:

- Prevent Squid from crashing when a non-critical assumption about
transaction state fails. Currently, most such assumptions are expressed
using fatal assertions.

- Provide clean transaction termination code in the presence of errors.
Currently, many transactions are very difficult to terminate cleanly if
the termination condition is detected deep inside the calling stack.

Christos Tsantilas has implemented the first version of the relevant
code, with a few design bugs contributed by me. In the first version,
assert() code would optionally threw exceptions instead of aborting
Squid. Transactions that could deal with exceptions (e.g., ICAP) would
terminate cleanly. Others would abort Squid when the exception is
propagated to the transaction boundary. Christos has also worked on
making non-ICAP transactions more exception-friendly. The feature was
enabled by default and controlled via squid.conf.

The current design has been discussed at the London meeting, and a few
adjustments were requested. This email attempts to summarize the
adjusted design.

1. The assert() code will not throw exceptions by default. It will
continue to call abort() as before. To enable the robustness feature, a
squid.conf option will need to be set. (In the future, that option may
contain a date value so that it automatically disables itself if the
administrator forgot to do that after fixing the problem.)

2. Assert() calls that are testing local, transaction-specific
conditions will be manually converted to Must() calls. Must() always
throws an exception. It is already used by the ICAP code. Must() name
comes from IETF RFC MUST/SHOULD/MAY terminology. Suggestions for a
better name are welcome. New code should use Must() whenever possible.

3. Transactions that can handle exceptions with a proper cleanup will
continue to handle them without aborting Squid. Other transactions will
abort Squid if an exception is thrown. This design remains unchanged
compared to the original version: we are changing when exceptions can be
thrown, not how they are handled.

4. We will continue to work on the transaction cleanup code.

Corrections and improvements are welcome.

Thank you,

Alex.
P.S. The word "transaction" should be interpreted as "asynchronous job"
in the above. Most jobs are currently protocol transactions, but the
robustness code works at the AsyncJob level. As AsyncJob API propagates
to other Squid areas, we may be able to handle, say, isolated cache disk
failures in a similar way.
Received on Sun Mar 09 2008 - 11:07:08 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Apr 01 2008 - 13:00:10 MDT