Re: Multiple issues in Squid-3.2.3 SMP + rock + aufs + a bit of load

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 30 Nov 2012 10:47:15 -0700

On 11/28/2012 04:07 PM, Henrik Nordström wrote:

> - each worker restarting at startup due to rock store DB open failures
> (timeout). This seems to settle once the rock store have completed
> rebuilding, but not 100% sure that's the cause as logs are a bit
> inclusive. Might also be related to the delay due to the kid rebuilding
> it's own aufs cache_dir.

IIRC, Rock store diskers should process queries while rebuilding so db
open should not fail due to rebuild itself. I bet this is actually
related to the problem discussed below.

> - kid registration failures at startup (timeout). This seem to be due to
> the local cache rebuild in each kid taking more than 6 seconds to
> complete, confusing the registration timeout management.

The registration code needs to be redesigned. It is not robust enough,
especially when we have legacy code that blocks Squid for more than 6
seconds at startup.

In general, longer registration delays help in some situations and hurt
in others. Consider a case where IPC registration cannot succeed due to
a configuration bug or somesuch. It is best for Squid to fail ASAP in
that case rather than keep waiting for 10 minutes.

> --- squid-3.2.3/src/DiskIO/IpcIo/IpcIoFile.cc 2012-10-20 15:39:49.000000000 +0300
> +++ squid-3.2.3/src/DiskIO/IpcIo/IpcIoFile.cc 2012-11-29 00:47:11.451848433 +0200
> @@ -30,7 +30,7 @@
> // TODO: make configurable or compute from squid.conf settings if possible
> static const int QueueCapacity = 1024;
>
> -const double IpcIoFile::Timeout = 7; // seconds; XXX: ALL,9 may require more
> +const double IpcIoFile::Timeout = 300; // seconds; XXX: ALL,9 may require more

Do not do that. The timeout is used for both file opening timeout and
regular/ongoing I/O timeout checks. The latter should happen more
frequently than once in 5 minutes, especially on busy Squids. Split this
into OpenTimeout and IoTimeout if you want to increase the former.

> IpcIoFile::IpcIoFileList IpcIoFile::WaitingForOpen;
> IpcIoFile::IpcIoFilesMap IpcIoFile::IpcIoFiles;
> std::auto_ptr<IpcIoFile::Queue> IpcIoFile::queue;
> @@ -218,7 +218,7 @@
> {
> bool ioError = false;
> if (!response) {
> - debugs(79, 3, HERE << "error: timeout");
> + debugs(79, 1, HERE << "error: timeout");

Do not do that. It is a common/benign case under overload.

> ioError = true; // I/O timeout does not warrant setting error_?
> } else {
> if (response->xerrno) {
> --- squid-3.2.3/src/HttpHeader.cc 2012-10-20 15:39:49.000000000 +0300
> +++ squid-3.2.3/src/HttpHeader.cc 2012-11-21 19:51:23.826807754 +0200
> @@ -548,7 +548,7 @@
>
> char *nulpos;
> if ((nulpos = (char*)memchr(header_start, '\0', header_end - header_start))) {
> - debugs(55, 1, "WARNING: HTTP header contains NULL characters {" <<
> + debugs(55, 2, "WARNING: HTTP header contains NULL characters {" <<
> getStringPrefix(header_start, nulpos) << "}\nNULL\n{" << getStringPrefix(nulpos+1, header_end));
> goto reset;
> }

While I agree that Squid should not warn about malformed headers by
default, others disagree, and this is unrelated to the rest of the changes.

> --- squid-3.2.3/src/ipc/Strand.cc 2012-10-20 15:39:49.000000000 +0300
> +++ squid-3.2.3/src/ipc/Strand.cc 2012-11-29 00:12:12.385439783 +0200
> @@ -53,7 +53,7 @@
> TypedMsgHdr message;
> ann.pack(message);
> SendMessage(coordinatorAddr, message);
> - setTimeout(6, "Ipc::Strand::timeoutHandler"); // TODO: make 6 configurable?
> + setTimeout(600, "Ipc::Strand::timeoutHandler"); // TODO: make 6 configurable?
> }

Please keep the TODO comment in sync.

This change is discussed at the beginning of this email. In summary, 10
minutes is sometimes worse than 6 seconds (and sometimes is not enough
anyway). The registration algorithm should be adjusted, and the code
that blocks Squid process for 9 minutes should be fixed. Until then, I
am not objecting to this specific change, but perhaps you can add a
comment discussing why this is so large.

Thank you,

Alex.
Received on Fri Nov 30 2012 - 17:47:25 MST

This archive was generated by hypermail 2.2.0 : Sat Dec 01 2012 - 12:00:32 MST