Re: Squid-3.2 status update

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Wed, 04 Jul 2012 16:00:17 -0600

On 06/27/2012 03:12 AM, Amos Jeffries wrote:

> A quick review of the other major bugs shows that each will take some
> large design and code changes to implement a proper fix or even a
> workaround.
>
>
> Are there any objections to ignoring these bugs when considering a 3.2
> stable release:

Our definition of a "stable release" has two criteria:

1. "Meant for production caches."

2. "begin when all known major bugs have been fixed [for 14 days]."

Criterion #1 should probably be interpreted as "Squid Project considers
the version suitable for production deployment". If you think we are
there, I have no objections -- I do not have enough information to say
whether enough users will be satisfied with current v3.2 code in
production today. Perhaps this is something we should ask on squid-users
after we close all bugs that we think should be closed?

As for Criteria #2, your question means that either we stop considering
those bugs as major OR we change criterion #2. IMHO, we should adjust
that criterion so that we do not have to play these games where we mark
something as a major bug but then decide that in the interest of a
speedier "stable" designation we are going to "ignore" it.

An adjusted initialization criteria could be phrased as

2'. "begin when #1 is satisfied for at least 14 days"

This gives us enough flexibility to release [what we consider
suitable-for-production] code that might have major bugs in some
environments. I added "at least" because otherwise we may have to
release v3.3 as stable 14 days after v3.2 is marked stable :-). In
practice, the version should have "enough improvements" to warrant its
numbering and its release but I do not want to digress in that discussion.

> 3124 - Cache manager stops responding when multiple workers used
> ** requires implementing non-blocking IPC packets between workers and
> coordinator.

Has this been discussed somewhere? IPC communication is already
non-blocking so I suspect some other issue is at play here. The specific
examples of mgr commands in the bug report (userhash, sourcehash,
client_list, and netdb) seem like non-essential in most environments
and, hence, not justifying the "major" designation, but perhaps they
indicate some major implementation problem that must be fixed.

> 3389 - Auto-reconnect for tcp access_log
> ** requires asynchronous handling of log opening and blocking Squid
> operation

Since we have stable file-based logging, this bug does not have to block
a "stable" designation if TCP logging is declared "experimental". You
already have a patch that addresses 90% of the core problem for those
who care.

If you do not want to mark TCP logging as experimental and highlight
this shortcoming, then the bug ought to be fixed IMHO because there is
consensus that accurate logging is critical for many deployments.

> 3478 - Host verify catching dynamic CDN hosted sites
> ** requires designing a CONNECT and bump handling mechanism

I am not an expert on this, but it feels like we are trying to enforce a
[good] rule ignored by the [bad] real world, especially in interception
environments. As a result, Squid lies and scares admins for no good
reason (in most cases). We will not win this battle.

I suggest that the "host_verify_strict off" behavior is adjusted to
cause no harm, even if some malicious requests will get through.

If you do not want to do that, please add a [fast] ACL so that admins
are not stuck without a solution and can whitelist bad (or all) sites.

Said that, the bug report itself does not explicitly say that something
is _seriously_ broken, does it? I bet the cache.log messages are
excessive on any busy site with a diverse user population, but we can
rate-limit these messages and downgrade the severity of the bug while
waiting for a real use case where these new checks break things (despite
host_verify_strict being off).

> 3517 - Workers ldap digest
> ** requires SMP atomic access support for all user credentials

This is not a blocker IMO. SMP has several known limitations, complex
authentication schemes being one of them. This does not affect stability
of supported SMP configurations.

> Which would leave us with only these to locate (any takers?) :
>
> 3551 - store_rebuild.cc:116: "store_errors == 0" assertion

It would be nice to figure this one out, at least for ufs, because many
folks will try ufs with SMP and there is clearly some kind of corruption
problem there. I assigned the bug to self for now.

However, if I cannot reproduce it, I will not be able to make much
progress. Please note that the original reported moved on to rock store
and does not consider this bug to be affecting him any more (per comment
#10).

> 3556 - assertion failed: comm.cc:1093: "isOpen(fd)"

I recommend adding a guard for the comm_close() call in the Connection
destructor to avoid the call for !isOpen(fd) orphan connections. And
print the value of isOpen() in the BUG message.

> 3562 - StoreEntry::kickProducer Segmentation fault

I suspect Squid is corrupting its own memory somewhere so this specific
core dump cannot be trusted. This might even be the same problem as bug
3551 above. This could be considered a blocker at least until we know
more, I guess.

Thank you,

Alex.
Received on Wed Jul 04 2012 - 22:00:21 MDT

This archive was generated by hypermail 2.2.0 : Mon Jul 16 2012 - 12:00:03 MDT