Re: Squid-3.2 status update

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Mon, 16 Jul 2012 18:17:03 +1200

On 5/07/2012 10:00 a.m., Alex Rousskov wrote:
> On 06/27/2012 03:12 AM, Amos Jeffries wrote:
>
>> A quick review of the other major bugs shows that each will take some
>> large design and code changes to implement a proper fix or even a
>> workaround.
>>
>>
>> Are there any objections to ignoring these bugs when considering a 3.2
>> stable release:
> Our definition of a "stable release" has two criteria:
>
> 1. "Meant for production caches."
>
> 2. "begin when all known major bugs have been fixed [for 14 days]."
>
> Criterion #1 should probably be interpreted as "Squid Project considers
> the version suitable for production deployment". If you think we are
> there, I have no objections -- I do not have enough information to say
> whether enough users will be satisfied with current v3.2 code in
> production today. Perhaps this is something we should ask on squid-users
> after we close all bugs that we think should be closed?
>
> As for Criteria #2, your question means that either we stop considering
> those bugs as major OR we change criterion #2. IMHO, we should adjust
> that criterion so that we do not have to play these games where we mark
> something as a major bug but then decide that in the interest of a
> speedier "stable" designation we are going to "ignore" it.
>
> An adjusted initialization criteria could be phrased as
>
> 2'. "begin when #1 is satisfied for at least 14 days"
>
>
> This gives us enough flexibility to release [what we consider
> suitable-for-production] code that might have major bugs in some
> environments. I added "at least" because otherwise we may have to
> release v3.3 as stable 14 days after v3.2 is marked stable :-). In
> practice, the version should have "enough improvements" to warrant its
> numbering and its release but I do not want to digress in that discussion.
>
>
>
>> 3124 - Cache manager stops responding when multiple workers used
>> ** requires implementing non-blocking IPC packets between workers and
>> coordinator.
> Has this been discussed somewhere? IPC communication is already
> non-blocking so I suspect some other issue is at play here. The specific
> examples of mgr commands in the bug report (userhash, sourcehash,
> client_list, and netdb) seem like non-essential in most environments
> and, hence, not justifying the "major" designation, but perhaps they
> indicate some major implementation problem that must be fixed.
>
>
>> 3389 - Auto-reconnect for tcp access_log
>> ** requires asynchronous handling of log opening and blocking Squid
>> operation
> Since we have stable file-based logging, this bug does not have to block
> a "stable" designation if TCP logging is declared "experimental". You
> already have a patch that addresses 90% of the core problem for those
> who care.
>
> If you do not want to mark TCP logging as experimental and highlight
> this shortcoming, then the bug ought to be fixed IMHO because there is
> consensus that accurate logging is critical for many deployments.
>
>
>> 3478 - Host verify catching dynamic CDN hosted sites
>> ** requires designing a CONNECT and bump handling mechanism
> I am not an expert on this, but it feels like we are trying to enforce a
> [good] rule ignored by the [bad] real world, especially in interception
> environments. As a result, Squid lies and scares admins for no good
> reason (in most cases). We will not win this battle.

Better way to think of this:
   We are trying to prevent poisoning our cache due to easily detected
client lies and also from generating lies ourselves (which would poison
our peers). Nothing more. In as many cases as possible the idea is to
relay the request on in a safe manner even if it is "bad".

Christos uncovered that PINNING is also a safe place we can relay.

I'm now trying to implement the outbound CONNECT-wrapping that makes
other peers safe, or at least responsible for any unwrapping and
poisoning of themselves. The bumping of those CONNECT requests when
received can happen later if this takes too long to get right.

>
>> 3517 - Workers ldap digest
>> ** requires SMP atomic access support for all user credentials
> This is not a blocker IMO. SMP has several known limitations, complex
> authentication schemes being one of them. This does not affect stability
> of supported SMP configurations.

Okay. Release notes reference open bugs on SMP support.

>> Which would leave us with only these to locate (any takers?) :
>>
>> 3551 - store_rebuild.cc:116: "store_errors == 0" assertion
> It would be nice to figure this one out, at least for ufs, because many
> folks will try ufs with SMP and there is clearly some kind of corruption
> problem there. I assigned the bug to self for now.
>
> However, if I cannot reproduce it, I will not be able to make much
> progress. Please note that the original reported moved on to rock store
> and does not consider this bug to be affecting him any more (per comment
> #10).
>
>
>> 3556 - assertion failed: comm.cc:1093: "isOpen(fd)"
> I recommend adding a guard for the comm_close() call in the Connection
> destructor to avoid the call for !isOpen(fd) orphan connections. And
> print the value of isOpen() in the BUG message.

Done. comm_close() logging at level-1 and returns without doing anything.

>> 3562 - StoreEntry::kickProducer Segmentation fault
> I suspect Squid is corrupting its own memory somewhere so this specific
> core dump cannot be trusted. This might even be the same problem as bug
> 3551 above. This could be considered a blocker at least until we know
> more, I guess.
>

Amos
Received on Mon Jul 16 2012 - 06:17:15 MDT

This archive was generated by hypermail 2.2.0 : Mon Jul 16 2012 - 12:00:03 MDT