Re: [squid-users] SMP-Rock-frequent FATAL: Received Segment Violation...dying."only on kid3"

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 19 Nov 2013 13:55:29 +1300

On 2013-11-19 12:18, Eliezer Croitoru wrote:
> Hey Dr,
>
> (notes inside)
>
> On 19/11/13 00:54, Amos Jeffries wrote:
>> Either event may have corrupted it slightly. Squid is supposed to
>> contain sufficient checksum protection in rock to cope with most forms
>> of corruption, but nobody's perfect.
>>
>> So, please try to get a core dump, or stack trace of the problem
>> before
>> going any further. This will help us to isolate where the problem is
>> occuring. If it is corruption related we will be needing to try and
>> add
>> better protection for that case.
>>
>> *after* that, please try:
>>
>> * shutting Down your Squid by any means necessary to ensure there are
>> 0
>> processes running.
> "pgrep squid"
>
>> * *move* the caches to somewhere they can be analysed later if
>> necessary.
>
>> * rebuild the configured cache_dir with squid -z
>
>> * wait until -z process completed *AND* there are 0 processes still
>> running in the background
> And no traffic at all on the server.
>

?? not relevant. When the squid proces is not running it does not matter
if the server is having traffic to/from other services.

>> * restart the main Squid
>>
>> This entire process should not take more than a minute.
>>
>> If the problem remains after doing that you will have successfully
>> eliminated cache corruption as a cause and we go back to needing a
>> backtrace to figure it out.
>
> The same result\test can be achieved by running the service in "RAM
> only" cache mode.
>

No, this is wrong. We need to isolate whether corruption of the disk
storage from previous actions is the cause. Eliminating disk storage
entirely is a different test to determine between rock store and memory
store - it does not demonstrate whether a clean rock store works/fails.

We need all three tests to take place:
1) recording of the backtrace/core in non-working condition to
investigate
2) elimination of previous actions effects on disk cache
3) test of whether disk cache proper behaviour is at all part of the
problem
4) other things based on details learned by above

Amos
Received on Tue Nov 19 2013 - 00:55:34 MST

This archive was generated by hypermail 2.2.0 : Tue Nov 19 2013 - 12:00:04 MST