Re: Regression introduced in trunk r13201 (large rock merge)

From: Nikolai Gorchilov <niki_at_x3me.net>
Date: Sat, 8 Feb 2014 20:24:47 +0200

Hi,

Unfortunately, the rock store issue I've found isn't related to r13201
regression as presumed by Amos.

It's still available with both stable (3.4.x) and development
(3.HEAD) branches.

As it is a show stopper for everyone who wants to use the rock store,
even with the stable branch, let me bring the problem to your
attention agin!

A quick summary of the issue, for those of you who didn't follow the
initial squid-users discussion (http://tinyurl.com/rockprob):
- some concurrent persistent connections to a single host sent via
Squid freeze forever waiting for rock store delivery of cache HIT
results.
- the stalled requests stay in this mode for (seemingly) forever,
until the connection is closed by the client (like the user pressing
ESC in the browser). Tested leaving such connections for full night
before hitting ESC. Neither browser, nor the Squid did something about
it for 8 hours. It was still connected, still waiting for result,
- once the stalled connection is closed (i.e. escape), a
TCP_HIT_ABORTED entry shows in access.log

My findings till now:
- the problem seems to disappear when turning off concurrent
connections in the browser. I.e. about:config in Firefox:
* network.http.max-persistent-connections-per-proxy=1, in case the
browser is aware of the proxy in a regular proxy setup (default is 32)
* network.http.max-persistent-connections-per-server=1, in case of
TPROXY setup (default is 6)
- the problem exists only when using rock store via IPC with a disker.
- the problem doesn't exist when running rock store in a non-daemon
mode (squid -N), thus no disker, no IPC.
- the problem doesn't exists when using aufs or diskd as storage, so
the diskd's IPC works correctly

All of the above is tested with my builds of Squid 3.4.1, 3.4.2, 3.4.3
and 3.HEAD under Ubuntu 12.04.4, as well as Eliezer's Squid 3.4.2
build under CentOS 6.5. So, no collapsed forwarding nor large store
regressions seems to be involved, as they are not related to 3.4.x.

The problem seems to be easily reproducible. My original steps and
config created some irrelevant discussions, so here's more
straight-forward approach:

1. create a simple config file in /tmp/squid.conf:

===[begin]===
http_port 3128
http_access allow all
cache allow all
cache_mem 0 MB
cache_dir rock /tmp 10 max-size=31000
===[eof]===

2. create the swap and wait till operation was completed successfully
and all the squid spawned processes stopped

~# squid -f /tmp/squid.conf -z
2014/02/08 17:58:53 kid2| Current Directory is /root
2014/02/08 17:58:53 kid2| Creating missing swap directories
2014/02/08 17:58:53 kid2| Skipping existing Rock db: /tmp/rock
2014/02/08 17:58:53 kid1| Current Directory is /root
2014/02/08 17:58:53 kid1| Creating missing swap directories
2014/02/08 17:58:53 kid3| Current Directory is /root
2014/02/08 17:58:53 kid3| Creating missing swap directories

3. start squid

~# squid -f /tmp/squid.conf

4.fetch the complete www.gnu.org homepage

~# wget -pqO /tmp/delme -e use_proxy=yes -e http_proxy=127.0.0.1:3128
http://www.gnu.org/

5. now is the time to stress the rock store, by creating concurrent
connections to this same host. As wget can't go concurrent for a
single job, let's call it multiple times in background:

~# for i in {1..6}; do bash -c "wget -pqO /tmp/delme -e use_proxy=yes
-e http_proxy=127.0.0.1:3128 http://www.gnu.org/ &"; done; sleep 30;
killall wget

NB! Using 6 concurrent connections as per Firefox defaults. The higher
the concurrency, the more stalled requests will be there.

6. After the 30 sec sleep, we can see the TCP_HIT_ABORTED results in access.log:

~# tail -n10 /var/log/squid/access.log
1391883641.642 0 127.0.0.1 TCP_HIT/200 6303 GET
http://www.gnu.org/graphics/topbanner.png - HIER_NONE/- image/png
1391883641.643 0 127.0.0.1 TCP_HIT/200 467 GET
http://www.gnu.org/graphics/bullet.gif - HIER_NONE/- image/gif
1391883641.643 0 127.0.0.1 TCP_HIT/200 5983 GET
http://www.gnu.org/graphics/topbanner.ar.png - HIER_NONE/- image/png
1391883641.644 0 127.0.0.1 TCP_HIT/200 8585 GET
http://www.gnu.org/graphics/rms2005chrys.jpg - HIER_NONE/- image/jpeg
1391883641.645 0 127.0.0.1 TCP_HIT/200 3439 GET
http://www.gnu.org/graphics/dogear.png - HIER_NONE/- image/png
1391883641.646 0 127.0.0.1 TCP_HIT/200 467 GET
http://www.gnu.org/graphics/bullet.gif - HIER_NONE/- image/gif
1391883646.568 0 127.0.0.1 TCP_MISS/000 0 GET
cache_object://127.0.0.1/active_requests - HIER_NONE/- -
1391883671.022 29413 127.0.0.1 TCP_HIT_ABORTED/000 0 GET
http://www.gnu.org/robots.txt - HIER_NONE/- -
1391883671.022 29404 127.0.0.1 TCP_HIT_ABORTED/000 0 GET
http://www.gnu.org/mini.css - HIER_NONE/- -
1391883671.022 29422 127.0.0.1 TCP_HIT_ABORTED/000 0 GET
http://www.gnu.org/layout.css - HIER_NONE/- -

7. Final note: during the sleep mgr:active_requests shows the following:
===[cut]===
Connection: 0x1a17b88
FD 19, read 897, wrote 40854
FD desc: Reading next request
in: buf 0x1a14b50, offset 0, size 4096
remote: 127.0.0.1:55384
local: 127.0.0.1:3128
nrequests: 5
uri http://www.gnu.org/mini.css
logType TCP_HIT
out.offset 0, out.size 0
req_sz 192
entry 0x1a09770/2FE7872FED01885F6F1C554EAD858BAE
start 1391883641.618508 (4.950253 seconds ago)
username -

Connection: 0x19f5ec8
FD 15, read 318, wrote 19096
FD desc: Reading next request
in: buf 0x19fd920, offset 0, size 4096
remote: 127.0.0.1:55381
local: 127.0.0.1:3128
nrequests: 2
uri http://www.gnu.org/robots.txt
logType TCP_HIT
out.offset 0, out.size 0
req_sz 164
entry 0x1a14a10/32FF32767DBBCCD9FED813952D39FD43
start 1391883641.609347 (4.959414 seconds ago)
username -

Connection: 0x1a3da98
FD 21, read 705, wrote 26789
FD desc: Reading next request
in: buf 0x1a405a0, offset 0, size 4096
remote: 127.0.0.1:55386
local: 127.0.0.1:3128
nrequests: 4
uri http://www.gnu.org/layout.css
logType TCP_HIT
out.offset 0, out.size 0
req_sz 194
entry 0x1a4b6c0/600B59D76FBF6BF64EE7F8B16391F159
start 1391883641.600655 (4.968106 seconds ago)
username -
===[cut]==

That's it! Please, help!

Thank you in advance for your cooperation!

Best,
Niki

On Thu, Feb 6, 2014 at 1:04 PM, Kinkie <gkinkie_at_gmail.com> wrote:
>
> Hi,
> Apparently my checked-out tree had some problems. I've performed a
> follow-up commit with the actual changes.
>
> On Thu, Feb 6, 2014 at 11:53 AM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> > On 6/02/2014 9:54 p.m., Kinkie wrote:
> >> Hi,
> >> Maybe there's some issue with launchpad mirroring?
> >> Can you try the branch at http://bzr.squid-cache.org/bzr/squid3/trunk ?
> >>
> >
> >
> > I dont think so. The master server shows the same things on the checkout
> > used to generate changesets and do maintenance.
> >
> > http://master.squid-cache.org/Versions/v3/3.HEAD/changesets/merge.html
> > http://master.squid-cache.org/Versions/v3/3.HEAD/changesets/squid-3-13257.patch
> >
> > Amos
> >
>
>
>
> --
> Francesco
Received on Sat Feb 08 2014 - 18:25:35 MST

This archive was generated by hypermail 2.2.0 : Fri Feb 21 2014 - 12:00:15 MST