目录

sysbench benchmark MySQL 时候,为什么 kill 连接后,sysbench 没有重连

文章简介:探索为什么 sysbench 没有重连 Reconnect

背景

plantegg 提供了一个案例,大体上是阿里云上 sysbench 压测 MySQL 的时候,使用 MySQL 客户端 kill sysbench 发起的压测连接,发现 sysbench 没有重连,QPS 始终为 0。plantegg 给出了复现方法。

问题复现

在我的 M1 mac 上使用 vagrant + vmware_fusion 启动了一个虚拟机,Vagrantfile 如下,虚拟机启动的环境内核版本更可控;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure("2") do |config|
  config.vm.box = "bento/ubuntu-22.04"
  config.vm.box_architecture = "arm64"
  config.vm.provider :vmware_fusion do |vm|
    # https://gist.github.com/jtopper/8588263
  end
  config.vm.provision "shell", inline: <<-SHELL
    # https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu/
    sed -E -i -e 's/(archive|ports).ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g' -e '/security.ubuntu.com/d' /etc/apt/sources.list
    apt update
    apt install -y tshark tcpdump net-tools gdb dstat
    apt autoremove
    apt install -y linux-tools-common linux-tools-$(uname -r)
  SHELL
end
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# 先在 vm 中启动一个 mysql,后文以 plantegg# 开头
docker run -it -d --net=host -e MYSQL_ROOT_PASSWORD=123 --name=plantegg docker.m.daocloud.io/library/mysql:8
plantegg# mysql --ssl-mode=DISABLED -uroot -p123

# 在本地启动一个容器,用于执行 sysbench,后文以 sysbench# 开头
docker run --privileged -it --name sysbench registry.dockermirror.com/747301585/alinux3:20220901 bash
#sysbench# uname -a
Linux e590c969f57f 6.6.16-linuxkit #1 SMP Fri Feb 16 11:54:02 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

sysbench# yum install -y sysstat sysbench iproute net-tools strace ltrace perf gdb

sysbench# export MYSQL_HOST='192.168.245.145'
sysbench# sysbench --mysql-user='root' --mysql-password='123' --mysql-db='test' --mysql-host=$MYSQL_HOST --mysql-port='3306' --tables='16'  --table-size='10000' --range-size='5' --db-ps-mode='disable' --skip-trx='on' --mysql-ignore-errors='all' --time='1180' --report-interval='1' --histogram='on' --threads=1 oltp_read_only prepare # 命令有些报错,不影响问题复现
sysbench# sysbench --mysql-user='root' --mysql-password='123' --mysql-db='test' --mysql-host=$MYSQL_HOST --mysql-port='3306' --tables='16'  --table-size='10000' --range-size='5' --db-ps-mode='disable' --skip-trx='on' --mysql-ignore-errors='all' --time='1180' --report-interval='1' --histogram='on' --threads=1 oltp_read_only run

plantegg# mysql --ssl-mode=DISABLED -uroot -p123
plantegg#mysql# select version();
+-----------+
| version() |
+-----------+
| 8.4.2     |
+-----------+
plantegg#mysql# show processlist;
plantegg#mysql# kill some-pid; -- 复现关键点在这里
plantegg#mysql# show processlist;

复现后的现象:

./images/mysql_process_bad_conn.png

sysbench 吞吐跌零:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
sysbench#
[ 15s ] thds: 1 tps: 68.07 qps: 2033.94 (r/w/o: 2033.94/0.00/0.00) lat (ms,95%): 36.89 err/s: 146.14 reconn/s: 0.00
[ 16s ] thds: 1 tps: 55.94 qps: 1998.85 (r/w/o: 1998.85/0.00/0.00) lat (ms,95%): 42.61 err/s: 168.82 reconn/s: 0.00
[ 17s ] thds: 1 tps: 24.81 qps: 1079.68 (r/w/o: 1079.68/0.00/0.00) lat (ms,95%): 74.46 err/s: 97.25 reconn/s: 0.00
[ 18s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 19s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 21s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 22s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 23s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 24s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 26s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 27s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 28s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00            

网络连接大量 CLOSE_WAIT:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
sysbench# ss -tnp | head -n 30
State      Recv-Q Send-Q Local Address:Port     Peer Address:PortProcess                               
CLOSE-WAIT 28     0         172.17.0.2:55265 192.168.245.145:3306 users:(("sysbench",pid=207,fd=22757))
CLOSE-WAIT 28     0         172.17.0.2:48464 192.168.245.145:3306 users:(("sysbench",pid=207,fd=10234))
CLOSE-WAIT 28     0         172.17.0.2:47621 192.168.245.145:3306 users:(("sysbench",pid=207,fd=20395))
CLOSE-WAIT 28     0         172.17.0.2:38447 192.168.245.145:3306 users:(("sysbench",pid=207,fd=18580))
CLOSE-WAIT 28     0         172.17.0.2:39019 192.168.245.145:3306 users:(("sysbench",pid=207,fd=25344))
CLOSE-WAIT 132    0         172.17.0.2:43514 192.168.245.145:3306 users:(("sysbench",pid=207,fd=7519)) 

sysbench# ss -tnp | grep CLOSE-WAIT | wc -l
28232

sysbench# ss -s
Total: 28234
TCP:   28257 (estab 125, closed 25, orphaned 0, timewait 0)

Transport Total     IP        IPv6
RAW       0         0         0        
UDP       0         0         0        
TCP       28232     28232     0        
INET      28232     28232     0        
FRAG      0         0         0  

sysbench# netstat -nap | head -n 1                   
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
sysbench# netstat -nap | grep CLOSE_WAIT | head -n 20
tcp       28      0 172.17.0.2:48464        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp      132      0 172.17.0.2:39019        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:39986        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:43514        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:32877        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:36728        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:50506        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:38603        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:35640        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:59714        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:36783        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:33157        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp      132      0 172.17.0.2:54836        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp      132      0 172.17.0.2:41472        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:45345        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:43855        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:43543        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:33528        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:44096        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
tcp       28      0 172.17.0.2:49250        192.168.245.145:3306    CLOSE_WAIT  486/sysbench        
sysbench# netstat -nap | grep CLOSE_WAIT | wc -l     
28232

sysbench# cat /proc/sys/net/ipv4/ip_local_port_range
32768   60999
echo $((60999-32768+1))
28232

sysbench 的 cpu 占用 67%,因为是一个线程在跑,已经比较高了。

./images/sysbench-top.png

分析一下原因

先看看 cpu 到被消耗在了内核还是进程

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
pidstat 1
Linux 6.6.16-linuxkit (e590c969f57f)    10/17/24        _aarch64_       (4 CPU)

09:22:11      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:22:12        0       115    0.00   64.08    0.00    0.00   64.08     0  sysbench
09:22:12        0       259    0.00    3.88    0.00    0.00    3.88     1  pidstat

09:22:12      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:22:13        0       115    0.00   67.00    0.00    0.00   67.00     0  sysbench

09:22:13      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:22:14        0       115    0.00   70.00    0.00    0.00   70.00     0  sysbench

09:22:14      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
09:22:15        0       115    0.00   69.00    0.00    0.00   69.00     0  sysbench

可以看到所有的 cpu 都在内核态。实际跑下来,刚复现跌零时候 cpu 比较少,随着 TCP 连接的变多,内核 cpu 越来越高。

看看内核在干什么:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
perf top -g -p `pidof sysbench`

+   98.36%     0.01%  [kernel]                [k] __sys_connect
+   97.73%     0.05%  libpthread-2.32.so      [.] __libc_connect
+   97.35%     5.06%  [kernel]                [k] __inet_hash_connect
+   92.17%     0.01%  [kernel]                [k] __inet_stream_connect
+   86.99%    59.51%  [kernel]                [k] __inet_check_established
+   86.84%     0.00%  [kernel]                [k] tcp_v4_connect
+   67.97%     0.01%  [kernel]                [k] do_el0_svc
+   28.19%    28.19%  [kernel]                [k] inet_ehashfn
+   19.36%     0.00%  libc-2.32.so            [.] thread_start
+   19.36%     0.00%  libpthread-2.32.so      [.] start_thread
+   19.35%     0.00%  sysbench                [.] 0x0000aaaade71bce0
+   19.35%     0.00%  libluajit-5.1.so.2.1.0  [.] lua_pcall
+   19.35%     0.00%  [JIT] tid 207           [.] 0x0000ffffac384eb8
+   19.35%     0.00%  sysbench                [.] db_execute
+   19.35%     0.00%  sysbench                [.] 0x0000aaaade72b744
+   19.35%     0.00%  [kernel]                [k] el0t_64_sync
+   19.35%     0.00%  [kernel]                [k] el0t_64_sync_handler
+   19.35%     0.01%  [kernel]                [k] el0_svc
+   19.33%     0.00%  sysbench                [.] 0x0000aaaade72affc
+   19.33%     0.00%  sysbench                [.] 0x0000aaaade72a078
+   19.33%     0.00%  libmariadb.so.3         [.] 0x0000ffffae08ce54
+   19.33%     0.00%  [kernel]                [k] invoke_syscall.constprop.0
+   19.27%     0.00%  libmariadb.so.3         [.] 0x0000ffffae084f10
+   19.27%     0.00%  libmariadb.so.3         [.] 0x0000ffffae084600
+   19.26%     0.00%  [kernel]                [k] __arm64_sys_connect
+   19.24%     0.00%  [kernel]                [k] inet_stream_connect
+   19.24%     0.00%  [kernel]                [k] inet_hash_connect
+    2.81%     0.98%  [kernel]                [k] __cond_resched
+    2.81%     2.26%  [kernel]                [k] __local_bh_enable_ip

__inet_check_established, inet_ehashfn 内核函数执行占用过多 cpu

/why-sysbench-dont-reconnect-after-kill/images/sysben-perf-top.png

可以看到 __inet_check_established 和 inet_ehashfn 两个函数占用比较高;

TODO: 需要研究一下 kernel 函数 __inet_check_established 和 inet_ehashfn 性能消耗在了哪里。

现在看看进程在做什么事情:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
strace -f -p `pidof sysbench`

437   sendto(3, "5\0\0\0\3SELECT c FROM sbtest7 WHERE"..., 57, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 57
437   recvfrom(3, 0xffff80027600, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
437   ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLIN}])
437   recvfrom(3, "\1\0\0\1\1*\0\0\2\3def\4test\7sbtest7\7sbtes"..., 16384, MSG_DONTWAIT, NULL, NULL) = 689
437   sendto(3, ";\0\0\0\3SELECT SUM(k) FROM sbtest15"..., 63, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 63
437   recvfrom(3, 0xffff80027600, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
437   ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLIN}])
437   recvfrom(3, ",\0\0\1\377z\4#42S02Table 'test.sbtest1"..., 16384, MSG_DONTWAIT, NULL, NULL) = 48
437   sendto(3, "%\0\0\0\3SELECT c FROM sbtest10 WHER"..., 41, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 41
437   recvfrom(3, 0xffff80027600, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
437   ppoll([{fd=3, events=POLLIN}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLIN}])
437   recvfrom(3, "", 16384, MSG_DONTWAIT, NULL, NULL) = 0
437   close(3)                          = 0
437   getpid()                          = 435
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
437   fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(3, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=3, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=3, revents=POLLOUT}])
437   getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(3, F_SETFL, O_RDONLY)       = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
437   fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(4, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=4, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=4, revents=POLLOUT}])
437   getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(4, F_SETFL, O_RDONLY)       = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff80035000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5
437   fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(5, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=5, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=5, revents=POLLOUT}])
437   getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(5, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff80037000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff80039000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
437   fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(6, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=6, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=6, revents=POLLOUT}])
437   getsockopt(6, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(6, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff8003b000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff8003d000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
437   fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(7, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=7, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=7, revents=POLLOUT}])
437   getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(7, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff8003f000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff80041000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
437   fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(8, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=8, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=8, revents=POLLOUT}])
437   getsockopt(8, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(8, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff80043000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0

可以看到在反复的循环的 syscall getpid->socket->connect->clock_nanosleep

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
strace -f -c -p `pidof sysbench`

strace: Process 207 attached with 3 threads
^Cstrace: Process 207 detached
strace: Process 208 detached
strace: Process 209 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.55   16.874015        3734      4518           clock_nanosleep
 49.22   16.759397        3731      4491      4491 connect
  0.56    0.192136      192136         1           restart_syscall
  0.25    0.085898          19      4492           close
  0.18    0.061314          13      4491           socket
  0.11    0.037854           8      4491           fcntl
  0.11    0.037465           8      4491           getpid
  0.01    0.004097         146        28           write
------ ----------- ----------- --------- --------- ----------------
100.00   34.052176        1261     27003      4491 total

syscall 的调用时间消耗在了 clock_nanosleep 和 connect 上。到这里,基本可以猜测程序在循环执行 connect 和 clock_nanosleep。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
ltrace -f -p `pidof sysbench` -C -o /dev/stdout | tee sysbench_ltrace.txt
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0
20 mysql_real_connect(0xffff8c020a40, 0xaaaacb78d8b0, 0xaaaacb78be30, 0xaaaacb78d830)                                                           = 0
20 usleep(1000, 0, -24, 0xffff8c020e80)                                                                                                         = 0

可以看到当前调用的第三方库的函数是,mysql connect 跟 syscall 中的 connect 和 clock_nanosleep 也是匹配的;

尝试看看能不能抓到程序执行流程大概是怎样的

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
pstack `pidof sysbench`

Thread 3 (Thread 0xffff992ccf30 (LWP 20)):
#0  0x0000ffff9a505b58 in connect () from /lib64/libpthread.so.0
#1  0x0000ffff9a777600 in pvio_socket_connect_sync_or_async () from /lib64/libmariadb.so.3
#2  0x0000ffff9a777f10 in pvio_socket_connect () from /lib64/libmariadb.so.3
#3  0x0000ffff9a77fe54 in mthd_my_real_connect () from /lib64/libmariadb.so.3
#4  0x0000aaaac06da078 in mysql_drv_real_connect ()
#5  0x0000aaaac06daffc in mysql_drv_reconnect ()
#6  0x0000aaaac06db744 in mysql_drv_execute ()
#7  0x0000aaaac06ce688 in db_execute ()
#8  0x0000ffff98a7c85c in ?? ()
#9  0x00007d178ca2c3d8 in ?? ()

打开 sysbench 的 debug 日志,看看有没有方便定位到代码行的信息

1
2
3
4
5
sysbench# sysbench --mysql-user='root' --mysql-password='123' --mysql-db='test' --mysql-host=$MYSQL_HOST --mysql-port='3306' --tables='16'  --table-size='10000' --range-size='5' --db-ps-mode='disable' --skip-trx='on' --mysql-ignore-errors='all' --time='1180' --report-interval='1' --histogram='on' --threads=1 --debug=on oltp_read_only run

DEBUG: Reconnecting
[ 11s ] thds: 1 tps: 7.99 qps: 408.55 (r/w/o: 408.55/0.00/0.00) lat (ms,95%): 48.34 err/s: 40.96 reconn/s: 0.00
[ 12s ] thds: 1 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,95%): 0.00 err/s: 0.00 reconn/s: 0.00

结合下载的代码,可以看到 DEBUG: Reconnecting 而未输出 DEBUG: Reconnected,可以确认 sysbench 在执行 mysql_drv_reconnect 后边这个循环。再结合上述的调用栈,可以确认调用路径。

1
git clone https://github.com/akopytov/sysbench.git
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
static int mysql_drv_reconnect(db_conn_t *sb_con)
{
  db_mysql_conn_t *db_mysql_con = (db_mysql_conn_t *) sb_con->ptr;
  MYSQL *con = db_mysql_con->mysql;

  log_text(LOG_DEBUG, "Reconnecting");

  DEBUG("mysql_close(%p)", con);
  mysql_close(con);

  while (mysql_drv_real_connect(db_mysql_con))
  {
    if (sb_globals.error)
      return DB_ERROR_FATAL;

    usleep(1000);
  }

  log_text(LOG_DEBUG, "Reconnected");

  return DB_ERROR_IGNORABLE;
}

去翻了下 mariadb 跟 Mysql mysql_real_connect 的文档发现 mariadb 的文档 中有一句话:

The connection handle can’t be reused for establishing a new connection. It must be closed and reinitialized before.

而 Mysql 的文档中对应的章节并未提到。

直接去掉 mysql_close(con) 去掉后使用 mariadb 的依赖重新编译发现,kill 链接后 QPS 能够自动恢复。

根据其他人的实验报告,大搞如此执行的原因是 mariadb-connect-c 的库函数参数兼容,但是行为跟 mysql 的客户端行为有差异导致的。

这里这里 是修复方式

比起如何修复更想关注为什么再循环执行这个函数

具体是如何工作的可以看 [这里](./docs/999我的sysbench 压测MySQL时 kill 链接后为什么不会重连?.htm)

TODO: gdb debug 试试看具体的执行流程及各参数返回值,确认如此执行的原因。

问题

为什么 MySQL 8.4.2 有的 Recv-Q 是 28,有的是 132

1
2
3
4
State      Recv-Q Send-Q Local Address:Port     Peer Address:PortProcess
CLOSE-WAIT        28            0                       172.17.0.2:37863               192.168.245.145:3306        users:(("sysbench",pid=207,fd=20236))       
CLOSE-WAIT        28            0                       172.17.0.2:55403               192.168.245.145:3306        users:(("sysbench",pid=207,fd=16731))       
CLOSE-WAIT        132           0                       172.17.0.2:53963               192.168.245.145:3306        users:(("sysbench",pid=207,fd=25906))       

TODO: 详细学习一下 Recv-Q Send-Q 分别意味着发生了什么事情

尝试抓 mysqld 的 syscall 发现,正常的登录过程,sendto 发送了 77(正常登录回显)+54(超时回显)+1(close 后的 fin 占用一个 byte)=132

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
strace -ff -p `pidof mysqld` -o mysqld_syscal.txt

14595 <... accept resumed>{sa_family=AF_INET6, sin6_port=htons(65517), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:192.168.245.1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 49
19382 sendto(49, "2\0\0\1\377\207\4#08S01Got timeout reading"..., 54, MSG_DONTWAIT, NULL, 0 <unfinished ...>
19382 setsockopt(49, SOL_TCP, TCP_NODELAY, [1], 4) = 0
19382 getpeername(49, {sa_family=AF_INET6, sin6_port=htons(64695), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:192.168.245.1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
19382 setsockopt(49, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
19382 sendto(49, "I\0\0\0\n8.4.2\0\2448\0\0W%59p.\21Y\0\377\377\377\2\0\377\337\25"..., 77, MSG_DONTWAIT, NULL, 0) = 77
19382 recvfrom(49, 0xffff5c1d0db0, 4, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
19382 sendto(49, "2\0\0\1\377\207\4#08S01Got timeout reading"..., 54, MSG_DONTWAIT, NULL, 0) = 54
19382 shutdown(49, SHUT_RDWR)           = 0
19382 close(49)                         = 0
19373 setsockopt(49, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
19373 getpeername(49, {sa_family=AF_INET6, sin6_port=htons(65517), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:192.168.245.1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 0
19373 setsockopt(49, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
19373 sendto(49, "I\0\0\0\n8.4.2\0;9\0\0\n(V\0201D_\26\0\377\377\377\2\0\377\337\25"..., 77, MSG_DONTWAIT, NULL, 0) = 77
19373 recvfrom(49, 0xffff78034a00, 4, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

echo $((54+77+1))
132

尝试抓 mysqld 的 syscall 发现,连接过多时,sendto 发送了 27(too many conn)+1(close 后的 fin 占用一个 byte)=28

1
2
3
4
5
6
7
8
14595 accept(20, {sa_family=AF_INET6, sin6_port=htons(64852), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::ffff:192.168.245.1", &sin6_addr), sin6_scope_id=0}, [128 => 28]) = 197
14595 setsockopt(197, SOL_TCP, TCP_NODELAY, [1], 4) = 0
14595 sendto(197, "\27\0\0\0\377\20\4Too many connections", 27, MSG_DONTWAIT, NULL, 0) = 27
14595 shutdown(197, SHUT_RDWR)          = 0
14595 close(197)                        = 0

echo $((27+1))
28

为什么 CPU 这么高,CPU 都在忙什么

/why-sysbench-dont-reconnect-after-kill/images/sysbench-top.png

./images/perf-top-tree.png

perf top 可以看到,kernel 在执行 __inet_check_established;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
strace -f -p `pidof sysbench`

437   mprotect(0xffff80037000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff80039000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 6
437   fcntl(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(6, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=6, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=6, revents=POLLOUT}])
437   getsockopt(6, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(6, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff8003b000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff8003d000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 7
437   fcntl(7, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(7, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=7, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=7, revents=POLLOUT}])
437   getsockopt(7, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(7, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff8003f000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
437   getpid()                          = 435
437   mprotect(0xffff80041000, 8192, PROT_READ|PROT_WRITE) = 0
437   socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 8
437   fcntl(8, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
437   connect(8, {sa_family=AF_INET, sin_port=htons(3306), sin_addr=inet_addr("192.168.245.145")}, 16) = -1 EINPROGRESS (Operation now in progress)
437   ppoll([{fd=8, events=POLLOUT}], 1, NULL, NULL, 0) = 1 ([{fd=8, revents=POLLOUT}])
437   getsockopt(8, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
437   fcntl(8, F_SETFL, O_RDONLY)       = 0
437   mprotect(0xffff80043000, 8192, PROT_READ|PROT_WRITE) = 0
437   clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
strace -f -c -p `pidof sysbench`
strace: Process 207 attached with 3 threads
^Cstrace: Process 207 detached
strace: Process 208 detached
strace: Process 209 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.55   16.874015        3734      4518           clock_nanosleep
 49.22   16.759397        3731      4491      4491 connect
  0.56    0.192136      192136         1           restart_syscall
  0.25    0.085898          19      4492           close
  0.18    0.061314          13      4491           socket
  0.11    0.037854           8      4491           fcntl
  0.11    0.037465           8      4491           getpid
  0.01    0.004097         146        28           write
------ ----------- ----------- --------- --------- ----------------
100.00   34.052176        1261     27003      4491 total

strace 可以看出 sysbench 在执行 connect,但是一直 errors,且可以看到几个 syscall calls 相差不大,echo $((4518-4491)) = 27

什么原因会导致 CLOSE_WAIT 状态

wiki 中搬一张图:

./images/TCP_CLOSE.svg

被动关闭方接收到 fin 后,没有发送 fin,被动关闭方状态停留在 CLOSE_WAIT。此例子中 CLOSE_WAIT 出现在 sysbench 中,说明是由于 mysql server 关闭发起 close,而客户端被动关闭未 close,客户端(sysbench)连接状态为 CLOSE_WAIT;

在 sysbench 机器中执行 nc -vz ip 3306,报错 Cannot assign requested address. 也说明是 sysbench 的问题。

1
2
3
bash-4.4# nc -v 192.168.245.145 3306
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Cannot assign requested address.

mysql 这里在做什么事情?mysql 多久会触发 close 行为呢?

mysql 接收到请求,后超时的包在这里

/why-sysbench-dont-reconnect-after-kill/images/mysql-conn-then-timeout-conn.png

可以看到,mysql 在 0.00618s mysql 返回了登录验证的 prompt,此时 mysql 中的连接状态是: ./images/mysql_process_bad_conn.png

/why-sysbench-dont-reconnect-after-kill/images/mysql_process_bad_timeout.png

10.008458 mysql 返回超时。说明 mysql 这里的登录超时时间是 10 s。

后续超时的连接被内核 reset 掉。

为什么 Sysbench 要疯狂创建 4 万多个连接

TODO: 需要翻翻 sysbench 的代码,确认一下原因了,看到 syscall 中大量的 connect,大概是到了一个死循环处

TODO: perf top 中展开可以看到很多函数地址,并无执行的函数名字,猜测是 sysbench 的函数,需要编译 sysbench 后确认一下

TODO: EADDRNOTAVAIL; //Cannot assign requested address 这个错误的含意是什么,何时出现