Message ID | 1505111262-12620-1-git-send-email-liujian56@huawei.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | tcp: TCP_USER_TIMEOUT can not work in tcp_probe_timer() | expand |
On Mon, 2017-09-11 at 14:27 +0800, liujian56@huawei.com wrote: > From: liujian <liujian56@huawei.com> > > After the tcp socket go to ESTABLISHED stat, change IP address (server > side), > then the tcp socket will go tcp_probe_timer process. > > [root@localhost net]# netstat -toe > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State User Inode Timer > tcp 0 1104 9.81.254:personal-agent 9.84.201.213:23597 ESTABLISHED root 12819 probe (4.36/0/7) > [root@localhost net]# cat /proc/net/tcp > sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > 3: B1FE5109:15B3 D5C95409:5C2D 01 00000495:00000000 04:0000005E 00000000 0 7 12819 2 ffff95cdcf45a000 20 4 1 10 -1 > > In my test case, tcp_write_queue_head(sk) and tcp_send_head(sk) is same > SKB. > And ((s32)(tcp_time_stamp(tp) - start_ts) > > jiffies_to_msecs(icsk->icsk_user_timeout)) > always is false. Interesting. > Here use keepalive_time_elapsed(tp) to do the compare as > tcp_keepalive_timer do. But zero window probe and TCP_USER_TIMEOUT can be used without keepalives... A packetdrill test would help, I will write one. Thanks.
On Mon, 2017-09-11 at 07:49 -0700, Eric Dumazet wrote: > On Mon, 2017-09-11 at 14:27 +0800, liujian56@huawei.com wrote: > > From: liujian <liujian56@huawei.com> > > > > After the tcp socket go to ESTABLISHED stat, change IP address (server > > side), > > then the tcp socket will go tcp_probe_timer process. > > > > [root@localhost net]# netstat -toe > > Active Internet connections (w/o servers) > > Proto Recv-Q Send-Q Local Address Foreign Address State User Inode Timer > > tcp 0 1104 9.81.254:personal-agent 9.84.201.213:23597 ESTABLISHED root 12819 probe (4.36/0/7) > > [root@localhost net]# cat /proc/net/tcp > > sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode > > 3: B1FE5109:15B3 D5C95409:5C2D 01 00000495:00000000 04:0000005E 00000000 0 7 12819 2 ffff95cdcf45a000 20 4 1 10 -1 > > > > In my test case, tcp_write_queue_head(sk) and tcp_send_head(sk) is same > > SKB. > > And ((s32)(tcp_time_stamp(tp) - start_ts) > > > jiffies_to_msecs(icsk->icsk_user_timeout)) > > always is false. > > Interesting. > > > Here use keepalive_time_elapsed(tp) to do the compare as > > tcp_keepalive_timer do. > > > But zero window probe and TCP_USER_TIMEOUT can be used without > keepalives... > > > A packetdrill test would help, I will write one. So existing code seems to work : # cat window-probe-without-data-user-timeout.pkt 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> // Client advertises a zero receive window, so we can't send. +.1 < . 1:1(0) ack 1 win 0 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 +0 write(4, ..., 2920) = 2920 // Window probes are scheduled just like RTOs. +.3~+.31 > . 0:0(0) ack 1 +.6~+.62 > . 0:0(0) ack 1 +1.2~+1.24 > . 0:0(0) ack 1 +2.4~+2.48 > . 0:0(0) ack 1 # ./packetdrill window-probe-without-data-user_timeout.pkt 08:10:39.306137 IP 192.0.2.1.58149 > 192.168.79.31.8080: Flags [S], seq 0, win 0, options [mss 1460,sackOK,nop,nop,nop,wscale 7], length 0 08:10:39.306166 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [S.], seq 3982794529, ack 1, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 8], length 0 08:10:39.406296 IP 192.0.2.1.58149 > 192.168.79.31.8080: Flags [.], ack 1, win 0, length 0 08:10:39.716004 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0 08:10:40.327133 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0 08:10:41.540243 IP 192.168.79.31.8080 > 192.0.2.1.58149: Flags [.], ack 1, win 115, length 0 You can see we got only 3 probes, not 4.
On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote:
> You can see we got only 3 probes, not 4.
Here is complete packetdrill test showing that code behaves as expected.
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
// Client advertises a zero receive window, so we can't send.
+.1 < . 1:1(0) ack 1 win 0
+0 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0
+0 write(4, ..., 2920) = 2920
// Window probes are scheduled just like RTOs.
+.3~+.31 > . 0:0(0) ack 1
+.6~+.62 > . 0:0(0) ack 1
+1.2~+1.24 > . 0:0(0) ack 1
// Peer opens its window too late !
+3 < . 1:1(0) ack 1 win 1000
+0 > R 1:1(0)
Hi, In the scenario, tcp server side IP changed, and at that memont, userspace application still send data continuously; tcp_send_head(sk)'s timestamp always be refreshed. Here is the packetdrill script: 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> +.1 < . 1:1(0) ack 1 win 65530 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 +0 write(4, ..., 24) = 24 +0 > P. 1:25(24) ack 1 win 229 +.1 < . 1:1(0) ack 25 win 65530 //change the ipaddress +1 `ifconfig tun0 192.168.0.10/16` +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +0 `ifconfig tun0 192.168.0.1/16` +0 < . 1:1(0) ack 1 win 1000 +0 write(4, ..., 24) = -1 [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt test.pkt:50: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory) real 1m11.364s user 0m0.028s sys 0m0.106s [root@localhost ~]# netstat -toen Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State User Inode Timer tcp 0 504 192.168.0.1:8080 192.0.2.1:33993 ESTABLISHED 0 45453 probe (22.38/0/7) since the script didn't wait for enough time, here only got 7 probes. 在 2017/9/11 23:22, Eric Dumazet 写道: > On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote: > >> You can see we got only 3 probes, not 4. > > Here is complete packetdrill test showing that code behaves as expected. > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> > > // Client advertises a zero receive window, so we can't send. > +.1 < . 1:1(0) ack 1 win 0 > +0 accept(3, ..., ...) = 4 > > +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > +0 write(4, ..., 2920) = 2920 > > // Window probes are scheduled just like RTOs. > +.3~+.31 > . 0:0(0) ack 1 > +.6~+.62 > . 0:0(0) ack 1 > +1.2~+1.24 > . 0:0(0) ack 1 > > // Peer opens its window too late ! > +3 < . 1:1(0) ack 1 win 1000 > +0 > R 1:1(0) > > > > . >
On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: > Hi, > > In the scenario, tcp server side IP changed, and at that memont, > userspace application still send data continuously; > tcp_send_head(sk)'s timestamp always be refreshed. > > Here is the packetdrill script: > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > +0 bind(3, ..., ...) = 0 > +0 listen(3, 1) = 0 > > +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> > > +.1 < . 1:1(0) ack 1 win 65530 > +0 accept(3, ..., ...) = 4 > > +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > +0 write(4, ..., 24) = 24 > +0 > P. 1:25(24) ack 1 win 229 > +.1 < . 1:1(0) ack 25 win 65530 > > //change the ipaddress > +1 `ifconfig tun0 192.168.0.10/16` > > +1 write(4, ..., 24) = 24 > +1 write(4, ..., 24) = 24 > +1 write(4, ..., 24) = 24 > +1 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > +3 write(4, ..., 24) = 24 > > +0 `ifconfig tun0 192.168.0.1/16` > +0 < . 1:1(0) ack 1 win 1000 > +0 write(4, ..., 24) = -1 > > This has nothing to do with the code patch you have changed. How have you tested your patch exactly ? > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > test.pkt:50: runtime error in write call: Expected result -1 but got 24 with errno 2 (No such file or directory) > > real 1m11.364s > user 0m0.028s > sys 0m0.106s > > [root@localhost ~]# netstat -toen > Active Internet connections (w/o servers) > Proto Recv-Q Send-Q Local Address Foreign Address State User Inode Timer > tcp 0 504 192.168.0.1:8080 192.0.2.1:33993 ESTABLISHED 0 45453 probe (22.38/0/7) > > since the script didn't wait for enough time, here only got 7 probes. > > 在 2017/9/11 23:22, Eric Dumazet 写道: > > On Mon, 2017-09-11 at 08:13 -0700, Eric Dumazet wrote: > > > >> You can see we got only 3 probes, not 4. > > > > Here is complete packetdrill test showing that code behaves as expected. > > > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > +0 bind(3, ..., ...) = 0 > > +0 listen(3, 1) = 0 > > > > +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> > > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> > > > > // Client advertises a zero receive window, so we can't send. > > +.1 < . 1:1(0) ack 1 win 0 > > +0 accept(3, ..., ...) = 4 > > > > +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > > +0 write(4, ..., 2920) = 2920 > > > > // Window probes are scheduled just like RTOs. > > +.3~+.31 > . 0:0(0) ack 1 > > +.6~+.62 > . 0:0(0) ack 1 > > +1.2~+1.24 > . 0:0(0) ack 1 > > > > // Peer opens its window too late ! > > +3 < . 1:1(0) ack 1 win 1000 > > +0 > R 1:1(0) > > > > > > > > . > > >
On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote: > On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: > > Hi, > > > > In the scenario, tcp server side IP changed, and at that memont, > > userspace application still send data continuously; > > tcp_send_head(sk)'s timestamp always be refreshed. > > > > Here is the packetdrill script: > > > > 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > > +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > > +0 bind(3, ..., ...) = 0 > > +0 listen(3, 1) = 0 > > > > +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> > > +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> > > > > +.1 < . 1:1(0) ack 1 win 65530 > > +0 accept(3, ..., ...) = 4 > > > > +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > > +0 write(4, ..., 24) = 24 > > +0 > P. 1:25(24) ack 1 win 229 > > +.1 < . 1:1(0) ack 25 win 65530 > > > > //change the ipaddress > > +1 `ifconfig tun0 192.168.0.10/16` > > > > +1 write(4, ..., 24) = 24 > > +1 write(4, ..., 24) = 24 > > +1 write(4, ..., 24) = 24 > > +1 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > +3 write(4, ..., 24) = 24 > > > > +0 `ifconfig tun0 192.168.0.1/16` > > +0 < . 1:1(0) ack 1 win 1000 > > +0 write(4, ..., 24) = -1 > > > > > > This has nothing to do with the code patch you have changed. > > How have you tested your patch exactly ? > lpaa23:~# ss -toenmi src :8080 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 144 192.168.134.161:8080 192.0.2.1:51165 timer:(persist,8.262ms,5) ino:1 82083 sk:3 <-> skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12 segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps app_limited busy:11346ms rcv_space:29200 notsent:1 44 minrtt:100.043 This is the typical RTO timer, not zero window probe.
在 2017/9/12 23:38, Eric Dumazet 写道: > On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote: >> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: >>> Hi, >>> >>> In the scenario, tcp server side IP changed, and at that memont, >>> userspace application still send data continuously; >>> tcp_send_head(sk)'s timestamp always be refreshed. >>> >>> Here is the packetdrill script: >>> >>> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 >>> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >>> +0 bind(3, ..., ...) = 0 >>> +0 listen(3, 1) = 0 >>> >>> +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> >>> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> >>> >>> +.1 < . 1:1(0) ack 1 win 65530 >>> +0 accept(3, ..., ...) = 4 >>> >>> +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 >>> +0 write(4, ..., 24) = 24 >>> +0 > P. 1:25(24) ack 1 win 229 >>> +.1 < . 1:1(0) ack 25 win 65530 >>> >>> //change the ipaddress >>> +1 `ifconfig tun0 192.168.0.10/16` >>> >>> +1 write(4, ..., 24) = 24 >>> +1 write(4, ..., 24) = 24 >>> +1 write(4, ..., 24) = 24 >>> +1 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> +3 write(4, ..., 24) = 24 >>> >>> +0 `ifconfig tun0 192.168.0.1/16` >>> +0 < . 1:1(0) ack 1 win 1000 >>> +0 write(4, ..., 24) = -1 >>> >>> >> >> This has nothing to do with the code patch you have changed. >> >> How have you tested your patch exactly ? >> I tested the patch, it can work. [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) real 0m5.356s user 0m0.026s sys 0m0.104s [root@localhost ~]# ss -toenmi src :8080 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 48 192.168.0.1:8080 192.0.2.1:39559 timer:(persist,186ms,2) ino:37178 sk:6 <-> skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200 if change the TCP_USER_TIMEOUT to 30s, test result as below: [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) real 0m44.362s user 0m0.018s sys 0m0.110s ESTAB 0 360 192.168.0.1:8080 192.0.2.1:47577 timer:(persist,516ms,6) ino:18806 sk:5 <-> skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp > > lpaa23:~# ss -toenmi src :8080 > State Recv-Q Send-Q Local Address:Port Peer > Address:Port > ESTAB 0 144 192.168.134.161:8080 > 192.0.2.1:51165 timer:(persist,8.262ms,5) ino:1 > 82083 sk:3 <-> > skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic > wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 > mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12 > segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l > astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps > app_limited busy:11346ms rcv_space:29200 notsent:1 > 44 minrtt:100.043userspace application still send data continuously > > This is the typical RTO timer, not zero window probe. > with the script, it is not zero window; but the code enter tcp_probe_timer. ->tcp_sendmsg -->tcp_push ---->__tcp_push_pending_frames ------>tcp_write_xmit -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed) ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH) -------->tcp_transmit_skb (send fail, do not advance send head) ------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer) Here userspace application still send data continuously, it triggered ICSK_TIME_PROBE0 timer and tcp_send_head(sk)'s timestamp refresh.
在 2017/9/13 14:56, liujian 写道: > > > 在 2017/9/12 23:38, Eric Dumazet 写道: >> On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote: >>> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: >>>> Hi, >>>> >>>> In the scenario, tcp server side IP changed, and at that memont, >>>> userspace application still send data continuously; >>>> tcp_send_head(sk)'s timestamp always be refreshed. >>>> >>>> Here is the packetdrill script: >>>> >>>> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 >>>> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 >>>> +0 bind(3, ..., ...) = 0 >>>> +0 listen(3, 1) = 0 >>>> >>>> +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> >>>> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> >>>> >>>> +.1 < . 1:1(0) ack 1 win 65530 >>>> +0 accept(3, ..., ...) = 4 >>>> >>>> +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 >>>> +0 write(4, ..., 24) = 24 >>>> +0 > P. 1:25(24) ack 1 win 229 >>>> +.1 < . 1:1(0) ack 25 win 65530 >>>> >>>> //change the ipaddress >>>> +1 `ifconfig tun0 192.168.0.10/16` >>>> >>>> +1 write(4, ..., 24) = 24 >>>> +1 write(4, ..., 24) = 24 >>>> +1 write(4, ..., 24) = 24 >>>> +1 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> +3 write(4, ..., 24) = 24 >>>> >>>> +0 `ifconfig tun0 192.168.0.1/16` >>>> +0 < . 1:1(0) ack 1 win 1000 >>>> +0 write(4, ..., 24) = -1 >>>> >>>> >>> >>> This has nothing to do with the code patch you have changed. >>> >>> How have you tested your patch exactly ? >>> > I tested the patch, it can work. > > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) > > real 0m5.356s > user 0m0.026s > sys 0m0.104s > > [root@localhost ~]# ss -toenmi src :8080 > State Recv-Q Send-Q Local Address:Port Peer Address:Port > ESTAB 0 48 192.168.0.1:8080 192.0.2.1:39559 timer:(persist,186ms,2) ino:37178 sk:6 <-> > skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200 > > > if change the TCP_USER_TIMEOUT to 30s, test result as below: > > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) > > real 0m44.362s > user 0m0.018s > sys 0m0.110s > > ESTAB 0 360 192.168.0.1:8080 192.0.2.1:47577 timer:(persist,516ms,6) ino:18806 sk:5 <-> > skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp > >> >> lpaa23:~# ss -toenmi src :8080 >> State Recv-Q Send-Q Local Address:Port Peer >> Address:Port >> ESTAB 0 144 192.168.134.161:8080 >> 192.0.2.1:51165 timer:(persist,8.262ms,5) ino:1 >> 82083 sk:3 <-> >> skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic >> wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 >> mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12 >> segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l >> astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps >> app_limited busy:11346ms rcv_space:29200 notsent:1 >> 44 minrtt:100.043userspace application still send data continuously >> >> This is the typical RTO timer, not zero window probe. >> > with the script, it is not zero window; but the code enter tcp_probe_timer. > ->tcp_sendmsg > -->tcp_push > ---->__tcp_push_pending_frames > ------>tcp_write_xmit > -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed) > ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH) > -------->tcp_transmit_skb (send fail, do not advance send head) > ------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer) sorry, change to ->tcp_sendmsg -->tcp_push ---->__tcp_push_pending_frames ------>tcp_write_xmit -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed) ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH) ------>tcp_write_xmit (send fail, do not advance send head) ------>tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer) > Here userspace application still send data continuously, > it triggered ICSK_TIME_PROBE0 timer and tcp_send_head(sk)'s timestamp refresh. > >
On Wed, 2017-09-13 at 15:15 +0800, liujian wrote: > > 在 2017/9/13 14:56, liujian 写道: > > > > > > 在 2017/9/12 23:38, Eric Dumazet 写道: > >> On Tue, 2017-09-12 at 08:05 -0700, Eric Dumazet wrote: > >>> On Tue, 2017-09-12 at 14:08 +0800, liujian wrote: > >>>> Hi, > >>>> > >>>> In the scenario, tcp server side IP changed, and at that memont, > >>>> userspace application still send data continuously; > >>>> tcp_send_head(sk)'s timestamp always be refreshed. > >>>> > >>>> Here is the packetdrill script: > >>>> > >>>> 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 > >>>> +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 > >>>> +0 bind(3, ..., ...) = 0 > >>>> +0 listen(3, 1) = 0 > >>>> > >>>> +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> > >>>> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> > >>>> > >>>> +.1 < . 1:1(0) ack 1 win 65530 > >>>> +0 accept(3, ..., ...) = 4 > >>>> > >>>> +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 > >>>> +0 write(4, ..., 24) = 24 > >>>> +0 > P. 1:25(24) ack 1 win 229 > >>>> +.1 < . 1:1(0) ack 25 win 65530 > >>>> > >>>> //change the ipaddress > >>>> +1 `ifconfig tun0 192.168.0.10/16` > >>>> > >>>> +1 write(4, ..., 24) = 24 > >>>> +1 write(4, ..., 24) = 24 > >>>> +1 write(4, ..., 24) = 24 > >>>> +1 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> +3 write(4, ..., 24) = 24 > >>>> > >>>> +0 `ifconfig tun0 192.168.0.1/16` > >>>> +0 < . 1:1(0) ack 1 win 1000 > >>>> +0 write(4, ..., 24) = -1 > >>>> > >>>> > >>> > >>> This has nothing to do with the code patch you have changed. > >>> > >>> How have you tested your patch exactly ? > >>> > > I tested the patch, it can work. > > > > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > > test.pkt:24: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) > > > > real 0m5.356s > > user 0m0.026s > > sys 0m0.104s > > > > [root@localhost ~]# ss -toenmi src :8080 > > State Recv-Q Send-Q Local Address:Port Peer Address:Port > > ESTAB 0 48 192.168.0.1:8080 192.0.2.1:39559 timer:(persist,186ms,2) ino:37178 sk:6 <-> > > skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:2 rtt:100.253/37.643 mss:1460 cwnd:10 bytes_acked:24 segs_out:5 segs_in:3 send 1.2Mbps lastsnd:976 lastrcv:4082 lastack:3982 pacing_rate 2.3Mbps rcv_space:29200 > > > > > > if change the TCP_USER_TIMEOUT to 30s, test result as below: > > > > [root@localhost ~]# time ./gtests/net/packetdrill/packetdrill test.pkt > > test.pkt:37: runtime error in write call: Expected result 24 but got -1 with errno 110 (Connection timed out) > > > > real 0m44.362s > > user 0m0.018s > > sys 0m0.110s > > > > ESTAB 0 360 192.168.0.1:8080 192.0.2.1:47577 timer:(persist,516ms,6) ino:18806 sk:5 <-> > > skmem:(r0,rb369280,t0,tb87040,f1792,w2304,o0,bl0) sack cubic wscale:7,7 rto:301 backoff:6 rtt:100.228/37.623 mss:10 bytes_acked:24 segs_out:22 segs_in:3 send 1.2Mbps lastsnd:2343 lastrcv:40450 lastack:40350 pacing_rate 2.3Mbps rcv_sp > > > >> > >> lpaa23:~# ss -toenmi src :8080 > >> State Recv-Q Send-Q Local Address:Port Peer > >> Address:Port > >> ESTAB 0 144 192.168.134.161:8080 > >> 192.0.2.1:51165 timer:(persist,8.262ms,5) ino:1 > >> 82083 sk:3 <-> > >> skmem:(r0,rb359040,t0,tb46080,f1792,w2304,o0,bl0,d0) sack cubic > >> wscale:7,8 rto:301 backoff:5 rtt:100.127/37.576 > >> mss:1460 rcvmss:536 advmss:1460 cwnd:10 bytes_acked:24 segs_out:12 > >> segs_in:3 data_segs_out:12 send 1.2Mbps lastsnd:1370 l > >> astrcv:13348 lastack:13248 pacing_rate 2.3Mbps delivery_rate 116.7Kbps > >> app_limited busy:11346ms rcv_space:29200 notsent:1 > >> 44 minrtt:100.043userspace application still send data continuously > >> > >> This is the typical RTO timer, not zero window probe. > >> > > with the script, it is not zero window; but the code enter tcp_probe_timer. > > ->tcp_sendmsg > > -->tcp_push > > ---->__tcp_push_pending_frames > > ------>tcp_write_xmit > > -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed) > > ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH) > > -------->tcp_transmit_skb (send fail, do not advance send head) > > ------->tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer) > sorry, change to > ->tcp_sendmsg > -->tcp_push > ---->__tcp_push_pending_frames > ------>tcp_write_xmit > -------->tcp_transmit_skb (at this function, tcp_send_head(sk)'s timestamp was be refreshed) > ---------->ip_queue_xmit (can not find route, return -EHOSTUNREACH) > ------>tcp_write_xmit (send fail, do not advance send head) > ------>tcp_check_probe_timer (reset ICSK_TIME_PROBE0 timer) OK but : 1) keepalive_time_elapsed() returns jiffies, and you compare it to a ms value. It will break on kernels with HZ=100 2) keepalive_time_elapsed() is reset every time a packet is received. With following packetdrill your patch (even if we fix 1)) breaks again # ./packetdrill --local_ip 192.168.102.64 user_timeout.pkt # cat user_timeout.pkt 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 `ifconfig tun0 192.168.102.64/16; ip ro add 192.0.2.1 dev tun0` +0 < S 0:0(0) win 0 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8> +.1 < . 1:1(0) ack 1 win 65530 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_USER_TIMEOUT, [3000], 4) = 0 +0 write(4, ..., 24) = 24 +0 > P. 1:25(24) ack 1 win 115 +.1 < . 1:1(0) ack 25 win 65530 //change the ipaddress +1 `ifconfig tun0 192.168.0.10/16` +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +1 write(4, ..., 24) = 24 +0 `ifconfig tun0 192.168.102.64/16` +0 < . 1:2(1) ack 25 win 65530 +0 `ifconfig tun0 192.168.0.10/16` +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +3 write(4, ..., 24) = 24 +0 `ifconfig tun0 192.168.0.1/16` +0 < . 1:1(0) ack 1 win 1000 +0 write(4, ..., 24) = -1 I would instead try to keep a meaningful value in tcp_send_head(sk)->skb_mstamp if the transmit failed. Or more exactly change skb->skb_mstamp only on (successfully) transmitted skb. I will provide a patch ( fixing net/ipv4/tcp_output.c )
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 655dd8d..2a28a03 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -325,7 +325,7 @@ static void tcp_probe_timer(struct sock *sk) if (!start_ts) tcp_send_head(sk)->skb_mstamp = tp->tcp_mstamp; else if (icsk->icsk_user_timeout && - (s32)(tcp_time_stamp(tp) - start_ts) > + keepalive_time_elapsed(tp) >= jiffies_to_msecs(icsk->icsk_user_timeout)) goto abort;