From patchwork Thu Dec 7 21:41:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yousuk Seung X-Patchwork-Id: 845865 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="bNbRTDWa"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yt88C42s0z9s81 for ; Fri, 8 Dec 2017 08:42:03 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751002AbdLGVmA (ORCPT ); Thu, 7 Dec 2017 16:42:00 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:46513 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750895AbdLGVl6 (ORCPT ); Thu, 7 Dec 2017 16:41:58 -0500 Received: by mail-it0-f67.google.com with SMTP id t1so473266ite.5 for ; Thu, 07 Dec 2017 13:41:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=eXUSkFt6JAqBUAfNc1qUMfgHqJcVW8JdaJv+Xb2KhBs=; b=bNbRTDWavujio2b0QgUzpS15qIGY2awfpICAA9mi0NhTiyBc+QAjNgGSOafJypMsRU Mtah3J5n60//av3n76qamF2rwz643w3EOMO5++R1LrLKb24Hpz50qSg36A04lnkyINPY 2Ycvhoynmz4zKlNDPr7jIh0OdAR2qy7awDF/eiC9ZmcylfM1WE57aROiCqa879wCV9Bd AAIPr+T4K1IZ5g7AlefCXNJvrsi8DwdeSzlXKYiyLmNndlM0E+5IK4+bTtsHXgzN8S81 7m/eALbe76Cxvv8i1UpFvsd/EVZiNYCBywoI5k+Uw+DMta3eet8U7V2N/CJZH7gPPKXs VuuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=eXUSkFt6JAqBUAfNc1qUMfgHqJcVW8JdaJv+Xb2KhBs=; b=NVGzPmav5BEEJchDcxEiHm0zLkURQ3ZQ9sLtT0BEyW+6aQJh1zSUQJpB5KTJW2P1VY /A0AoUBZkxhz6HR+CPJbcVgd/swcYaVGHJujKCTltouWm6+O/5X6Irv2kDqXGmP6M0iJ FWSHvfRvQsZ6AL1D9QQZ7fFrfABatZZbltHPgK4UPVmDj+uH1fSemge1iAc4C6vQM/nM cmrffFfIVcC9UHryAZXKdSyaFr0VetBQEWzh8KbEsPLcnRwJNR8+2JlYu/NTpgSSiR8e L1K3Y0nqs+1FvDsIqZ23DJRrdartARx1pPWm0dAsJiG33GWZEINnE0ctIzkuWiMliSHs NduQ== X-Gm-Message-State: AKGB3mIqnwVRIHw9h5vilz170URH/hhs1+PGrhWIn8MG1IMQgC3nS0Xi a+dQzcJM2p9pc6dgBH8k33ocAQ== X-Google-Smtp-Source: ACJfBovVKiVcQ0r4it0TLPhH1rJ3QJxYyG3Tp1JoGHFEEQwTGU9vA114oJCWeIMX8MUJ+6UT1medKg== X-Received: by 10.107.186.193 with SMTP id k184mr868225iof.242.1512682917778; Thu, 07 Dec 2017 13:41:57 -0800 (PST) Received: from ysseung.svl.corp.google.com ([100.116.92.22]) by smtp.gmail.com with ESMTPSA id r17sm2846616ioe.88.2017.12.07.13.41.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 07 Dec 2017 13:41:56 -0800 (PST) From: Yousuk Seung To: David Miller Cc: netdev@vger.kernel.org, Yousuk Seung , Neal Cardwell , Yuchung Cheng Subject: [PATCH net] tcp: invalidate rate samples during SACK reneging Date: Thu, 7 Dec 2017 13:41:34 -0800 Message-Id: <20171207214134.90015-1-ysseung@google.com> X-Mailer: git-send-email 2.15.1.424.g9478a66081-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Mark tcp_sock during a SACK reneging event and invalidate rate samples while marked. Such rate samples may overestimate bw by including packets that were SACKed before reneging. < ack 6001 win 10000 sack 7001:38001 < ack 7001 win 0 sack 8001:38001 // Reneg detected > seq 7001:8001 // RTO, SACK cleared. < ack 38001 win 10000 In above example the rate sample taken after the last ack will count 7001-38001 as delivered while the actual delivery rate likely could be much lower i.e. 7001-8001. This patch adds a new field tcp_sock.sack_reneg and marks it when we declare SACK reneging and entering TCP_CA_Loss, and unmarks it after the last rate sample was taken before moving back to TCP_CA_Open. This patch also invalidates rate samples taken while tcp_sock.is_sack_reneg is set. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Yousuk Seung Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Acked-by: Eric Dumazet Acked-by: Priyaranjan Jha --- include/linux/tcp.h | 3 ++- include/net/tcp.h | 2 +- net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 10 ++++++++-- net/ipv4/tcp_rate.c | 10 +++++++--- 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index df5d97a85e1a..ca4a6361389b 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -224,7 +224,8 @@ struct tcp_sock { rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */ fastopen_no_cookie:1, /* Allow send/recv SYN+data without a cookie */ - unused:3; + is_sack_reneg:1, /* in recovery from loss with SACK reneg? */ + unused:2; u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto : 1,/* Use linear timeouts for thin streams */ unused1 : 1, diff --git a/include/net/tcp.h b/include/net/tcp.h index 6998707e81f3..6da880d2f022 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1055,7 +1055,7 @@ void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb); void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb, struct rate_sample *rs); void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, - struct rate_sample *rs); + bool is_sack_reneg, struct rate_sample *rs); void tcp_rate_check_app_limited(struct sock *sk); /* These functions determine how the current flow behaves in respect of SACK diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bf97317e6c97..f08eebe60446 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2412,6 +2412,7 @@ int tcp_disconnect(struct sock *sk, int flags) tp->snd_cwnd_cnt = 0; tp->window_clamp = 0; tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; tcp_clear_retrans(tp); inet_csk_delack_init(sk); /* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 734cfc8ff76e..7b9e8b99b28d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1941,6 +1941,8 @@ void tcp_enter_loss(struct sock *sk) if (is_reneg) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING); tp->sacked_out = 0; + /* Mark SACK reneging until we recover from this loss event. */ + tp->is_sack_reneg = 1; } tcp_clear_all_retrans_hints(tp); @@ -2364,6 +2366,7 @@ static bool tcp_try_undo_recovery(struct sock *sk) return true; } tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; return false; } @@ -2397,8 +2400,10 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSPURIOUSRTOS); inet_csk(sk)->icsk_retransmits = 0; - if (frto_undo || tcp_is_sack(tp)) + if (frto_undo || tcp_is_sack(tp)) { tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; + } return true; } return false; @@ -3495,6 +3500,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) struct tcp_sacktag_state sack_state; struct rate_sample rs = { .prior_delivered = 0 }; u32 prior_snd_una = tp->snd_una; + bool is_sack_reneg = tp->is_sack_reneg; u32 ack_seq = TCP_SKB_CB(skb)->seq; u32 ack = TCP_SKB_CB(skb)->ack_seq; bool is_dupack = false; @@ -3611,7 +3617,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */ lost = tp->lost - lost; /* freshly marked lost */ - tcp_rate_gen(sk, delivered, lost, sack_state.rate); + tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate); tcp_cong_control(sk, ack, delivered, flag, sack_state.rate); tcp_xmit_recovery(sk, rexmit); return 1; diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c index 3330a370d306..c61240e43923 100644 --- a/net/ipv4/tcp_rate.c +++ b/net/ipv4/tcp_rate.c @@ -106,7 +106,7 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb, /* Update the connection delivery information and generate a rate sample. */ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, - struct rate_sample *rs) + bool is_sack_reneg, struct rate_sample *rs) { struct tcp_sock *tp = tcp_sk(sk); u32 snd_us, ack_us; @@ -124,8 +124,12 @@ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, rs->acked_sacked = delivered; /* freshly ACKed or SACKed */ rs->losses = lost; /* freshly marked lost */ - /* Return an invalid sample if no timing information is available. */ - if (!rs->prior_mstamp) { + /* Return an invalid sample if no timing information is available or + * in recovery from loss with SACK reneging. Rate samples taken during + * a SACK reneging event may overestimate bw by including packets that + * were SACKed before the reneg. + */ + if (!rs->prior_mstamp || is_sack_reneg) { rs->delivered = -1; rs->interval_us = -1; return;