From patchwork Sat Sep 17 17:35:47 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neal Cardwell X-Patchwork-Id: 671247 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3sbzqh0St4z9s2Q for ; Sun, 18 Sep 2016 03:37:20 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b=PnMmSWJO; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932179AbcIQRhL (ORCPT ); Sat, 17 Sep 2016 13:37:11 -0400 Received: from mail-qk0-f175.google.com ([209.85.220.175]:35081 "EHLO mail-qk0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754166AbcIQRge (ORCPT ); Sat, 17 Sep 2016 13:36:34 -0400 Received: by mail-qk0-f175.google.com with SMTP id t7so115305630qkh.2 for ; Sat, 17 Sep 2016 10:36:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=1FT1mXvwVe/OUwlqu2tmLT4XDVPWhZ25wtsr64bFu9M=; b=PnMmSWJONrTedS6Vm9hR0X1KB1Hsi5g4zEdIe7GY9JjJqv0smttWsLA4nCgTMMpZ23 MofRvQu/aDvBrhyHbrEEXBl76lVU+DRng6VooT2sUQm58xQLrLYimvOJzT9n91vdrFXU Lvj+omQoFPAzNeot4IQDVLnVq5+9ERPjW81DT/kimWzlJV2oN/QauyVWp6g4t29csTy3 2QvsfOxEIUcEiPYbQvg7ihs59yo4Q4SFFHgO2Cy2+PgkViKJw9KyLJWTu210tv48IZ7v tibU+sgRpx/uTH0c4u3odGF2OexGGU3K7/2E8mXwMk4ajMjUaGZmuYoU64BJT96+j1r8 SpPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=1FT1mXvwVe/OUwlqu2tmLT4XDVPWhZ25wtsr64bFu9M=; b=IJVZtjEXufiD2JTY97HUqofMNWobZ7GCEKSedxZQMSqzJ6Ib3DJYpUjOztgerT/gnw lgea1tdaZrT7R6YiIkpzQYLjPMc1+DCYdpYqN3ps9sCKP7BbGEGEYWVX79UwA7bK5btt gZIyMED4TvDf7sfXnZn1a/G9v8owoqNUkdHd8XLOgwBCShC7sUSQ8zmTTXQVA40wQ8Sn ty6bJ9iIPiNX0frWnt4bWRFFZf0nrkXqMzWxXaimJyku3Qc+DkRxgs6wPswDvU/9+fVr x/+a3MC6Ha4hTFpdBKXYGz8kiPnYYWvH8bdfrqqGkmC4s2+2xh1KXORQzynicHHguj1n twew== X-Gm-Message-State: AE9vXwPL13nzYba43wFyt+ZOJkhvSinPjvIeQu/Ke4f2a0qu5X0SYHf8hhVvtHAEYn61mGYL X-Received: by 10.55.163.67 with SMTP id m64mr21196058qke.68.1474133792546; Sat, 17 Sep 2016 10:36:32 -0700 (PDT) Received: from joy.nyc.corp.google.com ([100.101.230.104]) by smtp.gmail.com with ESMTPSA id t21sm8068625qkg.4.2016.09.17.10.36.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 17 Sep 2016 10:36:32 -0700 (PDT) From: Neal Cardwell To: David Miller Cc: netdev@vger.kernel.org, Yuchung Cheng , Van Jacobson , Neal Cardwell , Nandita Dukkipati , Eric Dumazet , Soheil Hassas Yeganeh Subject: [PATCH v2 net-next 14/16] tcp: new CC hook to set sending rate with rate_sample in any CA state Date: Sat, 17 Sep 2016 13:35:47 -0400 Message-Id: <1474133749-12895-15-git-send-email-ncardwell@google.com> X-Mailer: git-send-email 2.8.0.rc3.226.g39d4020 In-Reply-To: <1474133749-12895-1-git-send-email-ncardwell@google.com> References: <1474133749-12895-1-git-send-email-ncardwell@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Yuchung Cheng This commit introduces an optional new "omnipotent" hook, cong_control(), for congestion control modules. The cong_control() function is called at the end of processing an ACK (i.e., after updating sequence numbers, the SACK scoreboard, and loss detection). At that moment we have precise delivery rate information the congestion control module can use to control the sending behavior (using cwnd, TSO skb size, and pacing rate) in any CA state. This function can also be used by a congestion control that prefers not to use the default cwnd reduction approach (i.e., the PRR algorithm) during CA_Recovery to control the cwnd and sending rate during loss recovery. We take advantage of the fact that recent changes defer the retransmission or transmission of new data (e.g. by F-RTO) in recovery until the new tcp_cong_control() function is run. With this commit, we only run tcp_update_pacing_rate() if the congestion control is not using this new API. New congestion controls which use the new API do not want the TCP stack to run the default pacing rate calculation and overwrite whatever pacing rate they have chosen at initialization time. Signed-off-by: Van Jacobson Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Nandita Dukkipati Signed-off-by: Eric Dumazet Signed-off-by: Soheil Hassas Yeganeh --- include/net/tcp.h | 4 ++++ net/ipv4/tcp_cong.c | 2 +- net/ipv4/tcp_input.c | 17 ++++++++++++++--- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 1aa9628..f83b7f2 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -919,6 +919,10 @@ struct tcp_congestion_ops { u32 (*tso_segs_goal)(struct sock *sk); /* returns the multiplier used in tcp_sndbuf_expand (optional) */ u32 (*sndbuf_expand)(struct sock *sk); + /* call when packets are delivered to update cwnd and pacing rate, + * after all the ca_state processing. (optional) + */ + void (*cong_control)(struct sock *sk, const struct rate_sample *rs); /* get info for inet_diag (optional) */ size_t (*get_info)(struct sock *sk, u32 ext, int *attr, union tcp_cc_info *info); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 882caa4..1294af4 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -69,7 +69,7 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca) int ret = 0; /* all algorithms must implement ssthresh and cong_avoid ops */ - if (!ca->ssthresh || !ca->cong_avoid) { + if (!ca->ssthresh || !(ca->cong_avoid || ca->cong_control)) { pr_err("%s does not implement required ops\n", ca->name); return -EINVAL; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a134e66..931fe32 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2536,6 +2536,9 @@ static inline void tcp_end_cwnd_reduction(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + if (inet_csk(sk)->icsk_ca_ops->cong_control) + return; + /* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */ if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR || (tp->undo_marker && tp->snd_ssthresh < TCP_INFINITE_SSTHRESH)) { @@ -3312,8 +3315,15 @@ static inline bool tcp_may_raise_cwnd(const struct sock *sk, const int flag) * information. All transmission or retransmission are delayed afterwards. */ static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked, - int flag) + int flag, const struct rate_sample *rs) { + const struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ca_ops->cong_control) { + icsk->icsk_ca_ops->cong_control(sk, rs); + return; + } + if (tcp_in_cwnd_reduction(sk)) { /* Reduce cwnd if state mandates */ tcp_cwnd_reduction(sk, acked_sacked, flag); @@ -3683,7 +3693,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */ lost = tp->lost - lost; /* freshly marked lost */ tcp_rate_gen(sk, delivered, lost, &now, &rs); - tcp_cong_control(sk, ack, delivered, flag); + tcp_cong_control(sk, ack, delivered, flag, &rs); tcp_xmit_recovery(sk, rexmit); return 1; @@ -5981,7 +5991,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb) } else tcp_init_metrics(sk); - tcp_update_pacing_rate(sk); + if (!inet_csk(sk)->icsk_ca_ops->cong_control) + tcp_update_pacing_rate(sk); /* Prevent spurious tcp_cwnd_restart() on first data packet */ tp->lsndtime = tcp_time_stamp;