From patchwork Wed Jan 23 20:04:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Priyaranjan Jha X-Patchwork-Id: 1030149 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="qnx5A968"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43lGXt4zXsz9sCX for ; Thu, 24 Jan 2019 07:07:26 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726299AbfAWUHZ (ORCPT ); Wed, 23 Jan 2019 15:07:25 -0500 Received: from mail-io1-f73.google.com ([209.85.166.73]:51525 "EHLO mail-io1-f73.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726101AbfAWUHZ (ORCPT ); Wed, 23 Jan 2019 15:07:25 -0500 Received: by mail-io1-f73.google.com with SMTP id q207so2636546iod.18 for ; Wed, 23 Jan 2019 12:07:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4tEPOL6MIcoFfgFMMUgcelx+nxlgZfoXvROj9asS3qQ=; b=qnx5A9689J4BaHW4hxOMqq4PsLVIGkpCn8pE1GET8pSjsBeaon253LY1t1HBmuOzls IRYgTsG8CiRb4ci6hfxau7KqheWoOdQ0jz5VD0SFZ8JfXbsaqnfPDNVKBWN42ZKeRE9I MCbptLNDx2orK1Hg8aaf3DKid6kVvOaIYkKX7NPObcHoRq2eUxbYqCpyqoCmvTx0FWiF KHzvrEYL0q2Dxz5QaXWOsAr2UlDq6L+GCCezqF2/UHeslhxSah5/8yWlewMZgcld4S7Q bPvEhpgcoBfsgxTt/9WT+ypA2hB8d9DGnvE99qPixfeqpT5xL5QW3a7ktcbeFYudRNcf 69ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4tEPOL6MIcoFfgFMMUgcelx+nxlgZfoXvROj9asS3qQ=; b=O9qkX/9EN3UgVjFMHYLOrNlt4kaPopmdG/GTG/bnZAy+wuopSmdVqvRxEARppth9JO P3+KabbIMnyezqaFjI6HsTPGbkWDy6nM82sN4wcCcA3KJlGwDxqIYn1n8wE/W6ajsrox WoqgJYDSBONj6CTKJ6DP4NyX6s290qFmd1F8Jw2yKD2SNUXKe6R2XBoxt6MvgieY77ih wsrm+0rpSj7LvFZUIhVQGQ1bn6ykM540CPTcprnEnsxF5NRBFF7Hi35V1DRiVFnQPYNK R3GTOZSdoc4f79YNweviPAeDP+8SlwLx/aJxfK5ReVs77pom3Cu68jXEKIjTy0vYsHcd 9h2Q== X-Gm-Message-State: AHQUAuZlE8RFG0EGEm0EPwZistP0PTzPeMynuZW2mzcj5s7mivi8J3+N ViIF5dsiAJcQjb6GuZLSN2bqMN1YH7hSrUk= X-Google-Smtp-Source: ALg8bN73nG3P5TRKOA5vGoENK6EXMhWFcCwExn2sQmt+bL2/esI3TtU+kJpzASNsyJmlKy2cvm9mem5uevIXZP0= X-Received: by 2002:a24:7381:: with SMTP id y123mr3069948itb.32.1548274044104; Wed, 23 Jan 2019 12:07:24 -0800 (PST) Date: Wed, 23 Jan 2019 12:04:53 -0800 In-Reply-To: <20190123200454.260121-1-priyarjha@google.com> Message-Id: <20190123200454.260121-2-priyarjha@google.com> Mime-Version: 1.0 References: <20190123200454.260121-1-priyarjha@google.com> X-Mailer: git-send-email 2.20.1.321.g9e740568ce-goog Subject: [PATCH net-next 1/2] tcp_bbr: refactor bbr_target_cwnd() for general inflight provisioning From: Priyaranjan Jha To: David Miller Cc: netdev@vger.kernel.org, Priyaranjan Jha , Neal Cardwell , Yuchung Cheng Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Because bbr_target_cwnd() is really a general-purpose BBR helper for computing some volume of inflight data as a function of the estimated BDP, refactor it into following helper functions: - bbr_bdp() - bbr_quantization_budget() - bbr_inflight() Signed-off-by: Priyaranjan Jha Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng --- net/ipv4/tcp_bbr.c | 60 ++++++++++++++++++++++++++++++---------------- 1 file changed, 39 insertions(+), 21 deletions(-) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 0f497fc49c3f..6b6c7f14ccf9 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -315,30 +315,19 @@ static void bbr_cwnd_event(struct sock *sk, enum tcp_ca_event event) } } -/* Find target cwnd. Right-size the cwnd based on min RTT and the - * estimated bottleneck bandwidth: +/* Calculate bdp based on min RTT and the estimated bottleneck bandwidth: * - * cwnd = bw * min_rtt * gain = BDP * gain + * bdp = bw * min_rtt * gain * * The key factor, gain, controls the amount of queue. While a small gain * builds a smaller queue, it becomes more vulnerable to noise in RTT * measurements (e.g., delayed ACKs or other ACK compression effects). This * noise may cause BBR to under-estimate the rate. - * - * To achieve full performance in high-speed paths, we budget enough cwnd to - * fit full-sized skbs in-flight on both end hosts to fully utilize the path: - * - one skb in sending host Qdisc, - * - one skb in sending host TSO/GSO engine - * - one skb being received by receiver host LRO/GRO/delayed-ACK engine - * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because - * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets, - * which allows 2 outstanding 2-packet sequences, to try to keep pipe - * full even with ACK-every-other-packet delayed ACKs. */ -static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain) +static u32 bbr_bdp(struct sock *sk, u32 bw, int gain) { struct bbr *bbr = inet_csk_ca(sk); - u32 cwnd; + u32 bdp; u64 w; /* If we've never had a valid RTT sample, cap cwnd at the initial @@ -353,7 +342,24 @@ static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain) w = (u64)bw * bbr->min_rtt_us; /* Apply a gain to the given value, then remove the BW_SCALE shift. */ - cwnd = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT; + bdp = (((w * gain) >> BBR_SCALE) + BW_UNIT - 1) / BW_UNIT; + + return bdp; +} + +/* To achieve full performance in high-speed paths, we budget enough cwnd to + * fit full-sized skbs in-flight on both end hosts to fully utilize the path: + * - one skb in sending host Qdisc, + * - one skb in sending host TSO/GSO engine + * - one skb being received by receiver host LRO/GRO/delayed-ACK engine + * Don't worry, at low rates (bbr_min_tso_rate) this won't bloat cwnd because + * in such cases tso_segs_goal is 1. The minimum cwnd is 4 packets, + * which allows 2 outstanding 2-packet sequences, to try to keep pipe + * full even with ACK-every-other-packet delayed ACKs. + */ +static u32 bbr_quantization_budget(struct sock *sk, u32 cwnd, int gain) +{ + struct bbr *bbr = inet_csk_ca(sk); /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr_tso_segs_goal(sk); @@ -368,6 +374,17 @@ static u32 bbr_target_cwnd(struct sock *sk, u32 bw, int gain) return cwnd; } +/* Find inflight based on min RTT and the estimated bottleneck bandwidth. */ +static u32 bbr_inflight(struct sock *sk, u32 bw, int gain) +{ + u32 inflight; + + inflight = bbr_bdp(sk, bw, gain); + inflight = bbr_quantization_budget(sk, inflight, gain); + + return inflight; +} + /* With pacing at lower layers, there's often less data "in the network" than * "in flight". With TSQ and departure time pacing at lower layers (e.g. fq), * we often have several skbs queued in the pacing layer with a pre-scheduled @@ -462,7 +479,8 @@ static void bbr_set_cwnd(struct sock *sk, const struct rate_sample *rs, goto done; /* If we're below target cwnd, slow start cwnd toward target cwnd. */ - target_cwnd = bbr_target_cwnd(sk, bw, gain); + target_cwnd = bbr_bdp(sk, bw, gain); + target_cwnd = bbr_quantization_budget(sk, target_cwnd, gain); if (bbr_full_bw_reached(sk)) /* only cut cwnd if we filled the pipe */ cwnd = min(cwnd + acked, target_cwnd); else if (cwnd < target_cwnd || tp->delivered < TCP_INIT_CWND) @@ -503,14 +521,14 @@ static bool bbr_is_next_cycle_phase(struct sock *sk, if (bbr->pacing_gain > BBR_UNIT) return is_full_length && (rs->losses || /* perhaps pacing_gain*BDP won't fit */ - inflight >= bbr_target_cwnd(sk, bw, bbr->pacing_gain)); + inflight >= bbr_inflight(sk, bw, bbr->pacing_gain)); /* A pacing_gain < 1.0 tries to drain extra queue we added if bw * probing didn't find more bw. If inflight falls to match BDP then we * estimate queue is drained; persisting would underutilize the pipe. */ return is_full_length || - inflight <= bbr_target_cwnd(sk, bw, BBR_UNIT); + inflight <= bbr_inflight(sk, bw, BBR_UNIT); } static void bbr_advance_cycle_phase(struct sock *sk) @@ -762,11 +780,11 @@ static void bbr_check_drain(struct sock *sk, const struct rate_sample *rs) if (bbr->mode == BBR_STARTUP && bbr_full_bw_reached(sk)) { bbr->mode = BBR_DRAIN; /* drain queue we created */ tcp_sk(sk)->snd_ssthresh = - bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT); + bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT); } /* fall through to check if in-flight is already small: */ if (bbr->mode == BBR_DRAIN && bbr_packets_in_net_at_edt(sk, tcp_packets_in_flight(tcp_sk(sk))) <= - bbr_target_cwnd(sk, bbr_max_bw(sk), BBR_UNIT)) + bbr_inflight(sk, bbr_max_bw(sk), BBR_UNIT)) bbr_reset_probe_bw_mode(sk); /* we estimate queue is drained */ }