From patchwork Wed Jan 16 23:05:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 1026258 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="TGrc8Yit"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43g2rJ1RWKz9sBQ for ; Thu, 17 Jan 2019 10:06:08 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387995AbfAPXGG (ORCPT ); Wed, 16 Jan 2019 18:06:06 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:34046 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387990AbfAPXGF (ORCPT ); Wed, 16 Jan 2019 18:06:05 -0500 Received: by mail-pf1-f195.google.com with SMTP id h3so3810659pfg.1 for ; Wed, 16 Jan 2019 15:06:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=49+k6zS9/oEVA5XTNptxH+3XfGiaGGO7VAuh9+r22wI=; b=TGrc8Yit8UvG4MYaiBPAu8RguXd4g7/fidBiSEaqXxeoiCWnXr4+/AsiiNkssLSCD5 tLU2qCh4wpK8xskXC1d2YQ/CJDXcTN4mTpeUl5vPIo3zYgqqwtw1UNS5TybcF/OiUy6E R7rxtg0u3kMwnvIVlD1/UgkBAOvCwtfz+x8kl9UnynAbHiG/zLx4x1tpM1W4eN4vriYV GZ7dvO2inIgfO6OQKPYYblkgyaCb9OS837j2hMEvZdT8Jt8rsEBUxDQwAigNHfgyqdAu Exe9b2W/164AgWtNvOMYcR3gh5fkLTU2g8ao8al3C9JB7JxhdZeUKFlVLDBjpl0TV+Ti LJEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=49+k6zS9/oEVA5XTNptxH+3XfGiaGGO7VAuh9+r22wI=; b=DXgcpEgc8AK2i8nK7cg+ykfp1ogofzxgSLnSp9W2ZtO6gcSBHGirEfy1y2EjEfpGNI wzjooLrg56Aj8w8/Ux22wsRDzPLzNxRRooQ3XbyolXPuuBbarc+LuT3TRbyDIPUGVbi+ wZT3eb8AoCkyKTR2GnXD3iBEsCqi/etZh4wWKxjklRxaDidHqrCaBxGTW45NtRMI6MeZ od7KANKeQtl4DGloOUba/5vDe5BHG3u3EWmxDleL1idRq9Ut6ZcCuGELd4vPfhCzjlNR Xq+MA8Yl9E8lckm1SyUqVzqD3iincUu5um3V7Qwu36BfpiVFwAzYOvZSsNymR+80H7z3 kiTQ== X-Gm-Message-State: AJcUukfakHKUI2A4A/3qDspQN+Vaz+K6ngA3LsTvKllBunuZu3YJI0oF ZUbaJBzHXmaC9SAdjyIOlcLhdg== X-Google-Smtp-Source: ALg8bN5Wdqy1/LdnmRMOzcv6nfecuK7pYrwjF/H0yX7dSBc1MVIoJm/Cs4EpMnOl0bM4E4XzhqAAqg== X-Received: by 2002:a62:6ec8:: with SMTP id j191mr12334564pfc.198.1547679962961; Wed, 16 Jan 2019 15:06:02 -0800 (PST) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id k186sm8481087pge.13.2019.01.16.15.06.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 15:06:01 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net, edumazet@google.com Cc: netdev@vger.kernel.org, ncardwell@google.com, soheil@google.com, Yuchung Cheng Subject: [PATCH net-next 8/8] tcp: less aggressive window probing on local congestion Date: Wed, 16 Jan 2019 15:05:35 -0800 Message-Id: <20190116230535.162758-9-ycheng@google.com> X-Mailer: git-send-email 2.20.1.97.g81188d93c3-goog In-Reply-To: <20190116230535.162758-1-ycheng@google.com> References: <20190116230535.162758-1-ycheng@google.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Previously when the sender fails to send (original) data packet or window probes due to congestion in the local host (e.g. throttling in qdisc), it'll retry within an RTO or two up to 500ms. In low-RTT networks such as data-centers, RTO is often far below the default minimum 200ms. Then local host congestion could trigger a retry storm pouring gas to the fire. Worse yet, the probe counter (icsk_probes_out) is not properly updated so the aggressive retry may exceed the system limit (15 rounds) until the packet finally slips through. On such rare events, it's wise to retry more conservatively (500ms) and update the stats properly to reflect these incidents and follow the system limit. Note that this is consistent with the behaviors when a keep-alive probe or RTO retry is dropped due to local congestion. Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell Reviewed-by: Soheil Hassas Yeganeh --- net/ipv4/tcp_output.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d2d494c74811..6527f61f59ff 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3749,7 +3749,7 @@ void tcp_send_probe0(struct sock *sk) struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); - unsigned long probe_max; + unsigned long timeout; int err; err = tcp_write_wakeup(sk, LINUX_MIB_TCPWINPROBE); @@ -3761,26 +3761,18 @@ void tcp_send_probe0(struct sock *sk) return; } + icsk->icsk_probes_out++; if (err <= 0) { if (icsk->icsk_backoff < net->ipv4.sysctl_tcp_retries2) icsk->icsk_backoff++; - icsk->icsk_probes_out++; - probe_max = TCP_RTO_MAX; + timeout = tcp_probe0_when(sk, TCP_RTO_MAX); } else { /* If packet was not sent due to local congestion, - * do not backoff and do not remember icsk_probes_out. - * Let local senders to fight for local resources. - * - * Use accumulated backoff yet. + * Let senders fight for local resources conservatively. */ - if (!icsk->icsk_probes_out) - icsk->icsk_probes_out = 1; - probe_max = TCP_RESOURCE_PROBE_INTERVAL; - } - tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, - tcp_probe0_when(sk, probe_max), - TCP_RTO_MAX, - NULL); + timeout = TCP_RESOURCE_PROBE_INTERVAL; + } + tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, timeout, TCP_RTO_MAX, NULL); } int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)