From patchwork Wed Jan 16 23:05:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 1026250 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="Ahi3P+X5"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43g2r13HLvz9sBQ for ; Thu, 17 Jan 2019 10:05:53 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387903AbfAPXFv (ORCPT ); Wed, 16 Jan 2019 18:05:51 -0500 Received: from mail-pg1-f169.google.com ([209.85.215.169]:36291 "EHLO mail-pg1-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbfAPXFu (ORCPT ); Wed, 16 Jan 2019 18:05:50 -0500 Received: by mail-pg1-f169.google.com with SMTP id n2so3500468pgm.3 for ; Wed, 16 Jan 2019 15:05:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=DMRCCEYRcnjJIRDt/GEGFal+LiRQ/UDT7xqx2o1gL+Y=; b=Ahi3P+X5/GCf+YnLuAMoNS9mxZFr9Movja0lBL1QsUJ1skeuzSuivBAB+nCzeFS6jE dOs5j+hMcwEYdqescswKLAjU02O+psFHcXwwt/Is5QCv4kfKhqcumLCU/o5epfwkrxEX qDjhya0CdqUCC7DYu1xxSHYSmDkDcPMNsWi3ORdy1uVWs+SWQHvG5Fuvgs78Cd39AwPS aMO9j2Sds8+3AaXxWNUAgv4kl+GyPnSHbqanqNjynQAnPdmTE5H9A4AgT6qP7w4llPZS OJMhupAWnzQoZr2D6jvbLyOGnq+AUTKWB8enkGd8VrBeuMxwQvu5neeP7I1bwEM9D/mP ribw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=DMRCCEYRcnjJIRDt/GEGFal+LiRQ/UDT7xqx2o1gL+Y=; b=EFXfKe1f36XnPbL2119rAPnUkgCfmwbRmwlWm7lRv/sP8VaU2jepgngSdEYQta+tt8 y4jStM9bv/qOkUw8os1oPJX8w58oXo1YwZRQ06d8xnH885CpzgYuP+xrICiIqo9HDGD+ ZP12Hj7duqyq3BzmjWVMl3PU+nvsTvNjCBoMVGrjyX3LEGPyYloVoGL5tclZeFzvdcw4 bZTt3iIbNrDCqIj3N/1GGjOL2rvqRg8Zd/xeUpNUA108CnbEQYdrRXWvM4MhkuyqCtyk Hft+atpUIgp6HmgWQZ0K85uBz0ncu8w2M1HSuqfIhsWK2asKtQ6vHSxD3Iexc0X5wYkU 9rww== X-Gm-Message-State: AJcUuke/70PpSdCRzYCJnnuCKbooJnHPEg70lxz+lSLuM3vzFNZB58oP 41l2Ck6q9arVLS4VUIVVBFHXfA== X-Google-Smtp-Source: ALg8bN7pPe6ZKjUblyc60BOMQLe5HUaJ21ZsWnYspJW/hvnuVh0afwD78w3LJXLPgWdbd4UdM9OyoQ== X-Received: by 2002:a62:4c5:: with SMTP id 188mr12474200pfe.130.1547679948755; Wed, 16 Jan 2019 15:05:48 -0800 (PST) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id k186sm8481087pge.13.2019.01.16.15.05.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 Jan 2019 15:05:47 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net, edumazet@google.com Cc: netdev@vger.kernel.org, ncardwell@google.com, soheil@google.com, Yuchung Cheng Subject: [PATCH net-next 0/8] improving TCP behavior on host congestion Date: Wed, 16 Jan 2019 15:05:27 -0800 Message-Id: <20190116230535.162758-1-ycheng@google.com> X-Mailer: git-send-email 2.20.1.97.g81188d93c3-goog MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch set aims to improve how TCP handle local qdisc congestion by simplifying the previous implementation. Previously when an skb fails to (re)transmit due to local qdisc congestion or other resource issue, TCP refrains from setting the skb timestamp or the recovery starting time. This design makes determining when to abort a stalling socket more complicated, as the timestamps of these tranmission attempts were missing. The stack needs to sort of infer when the original attempt happens. A by-product is a socket may disregard the system timeout limit (i.e. sysctl net.ipv4.tcp_retries2 or USER_TIMEOUT option), and continue to retry until the transmission is successful. In data-center environment when TCP RTO is small, this could cause the socket to retry frequently for long during qdisc congestion. The solution is to first unconditionally timestamp skb and recovery attempt. Then retry more conservatively (twice a second) on local qdisc congestion but abort the sockets according to the system limit. Yuchung Cheng (8): tcp: exit if nothing to retransmit on RTO timeout tcp: always timestamp on every skb transmission tcp: always set retrans_stamp on recovery tcp: properly track retry time on passive Fast Open tcp: create a helper to model exponential backoff tcp: simplify window probe aborting on USER_TIMEOUT tcp: retry more conservatively on local congestion tcp: less aggressive window probing on local congestion net/ipv4/tcp_output.c | 47 ++++++++++-------------- net/ipv4/tcp_timer.c | 83 ++++++++++++++++++------------------------- 2 files changed, 54 insertions(+), 76 deletions(-)