From patchwork Tue Oct 24 17:32:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 1854581 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SFK0b6KYGz23jl for ; Wed, 25 Oct 2023 04:33:35 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qvLHS-0000z6-CT; Tue, 24 Oct 2023 17:33:26 +0000 Received: from smtp-relay-internal-0.internal ([10.131.114.225] helo=smtp-relay-internal-0.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qvLGp-0000oO-Hp for kernel-team@lists.ubuntu.com; Tue, 24 Oct 2023 17:32:48 +0000 Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id 295AF3F12E for ; Tue, 24 Oct 2023 17:32:47 +0000 (UTC) Received: by mail-pl1-f198.google.com with SMTP id d9443c01a7336-1c9bc9e6a89so36646405ad.0 for ; Tue, 24 Oct 2023 10:32:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698168765; x=1698773565; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WgUqlBhgJC43TY5LafT0biCcW5/Dw4wTJtEUgviWmrk=; b=Rw+qQRQQl2H7HV9EbZZS2s67k2UBG0vVKTIERQy2Xyr+cJcmPqyowBVrVQbCCfEqGU Uz5gYntA7Hsa/tDyaOqcJpFmpzKWpuFHYbmoyRTWknVxV6FeSvp9q54JMGi2t2Sjf1QX 0/FZXE7YMskuu7KGHqnNfUo4GV3qx6yKzqK/rqFFxPCX/7fsdOaYgiCQEXwkRIjpulQN Ho6OELDER95H+PtHUYzETzudzOg3/TV/Ydtc4pdA9AqmPuG2Shmytt7Hi3a3R3Upe4zi q8mxXu6V7k1QO1b4geXogIobrg6tBuoKsx6qRn3kdBrDYs0vUd6SkX9MMdOpjmJ/Z/D/ JfSw== X-Gm-Message-State: AOJu0YzAYllbqE91FvOqj0tOhnj8dggzMop0cXGCr1SLBxWzTl91V6Fm ESmd8F/VONIVIrtNVAcaKhGJcxAWT/RUALyOe9RT8ExNRYMCrnhD6OpQGm+XhFggK+hbI/Jsy7f IBYd7izte4yJvMEhx0rf3JzqsAWTDHskKYylrz/8jpZEGj/srpQ== X-Received: by 2002:a17:903:24c:b0:1c9:e765:e14a with SMTP id j12-20020a170903024c00b001c9e765e14amr12291308plh.1.1698168765349; Tue, 24 Oct 2023 10:32:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFBpLNyroDfiloIrVsaoG1tc87sK0P3Ui8m2rl/ehGcwdPznjXfpJ88wPvHM8GTlLS7NC5zgQ== X-Received: by 2002:a17:903:24c:b0:1c9:e765:e14a with SMTP id j12-20020a170903024c00b001c9e765e14amr12291293plh.1.1698168765034; Tue, 24 Oct 2023 10:32:45 -0700 (PDT) Received: from smtp.gmail.com (174-045-099-030.res.spectrum.com. [174.45.99.30]) by smtp.gmail.com with ESMTPSA id x4-20020a170902ea8400b001c723d6c410sm7682355plb.16.2023.10.24.10.32.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 10:32:43 -0700 (PDT) From: Tim Gardner To: kernel-team@lists.ubuntu.com Subject: [PATCH][lunar linux-azure] tcp: Set pingpong threshold via sysctl Date: Tue, 24 Oct 2023 11:32:38 -0600 Message-Id: <20231024173239.13141-3-tim.gardner@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231024173239.13141-1-tim.gardner@canonical.com> References: <20231024173239.13141-1-tim.gardner@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/2040300 TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance. The pingpong threshold and related code were changed to 3 in the year 2019 in: commit 4a41f453bedf ("tcp: change pingpong threshold to 3") And reverted to 1 in the year 2022 in: commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") There is no single value that fits all applications. Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for optimal performance based on the application needs. Signed-off-by: Haiyang Zhang Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Acked-by: Neal Cardwell Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski (backported from commit 562b1fdf061bff9394ccd884456ed1173c224fdc linux-next) [rtg - context adjustments] Signed-off-by: Tim Gardner --- Documentation/networking/ip-sysctl.rst | 13 +++++++++++++ include/net/inet_connection_sock.h | 16 ++++++++++++---- include/net/netns/ipv4.h | 2 ++ net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ net/ipv4/tcp_ipv4.c | 2 ++ net/ipv4/tcp_output.c | 4 ++-- 6 files changed, 39 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index a69581725a1d..288ba84c40cc 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -1146,6 +1146,19 @@ tcp_plb_cong_thresh - INTEGER Default: 128 +tcp_pingpong_thresh - INTEGER + The number of estimated data replies sent for estimated incoming data + requests that must happen before TCP considers that a connection is a + "ping-pong" (request-response) connection for which delayed + acknowledgments can provide benefits. + + This threshold is 1 by default, but some applications may need a higher + threshold for optimal performance. + + Possible Values: 1 - 255 + + Default: 1 + UDP variables ============= diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index c2b15f7e5516..b3e7eeb4cdfa 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -324,11 +324,10 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); -#define TCP_PINGPONG_THRESH 1 - static inline void inet_csk_enter_pingpong_mode(struct sock *sk) { - inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH; + inet_csk(sk)->icsk_ack.pingpong = + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); } static inline void inet_csk_exit_pingpong_mode(struct sock *sk) @@ -338,7 +337,16 @@ static inline void inet_csk_exit_pingpong_mode(struct sock *sk) static inline bool inet_csk_in_pingpong_mode(struct sock *sk) { - return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; + return inet_csk(sk)->icsk_ack.pingpong >= + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); +} + +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ack.pingpong < U8_MAX) + icsk->icsk_ack.pingpong++; } static inline bool inet_csk_has_ulp(struct sock *sk) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index db762e35aca9..b5f4b52f8780 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -131,6 +131,8 @@ struct netns_ipv4 { u8 sysctl_tcp_syncookies; u8 sysctl_tcp_migrate_req; u8 sysctl_tcp_comp_sack_nr; + u8 sysctl_tcp_pingpong_thresh; + int sysctl_tcp_reordering; u8 sysctl_tcp_retries1; u8 sysctl_tcp_retries2; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 88dfe51e68f3..478ac0ec6b26 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1470,6 +1470,14 @@ static struct ctl_table ipv4_net_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = &tcp_plb_max_cong_thresh, }, + { + .procname = "tcp_pingpong_thresh", + .data = &init_net.ipv4.sysctl_tcp_pingpong_thresh, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + .extra1 = SYSCTL_ONE, + }, { } }; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 8396cc37f8e2..84a42ef8d355 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3280,6 +3280,8 @@ static int __net_init tcp_sk_init(struct net *net) else net->ipv4.tcp_congestion_control = &tcp_reno; + net->ipv4.sysctl_tcp_pingpong_thresh = 1; + return 0; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ba550a216c9f..251330a9a269 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp, tp->lsndtime = now; /* If it is a reply for ato after last received - * packet, enter pingpong mode. + * packet, increase pingpong count. */ if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) - inet_csk_enter_pingpong_mode(sk); + inet_csk_inc_pingpong_cnt(sk); } /* Account for an ACK we sent. */