From patchwork Tue Oct 24 17:32:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 1854582 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=185.125.189.65; helo=lists.ubuntu.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=patchwork.ozlabs.org) Received: from lists.ubuntu.com (lists.ubuntu.com [185.125.189.65]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SFK0f4Lttz23jl for ; Wed, 25 Oct 2023 04:33:38 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=lists.ubuntu.com) by lists.ubuntu.com with esmtp (Exim 4.86_2) (envelope-from ) id 1qvLHW-00013p-5Q; Tue, 24 Oct 2023 17:33:30 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by lists.ubuntu.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1qvLGr-0000oV-Ay for kernel-team@lists.ubuntu.com; Tue, 24 Oct 2023 17:32:51 +0000 Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id C3CC23F0D2 for ; Tue, 24 Oct 2023 17:32:48 +0000 (UTC) Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-1cbe08af374so13825285ad.3 for ; Tue, 24 Oct 2023 10:32:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698168767; x=1698773567; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=midGo/Fz60fcnrIRDVcRJrUXpJk7pbKR2qQFF4jihVk=; b=RxNUjY522INwmM84WMBhiUsPKeRVcqVOrr6cB5EXHazvVOGHGjDHTz0b+6N8X8vvRz DJMdLnX+WQI3qPWjqHPfqMDP4SxEh2kKpiuRu39XD7HHnGb6m2Jx03J/r7uPNJIjzQE+ 72XnmJ/Lf7ORkDUzcVGTMMoJCbiyiGHLe6nz3o+Azg96PScZGHdpgXtUUfi/OSEgKMxK XRtNC5Y0pF4rzJZW/jBbpKd6wQ1BhzO2FFS60C/0ER+tSj0UdV3USL3bSovqcpYYIxYR NSksG2kwCencq03PYN5jDieU9LJIOurw6UR0eHxeA6395KvsLDVbxyOadcidedIeybwh i74A== X-Gm-Message-State: AOJu0Yyk4Kd4Rmwmj1OyPLTRmpO13uWB8yu/LbuO5XY8HKTbEddQ0hW9 UexHoZ7y55zpkistZqvZqefy4NTvUejklrJ5zHUHwjRtpSKMrFG4yQbJnd9a0ALVd1dfrKNSbTN e0/coJf+o0YtbEzrYGlPcRe+CdRU+s94CM5AfXN54K98Uu10Lkg== X-Received: by 2002:a17:903:2093:b0:1c9:d111:9b28 with SMTP id d19-20020a170903209300b001c9d1119b28mr8864443plc.49.1698168766986; Tue, 24 Oct 2023 10:32:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEMuklVmWTrS/TgG2/IaiP6K8Lu7nrgZgVkG+pJSvxfUcBjeqY1RhsQAQzOOyYqQI38SFY/NQ== X-Received: by 2002:a17:903:2093:b0:1c9:d111:9b28 with SMTP id d19-20020a170903209300b001c9d1119b28mr8864431plc.49.1698168766659; Tue, 24 Oct 2023 10:32:46 -0700 (PDT) Received: from smtp.gmail.com (174-045-099-030.res.spectrum.com. [174.45.99.30]) by smtp.gmail.com with ESMTPSA id x4-20020a170902ea8400b001c723d6c410sm7682355plb.16.2023.10.24.10.32.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Oct 2023 10:32:45 -0700 (PDT) From: Tim Gardner To: kernel-team@lists.ubuntu.com Subject: [PATCH][jammy linux-azure] tcp: Set pingpong threshold via sysctl Date: Tue, 24 Oct 2023 11:32:39 -0600 Message-Id: <20231024173239.13141-4-tim.gardner@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231024173239.13141-1-tim.gardner@canonical.com> References: <20231024173239.13141-1-tim.gardner@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Haiyang Zhang BugLink: https://bugs.launchpad.net/bugs/2040300 TCP pingpong threshold is 1 by default. But some applications, like SQL DB may prefer a higher pingpong threshold to activate delayed acks in quick ack mode for better performance. The pingpong threshold and related code were changed to 3 in the year 2019 in: commit 4a41f453bedf ("tcp: change pingpong threshold to 3") And reverted to 1 in the year 2022 in: commit 4d8f24eeedc5 ("Revert "tcp: change pingpong threshold to 3"") There is no single value that fits all applications. Add net.ipv4.tcp_pingpong_thresh sysctl tunable, so it can be tuned for optimal performance based on the application needs. Signed-off-by: Haiyang Zhang Reviewed-by: Simon Horman Reviewed-by: Eric Dumazet Acked-by: Neal Cardwell Reviewed-by: Kuniyuki Iwashima Link: https://lore.kernel.org/r/1697056244-21888-1-git-send-email-haiyangz@microsoft.com Signed-off-by: Jakub Kicinski (backported from commit 562b1fdf061bff9394ccd884456ed1173c224fdc linux-next) [rtg - context adjustments] Signed-off-by: Tim Gardner --- Documentation/networking/ip-sysctl.rst | 13 +++++++++++++ include/net/inet_connection_sock.h | 16 ++++++++++++---- include/net/netns/ipv4.h | 2 ++ net/ipv4/sysctl_net_ipv4.c | 8 ++++++++ net/ipv4/tcp_ipv4.c | 2 ++ net/ipv4/tcp_output.c | 4 ++-- 6 files changed, 39 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 7890b395e629..bf5e9e1bcb4e 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -999,6 +999,19 @@ tcp_rx_skb_cache - BOOLEAN Default: 0 (disabled) +tcp_pingpong_thresh - INTEGER + The number of estimated data replies sent for estimated incoming data + requests that must happen before TCP considers that a connection is a + "ping-pong" (request-response) connection for which delayed + acknowledgments can provide benefits. + + This threshold is 1 by default, but some applications may need a higher + threshold for optimal performance. + + Possible Values: 1 - 255 + + Default: 1 + UDP variables ============= diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 695ed45841f0..bfe479fed38f 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -315,11 +315,10 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb, struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu); -#define TCP_PINGPONG_THRESH 1 - static inline void inet_csk_enter_pingpong_mode(struct sock *sk) { - inet_csk(sk)->icsk_ack.pingpong = TCP_PINGPONG_THRESH; + inet_csk(sk)->icsk_ack.pingpong = + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); } static inline void inet_csk_exit_pingpong_mode(struct sock *sk) @@ -329,7 +328,16 @@ static inline void inet_csk_exit_pingpong_mode(struct sock *sk) static inline bool inet_csk_in_pingpong_mode(struct sock *sk) { - return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH; + return inet_csk(sk)->icsk_ack.pingpong >= + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_pingpong_thresh); +} + +static inline void inet_csk_inc_pingpong_cnt(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_ack.pingpong < U8_MAX) + icsk->icsk_ack.pingpong++; } static inline bool inet_csk_has_ulp(struct sock *sk) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index d60a10cfc382..bd5d6dc2d2bf 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -126,6 +126,8 @@ struct netns_ipv4 { u8 sysctl_tcp_synack_retries; u8 sysctl_tcp_syncookies; u8 sysctl_tcp_migrate_req; + u8 sysctl_tcp_pingpong_thresh; + int sysctl_tcp_reordering; u8 sysctl_tcp_retries1; u8 sysctl_tcp_retries2; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 1f22e72074fd..00ee27caf387 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -1362,6 +1362,14 @@ static struct ctl_table ipv4_net_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = &two, }, + { + .procname = "tcp_pingpong_thresh", + .data = &init_net.ipv4.sysctl_tcp_pingpong_thresh, + .maxlen = sizeof(u8), + .mode = 0644, + .proc_handler = proc_dou8vec_minmax, + .extra1 = SYSCTL_ONE, + }, { } }; diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index b5cb674eca1c..23efe57c626e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -3213,6 +3213,8 @@ static int __net_init tcp_sk_init(struct net *net) else net->ipv4.tcp_congestion_control = &tcp_reno; + net->ipv4.sysctl_tcp_pingpong_thresh = 1; + return 0; } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index d46fb6d7057b..91a1e525b798 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -170,10 +170,10 @@ static void tcp_event_data_sent(struct tcp_sock *tp, tp->lsndtime = now; /* If it is a reply for ato after last received - * packet, enter pingpong mode. + * packet, increase pingpong count. */ if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) - inet_csk_enter_pingpong_mode(sk); + inet_csk_inc_pingpong_cnt(sk); } /* Account for an ACK we sent. */