From patchwork Tue Jul 21 06:15:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 1332816 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.jp Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=amazon.co.jp header.i=@amazon.co.jp header.a=rsa-sha256 header.s=amazon201209 header.b=dPJek6+0; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9pJD5DKWz9sPB for ; Tue, 21 Jul 2020 16:16:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726971AbgGUGQL (ORCPT ); Tue, 21 Jul 2020 02:16:11 -0400 Received: from smtp-fw-9101.amazon.com ([207.171.184.25]:38701 "EHLO smtp-fw-9101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725294AbgGUGQK (ORCPT ); Tue, 21 Jul 2020 02:16:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1595312170; x=1626848170; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=4ax9dUz/9KARBIsD3Uui1dbm+83QHlDTiJb2No+LukU=; b=dPJek6+0/f3uP1IWWN/gJLvm1FBQvZZ6XBO/TpXIG+q3rnlEpQXvVzH6 1fL1A/QICnsZ6PyLebGXrAfxiulPvp7SvBBJybRC1YEDi2rvSiNMxNwKJ 5mC2nfVHfxqdkZBrHqI9vSfe4wxfPmKUYD1XKBsP2GryVKi0vUBjzEWyu I=; IronPort-SDR: Nb7ldkCZZBcJ1F1O2zXV4otnUM5dx7efLnua0UVTvbPAx2i4oxMfzNFyq4Uw/JYXMU7Srx+njT vpdbfVTudMaA== X-IronPort-AV: E=Sophos;i="5.75,377,1589241600"; d="scan'208";a="53266692" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2a-f14f4a47.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 21 Jul 2020 06:16:09 +0000 Received: from EX13MTAUWA001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-f14f4a47.us-west-2.amazon.com (Postfix) with ESMTPS id 27F36A409E; Tue, 21 Jul 2020 06:16:09 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWA001.ant.amazon.com (10.43.160.118) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 21 Jul 2020 06:16:08 +0000 Received: from 38f9d3582de7.ant.amazon.com.com (10.43.161.34) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 21 Jul 2020 06:16:04 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , "Jakub Kicinski" CC: Willem de Bruijn , Eric Dumazet , Craig Gallek , Paolo Abeni , , Kuniyuki Iwashima , "Kuniyuki Iwashima" , Benjamin Herrenschmidt , Subject: [PATCH net 1/2] udp: Copy has_conns in reuseport_grow(). Date: Tue, 21 Jul 2020 15:15:30 +0900 Message-ID: <20200721061531.94236-2-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20200721061531.94236-1-kuniyu@amazon.co.jp> References: <20200721061531.94236-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.34] X-ClientProxiedBy: EX13D46UWB001.ant.amazon.com (10.43.161.16) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If an unconnected socket in a UDP reuseport group connect()s, has_conns is set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all sockets in udp_hslot looking for the connected socket with the highest score. However, when the number of sockets bound to the port exceeds max_socks, reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2() to return without scanning all sockets, resulting in that packets sent to connected sockets may be distributed to unconnected sockets. Therefore, reuseport_grow() should copy has_conns. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima Acked-by: Willem de Bruijn --- net/core/sock_reuseport.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c index adcb3aea576d..bbdd3c7b6cb5 100644 --- a/net/core/sock_reuseport.c +++ b/net/core/sock_reuseport.c @@ -101,6 +101,7 @@ static struct sock_reuseport *reuseport_grow(struct sock_reuseport *reuse) more_reuse->prog = reuse->prog; more_reuse->reuseport_id = reuse->reuseport_id; more_reuse->bind_inany = reuse->bind_inany; + more_reuse->has_conns = reuse->has_conns; memcpy(more_reuse->socks, reuse->socks, reuse->num_socks * sizeof(struct sock *)); From patchwork Tue Jul 21 06:15:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kuniyuki Iwashima X-Patchwork-Id: 1332817 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.co.jp Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=amazon.co.jp header.i=@amazon.co.jp header.a=rsa-sha256 header.s=amazon201209 header.b=HzfhqT2g; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B9pKJ2bvyz9sPB for ; Tue, 21 Jul 2020 16:17:08 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727888AbgGUGRH (ORCPT ); Tue, 21 Jul 2020 02:17:07 -0400 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:32977 "EHLO smtp-fw-6001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725294AbgGUGRH (ORCPT ); Tue, 21 Jul 2020 02:17:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1595312227; x=1626848227; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=dPOvzF75Blw11Z02fvCg4wC2SJ4zlF6bd1nLEV9u+w0=; b=HzfhqT2g6U+/XANs4VMngHFhEFH2CXJiF/Y6E9flvtV6iHP52BkAJxAD AlWQHrfnsyfjKc0ueLBGPmdPOI6nIrhxjAuCuAHRcBPVHqTf7Xqqdoc2t iTYLwSJneflD3cSY5ln45f2zGCDGAktyb/0SucaagiPOOZbc8V8BMUmsG 0=; IronPort-SDR: GmcTD5UdQ3a1NbjjtaMDpcl3Yr4sUs+6AUgUKcJCcwN9BP/ibxF99eAGFVwcRgAqRGE4AeCgEX 7RntnJD1d9Mw== X-IronPort-AV: E=Sophos;i="5.75,377,1589241600"; d="scan'208";a="44557928" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-60ce1996.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 21 Jul 2020 06:17:05 +0000 Received: from EX13MTAUWA001.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-60ce1996.us-west-2.amazon.com (Postfix) with ESMTPS id BA77AA76CE; Tue, 21 Jul 2020 06:17:03 +0000 (UTC) Received: from EX13D04ANC001.ant.amazon.com (10.43.157.89) by EX13MTAUWA001.ant.amazon.com (10.43.160.118) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 21 Jul 2020 06:17:03 +0000 Received: from 38f9d3582de7.ant.amazon.com.com (10.43.161.34) by EX13D04ANC001.ant.amazon.com (10.43.157.89) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 21 Jul 2020 06:16:58 +0000 From: Kuniyuki Iwashima To: "David S . Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , "Jakub Kicinski" CC: Willem de Bruijn , Eric Dumazet , Craig Gallek , Paolo Abeni , , Kuniyuki Iwashima , "Kuniyuki Iwashima" , Benjamin Herrenschmidt , Subject: [PATCH net 2/2] udp: Improve load balancing for SO_REUSEPORT. Date: Tue, 21 Jul 2020 15:15:31 +0900 Message-ID: <20200721061531.94236-3-kuniyu@amazon.co.jp> X-Mailer: git-send-email 2.17.2 (Apple Git-113) In-Reply-To: <20200721061531.94236-1-kuniyu@amazon.co.jp> References: <20200721061531.94236-1-kuniyu@amazon.co.jp> MIME-Version: 1.0 X-Originating-IP: [10.43.161.34] X-ClientProxiedBy: EX13D46UWB001.ant.amazon.com (10.43.161.16) To EX13D04ANC001.ant.amazon.com (10.43.157.89) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn Reviewed-by: Benjamin Herrenschmidt Signed-off-by: Kuniyuki Iwashima Acked-by: Willem de Bruijn --- net/ipv4/udp.c | 15 +++++++++------ net/ipv6/udp.c | 15 +++++++++------ 2 files changed, 18 insertions(+), 12 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 1b7ebbcae497..99251d3c70d0 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -416,7 +416,7 @@ static struct sock *udp4_lib_lookup2(struct net *net, struct udp_hslot *hslot2, struct sk_buff *skb) { - struct sock *sk, *result; + struct sock *sk, *result, *reuseport_result; int score, badness; u32 hash = 0; @@ -426,17 +426,20 @@ static struct sock *udp4_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { + reuseport_result = NULL; + if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { hash = udp_ehashfn(net, daddr, hnum, saddr, sport); - result = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - if (result && !reuseport_has_conns(sk, false)) - return result; + reuseport_result = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); + if (reuseport_result && !reuseport_has_conns(sk, false)) + return reuseport_result; } + + result = reuseport_result ? : sk; badness = score; - result = sk; } } return result; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 7d4151747340..9503c87ac0b3 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -148,7 +148,7 @@ static struct sock *udp6_lib_lookup2(struct net *net, int dif, int sdif, struct udp_hslot *hslot2, struct sk_buff *skb) { - struct sock *sk, *result; + struct sock *sk, *result, *reuseport_result; int score, badness; u32 hash = 0; @@ -158,17 +158,20 @@ static struct sock *udp6_lib_lookup2(struct net *net, score = compute_score(sk, net, saddr, sport, daddr, hnum, dif, sdif); if (score > badness) { + reuseport_result = NULL; + if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) { hash = udp6_ehashfn(net, daddr, hnum, saddr, sport); - result = reuseport_select_sock(sk, hash, skb, - sizeof(struct udphdr)); - if (result && !reuseport_has_conns(sk, false)) - return result; + reuseport_result = reuseport_select_sock(sk, hash, skb, + sizeof(struct udphdr)); + if (reuseport_result && !reuseport_has_conns(sk, false)) + return reuseport_result; } - result = sk; + + result = reuseport_result ? : sk; badness = score; } }