From patchwork Wed Nov 18 08:25:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 1402039 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CbbW32QSbz9sVC for ; Wed, 18 Nov 2020 19:26:23 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727331AbgKRIZS (ORCPT ); Wed, 18 Nov 2020 03:25:18 -0500 Received: from out30-44.freemail.mail.aliyun.com ([115.124.30.44]:60180 "EHLO out30-44.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726270AbgKRIZQ (ORCPT ); Wed, 18 Nov 2020 03:25:16 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R761e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0UFn3IGa_1605687911; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UFn3IGa_1605687911) by smtp.aliyun-inc.com(127.0.0.1); Wed, 18 Nov 2020 16:25:11 +0800 From: Xuan Zhuo To: bjorn.topel@intel.com Cc: Magnus Karlsson , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 1/3] xsk: replace datagram_poll by sock_poll_wait Date: Wed, 18 Nov 2020 16:25:08 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: <3306b4d8-8689-b0e7-3f6d-c3ad873b7093@intel.com> In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org datagram_poll will judge the current socket status (EPOLLIN, EPOLLOUT) based on the traditional socket information (eg: sk_wmem_alloc), but this does not apply to xsk. So this patch uses sock_poll_wait instead of datagram_poll, and the mask is calculated by xsk_poll. Signed-off-by: Xuan Zhuo --- net/xdp/xsk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index cfbec39..7f0353e 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -477,11 +477,13 @@ static int xsk_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) static __poll_t xsk_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait) { - __poll_t mask = datagram_poll(file, sock, wait); + __poll_t mask = 0; struct sock *sk = sock->sk; struct xdp_sock *xs = xdp_sk(sk); struct xsk_buff_pool *pool; + sock_poll_wait(file, sock, wait); + if (unlikely(!xsk_is_bound(xs))) return mask; From patchwork Wed Nov 18 08:25:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 1402040 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CbbW42pM6z9sVS for ; Wed, 18 Nov 2020 19:26:24 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727584AbgKRIZU (ORCPT ); Wed, 18 Nov 2020 03:25:20 -0500 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:40324 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726274AbgKRIZT (ORCPT ); Wed, 18 Nov 2020 03:25:19 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R441e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0UFnFPS6_1605687912; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UFnFPS6_1605687912) by smtp.aliyun-inc.com(127.0.0.1); Wed, 18 Nov 2020 16:25:12 +0800 From: Xuan Zhuo To: bjorn.topel@intel.com Cc: Magnus Karlsson , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 2/3] xsk: change the tx writeable condition Date: Wed, 18 Nov 2020 16:25:09 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: <3306b4d8-8689-b0e7-3f6d-c3ad873b7093@intel.com> In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Modify the tx writeable condition from the queue is not full to the number of remaining tx queues is less than the half of the total number of queues. Because the tx queue not full is a very short time, this will cause a large number of EPOLLOUT events, and cause a large number of process wake up. Signed-off-by: Xuan Zhuo --- net/xdp/xsk.c | 20 +++++++++++++++++--- net/xdp/xsk_queue.h | 6 ++++++ 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 7f0353e..bc3d4ece 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -211,6 +211,17 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len, return 0; } +static bool xsk_writeable(struct xdp_sock *xs) +{ + if (!xs->tx) + return false; + + if (xskq_cons_left(xs->tx) > xs->tx->nentries / 2) + return false; + + return true; +} + static bool xsk_is_bound(struct xdp_sock *xs) { if (READ_ONCE(xs->state) == XSK_BOUND) { @@ -296,7 +307,8 @@ void xsk_tx_release(struct xsk_buff_pool *pool) rcu_read_lock(); list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { __xskq_cons_release(xs->tx); - xs->sk.sk_write_space(&xs->sk); + if (xsk_writeable(xs)) + xs->sk.sk_write_space(&xs->sk); } rcu_read_unlock(); } @@ -442,7 +454,8 @@ static int xsk_generic_xmit(struct sock *sk) out: if (sent_frame) - sk->sk_write_space(sk); + if (xsk_writeable(xs)) + sk->sk_write_space(sk); mutex_unlock(&xs->mutex); return err; @@ -499,7 +512,8 @@ static __poll_t xsk_poll(struct file *file, struct socket *sock, if (xs->rx && !xskq_prod_is_empty(xs->rx)) mask |= EPOLLIN | EPOLLRDNORM; - if (xs->tx && !xskq_cons_is_full(xs->tx)) + + if (xsk_writeable(xs)) mask |= EPOLLOUT | EPOLLWRNORM; return mask; diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h index cdb9cf3..82a5228 100644 --- a/net/xdp/xsk_queue.h +++ b/net/xdp/xsk_queue.h @@ -264,6 +264,12 @@ static inline bool xskq_cons_is_full(struct xsk_queue *q) q->nentries; } +static inline __u64 xskq_cons_left(struct xsk_queue *q) +{ + /* No barriers needed since data is not accessed */ + return READ_ONCE(q->ring->producer) - READ_ONCE(q->ring->consumer); +} + /* Functions for producers */ static inline bool xskq_prod_is_full(struct xsk_queue *q) From patchwork Wed Nov 18 08:25:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xuan Zhuo X-Patchwork-Id: 1402038 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4CbbW22ynpz9sT6 for ; Wed, 18 Nov 2020 19:26:22 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727527AbgKRIZS (ORCPT ); Wed, 18 Nov 2020 03:25:18 -0500 Received: from out30-133.freemail.mail.aliyun.com ([115.124.30.133]:54216 "EHLO out30-133.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726296AbgKRIZR (ORCPT ); Wed, 18 Nov 2020 03:25:17 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=12;SR=0;TI=SMTPD_---0UFn7TNs_1605687913; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UFn7TNs_1605687913) by smtp.aliyun-inc.com(127.0.0.1); Wed, 18 Nov 2020 16:25:13 +0800 From: Xuan Zhuo To: bjorn.topel@intel.com Cc: Magnus Karlsson , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 3/3] xsk: set tx/rx the min entries Date: Wed, 18 Nov 2020 16:25:10 +0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: <3306b4d8-8689-b0e7-3f6d-c3ad873b7093@intel.com> In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We expect tx entries to be greater than twice the number of packets that the network card can send at a time, so that when the remaining number of the tx queue is less than half of the queue, it can be guaranteed that there are recycled items in the cq that can be used. At the same time, rx will not cause packet loss because it cannot receive the packets uploaded by the network card at one time. Of course, the 1024 here is only an estimated value, and the number of packets sent by each network card at a time may be different. Signed-off-by: Xuan Zhuo --- include/uapi/linux/if_xdp.h | 2 ++ net/xdp/xsk.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h index a78a809..d55ba79 100644 --- a/include/uapi/linux/if_xdp.h +++ b/include/uapi/linux/if_xdp.h @@ -64,6 +64,8 @@ struct xdp_mmap_offsets { #define XDP_STATISTICS 7 #define XDP_OPTIONS 8 +#define XDP_RXTX_RING_MIN_ENTRIES 1024 + struct xdp_umem_reg { __u64 addr; /* Start of packet data area */ __u64 len; /* Length of packet data area */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index bc3d4ece..e62c795 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -831,6 +831,8 @@ static int xsk_setsockopt(struct socket *sock, int level, int optname, return -EINVAL; if (copy_from_sockptr(&entries, optval, sizeof(entries))) return -EFAULT; + if (entries < XDP_RXTX_RING_MIN_ENTRIES) + return -EINVAL; mutex_lock(&xs->mutex); if (xs->state != XSK_READY) {