From patchwork Sat Jul 11 10:10:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yahui Chen X-Patchwork-Id: 1327264 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=uFwIBWtH; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4B3m5G1MV5z9sRR for ; Sat, 11 Jul 2020 20:15:46 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726794AbgGKKPm (ORCPT ); Sat, 11 Jul 2020 06:15:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726267AbgGKKPm (ORCPT ); Sat, 11 Jul 2020 06:15:42 -0400 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4AEFC08C5DD; Sat, 11 Jul 2020 03:15:41 -0700 (PDT) Received: by mail-pf1-x444.google.com with SMTP id t11so3645309pfq.11; Sat, 11 Jul 2020 03:15:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=x3C6PMVhuvXtoXBSzqymZGfXBgxlxLggJJneWvRkdNw=; b=uFwIBWtHcdT5LPhz8ZsV6qOtlEx4QP6RgZOTiWfnfCeG4UR46A+dBccmFHXfoPdA3U mp41jbiuuCBUjmC1SMSrIxsDWW3WVqk81qxgpU3CXvp3R1LCtFKCUzCXyPNjT3UuNFx5 9UmBfMFM9mLJewRmv308K5n9GxaYg+rE7DBNg1Thlo/l5rYcMFOngjXoZ8TveDervhkr 8NWHVrKQD+7mObD6CPSB9FVjGhk7wTdl4thdoFD5OMeI9EYNg0i8FkJFN4jVBgEyIqmP 2mCTOJwKaLzqbexhoVTFjTyWRX82tpKXzeGXdlgTgD/1Avjejre5s9x9MmrqE5s98E7P RwcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=x3C6PMVhuvXtoXBSzqymZGfXBgxlxLggJJneWvRkdNw=; b=oJYMlui+KsIKs1AaD2b9NIYenFwioufGK+53NflHBuJMnoI7b4/hY1pVyUmGAdjc7p 02VP4HBtWcW0fO/K9h4m9fGlzAy9d3kvz6wCX5v3eVUB3snh/cG8ksMj9pJ7C98IV4dx 1O8NZuQXba4LjItl9/YQFA0nihCzsvBk8tG4VyuuanE9b3W/QUtLqoiEuhYnWJ4uBFbm wEO37lvqADnvqavhQQSkaPu3kuBlfceliHtP/lflZz/PxBxbffKh2b78LsBe6doeWYsh yyyvPvumpt27yciVLhdf2HAcuybz6i7g9/7IxJVD6sMUYZzFtlNGMXc2SJTok11s/FR+ ZJ/A== X-Gm-Message-State: AOAM531LD53C168f6fA+V4kS0mSLYUVY1b3gotKDXLgI3l9jiPErzIbF Lliispt8nBmCoyzwSTmsgh4= X-Google-Smtp-Source: ABdhPJz6wxc7+Aw9vVBZ4PotiljrxEwcbKTMrWEo9sYGuff4Yy/ZcR4fD5p2MNExoxc9qUHAmdqX9g== X-Received: by 2002:a62:2b96:: with SMTP id r144mr66273549pfr.272.1594462541246; Sat, 11 Jul 2020 03:15:41 -0700 (PDT) Received: from devel-ng-exporter-249.localdomain ([45.32.120.100]) by smtp.gmail.com with ESMTPSA id x9sm7883575pgr.57.2020.07.11.03.15.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 11 Jul 2020 03:15:40 -0700 (PDT) From: Yahui Chen To: Jeff Kirsher , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Cc: steven.zou@intel.com, Yahui Chen Subject: [PATCH] xsk: ixgbe: solve the soft interrupt 100% CPU usage when xdp rx traffic congestion Date: Sat, 11 Jul 2020 18:10:38 +0800 Message-Id: <1594462239-19596-1-git-send-email-goodluckwillcomesoon@gmail.com> X-Mailer: git-send-email 1.8.3.1 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org If the user mode APP of drv-mode xsk does not process packets fast enough, the packet receiving speed is lower than the flow rate. For example, the flow rate is 1Mpps, and the APP processing speed is 10Kpss only, this will cause a series of problems: 1. If the APP doesn't use wakeup mechanism, in other world, the `XDP_UMEM_USES_NEED_WAKEUP` flag is not set. Then the CPU usage of the soft interrupt corresponding to the network card queue using xsk has always been 100%. The reason is that the APP is slow to receive packets, the packet desc can not return to the fill queue in time, resulting in `ixgbe_alloc_rx_buffers_zc` fail to obtain the desc, so `ixgbe_clean_rx_irq_zc` returns 64, which further causes its corresponding NAPI to be always running status, so the CPU has been busy performing soft interrupts. 2. If the wakeup mechanism is used, that is, use the `XDP_UMEM_USES_NEED_WAKEUP` flag. This method takes advantage of the interrupt delay function of ixgbe skillfully, thus solving the problem that the si CPU is always 100%. However, it will cause other problems. The port-level flow control will be triggered on 82599, and the pause frame will be sent to the upstream sender. This will affect the other packet receiving queues of the network card, resulting in the packet receiving rate of all queues dropping to 10Kpps. This situation can be understood as rx traffic congestion. The reason of congestion is that XSK queue cannot get DMA address in time, so packet cannot be delivered to the host memory which result in packets congestion happening on the hardware FIFO. If the number of packets exceeds the FIFO waterline, the network card will send pause frames. This patch design another way. When the queue corresponding to xsk is initializing, a piece of DMA memory, named congestion memory, is applied for traffic congestion. If failing to obtain the desc from the fill queue, point the xsk ring packet DMA address to the congestion memory. Let the network card hardware DMA the packet to the congetion memory, and the APP will not receive packet from the congestion memory. This way can solve the two problems mentioned above. However, the wakeup mechanism should also be retained. When the network card has only one queue, the wakeup mechanism will have advantages. TODO: Apply this modification to all drivers that support xsk drv mode Signed-off-by: Yahui Chen --- drivers/net/ethernet/intel/ixgbe/ixgbe.h | 3 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 61 ++++++++++++++++++++++++---- 2 files changed, 56 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h index 5ddfc83..33601c1 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h @@ -351,6 +351,9 @@ struct ixgbe_ring { }; struct xdp_rxq_info xdp_rxq; struct xdp_umem *xsk_umem; +#define XDP_CONGESTION_MEM_SIZE (4096) + void *congestion_addr; /* address of buffer when traffic congestion */ + dma_addr_t congestion_dma; /* dma address of packet when traffic congestion */ u16 ring_idx; /* {rx,tx,xdp}_ring back reference idx */ u16 rx_buf_len; } ____cacheline_internodealigned_in_smp; diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c index be9d2a8..f4a8237 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c @@ -27,6 +27,8 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, struct net_device *netdev = adapter->netdev; bool if_running; int err; + int size = XDP_CONGESTION_MEM_SIZE; + struct ixgbe_ring *rx_ring; if (qid >= adapter->num_rx_queues) return -EINVAL; @@ -49,6 +51,21 @@ static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter, if (if_running) { ixgbe_txrx_ring_enable(adapter, qid); + rx_ring->congestion_addr = kmalloc(size, GFP_DMA | GFP_KERNEL); + if (!rx_ring->congestion_addr) { + rx_ring->congestion_addr = kmalloc(size, GFP_DMA | GFP_KERNEL); + if (!rx_ring->congestion_addr) + return -ENOMEM; + } + rx_ring = adapter->rx_ring[qid]; + rx_ring->congestion_dma = dma_map_single(rx_ring->dev, rx_ring->congestion_addr, + size, + DMA_FROM_DEVICE); + /* sync the buffer for use by the device */ + dma_sync_single_range_for_device(rx_ring->dev, rx_ring->congestion_dma, + 0, + size, + DMA_FROM_DEVICE); /* Kick start the NAPI context so that receiving will start */ err = ixgbe_xsk_wakeup(adapter->netdev, qid, XDP_WAKEUP_RX); @@ -63,6 +80,8 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) { struct xdp_umem *umem; bool if_running; + int size = XDP_CONGESTION_MEM_SIZE; + struct ixgbe_ring *rx_ring; umem = xdp_get_umem_from_qid(adapter->netdev, qid); if (!umem) @@ -71,9 +90,18 @@ static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid) if_running = netif_running(adapter->netdev) && ixgbe_enabled_xdp_adapter(adapter); - if (if_running) + if (if_running) { ixgbe_txrx_ring_disable(adapter, qid); + rx_ring = adapter->rx_ring[qid]; + dma_sync_single_range_for_cpu(rx_ring->dev, rx_ring->congestion_dma, + 0, + size, + DMA_FROM_DEVICE); + dma_unmap_single(rx_ring->dev, rx_ring->congestion_addr, size, DMA_FROM_DEVICE); + kfree(rx_ring->congestion_addr); + } + clear_bit(qid, adapter->af_xdp_zc_qps); xsk_buff_dma_unmap(umem, IXGBE_RX_DMA_ATTR); @@ -152,10 +180,12 @@ bool ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count) bi->xdp = xsk_buff_alloc(rx_ring->xsk_umem); if (!bi->xdp) { ok = false; - break; + /* rx traffic congestion, we do not use umem DMA address */ + dma = rx_ring->congestion_dma; + } else { + dma = xsk_buff_xdp_get_dma(bi->xdp); } - dma = xsk_buff_xdp_get_dma(bi->xdp); /* Refresh the desc even if buffer_addrs didn't change * because each write-back erases this info. @@ -231,14 +261,15 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, struct ixgbe_ring *rx_ring, const int budget) { - unsigned int total_rx_bytes = 0, total_rx_packets = 0; + unsigned int total_rx_bytes = 0, total_rx_packets = 0, + handled_bytes = 0, handled_packets = 0; struct ixgbe_adapter *adapter = q_vector->adapter; u16 cleaned_count = ixgbe_desc_unused(rx_ring); unsigned int xdp_res, xdp_xmit = 0; bool failure = false; struct sk_buff *skb; - while (likely(total_rx_packets < budget)) { + while (likely(handled_packets < budget)) { union ixgbe_adv_rx_desc *rx_desc; struct ixgbe_rx_buffer *bi; unsigned int size; @@ -263,6 +294,16 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, dma_rmb(); bi = &rx_ring->rx_buffer_info[rx_ring->next_to_clean]; + /* rx traffic congestion, ignore this packet */ + if (unlikely(!bi->xdp)) { + handled_packets++; + handled_bytes += size; + + cleaned_count++; + ixgbe_inc_ntc(rx_ring); + continue; + } + if (unlikely(!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))) { @@ -296,6 +337,8 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, xsk_buff_free(bi->xdp); bi->xdp = NULL; + handled_packets++; + handled_bytes += size; total_rx_packets++; total_rx_bytes += size; @@ -317,6 +360,8 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, if (eth_skb_pad(skb)) continue; + handled_packets++; + handled_bytes += size; total_rx_bytes += skb->len; total_rx_packets++; @@ -341,8 +386,8 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, rx_ring->stats.packets += total_rx_packets; rx_ring->stats.bytes += total_rx_bytes; u64_stats_update_end(&rx_ring->syncp); - q_vector->rx.total_packets += total_rx_packets; - q_vector->rx.total_bytes += total_rx_bytes; + q_vector->rx.total_packets += handled_packets; + q_vector->rx.total_bytes += handled_bytes; if (xsk_umem_uses_need_wakeup(rx_ring->xsk_umem)) { if (failure || rx_ring->next_to_clean == rx_ring->next_to_use) @@ -352,7 +397,7 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector, return (int)total_rx_packets; } - return failure ? budget : (int)total_rx_packets; + return (int)handled_packets; } void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)