From patchwork Thu Mar 4 18:27:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1447471 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20161025 header.b=cAn+h2+x; dkim-atps=neutral Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Ds14561KZz9sW4 for ; Fri, 5 Mar 2021 06:23:17 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 04B2643288; Thu, 4 Mar 2021 19:23:16 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mSV_W1zjSAoc; Thu, 4 Mar 2021 19:23:15 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTP id 309B843285; Thu, 4 Mar 2021 19:23:14 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id EAF01C000B; Thu, 4 Mar 2021 19:23:13 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6EE23C0001 for ; Thu, 4 Mar 2021 19:23:12 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 5CDFB43286 for ; Thu, 4 Mar 2021 19:23:12 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8SkXmWbSSV5q for ; Thu, 4 Mar 2021 19:23:11 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.8.0 Received: from mail-oi1-x233.google.com (mail-oi1-x233.google.com [IPv6:2607:f8b0:4864:20::233]) by smtp2.osuosl.org (Postfix) with ESMTPS id 4C22A43285 for ; Thu, 4 Mar 2021 19:23:11 +0000 (UTC) Received: by mail-oi1-x233.google.com with SMTP id d20so31339806oiw.10 for ; Thu, 04 Mar 2021 11:23:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=RLM/XeqoGwe5R2tyHMGz1U0kYUUPzkM1sXsdoI3aZCo=; b=cAn+h2+x7MZt+CJRPUXI+ems9K6r0mtPX6JhSs/FuNletzB7VgTQhQjOPnylEWfqqs XllnuXiMWTHb9NSN1S8MtknYeTEKwwWC0zRQuaZgWqO9EagxC8T9Nf0MLHaZ0CvlAd0k ZHEPo+Tz2tymQiwyE4tWbmQXzShRyjbOoTD4p9Zt1K05nsHI/x8gSXLJPka60SIXI/q0 mfV6NKBYy5mF9vTUWpe06hgXe7A+lm34/8ri0yC5mG002Y8x2hfOOnZJupxYizlfAHlb 3eXIqZCTTbDYFJUm0XYyOI0av1bskh9jK7odZBQXNUsbb8SyJa0YjYF/gAoJgutQ+Jza urQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=RLM/XeqoGwe5R2tyHMGz1U0kYUUPzkM1sXsdoI3aZCo=; b=ZJJcymoeYf3vBY5kGBXtPquNpQi8spalKW6UlfTCwY3trhW7SW1ltBkpF+b7msDe21 Rts+l0FPWLx+Lvthm6cpMNEZrkzDz4tcqzd5uFEEN+w2Hl3WOygNf4Z9nBrKUcIEfTP3 idmKZFLmr54Rersfq24JzxEhU4NZYpzfFtE7p9WASgUe9BCMvfIgOsBDLUdMGNirkwYB CJx89uyKUtL35ZeeUjv0WIlRU2jtIKzhgQVEKiwV3g13l4AO1XzhrySLjhpTCaKaQ/cG sbPv2aft3vzbQf6NBuas8gifhlVMqbFGVeV9/tXbuHzgSEEDf/XMSgo8v49fDV1eOohA Tj5Q== X-Gm-Message-State: AOAM532aWTZFM1oMUDoyA9eA0209QAVJxjvh+gU5inhbT4iW+Mcdn2t5 hEvCfbA97v56/Gr8pb1JFYcvDy+dB1g= X-Google-Smtp-Source: ABdhPJxxK6PWPn7kjjLDZ/2JPvPCgo1PagLwX5X2TV+/0rtbpBp/1Tcm4x5kcXpEoA+j6MIcw07dxA== X-Received: by 2002:a17:90a:9604:: with SMTP id v4mr5786897pjo.201.1614882476825; Thu, 04 Mar 2021 10:27:56 -0800 (PST) Received: from sc9-mailhost3.vmware.com (c-67-185-44-206.hsd1.wa.comcast.net. [67.185.44.206]) by smtp.gmail.com with ESMTPSA id b15sm112613pgj.84.2021.03.04.10.27.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Mar 2021 10:27:56 -0800 (PST) From: William Tu To: dev@openvswitch.org Date: Thu, 4 Mar 2021 10:27:05 -0800 Message-Id: <1614882425-52800-1-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 Cc: brouer@redhat.com, i.maximets@ovn.org, vedang.patel@intel.com, dsahern@gmail.com, bjorn.topel@intel.com, saeedm@nvidia.com, magnus.karlsson@intel.com Subject: [ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" One big problem of netdev-afxdp is that there is no metadata support from the hardware at all. For example, OVS netdev-afxdp has to do rxhash, or TCP checksum in software, resulting in high performance overhead. A generic meta data type for XDP frame using BTF is proposed[1] and there is sample implementation[2][3]. This patch experiments enabling the XDP metadata, or called HW hints, and shows the potential performance improvement. The patch uses only the rxhash value provided from HW, so avoiding at the calculation of hash at lib/dpif-netdev.c: if (!dp_packet_rss_valid(execute->packet)) { dp_packet_set_rss_hash(execute->packet, flow_hash_5tuple(execute->flow, 0)); } Using '$ ovs-appctl dpif-netdev/pmd-stats-show', the 'avg processing cycles per packet' drops from 402 to 272. More details below Reference: ---------- [1] https://www.kernel.org/doc/html/latest/bpf/btf.html [2] https://netdevconf.info/0x14/pub/slides/54/[1]%20XDP%20meta%20data%20acceleration.pdf [3] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4 Testbed: -------- Two Xeon E5-2620 v3 2.4GHz connected back-to-back using Mellanox ConnectX-6Dx 25GbE. Before starting OVS, enable the MD by: $ bpftool net xdp show xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(0) enp2s0f1np1(5) md_btf_id(2) md_btf_enabled(0) $ bpftool net xdp set dev enp2s0f0np0 md_btf on $ bpftool net xdp xdp: enp2s0f0np0(4) md_btf_id(1) md_btf_enabled(1) Limitations/TODO: ----------------- 1. Support only AF_XDP native mode, not zero-copy mode. 2. Currently only three fields: vlan, hash, and flow_mark, and only receive side supports XDP metadata. 3. Control plane, how to enable and probe the structure, not upstream yet. OVS rxdrop without HW hints: --------------------------- Drop rate: 4.8Mpps pmd thread numa_id 0 core_id 3: packets received: 196592006 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 196592006 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 56009063835 (41.43%) processing cycles: 79164971931 (58.57%) avg cycles per packet: 687.59 (135174035766/196592006) avg processing cycles per packet: 402.69 (79164971931/196592006) pmd thread numa_id 0 core_id 3: Iterations: 339607649 (0.23 us/it) - Used TSC cycles: 188620512777 ( 99.9 % of total cycles) - idle iterations: 330697002 ( 40.3 % of used cycles) - busy iterations: 8910647 ( 59.7 % of used cycles) Rx packets: 285140031 (3624 Kpps, 395 cycles/pkt) Datapath passes: 285140031 (1.00 passes/pkt) - EMC hits: 285139999 (100.0 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 0 Perf report: 17.56% pmd-c03/id:11 ovs-vswitchd [.] netdev_afxdp_rxq_recv 14.39% pmd-c03/id:11 ovs-vswitchd [.] dp_netdev_process_rxq_port 14.17% pmd-c03/id:11 ovs-vswitchd [.] pmd_thread_main 10.86% pmd-c03/id:11 [vdso] [.] __vdso_clock_gettime 10.19% pmd-c03/id:11 ovs-vswitchd [.] pmd_perf_end_iteration 7.71% pmd-c03/id:11 ovs-vswitchd [.] time_timespec__ 5.64% pmd-c03/id:11 ovs-vswitchd [.] time_usec 3.88% pmd-c03/id:11 ovs-vswitchd [.] netdev_get_class 2.95% pmd-c03/id:11 ovs-vswitchd [.] netdev_rxq_recv 2.78% pmd-c03/id:11 libbpf.so.0.2.0 [.] xsk_socket__fd 2.74% pmd-c03/id:11 ovs-vswitchd [.] pmd_perf_start_iteration 2.11% pmd-c03/id:11 libc-2.27.so [.] __clock_gettime 1.32% pmd-c03/id:11 ovs-vswitchd [.] xsk_socket__fd@plt OVS rxdrop with HW hints: ------------------------- rxdrop rate: 4.73Mpps pmd thread numa_id 0 core_id 7: packets received: 13686880 packet recirculations: 0 avg. datapath passes per packet: 1.00 emc hits: 13686880 smc hits: 0 megaflow hits: 0 avg. subtable lookups per megaflow hit: 0.00 miss with success upcall: 0 miss with failed upcall: 0 avg. packets per output batch: 0.00 idle cycles: 3182105544 (46.02%) processing cycles: 3732023844 (53.98%) avg cycles per packet: 505.16 (6914129388/13686880) avg processing cycles per packet: 272.67 (3732023844/13686880) pmd thread numa_id 0 core_id 7: Iterations: 392909539 (0.18 us/it) - Used TSC cycles: 167697342678 ( 99.9 % of total cycles) - idle iterations: 382539861 ( 46.0 % of used cycles) - busy iterations: 10369678 ( 54.0 % of used cycles) Rx packets: 331829656 (4743 Kpps, 273 cycles/pkt) Datapath passes: 331829656 (1.00 passes/pkt) - EMC hits: 331829656 (100.0 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits: 0 ( 0.0 %, 0.00 subtbl lookups/hit) - Upcalls: 0 ( 0.0 %, 0.0 us/upcall) - Lost upcalls: 0 ( 0.0 %) Tx packets: 0 Perf record/report: 22.96% pmd-c07/id:8 ovs-vswitchd [.] netdev_afxdp_rxq_recv 10.43% pmd-c07/id:8 ovs-vswitchd [.] miniflow_extract 7.20% pmd-c07/id:8 ovs-vswitchd [.] dp_packet_init__ 7.00% pmd-c07/id:8 ovs-vswitchd [.] dp_netdev_input__ 6.79% pmd-c07/id:8 ovs-vswitchd [.] dp_netdev_process_rxq_port 6.62% pmd-c07/id:8 ovs-vswitchd [.] pmd_thread_main 5.65% pmd-c07/id:8 ovs-vswitchd [.] pmd_perf_end_iteration 5.04% pmd-c07/id:8 [vdso] [.] __vdso_clock_gettime 3.60% pmd-c07/id:8 ovs-vswitchd [.] time_timespec__ 3.10% pmd-c07/id:8 ovs-vswitchd [.] umem_elem_push 2.74% pmd-c07/id:8 libc-2.27.so [.] __memcmp_avx2_movbe 2.62% pmd-c07/id:8 ovs-vswitchd [.] time_usec 2.14% pmd-c07/id:8 ovs-vswitchd [.] dp_packet_use_afxdp 1.58% pmd-c07/id:8 ovs-vswitchd [.] netdev_rxq_recv 1.47% pmd-c07/id:8 ovs-vswitchd [.] netdev_get_class 1.34% pmd-c07/id:8 ovs-vswitchd [.] pmd_perf_start_iteration Signed-off-by: William Tu --- lib/netdev-afxdp.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 482400d8d135..49881a8cc0cb 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -169,6 +169,17 @@ struct netdev_afxdp_tx_lock { ); }; +/* FIXME: + * This should be done dynamically by query the device's + * XDP metadata structure. Ex: + * $ bpftool net xdp md_btf cstyle dev enp2s0f0np0 + */ +struct xdp_md_desc { + uint32_t flow_mark; + uint32_t hash32; + uint16_t vlan; +}; + #ifdef HAVE_XDP_NEED_WAKEUP static inline void xsk_rx_wakeup_if_needed(struct xsk_umem_info *umem, @@ -849,6 +860,7 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, struct dp_packet_afxdp *xpacket; const struct xdp_desc *desc; struct dp_packet *packet; + struct xdp_md_desc *md; uint64_t addr, index; uint32_t len; char *pkt; @@ -858,6 +870,7 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, len = desc->len; pkt = xsk_umem__get_data(umem->buffer, addr); + md = pkt - sizeof *md; index = addr >> FRAME_SHIFT; xpacket = &umem->xpool.array[index]; packet = &xpacket->packet; @@ -868,6 +881,12 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, OVS_XDP_HEADROOM); dp_packet_set_size(packet, len); + /* FIXME: This should be done by detecting whether + * XDP MD is enabled or not. Ex: + * $ bpftool net xdp set dev enp2s0f0np0 md_btf on + */ + dp_packet_set_rss_hash(packet, md->hash32); + /* Add packet into batch, increase batch->count. */ dp_packet_batch_add(batch, packet);