From patchwork Thu Jul 14 17:51:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Finn, Emma" X-Patchwork-Id: 1656585 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=GPrGSDgz; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4LkMXx2Ssqz9sFs for ; Fri, 15 Jul 2022 03:53:21 +1000 (AEST) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 79F5B6166E; Thu, 14 Jul 2022 17:53:19 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 79F5B6166E Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=GPrGSDgz X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XGeNR22OGCUj; Thu, 14 Jul 2022 17:53:17 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 532C261662; Thu, 14 Jul 2022 17:53:15 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 532C261662 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E7549C007E; Thu, 14 Jul 2022 17:53:12 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 96282C0035 for ; Thu, 14 Jul 2022 17:53:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id A0C4F41896 for ; Thu, 14 Jul 2022 17:52:39 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org A0C4F41896 Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=GPrGSDgz X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NWkBCohTSkn4 for ; Thu, 14 Jul 2022 17:52:36 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org CAE534189C Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by smtp2.osuosl.org (Postfix) with ESMTPS id CAE534189C for ; Thu, 14 Jul 2022 17:52:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657821153; x=1689357153; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GnpZyZV9e6XOTadjX+ILhKEYNGDBQKyd6M66PdIB6DI=; b=GPrGSDgzHLklSOZZKjguS/q3fBX63u7A6GbwJaE++qEBtjL52EIbP0X5 KUaBAi+7RdBOyqPE5ZeTr13Epu1HPisjAC9XikH+pSiz3hgoQCk+zfObr ops2pQOEickBJbq/Dd7GB8gyJcPFkNyWroH2C9zuS/nled5eoMgZgNCfy s1drFWK8ZGL9HlCT96drYbaNUhf4dPIjXhEa8dSBW7h5fsNjapNrzdjeN ooVaJpMiy/q2MsFDqt+pC/WIuffoN0DRImprhUtATAHzwo9Zm0TMXAq3k rqAIVyboql0Sh50pZEAa7yrCT/w3GAL2DfT3LvGg3vsQ/xpxe19HTJa0E Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10408"; a="265380380" X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="265380380" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jul 2022 10:52:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,272,1650956400"; d="scan'208";a="842232209" Received: from silpixa00401384.ir.intel.com ([10.243.22.75]) by fmsmga006.fm.intel.com with ESMTP; 14 Jul 2022 10:52:19 -0700 From: Emma Finn To: dev@openvswitch.org, echaudro@redhat.com, harry.van.haaren@intel.com, kumar.amber@intel.com Date: Thu, 14 Jul 2022 17:51:55 +0000 Message-Id: <20220714175158.3709150-8-emma.finn@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220714175158.3709150-1-emma.finn@intel.com> References: <20220713182807.3416578-1-harry.van.haaren@intel.com> <20220714175158.3709150-1-emma.finn@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org Subject: [ovs-dev] [v11 07/10] odp-execute: Add ISA implementation of pop_vlan action. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Harry van Haaren This commit adds the AVX512 implementation of the pop_vlan action. Signed-off-by: Emma Finn --- lib/automake.mk | 4 + lib/odp-execute-avx512.c | 186 ++++++++++++++++++++++++++++++++++++++ lib/odp-execute-private.c | 32 ++++++- lib/odp-execute-private.h | 4 + 4 files changed, 225 insertions(+), 1 deletion(-) create mode 100644 lib/odp-execute-avx512.c diff --git a/lib/automake.mk b/lib/automake.mk index 5c3b05f6b..a76de6dbf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -45,6 +45,10 @@ lib_libopenvswitchavx512_la_CFLAGS += \ lib_libopenvswitchavx512_la_SOURCES += \ lib/dpif-netdev-extract-avx512.c \ lib/dpif-netdev-lookup-avx512-gather.c +if HAVE_GCC_AVX512VL_GOOD +lib_libopenvswitchavx512_la_SOURCES += \ + lib/odp-execute-avx512.c +endif # HAVE_GCC_AVX512VL_GOOD endif # HAVE_AVX512VL endif # HAVE_AVX512BW lib_libopenvswitchavx512_la_LDFLAGS = \ diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c new file mode 100644 index 000000000..d929abe68 --- /dev/null +++ b/lib/odp-execute-avx512.c @@ -0,0 +1,186 @@ +/* + * Copyright (c) 2022 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifdef __x86_64__ +/* Sparse cannot handle the AVX512 instructions. */ +#if !defined(__CHECKER__) + +#include +#include + +#include "dp-packet.h" +#include "immintrin.h" +#include "odp-execute-private.h" +#include "odp-netlink.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(odp_execute_avx512); + +/* The below three build asserts make sure that l2_5_ofs, l3_ofs, and l4_ofs + * fields remain in the same order and offset to l2_padd_size. This is needed + * as the avx512_dp_packet_resize_l2() function will manipulate those fields at + * a fixed memory index based on the l2_padd_size offset. */ +BUILD_ASSERT_DECL(offsetof(struct dp_packet, l2_pad_size) + + MEMBER_SIZEOF(struct dp_packet, l2_pad_size) == + offsetof(struct dp_packet, l2_5_ofs)); + +BUILD_ASSERT_DECL(offsetof(struct dp_packet, l2_5_ofs) + + MEMBER_SIZEOF(struct dp_packet, l2_5_ofs) == + offsetof(struct dp_packet, l3_ofs)); + +BUILD_ASSERT_DECL(offsetof(struct dp_packet, l3_ofs) + + MEMBER_SIZEOF(struct dp_packet, l3_ofs) == + offsetof(struct dp_packet, l4_ofs)); + +/* The below build assert makes sure it's safe to read/write 128-bits starting + * at the l2_pad_size location. */ +BUILD_ASSERT_DECL(sizeof(struct dp_packet) - + offsetof(struct dp_packet, l2_pad_size) >= sizeof(__m128i)); + +static inline void ALWAYS_INLINE +avx512_dp_packet_resize_l2(struct dp_packet *b, int resize_by_bytes) +{ + /* Update packet size/data pointers, same as the scalar implementation. */ + if (resize_by_bytes >= 0) { + dp_packet_push_uninit(b, resize_by_bytes); + } else { + dp_packet_pull(b, -resize_by_bytes); + } + + /* The next step is to update the l2_5_ofs, l3_ofs and l4_ofs fields which + * the scalar implementation does with the dp_packet_adjust_layer_offset() + * function. */ + + /* Set the v_zero register to all zero's. */ + const __m128i v_zeros = _mm_setzero_si128(); + + /* Set the v_u16_max register to all one's. */ + const __m128i v_u16_max = _mm_cmpeq_epi16(v_zeros, v_zeros); + + /* Each lane represents 16 bits in a 12-bit register. In this case the + * first three 16-bit values, which will map to the l2_5_ofs, l3_ofs and + * l4_ofs fields. */ + const uint8_t k_lanes = 0b1110; + + /* Set all 16-bit words in the 128-bits v_offset register to the value we + * need to add/substract from the l2_5_ofs, l3_ofs, and l4_ofs fields. */ + __m128i v_offset = _mm_set1_epi16(abs(resize_by_bytes)); + + /* Load 128 bits from the dp_packet structure starting at the l2_pad_size + * offset. */ + void *adjust_ptr = &b->l2_pad_size; + __m128i v_adjust_src = _mm_loadu_si128(adjust_ptr); + + /* Here is the tricky part, we only need to update the value of the three + * fields if they are not UINT16_MAX. The following function will return + * a mask of lanes (read fields) that are not UINT16_MAX. It will do this + * by comparing only the lanes we requested, k_lanes, and if they match + * v_u16_max, the bit will be set. */ + __mmask8 k_cmp = _mm_mask_cmpneq_epu16_mask(k_lanes, v_adjust_src, + v_u16_max); + + /* Based on the bytes adjust (positive, or negative) it will do the actual + * add or subtraction. These functions will only operate on the lanes + * (fields) requested based on k_cmp, i.e: + * k_cmp = [l2_5_ofs, l3_ofs, l4_ofs] + * for field in kcmp + * v_adjust_src[field] = v_adjust_src[field] + v_offset + */ + __m128i v_adjust_wip; + + if (resize_by_bytes >= 0) { + v_adjust_wip = _mm_mask_add_epi16(v_adjust_src, k_cmp, + v_adjust_src, v_offset); + } else { + v_adjust_wip = _mm_mask_sub_epi16(v_adjust_src, k_cmp, + v_adjust_src, v_offset); + } + + /* Here we write back the full 128-bits. */ + _mm_storeu_si128(adjust_ptr, v_adjust_wip); +} + +/* This function performs the same operation on each packet in the batch as + * the scalar eth_pop_vlan() function. */ +static void +action_avx512_pop_vlan(struct dp_packet_batch *batch, + const struct nlattr *a OVS_UNUSED) +{ + struct dp_packet *packet; + + /* Set the v_zero register to all zero's. */ + const __m128i v_zeros = _mm_setzero_si128(); + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + struct vlan_eth_header *veh = dp_packet_eth(packet); + + if (veh && dp_packet_size(packet) >= sizeof *veh && + eth_type_vlan(veh->veth_type)) { + + /* Load the first 128-bits of l2 header into the v_ether register. + * This result in the veth_dst/src and veth_type/tci of the + * vlan_eth_header structure to be loaded. */ + __m128i v_ether = _mm_loadu_si128((void *) veh); + + /* This creates a 256-bit value containing the first four fields + * of the vlan_eth_header plus 128 zero-bit. The result will be the + * lowest 128-bits after the right shift, hence we shift the data + * 128(zero)-bits minus the VLAN_HEADER_LEN, so we are left with + * only the veth_dst and veth_src fields. */ + __m128i v_realign = _mm_alignr_epi8(v_ether, v_zeros, + sizeof(__m128i) - + VLAN_HEADER_LEN); + + /* Write back the modified ethernet header. */ + _mm_storeu_si128((void *) veh, v_realign); + + /* As we removed the VLAN_HEADER we now need to adjust all the + * offsets. */ + avx512_dp_packet_resize_l2(packet, -VLAN_HEADER_LEN); + } + } +} + +int +action_avx512_init(struct odp_execute_action_impl *self OVS_UNUSED) +{ + if (!action_avx512_isa_probe()) { + return -ENOTSUP; + } + + /* Set function pointers for actions that can be applied directly, these + * are identified by OVS_ACTION_ATTR_*. */ + self->funcs[OVS_ACTION_ATTR_POP_VLAN] = action_avx512_pop_vlan; + return 0; +} + +#endif /* Sparse */ + +#else /* __x86_64__ */ + +#include +#include "odp-execute-private.h" +/* Function itself is required to be called, even in e.g. 32-bit builds. + * This dummy init function ensures 32-bit builds succeed too. + */ + +int +action_avx512_init(struct odp_execute_action_impl *self OVS_UNUSED) +{ + return -ENOTSUP; +} + +#endif diff --git a/lib/odp-execute-private.c b/lib/odp-execute-private.c index feccdaa43..265e3205f 100644 --- a/lib/odp-execute-private.c +++ b/lib/odp-execute-private.c @@ -19,6 +19,7 @@ #include #include +#include "cpu.h" #include "dpdk.h" #include "dp-packet.h" #include "odp-execute-private.h" @@ -29,6 +30,35 @@ VLOG_DEFINE_THIS_MODULE(odp_execute_impl); static int active_action_impl_index; +#if ACTION_IMPL_AVX512_CHECK +/* Probe functions to check ISA requirements. */ +bool +action_avx512_isa_probe(void) +{ + static enum ovs_cpu_isa isa_required[] = { + OVS_CPU_ISA_X86_AVX512F, + OVS_CPU_ISA_X86_AVX512BW, + OVS_CPU_ISA_X86_BMI2, + OVS_CPU_ISA_X86_AVX512VL, + }; + for (int i = 0; i < ARRAY_SIZE(isa_required); i++) { + if (!cpu_has_isa(isa_required[i])) { + return false; + } + } + return true; +} + +#else + +bool +action_avx512_isa_probe(void) +{ + return false; +} + +#endif + static struct odp_execute_action_impl action_impls[] = { [ACTION_IMPL_AUTOVALIDATOR] = { .available = false, @@ -46,7 +76,7 @@ static struct odp_execute_action_impl action_impls[] = { [ACTION_IMPL_AVX512] = { .available = false, .name = "avx512", - .init_func = NULL, + .init_func = action_avx512_init, }, #endif }; diff --git a/lib/odp-execute-private.h b/lib/odp-execute-private.h index dc01a3f9b..5c0c5a25f 100644 --- a/lib/odp-execute-private.h +++ b/lib/odp-execute-private.h @@ -77,6 +77,8 @@ BUILD_ASSERT_DECL(ACTION_IMPL_AUTOVALIDATOR == 1); #define ACTION_IMPL_BEGIN (ACTION_IMPL_AUTOVALIDATOR + 1) +bool action_avx512_isa_probe(void); + /* Odp execute init handles setting up the state of the actions functions at * initialization time. It cannot return errors, as it must always succeed in * initializing the scalar/generic codepath. */ @@ -90,6 +92,8 @@ struct odp_execute_action_impl * odp_execute_action_set(const char *name); int action_autoval_init(struct odp_execute_action_impl *self); +int action_avx512_init(struct odp_execute_action_impl *self); + void odp_execute_action_get_info(struct ds *name); #endif /* ODP_EXTRACT_PRIVATE */