From patchwork Mon Jan 31 13:54:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Van Haaren, Harry" X-Patchwork-Id: 1586798 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=MNJs+uBK; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JnV1k5LNJz9s8q for ; Tue, 1 Feb 2022 00:55:05 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 1DDAD605AB; Mon, 31 Jan 2022 13:55:03 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r1D5uUU2JDQd; Mon, 31 Jan 2022 13:55:02 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id 249EE6068A; Mon, 31 Jan 2022 13:55:01 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D83EFC001A; Mon, 31 Jan 2022 13:55:00 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id E80E1C000B for ; Mon, 31 Jan 2022 13:54:59 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id C830A82F6F for ; Mon, 31 Jan 2022 13:54:59 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (2048-bit key) header.d=intel.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eNVPADH-9t1q for ; Mon, 31 Jan 2022 13:54:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by smtp1.osuosl.org (Postfix) with ESMTPS id C582282BE5 for ; Mon, 31 Jan 2022 13:54:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643637298; x=1675173298; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FWLArAfAOBTXCIWHvRGTpzVDJG1uuOpBY4vtl+H26xQ=; b=MNJs+uBKeuZLQAV3/Ws6/jwajO65GaHFhwS9ddLCrg5m6pqvhlE7/tT3 kqRiitnownTaDM2v+nHyZ/lDGONMOwzkdYlT5nCRIc7hugH6in2fKMMmn B4qBRrNeWeuBBTci/i8bDYIN2Py5vyKe4+GW5NQ1/6vxIOBYYxuhaYJ1Q YXHgHtWWdeX8eLJvpgFcY6F3u1r497zoUyBPVgGq+f3XOxeCWVNzyeJiy J0sqnZmNtPqo9afuiRqZsxAkXQQaYwPPV+jXyELWkargWTlVb8FQdtBNK sBxiSthTNQ3mToTsyJTv4KvUmTwLBOBNHihzaxGgt0mGFDjFD7ZPhgxuo A==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="308194921" X-IronPort-AV: E=Sophos;i="5.88,331,1635231600"; d="scan'208";a="308194921" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Jan 2022 05:54:58 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,331,1635231600"; d="scan'208";a="537187216" Received: from silpixa00401120.ir.intel.com ([10.55.128.255]) by orsmga008.jf.intel.com with ESMTP; 31 Jan 2022 05:54:56 -0800 From: Harry van Haaren To: ovs-dev@openvswitch.org Date: Mon, 31 Jan 2022 13:54:53 +0000 Message-Id: <20220131135453.3239792-1-harry.van.haaren@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220128152033.3133613-1-harry.van.haaren@intel.com> References: <20220128152033.3133613-1-harry.van.haaren@intel.com> MIME-Version: 1.0 Cc: i.maximets@ovn.org, kumar.amber@intel.com Subject: [ovs-dev] [PATCH v3] dpif-netdev: fix vlan and ipv4 parsing in avx512 X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This commit fixes the minimum packet size for the vlan/ipv4/tcp traffic profile, which was previously incorrectly set. This commit also disallows any fragmented IPv4 packets from being matched in the optimized miniflow-extract, avoiding complexity of handling fragmented packets and using scalar fallback instead. The DF (don't fragment) bit is now ignored, and stripped from the resulting miniflow. Fixes: aa85a25095 ("dpif-netdev/mfex: Add more AVX512 traffic profiles.") Signed-off-by: Harry van Haaren Tested-by: Kumar Amber Acked-by: Eelco Chaudron --- Testing this patch becomes easier if the MFEX/DPIF patch by Amber here is applied, as it ensures the AVX512 DPIF is active (and hence MFEX-autovalidator actually executes in the datapath always, or the test gets skipped if the ISA is not available). https://patchwork.ozlabs.org/project/openvswitch/patch/20220131105149.1471184-1-kumar.amber@intel.com/ v3: - Rework AVX512 impl to be more generic, adding "strip_mask" to profile - Use #define NC for 0xFF value generation in bitmask (Eelco) - Use previous store method (not in separate function) (Eelco/Harry) - Handle VLAN/Dot1Q appropriately to pass MFEX Autovalidation (Amber) v2: - Fixup the "frag-offset" mask from incorrect value, to ignore DF bit (Eelco) - The OVS_UNLIKELY() is added as the extra instructions/inline-func-call was confusing the compiler here, resulting in slow code. By marking the branch as unlikely, the code sequence generated is optimal again. --- lib/dpif-netdev-extract-avx512.c | 36 +++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c index d23349482..c1c1fefb6 100644 --- a/lib/dpif-netdev-extract-avx512.c +++ b/lib/dpif-netdev-extract-avx512.c @@ -157,7 +157,7 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) 0, 0, 0, 0, /* Src IP */ \ 0, 0, 0, 0, /* Dst IP */ -#define PATTERN_IPV4_MASK PATTERN_IPV4_GEN(0xFF, 0xFE, 0xFF, 0xFF) +#define PATTERN_IPV4_MASK PATTERN_IPV4_GEN(0xFF, 0xBF, 0xFF, 0xFF) #define PATTERN_IPV4_UDP PATTERN_IPV4_GEN(0x45, 0, 0, 0x11) #define PATTERN_IPV4_TCP PATTERN_IPV4_GEN(0x45, 0, 0, 0x06) @@ -226,6 +226,25 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) #define PATTERN_DT1Q_IPV4_TCP_KMASK \ (KMASK_ETHER | (KMASK_DT1Q << 16) | (KMASK_IPV4 << 24) | (KMASK_TCP << 40)) +/* Miniflow Strip post-processing masks. + * This allows unsetting specific bits from the resulting miniflow. It is used + * for e.g. IPv4 where the "DF" bit is never pushed to the miniflow itself. + * The NC define is for "No Change", allowing the bits to pass through. + */ +#define NC 0xFF + +#define PATTERN_STRIP_IPV4_MASK \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, 0xBF, NC, NC, NC, \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC + +#define PATTERN_STRIP_DOT1Q_IPV4_MASK \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, \ + NC, NC, NC, NC, 0xBF, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, \ + NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC, NC + /* This union allows initializing static data as u8, but easily loading it * into AVX512 registers too. The union ensures proper alignment for the zmm. */ @@ -250,8 +269,9 @@ struct mfex_profile { union mfex_data probe_mask; union mfex_data probe_data; - /* Required for reshaping packet into miniflow. */ + /* Required for reshaping packet into miniflow and post-processing it. */ union mfex_data store_shuf; + union mfex_data strip_mask; __mmask64 store_kmsk; /* Constant data to set in mf.bits and dp_packet data on hit. */ @@ -319,6 +339,7 @@ static const struct mfex_profile mfex_profiles[PROFILE_COUNT] = .probe_data.u8_data = { PATTERN_ETHERTYPE_IPV4 PATTERN_IPV4_UDP}, .store_shuf.u8_data = { PATTERN_IPV4_UDP_SHUFFLE }, + .strip_mask.u8_data = { PATTERN_STRIP_IPV4_MASK }, .store_kmsk = PATTERN_IPV4_UDP_KMASK, .mf_bits = { 0x18a0000000000000, 0x0000000000040401}, @@ -341,6 +362,7 @@ static const struct mfex_profile mfex_profiles[PROFILE_COUNT] = }, .store_shuf.u8_data = { PATTERN_IPV4_TCP_SHUFFLE }, + .strip_mask.u8_data = { PATTERN_STRIP_IPV4_MASK }, .store_kmsk = PATTERN_IPV4_TCP_KMASK, .mf_bits = { 0x18a0000000000000, 0x0000000000044401}, @@ -359,6 +381,7 @@ static const struct mfex_profile mfex_profiles[PROFILE_COUNT] = }, .store_shuf.u8_data = { PATTERN_DT1Q_IPV4_UDP_SHUFFLE }, + .strip_mask.u8_data = { PATTERN_STRIP_DOT1Q_IPV4_MASK }, .store_kmsk = PATTERN_DT1Q_IPV4_UDP_KMASK, .mf_bits = { 0x38a0000000000000, 0x0000000000040401}, @@ -383,13 +406,14 @@ static const struct mfex_profile mfex_profiles[PROFILE_COUNT] = }, .store_shuf.u8_data = { PATTERN_DT1Q_IPV4_TCP_SHUFFLE }, + .strip_mask.u8_data = { PATTERN_STRIP_DOT1Q_IPV4_MASK }, .store_kmsk = PATTERN_DT1Q_IPV4_TCP_KMASK, .mf_bits = { 0x38a0000000000000, 0x0000000000044401}, .dp_pkt_offs = { 14, UINT16_MAX, 18, 38, }, - .dp_pkt_min_size = 46, + .dp_pkt_min_size = 58, }, }; @@ -471,6 +495,7 @@ mfex_avx512_process(struct dp_packet_batch *packets, __m512i v_vals = _mm512_loadu_si512(&profile->probe_data); __m512i v_mask = _mm512_loadu_si512(&profile->probe_mask); __m512i v_shuf = _mm512_loadu_si512(&profile->store_shuf); + __m512i v_strp = _mm512_loadu_si512(&profile->strip_mask); __mmask64 k_shuf = profile->store_kmsk; __m128i v_bits = _mm_loadu_si128((void *) &profile->mf_bits); @@ -498,7 +523,7 @@ mfex_avx512_process(struct dp_packet_batch *packets, __m512i v_pkt0_masked = _mm512_and_si512(v_pkt0, v_mask); __mmask64 k_cmp = _mm512_cmpeq_epi8_mask(v_pkt0_masked, v_vals); - if (k_cmp != UINT64_MAX) { + if (OVS_UNLIKELY(k_cmp != UINT64_MAX)) { continue; } @@ -526,8 +551,9 @@ mfex_avx512_process(struct dp_packet_batch *packets, v_blk0 = _mm512_maskz_permutex2var_epi8_skx(k_shuf, v_pkt0, v_shuf, v512_zeros); } - _mm512_storeu_si512(&blocks[2], v_blk0); + __m512i v_blk0_strip = _mm512_and_si512(v_blk0, v_strp); + _mm512_storeu_si512(&blocks[2], v_blk0_strip); /* Perform "post-processing" per profile, handling details not easily * handled in the above generic AVX512 code. Examples include TCP flag