From patchwork Mon Aug 19 09:03:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haochen Jiang X-Patchwork-Id: 1973739 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=S45r75cV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WnRWc1Xr3z1yg2 for ; Mon, 19 Aug 2024 19:05:16 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 632613864C64 for ; Mon, 19 Aug 2024 09:05:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) by sourceware.org (Postfix) with ESMTPS id 9846538654AD for ; Mon, 19 Aug 2024 09:03:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9846538654AD Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 9846538654AD Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.17 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058238; cv=none; b=i0ZTaSkl6j0d/Pn7Eaf3q2y1lbvlHFDg8s9su9N1D7mDoFkIe+liliWMYqcIw3S1qTxxzxPEMdMzsDUT/GM2cXoZLBRSJtnHZwZuK0u1wFWLv4xkQofKyZsXYSwRk3SVHT1dYA30kFrbqYD9lQWL5dds0iC3Fa5Bf12Fa/j5hVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724058238; c=relaxed/simple; bh=3LoqrWXlzZQskL+28EwWJ7JqD9AYkLHOR3KSClneXQU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=rynaKFVPApQ6vSXaXX24GIskPSY7QR8wVBPagRzthN2E+6Ozr9Xlsy/NQZUq+uikZSTv0/Ce5dhGk24iXGOF4rJPG1gMKnzwQIiZ2ij8EIBFjmXifJ+9wQ4dVEnGJwKbQ0i1zsoB8FVlJ7HdO71xXr4MWG5vKW0acqWzxL5PIHY= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724058236; x=1755594236; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3LoqrWXlzZQskL+28EwWJ7JqD9AYkLHOR3KSClneXQU=; b=S45r75cVMcejViW4G8bTJKPXb7nYdDoi8JggQKBd9mc71a6VXIVA3pFQ iQ5DNQWy7Bo5JvjxLjZWzpURdhfW1W7IDKXxToaTutctHbffom6uVW/aD LqZ0trMS5LEM0lhx9m3GIaLW2v93OHIjbo03IYwznZl2tSWsGHHxzg9ir FLh2WEOaDLgvAndzmoYEomOuqPt+s5bhnfDCbdjeiwxpMBRgScF78xFiF 5Sp2unSPQqtNb/J4TQc46Y/u+OHrFie9tB9kc99u1XUwjK9VWIfddK56z t20ZJXuOhVTklfTokOYmzVag68JBHoX9d1gbL1cr8b9050iV/J6BuTcjB Q==; X-CSE-ConnectionGUID: xaOb/FtkT5uvoZ2jCcaakw== X-CSE-MsgGUID: Kf7VLojNS3y1dQuQfO0v5Q== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="22174100" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="22174100" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 02:03:41 -0700 X-CSE-ConnectionGUID: MnSsBtfRRSuvYChB5ToEZQ== X-CSE-MsgGUID: K8KPwocLSmmV6TLpE1rgyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="59967118" Received: from scymds03.sc.intel.com ([10.148.94.166]) by fmviesa006.fm.intel.com with ESMTP; 19 Aug 2024 02:03:40 -0700 Received: from icl-spr-01.jf.intel.com (icl-spr-01.jf.intel.com [10.165.54.241]) by scymds03.sc.intel.com (Postfix) with ESMTP id 38A387E; Mon, 19 Aug 2024 02:03:40 -0700 (PDT) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH 12/12] i386: Add bf8 -> fp16 intrin Date: Mon, 19 Aug 2024 02:03:38 -0700 Message-ID: <20240819090340.193463-1-haochen.jiang@intel.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819085717.193256-1-haochen.jiang@intel.com> References: <20240819085717.193256-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Since BF8 and FP16 have same bits for exponent, the type conversion between them is just a cast for fraction part. We will use a sequence of instrctions instead of new instructions to do that. For convenience, intrins are also provided. gcc/ChangeLog: * config/i386/avx10_2-512convertintrin.h (_mm512_cvtpbf8_ph): New. (_mm512_mask_cvtpbf8_ph): Ditto. (_mm512_maskz_cvtpbf8_ph): Ditto. * config/i386/avx10_2convertintrin.h (_mm_cvtpbf8_ph): Ditto. (_mm_mask_cvtpbf8_ph): Ditto. (_mm_maskz_cvtpbf8_ph): Ditto. (_mm256_cvtpbf8_ph): Ditto. (_mm256_mask_cvtpbf8_ph): Ditto. (_mm256_maskz_cvtpbf8_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-convert-1.c: Add tests for new intrin. * gcc.target/i386/avx10_2-convert-1.c: Ditto. --- gcc/config/i386/avx10_2-512convertintrin.h | 24 ++++++++++ gcc/config/i386/avx10_2convertintrin.h | 48 +++++++++++++++++++ .../gcc.target/i386/avx10_2-512-convert-1.c | 16 ++++++- .../gcc.target/i386/avx10_2-convert-1.c | 26 ++++++++-- 4 files changed, 109 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/avx10_2-512convertintrin.h b/gcc/config/i386/avx10_2-512convertintrin.h index 4ad339bbbf9..dfbdfc3e51b 100644 --- a/gcc/config/i386/avx10_2-512convertintrin.h +++ b/gcc/config/i386/avx10_2-512convertintrin.h @@ -540,6 +540,30 @@ _mm512_maskz_cvtnesph_phf8 (__mmask32 __U, __m512h __A) (__mmask32) __U); } +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpbf8_ph (__m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 ( + (__m512i) _mm512_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpbf8_ph (__m512h __S, __mmask16 __U, __m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_mask_slli_epi16 ( + (__m512i) __S, __U, (__m512i) _mm512_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m512h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpbf8_ph (__mmask16 __U, __m256i __A) +{ + return (__m512h) _mm512_castsi512_ph ((__m512i) _mm512_slli_epi16 ( + (__m512i) _mm512_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + #ifdef __DISABLE_AVX10_2_512__ #undef __DISABLE_AVX10_2_512__ #pragma GCC pop_options diff --git a/gcc/config/i386/avx10_2convertintrin.h b/gcc/config/i386/avx10_2convertintrin.h index ac62d1290a5..8d2c1a54147 100644 --- a/gcc/config/i386/avx10_2convertintrin.h +++ b/gcc/config/i386/avx10_2convertintrin.h @@ -970,6 +970,54 @@ _mm256_maskz_cvtnesph_phf8 (__mmask16 __U, __m256h __A) (__mmask16) __U); } +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpbf8_ph (__m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_slli_epi16 ( + (__m128i) _mm_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cvtpbf8_ph (__m128h __S, __mmask8 __U, __m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_mask_slli_epi16 ( + (__m128i) __S, __U, (__m128i) _mm_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m128h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_cvtpbf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m128h) _mm_castsi128_ph ((__m128i) _mm_slli_epi16 ( + (__m128i) _mm_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtpbf8_ph (__m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 ( + (__m256i) _mm256_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_cvtpbf8_ph (__m256h __S, __mmask8 __U, __m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_mask_slli_epi16 ( + (__m256i) __S, __U, (__m256i) _mm256_cvtepi8_epi16 (__A), 8)); +} + +extern __inline __m256h +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskz_cvtpbf8_ph (__mmask8 __U, __m128i __A) +{ + return (__m256h) _mm256_castsi256_ph ((__m256i) _mm256_slli_epi16 ( + (__m256i) _mm256_maskz_cvtepi8_epi16 (__U, __A), 8)); +} + #ifdef __DISABLE_AVX10_2_256__ #undef __DISABLE_AVX10_2_256__ #pragma GCC pop_options diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c index bbbff186d0a..f67138c237c 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-512-convert-1.c @@ -45,13 +45,17 @@ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8s\[ \\t\]*%zmm\[0-9\]+,\[^\{\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %zmm\[0-9]\+, %zmm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %zmm\[0-9]\+, %zmm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%zmm\[0-9\](?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include -volatile __m256i x256i; +volatile __m256i x256i, z1; volatile __m512i x512i; volatile __m512 x, a1, b1; -volatile __m512h y, x512h; +volatile __m512h y, x512h, z; volatile __mmask16 m16; volatile __mmask32 m32; volatile __mmask64 m64; @@ -174,3 +178,11 @@ avx10_2_512_vcvtneph2hf8s_test (void) x256i = _mm512_mask_cvtnesph_phf8 (x256i, m32, x512h); x256i = _mm512_maskz_cvtnesph_phf8 (m32, x512h); } + +void extern +avx10_2_512_cvtbf8_fp16_test (void) +{ + y = _mm512_cvtpbf8_ph (z1); + y = _mm512_mask_cvtpbf8_ph (z, m16, z1); + y = _mm512_maskz_cvtpbf8_ph (m16, z1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c index 015474f8cf3..9c3e85718f2 100644 --- a/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c +++ b/gcc/testsuite/gcc.target/i386/avx10_2-convert-1.c @@ -87,14 +87,22 @@ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ /* { dg-final { scan-assembler-times "vcvtneph2hf8sy\[ \\t\]*%ymm\[0-9\]+,\[^\{\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\](?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %ymm\[0-9]\+, %ymm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %ymm\[0-9]\+, %ymm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %xmm\[0-9]\+, %xmm\[0-9]\+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vpsllw\[ \t]\+\\\$8, %xmm\[0-9]\+, %xmm\[0-9]\+\{%k\[1-7\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpmovsxbw\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}(?:\n|\[ \\t\]+#)" 1 } } */ #include volatile __m128 x1,a1,b1; volatile __m256 x2,a2,b2; -volatile __m128h y,x128h; -volatile __m256h y2,x256h; -volatile __m128i x128i; +volatile __m128h y,x128h,z; +volatile __m256h y2,x256h,z2; +volatile __m128i x128i,z3; volatile __m256i x256i; volatile __mmask8 m8; volatile __mmask16 m16; @@ -272,3 +280,15 @@ avx10_2_vcvtneph2hf8s_test (void) x128i = _mm256_mask_cvtnesph_phf8 (x128i, m16, x256h); x128i = _mm256_maskz_cvtnesph_phf8 (m16, x256h); } + +void extern +avx10_2_cvtbf8_fp16_test (void) +{ + y = _mm_cvtpbf8_ph (z3); + y = _mm_mask_cvtpbf8_ph (z, m8, z3); + y = _mm_maskz_cvtpbf8_ph (m8, z3); + + y2 = _mm256_cvtpbf8_ph (z3); + y2 = _mm256_mask_cvtpbf8_ph (z2, m8, z3); + y2 = _mm256_maskz_cvtpbf8_ph (m8, z3); +}