From patchwork Thu Nov 7 05:58:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 2007845 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=W+GLVNMT; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XkWcD3swqz1xxq for ; Thu, 7 Nov 2024 16:59:24 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BE98A3858C2B for ; Thu, 7 Nov 2024 05:59:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) by sourceware.org (Postfix) with ESMTPS id 4083D385841F for ; Thu, 7 Nov 2024 05:58:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4083D385841F Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4083D385841F Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730959106; cv=none; b=YFeR31CR9MGMIniBujUq9pu2nE2RR4x+cj0a7srH8U1yT+MSCHnRXnK3K1wt2A1zN68qIpanahquQB7K1xjDs7xZNITNsn0W8lFh18m2t5mX8nziM3zTd8u6BdNnkcgNKHne0xcRf5LsRZsvjfkb0GtclmMJigVGPtZAsve2WRA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730959106; c=relaxed/simple; bh=Qh7cFg40DzY4FpjkZZyT178lOOGFwVTUeRoFJPmFtVQ=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=nun5a5Zw7alsbT6S3zDYD8T2B0wquLofW+fI/creXKvefogwoUozGFyQIi/0/KMcY+uATOnrgnrfz4AIKTPiutQsmkLJuzJVJOmUOFFs4i09MDNoWu6/+xAbrfM5j5pjtzwLOwfvffdazTqg36Djwad/b4uvpW0q98hJvG1DbNU= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730959104; x=1762495104; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=Qh7cFg40DzY4FpjkZZyT178lOOGFwVTUeRoFJPmFtVQ=; b=W+GLVNMTOKdXzSzPAPcWQYTxBaBOuvj+Vg8Cp1bDjHKmxv12ea2L4mwG TXiyZ79vzRz28/Ce08u63UljkWoq71UY4XwmrvWQZBeo5S4/IjvEEL1ip G1430M44bJesAZ9q4xrUc7WWpIBwyZEzDMu0GzTOHZNv3saitveKA7+ib Xs5mdEdKp5bVM95Iezq2vZpC96WJUGbWeZQ1rf4Zk2idZjosNVYqQi/p5 AvgSW3mukbvy0FFFefJhqw3KXksA8TYjtioP6dvQrVtCW4tdQRJWN0D+t DX/A1VsPXb9wN7IZGZzfublNqMW71qPtQFHypHA1fQCaK7G3EL1LYiCIf g==; X-CSE-ConnectionGUID: Avcr/2g2Q5+rV+3qx5k0/w== X-CSE-MsgGUID: bNdsZOKCSNSMJOc2Dx0rOA== X-IronPort-AV: E=McAfee;i="6700,10204,11248"; a="42165151" X-IronPort-AV: E=Sophos;i="6.11,265,1725346800"; d="scan'208";a="42165151" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Nov 2024 21:58:22 -0800 X-CSE-ConnectionGUID: DE8u2iafThSx8Swv90eXaQ== X-CSE-MsgGUID: t6jDcnBpTL62yOlVZDoYWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,265,1725346800"; d="scan'208";a="89543461" Received: from shliclel4217.sh.intel.com ([10.239.240.127]) by fmviesa004.fm.intel.com with ESMTP; 06 Nov 2024 21:58:21 -0800 From: Hongyu Wang To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH] i386: Support cstorebf4 with native bf16 comi Date: Thu, 7 Nov 2024 13:58:20 +0800 Message-Id: <20241107055820.684921-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi, We recently supports cbranchbf4 with AVX10_2 native bf16 comi instructions, so do similar to cstorebf4. Bootstrapped & regtested on x86_64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (cstorebf4): Use vcomsbf16 under TARGET_AVX10_2_256 and -fno-trapping-math. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-comibf-3.c: New test. * gcc.target/i386/avx10_2-comibf-4.c: Likewise. --- gcc/config/i386/i386.md | 18 +++++--- .../gcc.target/i386/avx10_2-comibf-3.c | 27 ++++++++++++ .../gcc.target/i386/avx10_2-comibf-4.c | 41 +++++++++++++++++++ 3 files changed, 80 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-comibf-3.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-comibf-4.c diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index c492fe55881..b5ba75ef8e7 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1860,12 +1860,18 @@ (define_expand "cstorebf4" (const_int 0)]))] "TARGET_80387 || (SSE_FLOAT_MODE_P (SFmode) && TARGET_SSE_MATH)" { - rtx op1 = ix86_expand_fast_convert_bf_to_sf (operands[2]); - rtx op2 = ix86_expand_fast_convert_bf_to_sf (operands[3]); - rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]), - op1, op2, SFmode, 0, 1); - if (!rtx_equal_p (res, operands[0])) - emit_move_insn (operands[0], res); + if (TARGET_AVX10_2_256 && !flag_trapping_math) + ix86_expand_setcc (operands[0], GET_CODE (operands[1]), + operands[2], operands[3]); + else + { + rtx op1 = ix86_expand_fast_convert_bf_to_sf (operands[2]); + rtx op2 = ix86_expand_fast_convert_bf_to_sf (operands[3]); + rtx res = emit_store_flag_force (operands[0], GET_CODE (operands[1]), + op1, op2, SFmode, 0, 1); + if (!rtx_equal_p (res, operands[0])) + emit_move_insn (operands[0], res); + } DONE; }) diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-comibf-3.c b/gcc/testsuite/gcc.target/i386/avx10_2-comibf-3.c new file mode 100644 index 00000000000..afa41a3f071 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-comibf-3.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-march=x86-64-v3 -O2" } */ + +/* { dg-final { scan-assembler-times "vcomsbf16\[ \\t\]+\[^{}\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 6 } } */ +/* { dg-final { scan-assembler-times "set\[aeglnb\]+" 6 } } */ + +#define AVX10_ATTR \ +__attribute__((noinline, __target__("avx10.2"), optimize("no-trapping-math"))) + +AVX10_ATTR +int foo1_avx10 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return a == b && c < d; +} + +AVX10_ATTR +int foo2_avx10 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return a > b || c != d; +} + +AVX10_ATTR +int foo3_avx10 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return (a >= b) * (c <= d); +} + diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-comibf-4.c b/gcc/testsuite/gcc.target/i386/avx10_2-comibf-4.c new file mode 100644 index 00000000000..18848ddb5e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-comibf-4.c @@ -0,0 +1,41 @@ +/* { dg-do run { target { avx10_2 } } } */ +/* { dg-options "-march=x86-64-v3 -O2" } */ + +#include "avx10_2-comibf-3.c" + +__attribute__((noinline)) +int foo1 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return a == b && c < d; +} + +__attribute__((noinline)) +int foo2 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return a > b || c != d; +} + +__attribute__((noinline)) +int foo3 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) +{ + return (a >= b) * (c <= d); +} + + +int main (void) +{ + if (!__builtin_cpu_supports ("avx10.2")) + return 0; + + __bf16 a = 0.5bf16, b = -0.25bf16, c = 1.75bf16, d = -0.125bf16; + + if (foo1_avx10 (a, b, c, d) != foo1 (a, b, c, d)) + __builtin_abort (); + + if (foo2_avx10 (b, c, d, a) != foo2 (b, c, d, a)) + __builtin_abort (); + + if (foo3_avx10 (c, d, a, b) != foo3 (c, d, a, b)) + __builtin_abort (); +} +