From patchwork Wed Oct 30 02:19:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 2003969 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=FM/fV53C; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XdW8R2Jz9z1xwK for ; Wed, 30 Oct 2024 13:21:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2FD81385840E for ; Wed, 30 Oct 2024 02:21:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by sourceware.org (Postfix) with ESMTPS id AB2E63858D28 for ; Wed, 30 Oct 2024 02:19:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB2E63858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AB2E63858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254795; cv=none; b=wjhGDzVMh1Q57n+T3gbU13HdYm+TKGH03z1EYnAfZmIzazqQ2VQS0mmp4rp967COe5TuiWLnfnSwAo+dpd02FvfjNvvihO2NiPcSZJdXPwIFIPaGbNkFhjccVNdaSZALm0B96dYyqSnqPkNdT+A11E0bK9bndKOgMbN/yEkvfNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254795; c=relaxed/simple; bh=CyNydJlXSLhLESbRcuN0n53Z+voHxYVki2F7fGc2eBk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=B6u8kEG/TtMAH7h+Ry2bUM/hPnBbHVZ0EUp1G9KS7YAHQegKP9H6qpGsV5wGWaCrJjn2JRQU3UsnyXUln8rbuwtCNqtxKDTGD7x83IeaHvw+2wcPl21JFjGrKXnC4XcSjSp6g7HL+e1+k/6sWK78/WpSjm0OKFJ1qLoRcKbHOaM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730254781; x=1761790781; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=CyNydJlXSLhLESbRcuN0n53Z+voHxYVki2F7fGc2eBk=; b=FM/fV53Ci6NTBmuASYNGw2KKd2ZuUoPU2uxm9tQLL6HSwmVT2TlEyoPy TuV2D36wLFZcak73jGoycuiSyuTbI55R1zzQuoapBQEg0GbyOEczpNL1t 8U/XbIIhk53efWXzYoVS6w6zfq9gE8Y1MRPnVDpJFFuJLHto7zGgH3rwW BSDtEHkHBqKFtvsCl4aKr9QP5HU8PMb4Y+dC7T8k/ofUzSjQGd9TwVHKR CunUofKV2x/eJn7nvA2Z6+2zSG22mqJlrv040PRHmtOYHj6Aelkfxeg41 8xL2N84d04s3o0Rc9t3VmKz57RS79YvKHexeH5YadhXd2HDOqv4gPn4Zo g==; X-CSE-ConnectionGUID: O4ywquEgSDmccUFOtJ5JYg== X-CSE-MsgGUID: lzmGXk8KSKeXVaNjh5m8UA== X-IronPort-AV: E=McAfee;i="6700,10204,11240"; a="33861627" X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="33861627" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2024 19:19:40 -0700 X-CSE-ConnectionGUID: Syb0i/1tTUidzwjJP3S7wg== X-CSE-MsgGUID: epHBViMFT2KvSQAcZkKWPg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="82326430" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa006.jf.intel.com with ESMTP; 29 Oct 2024 19:19:41 -0700 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id E71F82003ADB; Tue, 29 Oct 2024 19:19:39 -0700 (PDT) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF. Date: Tue, 29 Oct 2024 19:19:38 -0700 Message-Id: <20241030021939.3041893-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Generate native instruction whenever possible, otherwise use vector permutation with odd indices. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vector_sf2bf_with_vec_perm): New function. * config/i386/i386-protos.h (ix86_expand_vector_sf2bf_with_vec_perm): New declare. * config/i386/mmx.md (truncv2sfv2bf2): New expander. * config/i386/sse.md (truncv4sfv4bf2): Ditto. (truncv8sfv8bf2): Ditto. (truncv16sfv16bf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-truncsfbf.c: New test. * gcc.target/i386/avx512bw-truncsfbf.c: New test. * gcc.target/i386/ssse3-truncsfbf.c: New test. --- gcc/config/i386/i386-expand.cc | 38 +++++++++++++++ gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/mmx.md | 18 ++++++++ gcc/config/i386/sse.md | 44 ++++++++++++++++++ .../gcc.target/i386/avx512bf16-truncsfbf.c | 5 ++ .../gcc.target/i386/avx512bw-truncsfbf.c | 46 +++++++++++++++++++ .../gcc.target/i386/ssse3-truncsfbf.c | 20 ++++++++ 7 files changed, 172 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c create mode 100644 gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 63f5e348d64..7138432659e 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -26817,4 +26817,42 @@ ix86_expand_trunc_with_avx2_noavx512f (rtx output, rtx input, machine_mode cvt_m emit_move_insn (output, gen_lowpart (out_mode, d.target)); } +/* Implement truncv8sfv8bf2 with vector permutation. */ +void +ix86_expand_vector_sf2bf_with_vec_perm (rtx dest, rtx src) +{ + machine_mode vperm_mode, src_mode = GET_MODE (src); + switch (src_mode) + { + case V16SFmode: + vperm_mode = V32BFmode; + break; + case V8SFmode: + vperm_mode = V16BFmode; + break; + case V4SFmode: + vperm_mode = V8BFmode; + break; + default: + gcc_unreachable (); + } + + int nelt = GET_MODE_NUNITS (vperm_mode); + vec_perm_builder sel (nelt, nelt, 1); + sel.quick_grow (nelt); + for (int i = 0; i != nelt; i++) + sel[i] = (2 * i + 1) % nelt; + vec_perm_indices indices (sel, 1, nelt); + + rtx target = gen_reg_rtx (vperm_mode); + rtx op0 = lowpart_subreg (vperm_mode, + force_reg (src_mode, src), + src_mode); + bool ok = targetm.vectorize.vec_perm_const (vperm_mode, vperm_mode, + target, op0, op0, indices); + gcc_assert (ok); + emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), target, vperm_mode)); +} + + #include "gt-i386-expand.h" diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index c1f9147769c..55ffdb9dcf1 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -258,6 +258,7 @@ extern int ix86_ternlog_idx (rtx op, rtx *args); extern bool ix86_ternlog_operand_p (rtx op); extern rtx ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, rtx target); +extern void ix86_expand_vector_sf2bf_with_vec_perm (rtx, rtx); #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 506f4cab6a8..5c776ec0aba 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2994,6 +2994,24 @@ (define_expand "truncv2sfv2hf2" DONE; }) +(define_expand "truncv2sfv2bf2" + [(set (match_operand:V2BF 0 "register_operand") + (float_truncate:V2BF + (match_operand:V2SF 1 "nonimmediate_operand")))] + "TARGET_SSSE3 && TARGET_MMX_WITH_SSE" +{ + rtx op1 = gen_reg_rtx (V4SFmode); + rtx op0 = gen_reg_rtx (V4BFmode); + + emit_move_insn (op1, lowpart_subreg (V4SFmode, + force_reg (V2SFmode, operands[1]), + V2SFmode)); + emit_insn (gen_truncv4sfv4bf2 (op0, op1)); + + emit_move_insn (operands[0], lowpart_subreg (V2BFmode, op0, V4BFmode)); + DONE; +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral arithmetic diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6c28b74ac3f..7f7910383ae 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -30952,6 +30952,24 @@ (define_insn "avx512f_cvtne2ps2bf16_" "TARGET_AVX512BF16" "vcvtne2ps2bf16\t{%2, %1, %0|%0, %1, %2}") +(define_expand "truncv4sfv4bf2" + [(set (match_operand:V4BF 0 "register_operand") + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand")))] + "TARGET_SSSE3" +{ + if (!TARGET_AVXNECONVERT + && !(TARGET_AVX512BF16 && TARGET_AVX512VL)) + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + else + { + rtx dest = gen_reg_rtx (V8BFmode); + emit_insn (gen_vcvtneps2bf16_v4sf (dest, operands[1])); + emit_move_insn (operands[0], lowpart_subreg (V4BFmode, dest, V8BFmode)); + } + DONE; +}) + (define_expand "vcvtneps2bf16_v4sf" [(set (match_operand:V8BF 0 "register_operand") (vec_concat:V8BF @@ -31027,6 +31045,20 @@ (define_expand "avx512f_cvtneps2bf16__maskz" DONE; }) +(define_expand "truncv8sfv8bf2" + [(set (match_operand:V8BF 0 "register_operand") + (float_truncate:V8BF + (match_operand:V8SF 1 "nonimmediate_operand")))] + "TARGET_AVX2" +{ + if (!TARGET_AVXNECONVERT + && !(TARGET_AVX512BF16 && TARGET_AVX512VL)) + { + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + DONE; + } +}) + (define_insn "vcvtneps2bf16_v8sf" [(set (match_operand:V8BF 0 "register_operand" "=x,v") (float_truncate:V8BF @@ -31039,6 +31071,18 @@ (define_insn "vcvtneps2bf16_v8sf" (set_attr "addr" "gpr16,*") (set_attr "prefix" "vex,evex")]) +(define_expand "truncv16sfv16bf2" + [(set (match_operand:V16BF 0 "register_operand") + (float_truncate:V16BF + (match_operand:V16SF 1 "nonimmediate_operand")))] + "TARGET_AVX512BW && TARGET_EVEX512" +{ + if (!TARGET_AVX512BF16) + { + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + DONE; + } +}) (define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c b/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c new file mode 100644 index 00000000000..da31bdba21b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bf16 -O2" } */ +/* { dg-final { scan-assembler-times {(?n)vcvtneps2bf16} 6 } } */ + +#include "avx512bw-truncsfbf.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c b/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c new file mode 100644 index 00000000000..071db21cfb3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -mno-avx512bf16 -mno-avxneconvert -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:vpermw|vpshufb)} 6 } } */ + +typedef float v4sf __attribute__((vector_size(16))); +typedef float v8sf __attribute__((vector_size(32))); +typedef float v16sf __attribute__((vector_size(64))); +typedef __bf16 v4bf __attribute__((vector_size(8))); +typedef __bf16 v8bf __attribute__((vector_size(16))); +typedef __bf16 v16bf __attribute__((vector_size(32))); + +v4bf +foo (v4sf b, v4sf a) +{ + return __builtin_convertvector (a, v4bf); +} + +v8bf +foo2 (v8sf b, v8sf a) +{ + return __builtin_convertvector (a, v8bf); +} + +v16bf +foo3 (v16sf b, v16sf a) +{ + return __builtin_convertvector (a, v16bf); +} + +v4bf +foo_mem (v4sf* a) +{ + return __builtin_convertvector (*a, v4bf); +} + +v8bf +foo2_mem (v8sf* a) +{ + return __builtin_convertvector (*a, v8bf); +} + +v16bf +foo3_mem (v16sf* a) +{ + return __builtin_convertvector (*a, v16bf); +} diff --git a/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c b/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c new file mode 100644 index 00000000000..70840c537f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-mssse3 -mno-avx512bf16 -mno-avxneconvert -O2" } */ +/* { dg-final { scan-assembler-times {(?n)pshufb} 2 { target { ! ia32 } } } } */ + +typedef float v2sf __attribute__((vector_size(8))); +typedef __bf16 v2bf __attribute__((vector_size(4))); + +v2bf +foo (v2sf b, v2sf a) +{ + return __builtin_convertvector (a, v2bf); +} + + +v2bf +foo_mem (v2sf* a) +{ + return __builtin_convertvector (*a, v2bf); +} + From patchwork Wed Oct 30 02:19:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 2003970 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Ymj+aAeA; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XdW8h1z8zz1xwK for ; Wed, 30 Oct 2024 13:21:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0002E3858406 for ; Wed, 30 Oct 2024 02:21:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by sourceware.org (Postfix) with ESMTPS id A5B063858C98 for ; Wed, 30 Oct 2024 02:19:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A5B063858C98 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A5B063858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254800; cv=none; b=HTPcqqBFLK/BB69yiwXNvVSLhG9MGCi4cbNU0vsnpLFbX1cNh8Qa/dWfk6RhVAUErEJGdKBgZba/RS54cduf/ITmG7xl7YWAFEfe9TXFm7HalYK/kY5iFX/ZgnPWisCyPaPtI8TW3bYCrelg0zGSoCO/u+qhglF3ueKWyVq6LAk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254800; c=relaxed/simple; bh=jay7ZYoCyispFny3tITmIk+PGdZHZ7WApOGXAAuzHF8=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=hcNBV4S6n1zgp4jtwIrsBtb806qBp2jkJ1Xo5fqs07dgEcGZAn1+xzTlLo2ErgwyAyJNCUP9daqj/UpiVMCbUAZXFYhsbPKq/bgwTPU9WL0TDBtNFfUFlV+GrNbXPexmfiGb4KdNx8e9RBlum/Lf/alQhOBVqm+Z1RB4nus/BfI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730254795; x=1761790795; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jay7ZYoCyispFny3tITmIk+PGdZHZ7WApOGXAAuzHF8=; b=Ymj+aAeA25Im3Nnn9deOv4ssiHtTPqQUyf/qfse7o/UZND6YpZasqW54 V99p9Cd/Ujn525pd91vlx1kRIyD784JUZwvw1lERbORjhHfGm900R/8EM tj7eum75Y8kIEQuKzbcxlRSUEM+iAkva0MyWvS4WXIw5N4pdrkftsqUDU PxN3ESuckExAoSMvvcFkGzfGOd94Ns2ZH/EBHSyeWhsEbVDNhBZSFUQs6 ugKGeMvNpqmQ+RhMLCFWq9zezB4kjwfMRtKZK6XBB6IfLSRhXaFF2P7WO G718la5lrmu/AwL3vIAtMSMRZhKO0D41l7YzB6f5ArfBqlKruN731CfEO w==; X-CSE-ConnectionGUID: myx40XQ2SZ+XNsLXQhveTQ== X-CSE-MsgGUID: wRZ7h1bSRVK+c3yD3yUBzA== X-IronPort-AV: E=McAfee;i="6700,10204,11240"; a="33861629" X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="33861629" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2024 19:19:40 -0700 X-CSE-ConnectionGUID: FWgaAn8rTaOP+qsHmo/GeQ== X-CSE-MsgGUID: rGHkJVlPTIScrCw3jf1Uag== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="82326432" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa006.jf.intel.com with ESMTP; 29 Oct 2024 19:19:41 -0700 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id 1008A2003ADC; Tue, 29 Oct 2024 19:19:40 -0700 (PDT) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 2/2] Support vector float_extend from __bf16 to float. Date: Tue, 29 Oct 2024 19:19:39 -0700 Message-Id: <20241030021939.3041893-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241030021939.3041893-1-hongtao.liu@intel.com> References: <20241030021939.3041893-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org It's supported by vector permutation with zero vector. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vector_bf2sf_with_vec_perm): New function. * config/i386/i386-protos.h (ix86_expand_vector_bf2sf_with_vec_perm): New Declare. * config/i386/mmx.md (extendv2bfv2sf2): New expander. * config/i386/sse.md (extend2): Ditto. (VF1_AVX512BW): New mode iterator. (sf_cvt_bf16): Add V4SF. (sf_cvt_bf16_lower): New mode attr. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-extendbf2sf.c: New test. * gcc.target/i386/sse2-extendbf2sf.c: New test. --- gcc/config/i386/i386-expand.cc | 39 ++++++++++++++++ gcc/config/i386/i386-protos.h | 2 + gcc/config/i386/mmx.md | 18 ++++++++ gcc/config/i386/sse.md | 20 +++++++- .../gcc.target/i386/avx512bw-extendbf2sf.c | 46 +++++++++++++++++++ .../gcc.target/i386/sse2-extendbf2sf.c | 20 ++++++++ 6 files changed, 144 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-extendbf2sf.c create mode 100644 gcc/testsuite/gcc.target/i386/sse2-extendbf2sf.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 7138432659e..df9676b80d4 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -26854,5 +26854,44 @@ ix86_expand_vector_sf2bf_with_vec_perm (rtx dest, rtx src) emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), target, vperm_mode)); } +/* Implement extendv8bf2v8sf2 with vector permutation. */ +void +ix86_expand_vector_bf2sf_with_vec_perm (rtx dest, rtx src) +{ + machine_mode vperm_mode, src_mode = GET_MODE (src); + switch (src_mode) + { + case V16BFmode: + vperm_mode = V32BFmode; + break; + case V8BFmode: + vperm_mode = V16BFmode; + break; + case V4BFmode: + vperm_mode = V8BFmode; + break; + default: + gcc_unreachable (); + } + + int nelt = GET_MODE_NUNITS (vperm_mode); + vec_perm_builder sel (nelt, nelt, 1); + sel.quick_grow (nelt); + for (int i = 0, k = 0, j = nelt; i != nelt; i++) + sel[i] = i & 1 ? j++ : k++; + + vec_perm_indices indices (sel, 2, nelt); + + rtx target = gen_reg_rtx (vperm_mode); + rtx op1 = lowpart_subreg (vperm_mode, + force_reg (src_mode, src), + src_mode); + rtx op0 = CONST0_RTX (vperm_mode); + bool ok = targetm.vectorize.vec_perm_const (vperm_mode, vperm_mode, + target, op0, op1, indices); + gcc_assert (ok); + emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), target, vperm_mode)); +} + #include "gt-i386-expand.h" diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 55ffdb9dcf1..c26ae5e4f1d 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -259,6 +259,8 @@ extern bool ix86_ternlog_operand_p (rtx op); extern rtx ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, rtx target); extern void ix86_expand_vector_sf2bf_with_vec_perm (rtx, rtx); +extern void ix86_expand_vector_bf2sf_with_vec_perm (rtx, rtx); + #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 5c776ec0aba..021ac90ae2a 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -3012,6 +3012,24 @@ (define_expand "truncv2sfv2bf2" DONE; }) +(define_expand "extendv2bfv2sf2" + [(set (match_operand:V2SF 0 "register_operand") + (float_extend:V2SF + (match_operand:V2BF 1 "nonimmediate_operand")))] + "TARGET_SSE2 && TARGET_MMX_WITH_SSE" +{ + rtx op0 = gen_reg_rtx (V4SFmode); + rtx op1 = gen_reg_rtx (V4BFmode); + + emit_move_insn (op1, lowpart_subreg (V4BFmode, + force_reg (V2BFmode, operands[1]), + V2BFmode)); + emit_insn (gen_extendv4bfv4sf2 (op0, op1)); + + emit_move_insn (operands[0], lowpart_subreg (V2SFmode, op0, V4SFmode)); + DONE; +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral arithmetic diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 7f7910383ae..3d57a90fad7 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -530,6 +530,9 @@ (define_mode_iterator VF2_AVX512VL (define_mode_iterator VF1_AVX512VL [(V16SF "TARGET_EVEX512") (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL")]) +(define_mode_iterator VF1_AVX512BW + [(V16SF "TARGET_EVEX512 && TARGET_EVEX512") (V8SF "TARGET_AVX2") V4SF]) + (define_mode_iterator VF1_AVX10_2 [(V16SF "TARGET_AVX10_2_512") V8SF V4SF]) @@ -30925,7 +30928,11 @@ (define_mode_attr bf16_cvt_2sf [(V32BF "V16SF") (V16BF "V8SF") (V8BF "V4SF")]) ;; Converting from SF to BF (define_mode_attr sf_cvt_bf16 - [(V8SF "V8BF") (V16SF "V16BF")]) + [(V4SF "V4BF") (V8SF "V8BF") (V16SF "V16BF")]) + +(define_mode_attr sf_cvt_bf16_lower + [(V4SF "v4bf") (V8SF "v8bf") (V16SF "v16bf")]) + ;; Mapping from BF to SF (define_mode_attr sf_bf16 [(V4SF "V8BF") (V8SF "V16BF") (V16SF "V32BF")]) @@ -31084,6 +31091,17 @@ (define_expand "truncv16sfv16bf2" } }) +(define_expand "extend2" + [(set (match_operand:VF1_AVX512BW 0 "register_operand") + (float_extend:VF1_AVX512BW + (match_operand: 1 "nonimmediate_operand")))] + "TARGET_SSE2" +{ + ix86_expand_vector_bf2sf_with_vec_perm (operands[0], operands[1]); + DONE; +}) + + (define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") (float_truncate: diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-extendbf2sf.c b/gcc/testsuite/gcc.target/i386/avx512bw-extendbf2sf.c new file mode 100644 index 00000000000..5b59958151f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-extendbf2sf.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:vpermi2w|vpunpcklwd)} 6 } } */ + +typedef float v4sf __attribute__((vector_size(16))); +typedef float v8sf __attribute__((vector_size(32))); +typedef float v16sf __attribute__((vector_size(64))); +typedef __bf16 v4bf __attribute__((vector_size(8))); +typedef __bf16 v8bf __attribute__((vector_size(16))); +typedef __bf16 v16bf __attribute__((vector_size(32))); + +v4sf +foo (v4bf b, v4bf a) +{ + return __builtin_convertvector (a, v4sf); +} + +v8sf +foo2 (v8bf b, v8bf a) +{ + return __builtin_convertvector (a, v8sf); +} + +v16sf +foo3 (v16bf b, v16bf a) +{ + return __builtin_convertvector (a, v16sf); +} + +v4sf +foo_mem (v4bf* a) +{ + return __builtin_convertvector (*a, v4sf); +} + +v8sf +foo2_mem (v8bf* a) +{ + return __builtin_convertvector (*a, v8sf); +} + +v16sf +foo3_mem (v16bf* a) +{ + return __builtin_convertvector (*a, v16sf); +} diff --git a/gcc/testsuite/gcc.target/i386/sse2-extendbf2sf.c b/gcc/testsuite/gcc.target/i386/sse2-extendbf2sf.c new file mode 100644 index 00000000000..0f007df68f6 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-extendbf2sf.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-msse2 -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:vpermi2w|punpcklwd)} 2 { target { ! ia32 } } } } */ + +typedef float v2sf __attribute__((vector_size(8))); +typedef __bf16 v2bf __attribute__((vector_size(4))); + +v2sf +foo (v2bf b, v2bf a) +{ + return __builtin_convertvector (a, v2sf); +} + + +v2sf +foo_mem (v2bf* a) +{ + return __builtin_convertvector (*a, v2sf); +} +