From patchwork Wed Oct 30 02:19:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 2003969 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=FM/fV53C; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XdW8R2Jz9z1xwK for ; Wed, 30 Oct 2024 13:21:26 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2FD81385840E for ; Wed, 30 Oct 2024 02:21:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.12]) by sourceware.org (Postfix) with ESMTPS id AB2E63858D28 for ; Wed, 30 Oct 2024 02:19:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB2E63858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AB2E63858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.12 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254795; cv=none; b=wjhGDzVMh1Q57n+T3gbU13HdYm+TKGH03z1EYnAfZmIzazqQ2VQS0mmp4rp967COe5TuiWLnfnSwAo+dpd02FvfjNvvihO2NiPcSZJdXPwIFIPaGbNkFhjccVNdaSZALm0B96dYyqSnqPkNdT+A11E0bK9bndKOgMbN/yEkvfNA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730254795; c=relaxed/simple; bh=CyNydJlXSLhLESbRcuN0n53Z+voHxYVki2F7fGc2eBk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=B6u8kEG/TtMAH7h+Ry2bUM/hPnBbHVZ0EUp1G9KS7YAHQegKP9H6qpGsV5wGWaCrJjn2JRQU3UsnyXUln8rbuwtCNqtxKDTGD7x83IeaHvw+2wcPl21JFjGrKXnC4XcSjSp6g7HL+e1+k/6sWK78/WpSjm0OKFJ1qLoRcKbHOaM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1730254781; x=1761790781; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=CyNydJlXSLhLESbRcuN0n53Z+voHxYVki2F7fGc2eBk=; b=FM/fV53Ci6NTBmuASYNGw2KKd2ZuUoPU2uxm9tQLL6HSwmVT2TlEyoPy TuV2D36wLFZcak73jGoycuiSyuTbI55R1zzQuoapBQEg0GbyOEczpNL1t 8U/XbIIhk53efWXzYoVS6w6zfq9gE8Y1MRPnVDpJFFuJLHto7zGgH3rwW BSDtEHkHBqKFtvsCl4aKr9QP5HU8PMb4Y+dC7T8k/ofUzSjQGd9TwVHKR CunUofKV2x/eJn7nvA2Z6+2zSG22mqJlrv040PRHmtOYHj6Aelkfxeg41 8xL2N84d04s3o0Rc9t3VmKz57RS79YvKHexeH5YadhXd2HDOqv4gPn4Zo g==; X-CSE-ConnectionGUID: O4ywquEgSDmccUFOtJ5JYg== X-CSE-MsgGUID: lzmGXk8KSKeXVaNjh5m8UA== X-IronPort-AV: E=McAfee;i="6700,10204,11240"; a="33861627" X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="33861627" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Oct 2024 19:19:40 -0700 X-CSE-ConnectionGUID: Syb0i/1tTUidzwjJP3S7wg== X-CSE-MsgGUID: epHBViMFT2KvSQAcZkKWPg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.11,243,1725346800"; d="scan'208";a="82326430" Received: from scymds04.sc.intel.com ([10.82.73.238]) by orviesa006.jf.intel.com with ESMTP; 29 Oct 2024 19:19:41 -0700 Received: from jfel-spr-6155.jf.intel.com (jfel-spr-6155.jf.intel.com [10.7.183.102]) by scymds04.sc.intel.com (Postfix) with ESMTP id E71F82003ADB; Tue, 29 Oct 2024 19:19:39 -0700 (PDT) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com Subject: [PATCH 1/2] [x86] Support vector float_truncate for SF to BF. Date: Tue, 29 Oct 2024 19:19:38 -0700 Message-Id: <20241030021939.3041893-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Generate native instruction whenever possible, otherwise use vector permutation with odd indices. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vector_sf2bf_with_vec_perm): New function. * config/i386/i386-protos.h (ix86_expand_vector_sf2bf_with_vec_perm): New declare. * config/i386/mmx.md (truncv2sfv2bf2): New expander. * config/i386/sse.md (truncv4sfv4bf2): Ditto. (truncv8sfv8bf2): Ditto. (truncv16sfv16bf2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-truncsfbf.c: New test. * gcc.target/i386/avx512bw-truncsfbf.c: New test. * gcc.target/i386/ssse3-truncsfbf.c: New test. --- gcc/config/i386/i386-expand.cc | 38 +++++++++++++++ gcc/config/i386/i386-protos.h | 1 + gcc/config/i386/mmx.md | 18 ++++++++ gcc/config/i386/sse.md | 44 ++++++++++++++++++ .../gcc.target/i386/avx512bf16-truncsfbf.c | 5 ++ .../gcc.target/i386/avx512bw-truncsfbf.c | 46 +++++++++++++++++++ .../gcc.target/i386/ssse3-truncsfbf.c | 20 ++++++++ 7 files changed, 172 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c create mode 100644 gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 63f5e348d64..7138432659e 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -26817,4 +26817,42 @@ ix86_expand_trunc_with_avx2_noavx512f (rtx output, rtx input, machine_mode cvt_m emit_move_insn (output, gen_lowpart (out_mode, d.target)); } +/* Implement truncv8sfv8bf2 with vector permutation. */ +void +ix86_expand_vector_sf2bf_with_vec_perm (rtx dest, rtx src) +{ + machine_mode vperm_mode, src_mode = GET_MODE (src); + switch (src_mode) + { + case V16SFmode: + vperm_mode = V32BFmode; + break; + case V8SFmode: + vperm_mode = V16BFmode; + break; + case V4SFmode: + vperm_mode = V8BFmode; + break; + default: + gcc_unreachable (); + } + + int nelt = GET_MODE_NUNITS (vperm_mode); + vec_perm_builder sel (nelt, nelt, 1); + sel.quick_grow (nelt); + for (int i = 0; i != nelt; i++) + sel[i] = (2 * i + 1) % nelt; + vec_perm_indices indices (sel, 1, nelt); + + rtx target = gen_reg_rtx (vperm_mode); + rtx op0 = lowpart_subreg (vperm_mode, + force_reg (src_mode, src), + src_mode); + bool ok = targetm.vectorize.vec_perm_const (vperm_mode, vperm_mode, + target, op0, op0, indices); + gcc_assert (ok); + emit_move_insn (dest, lowpart_subreg (GET_MODE (dest), target, vperm_mode)); +} + + #include "gt-i386-expand.h" diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index c1f9147769c..55ffdb9dcf1 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -258,6 +258,7 @@ extern int ix86_ternlog_idx (rtx op, rtx *args); extern bool ix86_ternlog_operand_p (rtx op); extern rtx ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, rtx target); +extern void ix86_expand_vector_sf2bf_with_vec_perm (rtx, rtx); #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 506f4cab6a8..5c776ec0aba 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2994,6 +2994,24 @@ (define_expand "truncv2sfv2hf2" DONE; }) +(define_expand "truncv2sfv2bf2" + [(set (match_operand:V2BF 0 "register_operand") + (float_truncate:V2BF + (match_operand:V2SF 1 "nonimmediate_operand")))] + "TARGET_SSSE3 && TARGET_MMX_WITH_SSE" +{ + rtx op1 = gen_reg_rtx (V4SFmode); + rtx op0 = gen_reg_rtx (V4BFmode); + + emit_move_insn (op1, lowpart_subreg (V4SFmode, + force_reg (V2SFmode, operands[1]), + V2SFmode)); + emit_insn (gen_truncv4sfv4bf2 (op0, op1)); + + emit_move_insn (operands[0], lowpart_subreg (V2BFmode, op0, V4BFmode)); + DONE; +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel integral arithmetic diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6c28b74ac3f..7f7910383ae 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -30952,6 +30952,24 @@ (define_insn "avx512f_cvtne2ps2bf16_" "TARGET_AVX512BF16" "vcvtne2ps2bf16\t{%2, %1, %0|%0, %1, %2}") +(define_expand "truncv4sfv4bf2" + [(set (match_operand:V4BF 0 "register_operand") + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand")))] + "TARGET_SSSE3" +{ + if (!TARGET_AVXNECONVERT + && !(TARGET_AVX512BF16 && TARGET_AVX512VL)) + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + else + { + rtx dest = gen_reg_rtx (V8BFmode); + emit_insn (gen_vcvtneps2bf16_v4sf (dest, operands[1])); + emit_move_insn (operands[0], lowpart_subreg (V4BFmode, dest, V8BFmode)); + } + DONE; +}) + (define_expand "vcvtneps2bf16_v4sf" [(set (match_operand:V8BF 0 "register_operand") (vec_concat:V8BF @@ -31027,6 +31045,20 @@ (define_expand "avx512f_cvtneps2bf16__maskz" DONE; }) +(define_expand "truncv8sfv8bf2" + [(set (match_operand:V8BF 0 "register_operand") + (float_truncate:V8BF + (match_operand:V8SF 1 "nonimmediate_operand")))] + "TARGET_AVX2" +{ + if (!TARGET_AVXNECONVERT + && !(TARGET_AVX512BF16 && TARGET_AVX512VL)) + { + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + DONE; + } +}) + (define_insn "vcvtneps2bf16_v8sf" [(set (match_operand:V8BF 0 "register_operand" "=x,v") (float_truncate:V8BF @@ -31039,6 +31071,18 @@ (define_insn "vcvtneps2bf16_v8sf" (set_attr "addr" "gpr16,*") (set_attr "prefix" "vex,evex")]) +(define_expand "truncv16sfv16bf2" + [(set (match_operand:V16BF 0 "register_operand") + (float_truncate:V16BF + (match_operand:V16SF 1 "nonimmediate_operand")))] + "TARGET_AVX512BW && TARGET_EVEX512" +{ + if (!TARGET_AVX512BF16) + { + ix86_expand_vector_sf2bf_with_vec_perm (operands[0], operands[1]); + DONE; + } +}) (define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c b/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c new file mode 100644 index 00000000000..da31bdba21b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-truncsfbf.c @@ -0,0 +1,5 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bf16 -O2" } */ +/* { dg-final { scan-assembler-times {(?n)vcvtneps2bf16} 6 } } */ + +#include "avx512bw-truncsfbf.c" diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c b/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c new file mode 100644 index 00000000000..071db21cfb3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-truncsfbf.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -mno-avx512bf16 -mno-avxneconvert -O2" } */ +/* { dg-final { scan-assembler-times {(?n)(?:vpermw|vpshufb)} 6 } } */ + +typedef float v4sf __attribute__((vector_size(16))); +typedef float v8sf __attribute__((vector_size(32))); +typedef float v16sf __attribute__((vector_size(64))); +typedef __bf16 v4bf __attribute__((vector_size(8))); +typedef __bf16 v8bf __attribute__((vector_size(16))); +typedef __bf16 v16bf __attribute__((vector_size(32))); + +v4bf +foo (v4sf b, v4sf a) +{ + return __builtin_convertvector (a, v4bf); +} + +v8bf +foo2 (v8sf b, v8sf a) +{ + return __builtin_convertvector (a, v8bf); +} + +v16bf +foo3 (v16sf b, v16sf a) +{ + return __builtin_convertvector (a, v16bf); +} + +v4bf +foo_mem (v4sf* a) +{ + return __builtin_convertvector (*a, v4bf); +} + +v8bf +foo2_mem (v8sf* a) +{ + return __builtin_convertvector (*a, v8bf); +} + +v16bf +foo3_mem (v16sf* a) +{ + return __builtin_convertvector (*a, v16bf); +} diff --git a/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c b/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c new file mode 100644 index 00000000000..70840c537f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/ssse3-truncsfbf.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-mssse3 -mno-avx512bf16 -mno-avxneconvert -O2" } */ +/* { dg-final { scan-assembler-times {(?n)pshufb} 2 { target { ! ia32 } } } } */ + +typedef float v2sf __attribute__((vector_size(8))); +typedef __bf16 v2bf __attribute__((vector_size(4))); + +v2bf +foo (v2sf b, v2sf a) +{ + return __builtin_convertvector (a, v2bf); +} + + +v2bf +foo_mem (v2sf* a) +{ + return __builtin_convertvector (*a, v2bf); +} +