From patchwork Wed Sep 4 03:23:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Levy Hsu X-Patchwork-Id: 1980428 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=levyhsu.com header.i=@levyhsu.com header.a=rsa-sha256 header.s=default header.b=iNwdXg/1; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wz7Md55vjz1yZ9 for ; Wed, 4 Sep 2024 13:31:56 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 207DE385F032 for ; Wed, 4 Sep 2024 03:31:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from out28-42.mail.aliyun.com (out28-42.mail.aliyun.com [115.124.28.42]) by sourceware.org (Postfix) with ESMTPS id 931CE385C6D1 for ; Wed, 4 Sep 2024 03:31:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 931CE385C6D1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=levyhsu.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=levyhsu.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 931CE385C6D1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=115.124.28.42 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725420697; cv=none; b=Gg4XBkh3ohJoHsewGsW4xJTDkK7GBhg2cFeriKNMDAfiXeqfrf4LbyTig4dkmEzxrucpIbdzcjEIZbqXrWTpi9lnAaIauoobiPoavGaXXmuVRINv7S8FylxNfXCZp80MbrCaS0BaUYfLtTTskbmYNb8TADqXxGsdUv6h2AX41Vo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725420697; c=relaxed/simple; bh=UknR0K1Hu9aTnQ+dKHqwXM4VofIFhq2Q/rndjfMwZ7g=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=eEXVyx5GyeiMk6pcfObG0poxY80K/XQ5HqQZLFFQSa4wNk92TU7ODTMxulNAiAnDWHQ13/6IshbJxzXMCjtEKIgE2Q+0zj7Qv4FJHFby9Bxvy2GR3x/3883sIJ8Zv/5YRSwhQwKr0x67j5049oGsh1fH8DmW+dFvb/qd9O2+p6U= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=levyhsu.com; s=default; t=1725420690; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=/9DAwWdpS+SHFvX7OLOUc9rVCBMS6L3hFK/q9F2Q3iI=; b=iNwdXg/1TuFnMcx3zB3jYaW1//vRiw56unIuoA/DLvI7VazLk+I4az7yQNqWovIqJKYTvIYmjTg9F/1FhJvLzJQxpMFbH0kPIYWUlx5DNyas1GBq0L1WmRyejfJ46LuAitzjfw8HQvSMxGkLmlpimF74SMi5lpKEzmnTc+TBeDA7epeVFO38OfLdkVjAc+LwS+EvNcAF4h6M0TgvnLoRF9U94U9E+nWyNwsQqsn8Ea+8X1Z8Hb+/INzC1eZPHuyHP6NfyyMV+UJGrBTXZanndxRnZ1Xkp41uTEoD72hBB0qVnEHnHBUK1Q4ctit6qeDzFE5v34KX2T9era7IRxH9wQ== Received: from ip-10-0-136-122.us-west-2.compute.internal(mailfrom:admin@levyhsu.com fp:SMTPD_---.ZAF2QzY_1725420685) by smtp.aliyun-inc.com; Wed, 04 Sep 2024 11:31:28 +0800 From: Levy Hsu To: gcc-patches@gcc.gnu.org Cc: admin@levyhsu.com, liwei.xu@intel.com, crazylht@gmail.com, ubizjak@gmail.com Subject: [PATCH] i386: Support partial vectorized FMA for V2BF/V4BF Date: Wed, 4 Sep 2024 03:23:48 +0000 Message-ID: <20240904033121.1895231-1-admin@levyhsu.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_PERMERROR, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi Bootstrapped and tested on x86-64-pc-linux-gnu. Ok for trunk? This patch introduces support for vectorized FMA operations for bf16 types in V2BF and V4BF modes on the i386 architecture. New mode iterators and define_expand entries for fma, fnma, fms, and fnms operations are added in mmx.md, enhancing the i386 backend to handle these complex arithmetic operations. gcc/ChangeLog: * config/i386/mmx.md (TARGET_MMX_WITH_SSE): New mode iterator VBF_32_64 (fma4): define_expand for V2BF/V4BF fma4. (fnma4): define_expand for V2BF/V4BF fnma4. (fms4): define_expand for V2BF/V4BF fms4. (fnms4): define_expand for V2BF/V4BF fnms4. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: New test. --- gcc/config/i386/mmx.md | 84 ++++++++++++++++++- .../i386/avx10_2-partial-bf-vector-fma-1.c | 57 +++++++++++++ 2 files changed, 139 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 10fcd2beda6..22aeb43f436 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2636,6 +2636,88 @@ DONE; }) +(define_mode_iterator VBF_32_64 [V2BF (V4BF "TARGET_MMX_WITH_SSE")]) + +(define_expand "fma4" + [(set (match_operand:VBF_32_64 0 "register_operand") + (fma:VBF_32_64 + (match_operand:VBF_32_64 1 "nonimmediate_operand") + (match_operand:VBF_32_64 2 "nonimmediate_operand") + (match_operand:VBF_32_64 3 "nonimmediate_operand")))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, force_reg (mode, operands[1]), mode); + rtx op2 = lowpart_subreg (V8BFmode, force_reg (mode, operands[2]), mode); + rtx op3 = lowpart_subreg (V8BFmode, force_reg (mode, operands[3]), mode); + + emit_insn (gen_fmav8bf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + +(define_expand "fms4" + [(set (match_operand:VBF_32_64 0 "register_operand") + (fma:VBF_32_64 + (match_operand:VBF_32_64 1 "nonimmediate_operand") + (match_operand:VBF_32_64 2 "nonimmediate_operand") + (neg:VBF_32_64 + (match_operand:VBF_32_64 3 "nonimmediate_operand"))))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, force_reg (mode, operands[1]), mode); + rtx op2 = lowpart_subreg (V8BFmode, force_reg (mode, operands[2]), mode); + rtx op3 = lowpart_subreg (V8BFmode, force_reg (mode, operands[3]), mode); + + emit_insn (gen_fmsv8bf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + +(define_expand "fnma4" + [(set (match_operand:VBF_32_64 0 "register_operand") + (fma:VBF_32_64 + (neg:VBF_32_64 + (match_operand:VBF_32_64 1 "nonimmediate_operand")) + (match_operand:VBF_32_64 2 "nonimmediate_operand") + (match_operand:VBF_32_64 3 "nonimmediate_operand")))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, force_reg (mode, operands[1]), mode); + rtx op2 = lowpart_subreg (V8BFmode, force_reg (mode, operands[2]), mode); + rtx op3 = lowpart_subreg (V8BFmode, force_reg (mode, operands[3]), mode); + + emit_insn (gen_fnmav8bf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + +(define_expand "fnms4" + [(set (match_operand:VBF_32_64 0 "register_operand") + (fma:VBF_32_64 + (neg:VBF_32_64 + (match_operand:VBF_32_64 1 "nonimmediate_operand")) + (match_operand:VBF_32_64 2 "nonimmediate_operand") + (neg:VBF_32_64 + (match_operand:VBF_32_64 3 "nonimmediate_operand"))))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, force_reg (mode, operands[1]), mode); + rtx op2 = lowpart_subreg (V8BFmode, force_reg (mode, operands[2]), mode); + rtx op3 = lowpart_subreg (V8BFmode, force_reg (mode, operands[3]), mode); + + emit_insn (gen_fnmsv8bf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; ;; Parallel half-precision floating point complex type operations @@ -6670,8 +6752,6 @@ (set_attr "modrm" "0") (set_attr "memory" "none")]) -(define_mode_iterator VBF_32_64 [V2BF (V4BF "TARGET_MMX_WITH_SSE")]) - ;; VDIVNEPBF16 does not generate floating point exceptions. (define_expand "3" [(set (match_operand:VBF_32_64 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c new file mode 100644 index 00000000000..72e17e99603 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c @@ -0,0 +1,57 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ + +typedef __bf16 v4bf __attribute__ ((__vector_size__ (8))); +typedef __bf16 v2bf __attribute__ ((__vector_size__ (4))); + +v4bf +foo_madd_64 (v4bf a, v4bf b, v4bf c) +{ + return a * b + c; +} + +v4bf +foo_msub_64 (v4bf a, v4bf b, v4bf c) +{ + return a * b - c; +} + +v4bf +foo_nmadd_64 (v4bf a, v4bf b, v4bf c) +{ + return -a * b + c; +} + +v4bf +foo_nmsub_64 (v4bf a, v4bf b, v4bf c) +{ + return -a * b - c; +} + +v2bf +foo_madd_32 (v2bf a, v2bf b, v2bf c) +{ + return a * b + c; +} + +v2bf +foo_msub_32 (v2bf a, v2bf b, v2bf c) +{ + return a * b - c; +} + +v2bf +foo_nmadd_32 (v2bf a, v2bf b, v2bf c) +{ + return -a * b + c; +} + +v2bf +foo_nmsub_32 (v2bf a, v2bf b, v2bf c) +{ + return -a * b - c; +}