From patchwork Mon Sep 2 08:32:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Levy Hsu X-Patchwork-Id: 1979570 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=levyhsu.com header.i=@levyhsu.com header.a=rsa-sha256 header.s=default header.b=UrLn5E2E; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wy28x1pWqz1yXY for ; Mon, 2 Sep 2024 18:33:53 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 69896385EC33 for ; Mon, 2 Sep 2024 08:33:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from out28-45.mail.aliyun.com (out28-45.mail.aliyun.com [115.124.28.45]) by sourceware.org (Postfix) with ESMTPS id 59407385DDD3 for ; Mon, 2 Sep 2024 08:33:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 59407385DDD3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=levyhsu.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=levyhsu.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 59407385DDD3 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=115.124.28.45 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725266012; cv=none; b=iUPDqxTirzgZJuX3R704vqygksS1kXkzHtu5Cu/9JVelkGXCM5xYdfWfzea6wrqxPZqghGirZW3DI9praGxAui2U24D5WVEiQ4rG5RSStNtZGYgTmFNIyGL3w8jx6MSV5+Vzy0cmnqTZtJYeY8ExwZwlfoJKBP7rfpBzxXwB0Jk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725266012; c=relaxed/simple; bh=Q1RbJPpFTAiXnYHyH+iUOJCFFV2PNjlspx0cDXlCAiE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BYDaqkF/iHPfq/I5Y9LhegXYWNt4nujx44zK6nAPKdTtDAi7+Eb+1cXpPh1iZZHSyuoud7nQweLisOuNKQhT0uDuelwwVAmNzWPThow+CD177/UqabHN/DxkXyBPVpzTdIYQJ+Njj+KuMB4oEhekfbvtljvFPLh44zhXry9chXI= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=levyhsu.com; s=default; t=1725266006; h=From:To:Subject:Date:Message-ID:MIME-Version; bh=rSXJfzLB2N4lJlVi1S08/QEsLHq8DizkNjSZXoakJGw=; b=UrLn5E2E1iLQlMXVD/q5r6JpwXXYGlsBhomRK/DbZzCiRaajogZCRrP5mT7CcsnUIrPHpxjcmKLAKRISlg2NzWO01tZVdDTOEvFUR3GOggeCwEZ+d5st6ubF6mRjbZAQkbqZ5qzbZWtjC5hT38Lfp9FHsb0TTZ6brLU+6iBg8hvGv8AUc0ieDsOUk9DAzwadGZPlFadw/yzVr2gel9rVt/b0l4vC2Z+iSuDUMMLgPMpAqX3aXat3oWg9XBu/83OLwPv0MV3g1lhcIVKJBwWOu9IJo6INcbEzpDn0S5GJbeEtQL1M+dnsCUizkR2MbKduBZGJm1BTM2Kf0aaNNNCjHA== Received: from ip-10-0-136-122.us-west-2.compute.internal(mailfrom:admin@levyhsu.com fp:SMTPD_---.Z8W041P_1725266000) by smtp.aliyun-inc.com; Mon, 02 Sep 2024 16:33:23 +0800 From: Levy Hsu To: gcc-patches@gcc.gnu.org Cc: admin@levyhsu.com, liwei.xu@intel.com, crazylht@gmail.com, ubizjak@gmail.com Subject: [PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt Date: Mon, 2 Sep 2024 08:32:32 +0000 Message-ID: <20240902083300.1861771-1-admin@levyhsu.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_PERMERROR, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. These operations include addition, subtraction, multiplication, division, and square root calculations for V2BF and V4BF data types. gcc/ChangeLog: * config/i386/mmx.md (VBF_32_64): New mode iterator for partial vectorized V2BF/V4BF. (3): New define_expand for plusminusmultdiv. (sqrt2): New define_expand for sqrt. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: New test. * gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: New test. --- gcc/config/i386/mmx.md | 37 ++++++++++++ .../avx10_2-partial-bf-vector-fast-math-1.c | 22 +++++++ .../avx10_2-partial-bf-vector-operations-1.c | 57 +++++++++++++++++++ 3 files changed, 116 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index e0065ed4d48..9116ddb5321 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -94,6 +94,8 @@ (define_mode_iterator VHF_32_64 [V2HF (V4HF "TARGET_MMX_WITH_SSE")]) +(define_mode_iterator VBF_32_64 [V2BF (V4BF "TARGET_MMX_WITH_SSE")]) + ;; Mapping from integer vector mode to mnemonic suffix (define_mode_attr mmxvecsize [(V8QI "b") (V4QI "b") (V2QI "b") @@ -2036,6 +2038,26 @@ DONE; }) +;; VDIVNEPBF16 does not generate floating point exceptions. +(define_expand "3" + [(set (match_operand:VBF_32_64 0 "register_operand") + (plusminusmultdiv:VBF_32_64 + (match_operand:VBF_32_64 1 "nonimmediate_operand") + (match_operand:VBF_32_64 2 "nonimmediate_operand")))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, + force_reg (mode, operands[1]), mode); + rtx op2 = lowpart_subreg (V8BFmode, + force_reg (mode, operands[2]), mode); + + emit_insn (gen_v8bf3 (op0, op1, op2)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + (define_expand "divv2hf3" [(set (match_operand:V2HF 0 "register_operand") (div:V2HF @@ -2091,6 +2113,21 @@ DONE; }) +(define_expand "sqrt2" + [(set (match_operand:VBF_32_64 0 "register_operand") + (sqrt:VBF_32_64 (match_operand:VBF_32_64 1 "vector_operand")))] + "TARGET_AVX10_2_256" +{ + rtx op0 = gen_reg_rtx (V8BFmode); + rtx op1 = lowpart_subreg (V8BFmode, + force_reg (mode, operands[1]), mode); + + emit_insn (gen_sqrtv8bf2 (op0, op1)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8BFmode)); + DONE; +}) + (define_expand "2" [(set (match_operand:VHF_32_64 0 "register_operand") (absneg:VHF_32_64 diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c new file mode 100644 index 00000000000..fd064f17445 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vrcppbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ + +typedef __bf16 v4bf __attribute__ ((__vector_size__ (8))); +typedef __bf16 v2bf __attribute__ ((__vector_size__ (4))); + + +__attribute__((optimize("fast-math"))) +v4bf +foo_div_fast_math_4 (v4bf a, v4bf b) +{ + return a / b; +} + +__attribute__((optimize("fast-math"))) +v2bf +foo_div_fast_math_2 (v2bf a, v2bf b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c new file mode 100644 index 00000000000..e7ee08a20a9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c @@ -0,0 +1,57 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-mavx10.2 -O2" } */ +/* { dg-final { scan-assembler-times "vmulnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vaddnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vdivnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ +/* { dg-final { scan-assembler-times "vsubnepbf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 2 } } */ + +typedef __bf16 v4bf __attribute__ ((__vector_size__ (8))); +typedef __bf16 v2bf __attribute__ ((__vector_size__ (4))); + +v4bf +foo_mul_4 (v4bf a, v4bf b) +{ + return a * b; +} + +v4bf +foo_add_4 (v4bf a, v4bf b) +{ + return a + b; +} + +v4bf +foo_div_4 (v4bf a, v4bf b) +{ + return a / b; +} + +v4bf +foo_sub_4 (v4bf a, v4bf b) +{ + return a - b; +} + +v2bf +foo_mul_2 (v2bf a, v2bf b) +{ + return a * b; +} + +v2bf +foo_add_2 (v2bf a, v2bf b) +{ + return a + b; +} + +v2bf +foo_div_2 (v2bf a, v2bf b) +{ + return a / b; +} + +v2bf +foo_sub_2 (v2bf a, v2bf b) +{ + return a - b; +}