From patchwork Thu Sep 28 14:15:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 1840889 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=Y3dNM6+z; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RxFrb0Yc1z1ynX for ; Fri, 29 Sep 2023 00:15:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DE45F386483A for ; Thu, 28 Sep 2023 14:15:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id D91E53858C52 for ; Thu, 28 Sep 2023 14:15:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D91E53858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695910542; x=1727446542; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PV0fU5IHJhTWLKcPPoWgUoyOKSGrCvfg99rfKAAohCA=; b=Y3dNM6+z0Qhcw0FUjRpwQz4w7Wq1F2RcvIQdmNu1GkQLNTnavrJUESGP TIEsfUBNu9wCNRLTM90OMJzQ7UnftPW1VOe+wsuH7mUmFtiB+FKRt80IX H+o24wXIVk84I9jXPB9bTa5PuMVzcBYGVqeGjJisL7q7ALIhsxta3QRRd 24LicFRsb9ZSN1LyDSIoddFx6VFCC/dIvb7L2gSvVcH8JPs6x84HiyjWQ D3kxpv2vhuzlew/tyc2VFa/Jaw+28K7rrfVjtk5MAgOA5x7cYnRW+Rhqw UgRFN/PdBUWmm9BxkA3F0RdaqQM2TLnHNV7LlwP67Lb71SQxcYrNdLM6l Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="367134212" X-IronPort-AV: E=Sophos;i="6.03,184,1694761200"; d="scan'208";a="367134212" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 07:15:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="923235322" X-IronPort-AV: E=Sophos;i="6.03,184,1694761200"; d="scan'208";a="923235322" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga005.jf.intel.com with ESMTP; 28 Sep 2023 07:15:38 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 11ABD1005132; Thu, 28 Sep 2023 22:15:38 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com, kito.cheng@gmail.com Subject: [PATCH v2] RISC-V: Support {U}INT64 to FP16 auto-vectorization Date: Thu, 28 Sep 2023 22:15:37 +0800 Message-Id: <20230928141537.3570884-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230928055913.1782465-1-pan2.li@intel.com> References: <20230928055913.1782465-1-pan2.li@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org From: Pan Li Update in v2: * Add math trap check. * Adjust some test cases. Original logs: This patch would like to support the auto-vectorization from the INT64 to FP16. We take below steps for the conversion. * INT64 to FP32. * FP32 to FP16. Given sample code as below: void test_func (int64_t * __restrict a, _Float16 *b, unsigned n) { for (unsigned i = 0; i < n; i++) b[i] = (_Float16) (a[i]); } Before this patch: test.c:6:26: missed: couldn't vectorize loop test.c:6:26: missed: not vectorized: unsupported data-type ld a0,0(s0) call __floatdihf fsh fa0,0(s1) addi s0,s0,8 addi s1,s1,2 bne s2,s0,.L3 ld ra,24(sp) ld s0,16(sp) ld s1,8(sp) ld s2,0(sp) addi sp,sp,32 After this patch: vsetvli a5,a2,e8,mf8,ta,ma vle64.v v1,0(a0) vsetvli a4,zero,e32,mf2,ta,ma vfncvt.f.x.w v1,v1 vsetvli zero,zero,e16,mf4,ta,ma vfncvt.f.f.w v1,v1 vsetvli zero,a2,e16,mf4,ta,ma vse16.v v1,0(a1) Please note VLS mode is also involved in this patch and covered by the test cases. PR target/111506 gcc/ChangeLog: * config/riscv/autovec.md (2): New pattern. * config/riscv/vector-iterators.md: New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/cvt-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/cvt-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cvt-0.c: New test. Signed-off-by: Pan Li Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 24 ++++++++++ gcc/config/riscv/vector-iterators.md | 38 +++++++++++++++ .../gcc.target/riscv/rvv/autovec/unop/cvt-0.c | 21 +++++++++ .../gcc.target/riscv/rvv/autovec/unop/cvt-1.c | 22 +++++++++ .../gcc.target/riscv/rvv/autovec/vls/cvt-0.c | 47 +++++++++++++++++++ 5 files changed, 152 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index cd0cbdd2889..d6cf376ebca 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -974,6 +974,30 @@ (define_insn_and_split "2" } [(set_attr "type" "vfncvtitof")]) +;; This operation can be performed in the loop vectorizer but unfortunately +;; not applicable for now. We can remove this pattern after loop vectorizer +;; is able to take care of INT64 to FP16 conversion. +(define_insn_and_split "2" + [(set (match_operand: 0 "register_operand") + (any_float: + (match_operand:VWWCONVERTI 1 "register_operand")))] + "TARGET_VECTOR && TARGET_ZVFH && can_create_pseudo_p () && !flag_trapping_math" + "#" + "&& 1" + [(const_int 0)] + { + rtx single = gen_reg_rtx (mode); /* Get vector SF mode. */ + + /* Step-1, INT64 => FP32. */ + emit_insn (gen_2 (single, operands[1])); + /* Step-2, FP32 => FP16. */ + emit_insn (gen_trunc2 (operands[0], single)); + + DONE; + } + [(set_attr "type" "vfncvtitof")] +) + ;; ========================================================================= ;; == Unary arithmetic ;; ========================================================================= diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index b6cd872eb42..c9a7344b1bc 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -1247,6 +1247,24 @@ (define_mode_iterator VWCONVERTI [ (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 4096") ]) +(define_mode_iterator VWWCONVERTI [ + (RVVM8DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM4DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM2DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (RVVM1DI "TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + + (V1DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V2DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V4DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH") + (V8DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 64") + (V16DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 128") + (V32DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 256") + (V64DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 512") + (V128DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 1024") + (V256DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 2048") + (V512DI "TARGET_VECTOR_VLS && TARGET_VECTOR_ELEN_64 && TARGET_ZVFH && TARGET_MIN_VLEN >= 4096") +]) + (define_mode_iterator VQEXTI [ RVVM8SI RVVM4SI RVVM2SI RVVM1SI (RVVMF2SI "TARGET_MIN_VLEN > 32") @@ -3243,6 +3261,26 @@ (define_mode_attr vnconvert [ (V512DF "v512si") ]) +;; NN indicates narrow twice +(define_mode_attr VNNCONVERT [ + (RVVM8DI "RVVM2HF") (RVVM4DI "RVVM1HF") (RVVM2DI "RVVMF2HF") + (RVVM1DI "RVVMF4HF") + + (V1DI "V1HF") (V2DI "V2HF") (V4DI "V4HF") (V8DI "V8HF") (V16DI "V16HF") + (V32DI "V32HF") (V64DI "V64HF") (V128DI "V128HF") (V256DI "V256HF") + (V512DI "V512HF") +]) + +;; nn indicates narrow twice +(define_mode_attr vnnconvert [ + (RVVM8DI "rvvm2hf") (RVVM4DI "rvvm1hf") (RVVM2DI "rvvmf2hf") + (RVVM1DI "rvvmf4hf") + + (V1DI "v1hf") (V2DI "v2hf") (V4DI "v4hf") (V8DI "v8hf") (V16DI "v16hf") + (V32DI "v32hf") (V64DI "v64hf") (V128DI "v128hf") (V256DI "v256hf") + (V512DI "v512hf") +]) + (define_mode_attr VDEMOTE [ (RVVM8DI "RVVM8SI") (RVVM4DI "RVVM4SI") (RVVM2DI "RVVM2SI") (RVVM1DI "RVVM1SI") (V1DI "V1SI") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c new file mode 100644 index 00000000000..f08c1211723 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-0.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -ffast-math -fno-vect-cost-model -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +/* +** test_int65_to_fp16: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma +** vfncvt\.f\.x\.w\s+v[0-9]+,\s*v[0-9]+ +** vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma +** vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+ +** ... +*/ +void +test_int65_to_fp16 (int64_t * __restrict a, _Float16 *b, unsigned n) +{ + for (unsigned i = 0; i < n; i++) + b[i] = (_Float16) (a[i]); +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c new file mode 100644 index 00000000000..2d8ba8f45a5 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/cvt-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 -ftree-vectorize -ffast-math -fno-vect-cost-model -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +/* +** test_uint65_to_fp16: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*mf2,\s*ta,\s*ma +** vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+ +** vsetvli\s+zero,\s*zero,\s*e16,\s*mf4,\s*ta,\s*ma +** vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+ +** ... +*/ +void +test_uint65_to_fp16 (uint64_t * __restrict a, _Float16 *b, unsigned n) +{ + for (unsigned i = 0; i < n; i++) + b[i] = (_Float16) (a[i]); +} + diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c new file mode 100644 index 00000000000..5637b05ad6e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cvt-0.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -ffast-math --param=riscv-autovec-lmul=m8 -fdump-tree-optimized" } */ + +#include "def.h" + +DEF_CONVERT (fp16, int64_t, _Float16, 1) +DEF_CONVERT (fp16, int64_t, _Float16, 2) +DEF_CONVERT (fp16, int64_t, _Float16, 4) +DEF_CONVERT (fp16, int64_t, _Float16, 8) +DEF_CONVERT (fp16, int64_t, _Float16, 16) +DEF_CONVERT (fp16, int64_t, _Float16, 32) +DEF_CONVERT (fp16, int64_t, _Float16, 64) +DEF_CONVERT (fp16, int64_t, _Float16, 128) +DEF_CONVERT (fp16, int64_t, _Float16, 256) +DEF_CONVERT (fp16, int64_t, _Float16, 512) +DEF_CONVERT (fp16, int64_t, _Float16, 1024) +DEF_CONVERT (fp16, int64_t, _Float16, 2048) + +DEF_CONVERT (fp16, uint64_t, _Float16, 1) +DEF_CONVERT (fp16, uint64_t, _Float16, 2) +DEF_CONVERT (fp16, uint64_t, _Float16, 4) +DEF_CONVERT (fp16, uint64_t, _Float16, 8) +DEF_CONVERT (fp16, uint64_t, _Float16, 16) +DEF_CONVERT (fp16, uint64_t, _Float16, 32) +DEF_CONVERT (fp16, uint64_t, _Float16, 64) +DEF_CONVERT (fp16, uint64_t, _Float16, 128) +DEF_CONVERT (fp16, uint64_t, _Float16, 256) +DEF_CONVERT (fp16, uint64_t, _Float16, 512) +DEF_CONVERT (fp16, uint64_t, _Float16, 1024) +DEF_CONVERT (fp16, uint64_t, _Float16, 2048) + +/* { dg-final { scan-assembler-not {csrr} } } */ +/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */ +/* { dg-final { scan-assembler-times {vfncvt\.f\.x\.w\s+v[0-9]+,\s*v[0-9]+} 15 } } */ +/* { dg-final { scan-assembler-times {vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 15 } } */ +/* { dg-final { scan-assembler-times {vfncvt\.f\.f\.w\s+v[0-9]+,\s*v[0-9]+} 30 } } */