From patchwork Tue Nov 28 03:29:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1869091 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfSdF71nGz23mg for ; Tue, 28 Nov 2023 14:30:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DED8A3858C36 for ; Tue, 28 Nov 2023 03:30:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 09B6A3858C53 for ; Tue, 28 Nov 2023 03:29:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 09B6A3858C53 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 09B6A3858C53 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142193; cv=none; b=sfJJrj9Mh0qzY0RFJjmLut4tJDc79hSEa8+taGaKBAM7KJl05ufnycWkjUgUy3Dtbi0XHxQzwfRNwSnUAT+tgj0kLr0KQNMeafgaQhnZC2cSzmJ1O/6iNwlw5Ev4RBtL0iOWhp4oTrkFnO7iP3cwpDZHDTHAVeYZaAHOtnAqJD4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142193; c=relaxed/simple; bh=rqTr8qjIOQ3RxJelIb7DSvj6dFJhcSelo6LOo5bRLF4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=VPReK4aoVlWw/luOMML1+S+VN6Oh2TXpF91rYiEYWgExV6HPrc6XAqNcsx5okLPcu0eFTZ2vmXTE7SaLwa5fMAkRxee8iTVwkIO8orXM3Sp+aZnuEr5miZA61gaL7A2aJ5n8nDB50F3hw3qVlJkksS0zg6Vn2KujpSG39wTBF5U= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8BxNuiqXmVlBE49AA--.2648S3; Tue, 28 Nov 2023 11:29:46 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S5; Tue, 28 Nov 2023 11:29:45 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 1/5] LoongArch: Add support for approximate instructions. Date: Tue, 28 Nov 2023 11:29:34 +0800 Message-Id: <20231128032938.17202-2-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S5 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfGr17Ar1fXrWftr1fAr4kKrX_yoW8Gw15Zo Z3JFsrtF4xWFyrAa9xtr1fZrWUXayakFs7AFW5XFs5C3WfJ3s0kw17Wa1Yva42qFWkW3WD C3s3W3sxXFyfXFs5l-sFpf9Il3svdjkaLaAFLSUrUUUUbb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYU7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUXVWUAwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r1j6r1xM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r1j6r4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4j6r4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAF wI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x 0267AKxVWUJVW8JbIYCTnIWIevJa73UjIFyTuYvjxU7_MaUUUUU X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org LA664 introduces new instructions for reciprocal approximation and reciprocal square root approximation. It includes the scalar instructions frecipe and frsrte, as well as their corresponding vector instructions [x]vfrecipe and [x]vfrsqrte. This patch adds define_insn/builtins/intrinsics for these instructions. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecipe_): New insn pattern. (lasx_xvfrsqrte_): Ditto. * config/loongarch/lasxintrin.h (__lasx_xvfrecipe_s): New intrinsic. (__lasx_xvfrecipe_d): Ditto. (__lasx_xvfrsqrte_s): Ditto. (__lasx_xvfrsqrte_d): Ditto. * config/loongarch/loongarch-builtins.cc: Add new builtin functions. * config/loongarch/loongarch.md (recipe2): New insn pattern. (rsqrte): Ditto. * config/loongarch/lsx.md (lsx_vfrecipe_): Ditto. (lsx_vfrsqrte_): Ditto. * config/loongarch/lsxintrin.h (__lsx_vfrecipe_s): New intrinsic. (__lsx_vfrecipe_d): Ditto. (__lsx_vfsqrte_s): Ditto. (__lsx_vfsqrte_d): Ditto. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 2e11f061202..dd60d2bfed3 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -40,8 +40,10 @@ (define_c_enum "unspec" [ UNSPEC_LASX_XVFCVTL UNSPEC_LASX_XVFLOGB UNSPEC_LASX_XVFRECIP + UNSPEC_LASX_XVFRECIPE UNSPEC_LASX_XVFRINT UNSPEC_LASX_XVFRSQRT + UNSPEC_LASX_XVFRSQRTE UNSPEC_LASX_XVFCMP_SAF UNSPEC_LASX_XVFCMP_SEQ UNSPEC_LASX_XVFCMP_SLE @@ -1688,6 +1690,17 @@ (define_insn "lasx_xvfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lasx_xvfrecipe_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRECIPE))] + "ISA_HAS_LASX" + "xvfrecipe.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvfrint_" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] @@ -1706,6 +1719,17 @@ (define_insn "lasx_xvfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lasx_xvfrsqrte_" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRTE))] + "ISA_HAS_LASX" + "xvfrsqrte.\t%u0,%u1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lasx_xvftint_s__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLASX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lasxintrin.h b/gcc/config/loongarch/lasxintrin.h index 7bce2c757f1..3017361a924 100644 --- a/gcc/config/loongarch/lasxintrin.h +++ b/gcc/config/loongarch/lasxintrin.h @@ -2399,6 +2399,22 @@ __m256d __lasx_xvfrecip_d (__m256d _1) return (__m256d)__builtin_lasx_xvfrecip_d ((v4f64)_1); } +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrecipe_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrecipe_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrecipe_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrecipe_d ((v4f64)_1); +} + /* Assembly instruction format: xd, xj. */ /* Data types in instruction templates: V8SF, V8SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -2431,6 +2447,22 @@ __m256d __lasx_xvfrsqrt_d (__m256d _1) return (__m256d)__builtin_lasx_xvfrsqrt_d ((v4f64)_1); } +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V8SF, V8SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256 __lasx_xvfrsqrte_s (__m256 _1) +{ + return (__m256)__builtin_lasx_xvfrsqrte_s ((v8f32)_1); +} + +/* Assembly instruction format: xd, xj. */ +/* Data types in instruction templates: V4DF, V4DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m256d __lasx_xvfrsqrte_d (__m256d _1) +{ + return (__m256d)__builtin_lasx_xvfrsqrte_d ((v4f64)_1); +} + /* Assembly instruction format: xd, xj. */ /* Data types in instruction templates: V8SF, V8SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index db02aacdc3f..47f658d6ab5 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -1195,10 +1195,14 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { LSX_BUILTIN (vfsqrt_d, LARCH_V2DF_FTYPE_V2DF), LSX_BUILTIN (vfrecip_s, LARCH_V4SF_FTYPE_V4SF), LSX_BUILTIN (vfrecip_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrecipe_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrecipe_d, LARCH_V2DF_FTYPE_V2DF), LSX_BUILTIN (vfrint_s, LARCH_V4SF_FTYPE_V4SF), LSX_BUILTIN (vfrint_d, LARCH_V2DF_FTYPE_V2DF), LSX_BUILTIN (vfrsqrt_s, LARCH_V4SF_FTYPE_V4SF), LSX_BUILTIN (vfrsqrt_d, LARCH_V2DF_FTYPE_V2DF), + LSX_BUILTIN (vfrsqrte_s, LARCH_V4SF_FTYPE_V4SF), + LSX_BUILTIN (vfrsqrte_d, LARCH_V2DF_FTYPE_V2DF), LSX_BUILTIN (vflogb_s, LARCH_V4SF_FTYPE_V4SF), LSX_BUILTIN (vflogb_d, LARCH_V2DF_FTYPE_V2DF), LSX_BUILTIN (vfcvth_s_h, LARCH_V4SF_FTYPE_V8HI), @@ -1901,10 +1905,14 @@ static const struct loongarch_builtin_description loongarch_builtins[] = { LASX_BUILTIN (xvfsqrt_d, LARCH_V4DF_FTYPE_V4DF), LASX_BUILTIN (xvfrecip_s, LARCH_V8SF_FTYPE_V8SF), LASX_BUILTIN (xvfrecip_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrecipe_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrecipe_d, LARCH_V4DF_FTYPE_V4DF), LASX_BUILTIN (xvfrint_s, LARCH_V8SF_FTYPE_V8SF), LASX_BUILTIN (xvfrint_d, LARCH_V4DF_FTYPE_V4DF), LASX_BUILTIN (xvfrsqrt_s, LARCH_V8SF_FTYPE_V8SF), LASX_BUILTIN (xvfrsqrt_d, LARCH_V4DF_FTYPE_V4DF), + LASX_BUILTIN (xvfrsqrte_s, LARCH_V8SF_FTYPE_V8SF), + LASX_BUILTIN (xvfrsqrte_d, LARCH_V4DF_FTYPE_V4DF), LASX_BUILTIN (xvflogb_s, LARCH_V8SF_FTYPE_V8SF), LASX_BUILTIN (xvflogb_d, LARCH_V4DF_FTYPE_V4DF), LASX_BUILTIN (xvfcvth_s_h, LARCH_V8SF_FTYPE_V16HI), diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index cd4ed495697..7b09926d1a7 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -59,6 +59,12 @@ (define_c_enum "unspec" [ ;; Stack tie UNSPEC_TIE + ;; RSQRT + UNSPEC_RSQRTE + + ;; RECIP + UNSPEC_RECIPE + ;; CRC UNSPEC_CRC UNSPEC_CRCC @@ -220,6 +226,7 @@ (define_attr "qword_mode" "no,yes" ;; fmadd floating point multiply-add ;; fdiv floating point divide ;; frdiv floating point reciprocal divide +;; frecipe floating point approximate reciprocal ;; fabs floating point absolute value ;; flogb floating point exponent extract ;; fneg floating point negation @@ -229,6 +236,7 @@ (define_attr "qword_mode" "no,yes" ;; fscaleb floating point scale ;; fsqrt floating point square root ;; frsqrt floating point reciprocal square root +;; frsqrte floating point approximate reciprocal square root ;; multi multiword sequence (or user asm statements) ;; atomic atomic memory update instruction ;; syncloop memory atomic operation implemented as a sync loop @@ -238,8 +246,8 @@ (define_attr "type" "unknown,branch,jump,call,load,fpload,fpidxload,store,fpstore,fpidxstore, prefetch,prefetchx,condmove,mgtf,mftg,const,arith,logical, shift,slt,signext,clz,trap,imul,idiv,move, - fmove,fadd,fmul,fmadd,fdiv,frdiv,fabs,flogb,fneg,fcmp,fcopysign,fcvt, - fscaleb,fsqrt,frsqrt,accext,accmod,multi,atomic,syncloop,nop,ghost, + fmove,fadd,fmul,fmadd,fdiv,frdiv,frecipe,fabs,flogb,fneg,fcmp,fcopysign,fcvt, + fscaleb,fsqrt,frsqrt,frsqrte,accext,accmod,multi,atomic,syncloop,nop,ghost, simd_div,simd_fclass,simd_flog2,simd_fadd,simd_fcvt,simd_fmul,simd_fmadd, simd_fdiv,simd_bitins,simd_bitmov,simd_insert,simd_sld,simd_mul,simd_fcmp, simd_fexp2,simd_int_arith,simd_bit,simd_shift,simd_splat,simd_fill, @@ -911,6 +919,18 @@ (define_insn "*recip3" [(set_attr "type" "frdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "recipe2" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RECIPE))] + "TARGET_HARD_FLOAT" + "frecipe.\t%0,%1" + [(set_attr "type" "frecipe") + (set_attr "mode" "") + (set_attr "insn_count" "1")]) + ;; Integer division and modulus. (define_expand "3" [(set (match_operand:GPR 0 "register_operand") @@ -1136,6 +1156,17 @@ (define_insn "*rsqrtb" [(set_attr "type" "frsqrt") (set_attr "mode" "") (set_attr "insn_count" "1")]) + +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "rsqrte" + [(set (match_operand:ANYF 0 "register_operand" "=f") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRTE))] + "TARGET_HARD_FLOAT" + "frsqrte.\t%0,%1" + [(set_attr "type" "frsqrte") + (set_attr "mode" "")]) ;; ;; .................... diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 5e8d8d74b43..391e84f8d1d 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -42,8 +42,10 @@ (define_c_enum "unspec" [ UNSPEC_LSX_VFCVTL UNSPEC_LSX_VFLOGB UNSPEC_LSX_VFRECIP + UNSPEC_LSX_VFRECIPE UNSPEC_LSX_VFRINT UNSPEC_LSX_VFRSQRT + UNSPEC_LSX_VFRSQRTE UNSPEC_LSX_VFCMP_SAF UNSPEC_LSX_VFCMP_SEQ UNSPEC_LSX_VFCMP_SLE @@ -1616,6 +1618,17 @@ (define_insn "lsx_vfrecip_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Instructions. + +(define_insn "lsx_vfrecipe_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRECIPE))] + "ISA_HAS_LSX" + "vfrecipe.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vfrint_" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] @@ -1634,6 +1647,17 @@ (define_insn "lsx_vfrsqrt_" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) +;; Approximate Reciprocal Square Root Instructions. + +(define_insn "lsx_vfrsqrte_" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRTE))] + "ISA_HAS_LSX" + "vfrsqrte.\t%w0,%w1" + [(set_attr "type" "simd_fdiv") + (set_attr "mode" "")]) + (define_insn "lsx_vftint_s__" [(set (match_operand: 0 "register_operand" "=f") (unspec: [(match_operand:FLSX 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/lsxintrin.h b/gcc/config/loongarch/lsxintrin.h index 29553c093fa..e1e0df2971c 100644 --- a/gcc/config/loongarch/lsxintrin.h +++ b/gcc/config/loongarch/lsxintrin.h @@ -2480,6 +2480,22 @@ __m128d __lsx_vfrecip_d (__m128d _1) return (__m128d)__builtin_lsx_vfrecip_d ((v2f64)_1); } +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrecipe_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrecipe_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrecipe_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrecipe_d ((v2f64)_1); +} + /* Assembly instruction format: vd, vj. */ /* Data types in instruction templates: V4SF, V4SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -2512,6 +2528,22 @@ __m128d __lsx_vfrsqrt_d (__m128d _1) return (__m128d)__builtin_lsx_vfrsqrt_d ((v2f64)_1); } +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V4SF, V4SF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128 __lsx_vfrsqrte_s (__m128 _1) +{ + return (__m128)__builtin_lsx_vfrsqrte_s ((v4f32)_1); +} + +/* Assembly instruction format: vd, vj. */ +/* Data types in instruction templates: V2DF, V2DF. */ +extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__m128d __lsx_vfrsqrte_d (__m128d _1) +{ + return (__m128d)__builtin_lsx_vfrsqrte_d ((v2f64)_1); +} + /* Assembly instruction format: vd, vj. */ /* Data types in instruction templates: V4SF, V4SF. */ extern __inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) From patchwork Tue Nov 28 03:29:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1869090 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfSdF6WWRz1yRy for ; Tue, 28 Nov 2023 14:30:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AA0453857349 for ; Tue, 28 Nov 2023 03:30:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 8F4D83858C36 for ; Tue, 28 Nov 2023 03:29:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8F4D83858C36 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8F4D83858C36 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142193; cv=none; b=rjd5bMPpdZ7v+Ciq4VoMZqhDGGKvAJZm9vlSRFPowtwpCI+6UF1w9T/7ky3AhuBICZvkRN75d0Bwn8m8JnyyqiqW3MuCAvTj2x2s2/2+wHAoXEaNm6gqHB48pFaMMxtJ/GdEysM+8Q2h65MRqJvlVs5BuAZAAz5uUcA0pQt/mvs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142193; c=relaxed/simple; bh=q8tZBIoTxoP5juBfUkd3PEYsaw3VqC/hg+P3Ib+PkVM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=sR3HR1MLU9WTH0yX6kWf8OKPB4EtJbgP+vDZCKbkSJjWpCu6JsPuWAXeEUYCNSY0Zv5eAYEP8dbTcA182OE4yl2F7CKokY8jNtnoJCErPpGhZMAQGUZ+cOgpSRfhdqjY0py0Jh5w5AB3K56GKF7COT2Ktzy3lbfVujVHnYBzhhg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8BxIvCtXmVlCE49AA--.55486S3; Tue, 28 Nov 2023 11:29:49 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S6; Tue, 28 Nov 2023 11:29:47 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 2/5] LoongArch: Use standard pattern name for xvfrsqrt/vfrsqrt instructions. Date: Tue, 28 Nov 2023 11:29:35 +0800 Message-Id: <20231128032938.17202-3-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S6 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoWxZrW7Gr1xWryDGrWUtFyDXFc_yoW7Jw45pr ZrC3WvyrWrJr4Yg3Wktay5Xw1Yyr9rGF429FZ8ZrnFyF4qq3WkZr1FkFZaqF1qqw4rGr1I qa1rWayUZFWDC3gCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkjb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r126r1DMcIj6I8E87Iv67AK xVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I 0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j83kZUUUUU= X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Rename lasx_xvfrsqrt*/lsx_vfrsqrt* to rsqrt2 to align with standard pattern name. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrsqrt_): Renamed to .. (*rsqrt2): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrsqrt_d): Redefine to standard pattern name. (CODE_FOR_lsx_vfrsqrt_s): Ditto. (CODE_FOR_lasx_xvfrsqrt_d): Ditto. (CODE_FOR_lasx_xvfrsqrt_s): Ditto. * config/loongarch/loongarch.md (*rsqrta): Remove. (*rsqrt2): New insn pattern. (*rsqrtb): Remove. * config/loongarch/lsx.md (lsx_vfrsqrt_): Renamed to .. (*rsqrt2): .. this. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index dd60d2bfed3..5f78cc45ccd 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1710,10 +1710,10 @@ (define_insn "lasx_xvfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "lasx_xvfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRSQRT))] + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] "ISA_HAS_LASX" "xvfrsqrt.\t%u0,%u1" [(set_attr "type" "simd_fdiv") diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 47f658d6ab5..43d853bc961 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -473,6 +473,8 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) #define CODE_FOR_lsx_vssrlrn_bu_h CODE_FOR_lsx_vssrlrn_u_bu_h #define CODE_FOR_lsx_vssrlrn_hu_w CODE_FOR_lsx_vssrlrn_u_hu_w #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d +#define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 +#define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -743,6 +745,8 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) #define CODE_FOR_lasx_xvsat_hu CODE_FOR_lasx_xvsat_u_hu #define CODE_FOR_lasx_xvsat_wu CODE_FOR_lasx_xvsat_u_wu #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du +#define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 +#define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 7b09926d1a7..0b6910d84ab 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -60,6 +60,7 @@ (define_c_enum "unspec" [ UNSPEC_TIE ;; RSQRT + UNSPEC_RSQRT UNSPEC_RSQRTE ;; RECIP @@ -1137,25 +1138,14 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) -(define_insn "*rsqrta" +(define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") - (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (sqrt:ANYF (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" - [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) - -(define_insn "*rsqrtb" - [(set (match_operand:ANYF 0 "register_operand" "=f") - (sqrt:ANYF (div:ANYF (match_operand:ANYF 1 "const_1_operand" "") - (match_operand:ANYF 2 "register_operand" "f"))))] - "flag_unsafe_math_optimizations" - "frsqrt.\t%0,%2" + (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" + "frsqrt.\t%0,%1" [(set_attr "type" "frsqrt") - (set_attr "mode" "") - (set_attr "insn_count" "1")]) + (set_attr "mode" "")]) ;; Approximate Reciprocal Square Root Instructions. diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 391e84f8d1d..130d77e164b 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1638,10 +1638,10 @@ (define_insn "lsx_vfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "lsx_vfrsqrt_" +(define_insn "rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRSQRT))] + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] "ISA_HAS_LSX" "vfrsqrt.\t%w0,%w1" [(set_attr "type" "simd_fdiv") From patchwork Tue Nov 28 03:29:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1869094 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfSfM2Mq0z1yRy for ; Tue, 28 Nov 2023 14:31:07 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E2E63385842D for ; Tue, 28 Nov 2023 03:31:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id 771F03858004 for ; Tue, 28 Nov 2023 03:30:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 771F03858004 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 771F03858004 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142202; cv=none; b=OOgGItnYJ2sgLCPspa+FPyVzSyyKixcRDo6dN40D/XLkh1sHc7zVlNoaakeRToPiqs7Rux8yz8safPXv6TW9F0o57CNiPkwKwOTemC1rq1fg2LpNWsMjpUjPv4zmFkqzS3juVELThNWffVzSgsMT9NgwBEVHCA2DDw4WtZrPiSM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142202; c=relaxed/simple; bh=k76wJZsyG6hkKmtFXGktTiXTpXhI1ReW7eyniI4KQpE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=WKqDHPLEZ3pqcRVRvEHjzLpOHzYBW91o20j5KbA3MM7EjqLIGIo5LxhSwsZTPCd57Q/AAxtobnUoyip8HQhYHwblcclJgp+sTBl9smARp9OPEVWIlHz4YP9462U0VDY/Cpoy0kYn/Bp+7srLKu2T4G6x2YXoD454VwY7udAJTDU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7onN-000151-Hm for gcc-patches@gcc.gnu.org; Mon, 27 Nov 2023 22:30:00 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8DxBfGvXmVlC049AA--.56775S3; Tue, 28 Nov 2023 11:29:51 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S7; Tue, 28 Nov 2023 11:29:50 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 3/5] LoongArch: Redefine pattern for xvfrecip/vfrecip instructions. Date: Tue, 28 Nov 2023 11:29:36 +0800 Message-Id: <20231128032938.17202-4-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S7 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW3WF43Gr15Jr1UtFWUur1rGrX_yoW7Ww1Upr ZrC3ZrArWrJFsIgw1ktay5Xr15Kr9rKF429FW3Z39Iya1jqw1vvF1FkFZIqF12qw4rKr1I va1Sg3WYvFWDC3gCm3ZEXasCq-sJn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkjb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv67AK xVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_JFI_Gr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I 0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07j8CztUUUUU= Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Redefine pattern for [x]vfrecip instructions use rtx code instead of unspec, and enable [x]vfrecip instructions to be generated during auto-vectorization. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_xvfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/loongarch-builtins.cc (CODE_FOR_lsx_vfrecip_d): Redefine to new pattern name. (CODE_FOR_lsx_vfrecip_s): Ditto. (CODE_FOR_lasx_xvfrecip_d): Ditto. (CODE_FOR_lasx_xvfrecip_s): Ditto. (loongarch_expand_builtin_direct): For the vector recip instructions, construct a temporary parameter const1_vector. * config/loongarch/lsx.md (lsx_vfrecip_): Renamed to .. (recip3): .. this. * config/loongarch/predicates.md (const_vector_1_operand): New predicate. diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 5f78cc45ccd..2d7b8f02b4b 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1681,12 +1681,12 @@ (define_insn "lasx_xvfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lasx_xvfrecip_" +(define_insn "recip3" [(set (match_operand:FLASX 0 "register_operand" "=f") - (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] - UNSPEC_LASX_XVFRECIP))] + (div:FLASX (match_operand:FLASX 1 "const_vector_1_operand" "") + (match_operand:FLASX 2 "register_operand" "f")))] "ISA_HAS_LASX" - "xvfrecip.\t%u0,%u1" + "xvfrecip.\t%u0,%u2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/loongarch-builtins.cc b/gcc/config/loongarch/loongarch-builtins.cc index 43d853bc961..9d3644dbb9b 100644 --- a/gcc/config/loongarch/loongarch-builtins.cc +++ b/gcc/config/loongarch/loongarch-builtins.cc @@ -475,6 +475,8 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) #define CODE_FOR_lsx_vssrlrn_wu_d CODE_FOR_lsx_vssrlrn_u_wu_d #define CODE_FOR_lsx_vfrsqrt_d CODE_FOR_rsqrtv2df2 #define CODE_FOR_lsx_vfrsqrt_s CODE_FOR_rsqrtv4sf2 +#define CODE_FOR_lsx_vfrecip_d CODE_FOR_recipv2df3 +#define CODE_FOR_lsx_vfrecip_s CODE_FOR_recipv4sf3 /* LoongArch ASX define CODE_FOR_lasx_mxxx */ #define CODE_FOR_lasx_xvsadd_b CODE_FOR_ssaddv32qi3 @@ -747,6 +749,8 @@ AVAIL_ALL (lasx, ISA_HAS_LASX) #define CODE_FOR_lasx_xvsat_du CODE_FOR_lasx_xvsat_u_du #define CODE_FOR_lasx_xvfrsqrt_d CODE_FOR_rsqrtv4df2 #define CODE_FOR_lasx_xvfrsqrt_s CODE_FOR_rsqrtv8sf2 +#define CODE_FOR_lasx_xvfrecip_d CODE_FOR_recipv4df3 +#define CODE_FOR_lasx_xvfrecip_s CODE_FOR_recipv8sf3 static const struct loongarch_builtin_description loongarch_builtins[] = { #define LARCH_MOVFCSR2GR 0 @@ -2978,6 +2982,22 @@ loongarch_expand_builtin_direct (enum insn_code icode, rtx target, tree exp, if (has_target_p) create_output_operand (&ops[opno++], target, TYPE_MODE (TREE_TYPE (exp))); + /* For the vector reciprocal instructions, we need to construct a temporary + parameter const1_vector. */ + switch (icode) + { + case CODE_FOR_recipv8sf3: + case CODE_FOR_recipv4df3: + case CODE_FOR_recipv4sf3: + case CODE_FOR_recipv2df3: + loongarch_prepare_builtin_arg (&ops[2], exp, 0); + create_input_operand (&ops[1], CONST1_RTX (ops[0].mode), ops[0].mode); + return loongarch_expand_builtin_insn (icode, 3, ops, has_target_p); + + default: + break; + } + /* Map the arguments to the other operands. */ gcc_assert (opno + call_expr_nargs (exp) == insn_data[icode].n_generator_args); diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 130d77e164b..20946326e37 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1609,12 +1609,12 @@ (define_insn "lsx_vfmina_" [(set_attr "type" "simd_fminmax") (set_attr "mode" "")]) -(define_insn "lsx_vfrecip_" +(define_insn "recip3" [(set (match_operand:FLSX 0 "register_operand" "=f") - (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] - UNSPEC_LSX_VFRECIP))] + (div:FLSX (match_operand:FLSX 1 "const_vector_1_operand" "") + (match_operand:FLSX 2 "register_operand" "f")))] "ISA_HAS_LSX" - "vfrecip.\t%w0,%w1" + "vfrecip.\t%w0,%w2" [(set_attr "type" "simd_fdiv") (set_attr "mode" "")]) diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index d02e846cb12..f7796da10b2 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -227,6 +227,10 @@ (define_predicate "const_1_operand" (and (match_code "const_int,const_wide_int,const_double,const_vector") (match_test "op == CONST1_RTX (GET_MODE (op))"))) +(define_predicate "const_vector_1_operand" + (and (match_code "const_vector") + (match_test "op == CONST1_RTX (GET_MODE (op))"))) + (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) From patchwork Tue Nov 28 03:29:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1869092 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfSdn60D6z1yRy for ; Tue, 28 Nov 2023 14:30:37 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 190833857C56 for ; Tue, 28 Nov 2023 03:30:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id 35F19385829D for ; Tue, 28 Nov 2023 03:29:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35F19385829D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 35F19385829D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=114.242.206.163 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142201; cv=none; b=dcTwSTMnAr+NIOBXfQCDItDtLj9r5Ri+BHxp6fWXaty8LlG0Or7ugJpcM014PxgcMoR2qJefjiZbXia+6nOS3OCZ0cpkcj2IjRAv+0508W1Iq1hQmzYaQ8Xz8srozQEqnKfL/jhxsMFeMdOGi0Mc7kGiBl760R7jgxaUxsb55hE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142201; c=relaxed/simple; bh=NdRC/7aPyWpOhi1zvHFvCN8RuMJgun8WqtC2hnvGCu8=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=cfnmTlfOlmeaK3rI1CMHQRnUR1Go1mRoRp34acTSOx16kMwPJJ2Qf5Du0COjN5ytPbHVCFRwEvlQQiZr5rOHmSZQmmbezI/XxRRHIsG34RrOCEfFVNa0NOfHOte33BvZkLw54t6YZwk1WNjMYyDjwqDaWUYMizTaNkcz9FtChAI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8CxtPCyXmVlD049AA--.57265S3; Tue, 28 Nov 2023 11:29:54 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S8; Tue, 28 Nov 2023 11:29:52 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 4/5] LoongArch: New options -mrecip and -mrecip= with ffast-math. Date: Tue, 28 Nov 2023 11:29:37 +0800 Message-Id: <20231128032938.17202-5-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S8 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj9fXoWfCr47Xr48tF15KrW5Wr45Jwc_yoW5Xw48Zo WrtF4DJ3W8GryF939rKrs3Zry8X3WUAr4xAay3ZwnYyFs7Jr98t3sF9a1Yv343Ar9rWry5 C3s7W3ZxZa4xJa1kl-sFpf9Il3svdjkaLaAFLSUrUUUUUb8apTn2vfkv8UJUUUU8wcxFpf 9Il3svdxBIdaVrn0xqx4xG64xvF2IEw4CE5I8CrVC2j2Jv73VFW2AGmfu7bjvjm3AaLaJ3 UjIYCTnIWjp_UUUYj7kC6x804xWl14x267AKxVWUJVW8JwAFc2x0x2IEx4CE42xK8VAvwI 8IcIk0rVWrJVCq3wAFIxvE14AKwVWUGVWUXwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xG Y2AK021l84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14 v26r4j6F4UM28EF7xvwVC2z280aVAFwI0_Gr0_Cr1l84ACjcxK6I8E87Iv6xkF7I0E14v2 6r4j6r4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc804VCY07AIYIkI8VC2zVCFFI0UMc 02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2IY67AKxVWUtVWrXwAv7VC2z280aVAF wI0_Gr0_Cr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7V AKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCj r7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6x IIjxv20xvE14v26r4j6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8JVWxJwCI42IY6xAI w20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x 0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8QJ57UUUUU== X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org When -mrecip option is turned on, use approximate reciprocal instructions and approximate reciprocal square root instructions with additional Newton-Raphson steps to implement single precision floating-point division, square root and reciprocal square root operations for better throughput. gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lasx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch-protos.h (loongarch_emit_swrsqrtsf): New prototype. (loongarch_emit_swdivsf): Ditto. * config/loongarch/loongarch.cc (loongarch_option_override_internal): Set recip_mask for -mrecip and -mrecip= options. (loongarch_emit_swrsqrtsf): New function. (loongarch_emit_swdivsf): Ditto. (use_rsqrt_p): Ditto. (loongarch_optab_supported_p): Ditto. (TARGET_OPTAB_SUPPORTED_P): New hook. * config/loongarch/loongarch.h (RECIP_MASK_NONE): New bitmasks. (RECIP_MASK_DIV): Ditto. (RECIP_MASK_SQRT): Ditto. (RECIP_MASK_RSQRT): Ditto. (RECIP_MASK_VEC_DIV): Ditto. (RECIP_MASK_VEC_SQRT): Ditto. (RECIP_MASK_VEC_RSQRT): Ditto. (RECIP_MASK_ALL): Ditto. (TARGET_RECIP_DIV): New tests. (TARGET_RECIP_SQRT): Ditto. (TARGET_RECIP_RSQRT): Ditto. (TARGET_RECIP_VEC_DIV): Ditto. (TARGET_RECIP_VEC_SQRT): Ditto. (TARGET_RECIP_VEC_RSQRT): Ditto. * config/loongarch/loongarch.md (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/loongarch.opt (recip_mask): New variable. (-mrecip, -mrecip): New options. * config/loongarch/lsx.md (div3): New expander. (*div3): Rename. (sqrt2): New expander. (*sqrt2): Rename. (rsqrt2): New expander. * config/loongarch/predicates.md (reg_or_vecotr_1_operand): New predicate. * doc/invoke.texi (LoongArch Options): Document new options. gcc/testsuite/ChangeLog: * gcc.target/loongarch/recip-divf.c: New test. * gcc.target/loongarch/recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-divf.c: New test. * gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-divf.c: New test. * gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c: New test. diff --git a/gcc/config/loongarch/genopts/loongarch.opt.in b/gcc/config/loongarch/genopts/loongarch.opt.in index 8af6cc6f532..cc1a9daf7cf 100644 --- a/gcc/config/loongarch/genopts/loongarch.opt.in +++ b/gcc/config/loongarch/genopts/loongarch.opt.in @@ -23,6 +23,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -197,6 +200,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md index 2d7b8f02b4b..08c81ef53e4 100644 --- a/gcc/config/loongarch/lasx.md +++ b/gcc/config/loongarch/lasx.md @@ -1249,7 +1249,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLASX 0 "register_operand") + (div:FLASX (match_operand:FLASX 1 "reg_or_vecotr_1_operand") + (match_operand:FLASX 2 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V8SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLASX 0 "register_operand" "=f") (div:FLASX (match_operand:FLASX 1 "register_operand" "f") (match_operand:FLASX 2 "register_operand" "f")))] @@ -1278,7 +1296,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLASX 0 "register_operand") + (sqrt:FLASX (match_operand:FLASX 1 "register_operand")))] + "ISA_HAS_LASX" +{ + if (mode == V8SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (sqrt:FLASX (match_operand:FLASX 1 "register_operand" "f")))] "ISA_HAS_LASX" @@ -1710,7 +1744,20 @@ (define_insn "lasx_xvfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLASX 0 "register_operand" "=f") + (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] + UNSPEC_LASX_XVFRSQRT))] + "ISA_HAS_LASX" + { + if (mode == V8SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V8SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLASX 0 "register_operand" "=f") (unspec:FLASX [(match_operand:FLASX 1 "register_operand" "f")] UNSPEC_LASX_XVFRSQRT))] diff --git a/gcc/config/loongarch/loongarch-protos.h b/gcc/config/loongarch/loongarch-protos.h index cb8fc36b086..f2ff93b5e10 100644 --- a/gcc/config/loongarch/loongarch-protos.h +++ b/gcc/config/loongarch/loongarch-protos.h @@ -220,5 +220,7 @@ extern rtx loongarch_gen_const_int_vector_shuffle (machine_mode, int); extern tree loongarch_build_builtin_va_list (void); extern rtx loongarch_build_signbit_mask (machine_mode, bool, bool); +extern void loongarch_emit_swrsqrtsf (rtx, rtx, machine_mode, bool); +extern void loongarch_emit_swdivsf (rtx, rtx, rtx, machine_mode); extern bool loongarch_explicit_relocs_p (enum loongarch_symbol_type); #endif /* ! GCC_LOONGARCH_PROTOS_H */ diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index d3896d72bc2..afee09c3b61 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -7547,6 +7547,69 @@ loongarch_option_override_internal (struct gcc_options *opts, /* Function to allocate machine-dependent function status. */ init_machine_status = &loongarch_init_machine_status; + + /* -mrecip options. */ + static struct + { + const char *string; /* option name. */ + unsigned int mask; /* mask bits to set. */ + } + const recip_options[] = { + { "all", RECIP_MASK_ALL }, + { "none", RECIP_MASK_NONE }, + { "div", RECIP_MASK_DIV }, + { "sqrt", RECIP_MASK_SQRT }, + { "rsqrt", RECIP_MASK_RSQRT }, + { "vec-div", RECIP_MASK_VEC_DIV }, + { "vec-sqrt", RECIP_MASK_VEC_SQRT }, + { "vec-rsqrt", RECIP_MASK_VEC_RSQRT }, + }; + + if (loongarch_recip_name) + { + char *p = ASTRDUP (loongarch_recip_name); + char *q; + unsigned int mask, i; + bool invert; + + while ((q = strtok (p, ",")) != NULL) + { + p = NULL; + if (*q == '!') + { + invert = true; + q++; + } + else + invert = false; + + if (!strcmp (q, "default")) + mask = RECIP_MASK_ALL; + else + { + for (i = 0; i < ARRAY_SIZE (recip_options); i++) + if (!strcmp (q, recip_options[i].string)) + { + mask = recip_options[i].mask; + break; + } + + if (i == ARRAY_SIZE (recip_options)) + { + error ("unknown option for %<-mrecip=%s%>", q); + invert = false; + mask = RECIP_MASK_NONE; + } + } + + if (invert) + recip_mask &= ~mask; + else + recip_mask |= mask; + } + } + if (loongarch_recip) + recip_mask |= RECIP_MASK_ALL; } @@ -11443,6 +11506,156 @@ loongarch_build_signbit_mask (machine_mode mode, bool vect, bool invert) return force_reg (vec_mode, v); } +/* Use rsqrte instruction and Newton-Rhapson to compute the approximation of + a single precision floating point [reciprocal] square root. */ + +void loongarch_emit_swrsqrtsf (rtx res, rtx a, machine_mode mode, bool recip) +{ + rtx x0, e0, e1, e2, mhalf, monehalf; + REAL_VALUE_TYPE r; + int unspec; + + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + e1 = gen_reg_rtx (mode); + e2 = gen_reg_rtx (mode); + + real_arithmetic (&r, ABS_EXPR, &dconsthalf, NULL); + mhalf = const_double_from_real_value (r, SFmode); + + real_arithmetic (&r, PLUS_EXPR, &dconsthalf, &dconst1); + monehalf = const_double_from_real_value (r, SFmode); + unspec = UNSPEC_RSQRTE; + + if (VECTOR_MODE_P (mode)) + { + mhalf = loongarch_build_const_vector (mode, true, mhalf); + monehalf = loongarch_build_const_vector (mode, true, monehalf); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRSQRTE + : UNSPEC_LSX_VFRSQRTE; + } + + /* rsqrt(a) = rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) + sqrt(a) = a * rsqrte(a) * (1.5 - 0.5 * a * rsqrte(a) * rsqrte(a)) */ + + a = force_reg (mode, a); + + /* x0 = rsqrt(a) estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, a), + unspec))); + + /* If (a == 0.0) Filter out infinity to prevent NaN for sqrt(0.0). */ + if (!recip) + { + rtx zero = force_reg (mode, CONST0_RTX (mode)); + + if (VECTOR_MODE_P (mode)) + { + machine_mode imode = related_int_vector_mode (mode).require (); + rtx mask = gen_reg_rtx (imode); + emit_insn (gen_rtx_SET (mask, gen_rtx_NE (imode, a, zero))); + emit_insn (gen_rtx_SET (x0, gen_rtx_AND (mode, x0, + gen_lowpart (mode, mask)))); + } + else + { + rtx target = emit_conditional_move (x0, { GT, a, zero, mode }, + x0, zero, mode, 0); + if (target != x0) + emit_move_insn (x0, target); + } + } + + /* e0 = x0 * a */ + emit_insn (gen_rtx_SET (e0, gen_rtx_MULT (mode, x0, a))); + /* e1 = e0 * x0 */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, e0, x0))); + + /* e2 = 1.5 - e1 * 0.5 */ + mhalf = force_reg (mode, mhalf); + monehalf = force_reg (mode, monehalf); + emit_insn (gen_rtx_SET (e2, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, e1), + mhalf, monehalf))); + + if (recip) + /* res = e2 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, x0, e2))); + else + /* res = e2 * e0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e2, e0))); +} + +/* Use recipe instruction and Newton-Rhapson to compute the approximation of + a single precision floating point divide. */ + +void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) +{ + rtx x0, e0, mtwo; + REAL_VALUE_TYPE r; + x0 = gen_reg_rtx (mode); + e0 = gen_reg_rtx (mode); + int unspec = UNSPEC_RECIPE; + + real_arithmetic (&r, ABS_EXPR, &dconst2, NULL); + mtwo = const_double_from_real_value (r, SFmode); + + if (VECTOR_MODE_P (mode)) + { + mtwo = loongarch_build_const_vector (mode, true, mtwo); + unspec = GET_MODE_SIZE (mode) == 32 ? UNSPEC_LASX_XVFRECIPE + : UNSPEC_LSX_VFRECIPE; + } + + mtwo = force_reg (mode, mtwo); + + /* a / b = a * recipe(b) * (2.0 - b * recipe(b)) */ + + /* x0 = 1./b estimate. */ + emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), + unspec))); + /* 2.0 - b * x0 */ + emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode, + gen_rtx_NEG (mode, b), x0, mtwo))); + + /* x0 = a * x0 */ + if (a != CONST1_RTX (mode)) + emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0))); + + /* res = e0 * x0 */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); +} + +/* Return true if it is safe to use the rsqrt optabs to optimize + 1.0/sqrt. */ + +static bool +use_rsqrt_p (machine_mode mode) +{ + if (TARGET_RECIP_RSQRT && GET_MODE_INNER (mode) == SFmode) + return (flag_finite_math_only + && !flag_trapping_math + && flag_unsafe_math_optimizations); + else + return true; +} + +/* Implement the TARGET_OPTAB_SUPPORTED_P hook. */ + +static bool +loongarch_optab_supported_p (int op, machine_mode mode1, machine_mode, + optimization_type opt_type) +{ + switch (op) + { + case rsqrt_optab: + return opt_type == OPTIMIZE_FOR_SPEED && use_rsqrt_p (mode1); + + default: + return true; + } +} + static bool loongarch_builtin_support_vector_misalignment (machine_mode mode, const_tree type, @@ -11610,6 +11823,9 @@ loongarch_asm_code_end (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ loongarch_autovectorize_vector_modes +#undef TARGET_OPTAB_SUPPORTED_P +#define TARGET_OPTAB_SUPPORTED_P loongarch_optab_supported_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS loongarch_init_builtins #undef TARGET_BUILTIN_DECL diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h index 115222e70fd..9910084a6a5 100644 --- a/gcc/config/loongarch/loongarch.h +++ b/gcc/config/loongarch/loongarch.h @@ -700,6 +700,24 @@ enum reg_class && (GET_MODE_CLASS (MODE) == MODE_VECTOR_INT \ || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)) +#define RECIP_MASK_NONE 0x00 +#define RECIP_MASK_DIV 0x01 +#define RECIP_MASK_SQRT 0x02 +#define RECIP_MASK_RSQRT 0x04 +#define RECIP_MASK_VEC_DIV 0x08 +#define RECIP_MASK_VEC_SQRT 0x10 +#define RECIP_MASK_VEC_RSQRT 0x20 +#define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ + | RECIP_MASK_RSQRT | RECIP_MASK_VEC_SQRT \ + | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_RSQRT) + +#define TARGET_RECIP_DIV ((recip_mask & RECIP_MASK_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_SQRT ((recip_mask & RECIP_MASK_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_RSQRT ((recip_mask & RECIP_MASK_RSQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_DIV ((recip_mask & RECIP_MASK_VEC_DIV) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_SQRT ((recip_mask & RECIP_MASK_VEC_SQRT) != 0 || TARGET_uARCH_LA664) +#define TARGET_RECIP_VEC_RSQRT ((recip_mask & RECIP_MASK_VEC_RSQRT) != 0 || TARGET_uARCH_LA664) + /* 1 if N is a possible register number for function argument passing. We have no FP argument registers when soft-float. */ diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index 0b6910d84ab..ce4d2d9ad06 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -896,9 +896,21 @@ (define_peephole ;; Float division and modulus. (define_expand "div3" [(set (match_operand:ANYF 0 "register_operand") - (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") - (match_operand:ANYF 2 "register_operand")))] - "") + (div:ANYF (match_operand:ANYF 1 "reg_or_1_operand") + (match_operand:ANYF 2 "register_operand")))] + "" +{ + if (mode == SFmode + && TARGET_RECIP_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], SFmode); + DONE; + } +}) (define_insn "*div3" [(set (match_operand:ANYF 0 "register_operand" "=f") @@ -1129,7 +1141,23 @@ (define_insn "*fnma4" ;; ;; .................... -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (sqrt:ANYF (match_operand:ANYF 1 "register_operand")))] + "" + { + if (mode == SFmode + && TARGET_RECIP_SQRT + && flag_unsafe_math_optimizations + && !optimize_insn_for_size_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 0); + DONE; + } + }) + +(define_insn "*sqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (sqrt:ANYF (match_operand:ANYF 1 "register_operand" "f")))] "" @@ -1138,6 +1166,19 @@ (define_insn "sqrt2" (set_attr "mode" "") (set_attr "insn_count" "1")]) +(define_expand "rsqrt2" + [(set (match_operand:ANYF 0 "register_operand") + (unspec:ANYF [(match_operand:ANYF 1 "register_operand")] + UNSPEC_RSQRT))] + "TARGET_HARD_FLOAT" +{ + if (mode == SFmode && TARGET_RECIP_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], SFmode, 1); + DONE; + } +}) + (define_insn "*rsqrt2" [(set (match_operand:ANYF 0 "register_operand" "=f") (unspec:ANYF [(match_operand:ANYF 1 "register_operand" "f")] diff --git a/gcc/config/loongarch/loongarch.opt b/gcc/config/loongarch/loongarch.opt index 4d36e3ec4de..3c4fbda18ee 100644 --- a/gcc/config/loongarch/loongarch.opt +++ b/gcc/config/loongarch/loongarch.opt @@ -31,6 +31,9 @@ config/loongarch/loongarch-opts.h HeaderInclude config/loongarch/loongarch-str.h +TargetVariable +unsigned int recip_mask = 0 + ; ISA related options ;; Base ISA Enum @@ -205,6 +208,14 @@ mexplicit-relocs Target Var(la_opt_explicit_relocs_backward) Init(M_OPT_UNSET) Use %reloc() assembly operators (for backward compatibility). +mrecip +Target RejectNegative Var(loongarch_recip) +Generate approximate reciprocal divide and square root for better throughput. + +mrecip= +Target RejectNegative Joined Var(loongarch_recip_name) +Control generation of reciprocal estimates. + ; The code model option names for -mcmodel. Enum Name(cmodel) Type(int) diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md index 20946326e37..dc78837ecf5 100644 --- a/gcc/config/loongarch/lsx.md +++ b/gcc/config/loongarch/lsx.md @@ -1153,7 +1153,25 @@ (define_insn "mul3" [(set_attr "type" "simd_fmul") (set_attr "mode" "")]) -(define_insn "div3" +(define_expand "div3" + [(set (match_operand:FLSX 0 "register_operand") + (div:FLSX (match_operand:FLSX 1 "reg_or_vecotr_1_operand") + (match_operand:FLSX 2 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_DIV + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math + && flag_unsafe_math_optimizations) + { + loongarch_emit_swdivsf (operands[0], operands[1], + operands[2], V4SFmode); + DONE; + } +}) + +(define_insn "*div3" [(set (match_operand:FLSX 0 "register_operand" "=f") (div:FLSX (match_operand:FLSX 1 "register_operand" "f") (match_operand:FLSX 2 "register_operand" "f")))] @@ -1182,7 +1200,23 @@ (define_insn "fnma4" [(set_attr "type" "simd_fmadd") (set_attr "mode" "")]) -(define_insn "sqrt2" +(define_expand "sqrt2" + [(set (match_operand:FLSX 0 "register_operand") + (sqrt:FLSX (match_operand:FLSX 1 "register_operand")))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode + && TARGET_RECIP_VEC_SQRT + && flag_unsafe_math_optimizations + && optimize_insn_for_speed_p () + && flag_finite_math_only && !flag_trapping_math) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 0); + DONE; + } +}) + +(define_insn "*sqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (sqrt:FLSX (match_operand:FLSX 1 "register_operand" "f")))] "ISA_HAS_LSX" @@ -1638,7 +1672,20 @@ (define_insn "lsx_vfrint_" [(set_attr "type" "simd_fcvt") (set_attr "mode" "")]) -(define_insn "rsqrt2" +(define_expand "rsqrt2" + [(set (match_operand:FLSX 0 "register_operand" "=f") + (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] + UNSPEC_LSX_VFRSQRT))] + "ISA_HAS_LSX" +{ + if (mode == V4SFmode && TARGET_RECIP_VEC_RSQRT) + { + loongarch_emit_swrsqrtsf (operands[0], operands[1], V4SFmode, 1); + DONE; + } +}) + +(define_insn "*rsqrt2" [(set (match_operand:FLSX 0 "register_operand" "=f") (unspec:FLSX [(match_operand:FLSX 1 "register_operand" "f")] UNSPEC_LSX_VFRSQRT))] diff --git a/gcc/config/loongarch/predicates.md b/gcc/config/loongarch/predicates.md index f7796da10b2..9e9ce58cb53 100644 --- a/gcc/config/loongarch/predicates.md +++ b/gcc/config/loongarch/predicates.md @@ -235,6 +235,10 @@ (define_predicate "reg_or_1_operand" (ior (match_operand 0 "const_1_operand") (match_operand 0 "register_operand"))) +(define_predicate "reg_or_vecotr_1_operand" + (ior (match_operand 0 "const_vector_1_operand") + (match_operand 0 "register_operand"))) + ;; These are used in vec_merge, hence accept bitmask as const_int. (define_predicate "const_exp_2_operand" (and (match_code "const_int") diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 2e6bac37f3c..fd2a3e2a848 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1209,6 +1209,7 @@ Objective-C and Objective-C++ Dialects}. -msoft-float -mhard-float -mdouble-float -munordered-float -mcmov -mror -mrori -msext -msfimm -mshftimm -mcmodel=@var{code-model}} +-mrecip -mrecip=@var{opt} @emph{PDP-11 Options} @gccoptlist{-mfpu -msoft-float -mac0 -mno-ac0 -m40 -m45 -m10 @@ -26523,6 +26524,57 @@ detecting corresponding assembler support: This option is mostly useful for debugging, or interoperation with assemblers different from the build-time one. +@opindex mrecip +@item -mrecip +This option enables use of the reciprocal estimate and reciprocal square +root estimate instructions with additional Newton-Raphson steps to increase +precision instead of doing a divide or square root and divide for +floating-point arguments. +These instructions are generated only when @option{-funsafe-math-optimizations} +is enabled together with @option{-ffinite-math-only} and +@option{-fno-trapping-math}. +Note that while the throughput of the sequence is higher than the throughput of +the non-reciprocal instruction, the precision of the sequence can be decreased +by up to 2 ulp (i.e. the inverse of 1.0 equals 0.99999994). + +@opindex mrecip=opt +@item -mrecip=@var{opt} +This option controls which reciprocal estimate instructions +may be used. @var{opt} is a comma-separated list of options, which may +be preceded by a @samp{!} to invert the option: + +@table @samp +@item all +Enable all estimate instructions. + +@item default +Enable the default instructions, equivalent to @option{-mrecip}. + +@item none +Disable all estimate instructions, equivalent to @option{-mno-recip}. + +@item div +Enable the approximation for scalar division. + +@item vec-div +Enable the approximation for vectorized division. + +@item sqrt +Enable the approximation for scalar square root. + +@item vec-sqrt +Enable the approximation for vectorized square root. + +@item rsqrt +Enable the approximation for scalar reciprocal square root. + +@item vec-rsqrt +Enable the approximation for vectorized reciprocal square root. +@end table + +So, for example, @option{-mrecip=all,!sqrt} enables +all of the reciprocal approximations, except for scalar square root. + @item loongarch-vect-unroll-limit The vectorizer will use available tuning information to determine whether it would be beneficial to unroll the main vectorized loop and by how much. This diff --git a/gcc/testsuite/gcc.target/loongarch/recip-divf.c b/gcc/testsuite/gcc.target/loongarch/recip-divf.c new file mode 100644 index 00000000000..82b3224a250 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-divf.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -march=la664" } */ +/* { dg-final { scan-assembler "frecipe.s" } } */ + +float +foo(float a, float b) +{ + return a / b; +} diff --git a/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c new file mode 100644 index 00000000000..23ceb4cd261 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/recip-sqrtf.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -march=la664" } */ +/* { dg-final { scan-assembler-times "frsqrte.s" 3 } } */ + +extern float sqrtf (float); + +float +foo1 (float a, float b) +{ + return a/sqrtf(b); +} + +float +foo2 (float a, float b) +{ + return sqrtf(a/b); +} + +float +foo3 (float a) +{ + return sqrtf(a); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c new file mode 100644 index 00000000000..6ca72a1ce81 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -march=la664" } */ +/* { dg-final { scan-assembler "xvfrecipe.s" } } */ + +float a[8],b[8],c[8]; + +void +foo () +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c new file mode 100644 index 00000000000..5f1c4e3d164 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlasx -march=la664" } */ +/* { dg-final { scan-assembler-times "xvfrsqrte.s" 3 } } */ + +float a[8], b[8], c[8]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 8; i++) + c[i] = sqrtf (a[i]); +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c new file mode 100644 index 00000000000..015dafb50f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-divf.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -march=la664" } */ +/* { dg-final { scan-assembler "vfrecipe.s" } } */ + +float a[4],b[4],c[4]; + +void +foo () +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / b[i]; +} diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c new file mode 100644 index 00000000000..5a1a14e291c --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/vector/lsx/lsx-recip-sqrtf.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ffast-math -mrecip -mlsx -march=la664" } */ +/* { dg-final { scan-assembler-times "vfrsqrte.s" 3 } } */ + +float a[4], b[4], c[4]; + +extern float sqrtf (float); + +void +foo1 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = a[i] / sqrtf (b[i]); +} + +void +foo2 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i] / b[i]); +} + +void +foo3 (void) +{ + for (int i = 0; i < 4; i++) + c[i] = sqrtf (a[i]); +} From patchwork Tue Nov 28 03:29:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiahao Xu X-Patchwork-Id: 1869093 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SfSdw1jQ5z1yRy for ; Tue, 28 Nov 2023 14:30:44 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AD8513858012 for ; Tue, 28 Nov 2023 03:30:41 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id D76263857C73 for ; Tue, 28 Nov 2023 03:30:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D76263857C73 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D76263857C73 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142202; cv=none; b=kzIqhRahi2LNHjnwNdX3PdA05APm1dATQqyZrwBLkebIcQPPa6k3tzFVXunIoxzk/pZDYtQWYGMGedTdYU+zS6s07O4I5wVyPQMz10r5JAsDu5XxpBbOb/+0b2rI9jqkdcgvG/8xcpU1A5H9CBJpay5ToaGkcBJfAMFbHCXRldw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701142202; c=relaxed/simple; bh=IsZ/ld3Q8i5k2axcgFS70aXCgN30rX9BCe8lzWnZGEc=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=jI4LcqC6AOvCJ1d3Rv42Gw+OvuVS3mn4U4I9YELgYu59NQ176+/m4QK6JFPU5AmSUe4YYDeUf73Y3feTpwbubAHQTe1G4bEeobgTkMit0s4U3a4BN/kk8L0H9yuliBxb/7yJLfC2jFJKeaojpOTTVcaY3GLQKiJ7qiD++LG+msc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r7onO-00015Q-2K for gcc-patches@gcc.gnu.org; Mon, 27 Nov 2023 22:30:00 -0500 Received: from loongson.cn (unknown [10.10.130.252]) by gateway (Coremail) with SMTP id _____8Dxl+izXmVlFU49AA--.20296S3; Tue, 28 Nov 2023 11:29:55 +0800 (CST) Received: from slurm-master.loongson.cn (unknown [10.10.130.252]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Dxnd6lXmVlM2ZOAA--.43637S9; Tue, 28 Nov 2023 11:29:55 +0800 (CST) From: Jiahao Xu To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, chenglulu@loongson.cn, xuchenghua@loongson.cn, Jiahao Xu Subject: [PATCH 5/5] LoongArch: Vectorized loop unrolling is not performed on divf/sqrtf/rsqrtf with turns on -mrecip. Date: Tue, 28 Nov 2023 11:29:38 +0800 Message-Id: <20231128032938.17202-6-xujiahao@loongson.cn> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231128032938.17202-1-xujiahao@loongson.cn> References: <20231128032938.17202-1-xujiahao@loongson.cn> MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Dxnd6lXmVlM2ZOAA--.43637S9 X-CM-SenderInfo: 50xmxthkdrqz5rrqw2lrqou0/ X-Coremail-Antispam: 1Uk129KBj93XoW7KrWUAFW7trWfZryrCry7twc_yoW8tF1Upr ZIyr13tw48Jr47WrsrJ3yxWw1ayrZxGF42qa1fta4fCa17Kr1Fq3Wktr1qvFZxX3yrGryI vr1IqFs8Za45C3cCm3ZEXasCq-sJn29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUk0b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1Y6r17M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Gr0_Cr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_ Gr0_Gr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx1l5I 8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1q6rW5McIj6I8E87Iv67AK xVW8JVWxJwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2Ij64 vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8G jcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE2I x0cI8IcVAFwI0_Gr0_Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42xK 8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I 0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUcCD7UUUUU Received-SPF: pass client-ip=114.242.206.163; envelope-from=xujiahao@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Using -mrecip generates a sequence of instructions to replace divf, sqrtf and rsqrtf. The number of generated instructions is close to or exceeds the maximum issue of the LoongArch, so vectorized loop unrolling is not performed on them. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs::determine_suggested_unroll_factor): If m_has_recip is true, uf return 1. (loongarch_vector_costs::add_stmt_cost): Detect the use of approximate instruction sequence. diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index afee09c3b61..894ce0e1630 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -3974,7 +3974,9 @@ protected: /* Reduction factor for suggesting unroll factor. */ unsigned m_reduc_factor = 0; /* True if the loop contains an average operation. */ - bool m_has_avg =false; + bool m_has_avg = false; + /* True if the loop uses approximation instruction sequence. */ + bool m_has_recip = false; }; /* Implement TARGET_VECTORIZE_CREATE_COSTS. */ @@ -4021,7 +4023,7 @@ loongarch_vector_costs::determine_suggested_unroll_factor (loop_vec_info loop_vi { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); - if (m_has_avg) + if (m_has_avg || m_has_recip) return 1; /* Don't unroll if it's specified explicitly not to be unrolled. */ @@ -4081,6 +4083,36 @@ loongarch_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind, } } + combined_fn cfn; + if (kind == vector_stmt + && stmt_info + && stmt_info->stmt) + { + /* Detect the use of approximate instruction sequence. */ + if ((TARGET_RECIP_VEC_SQRT || TARGET_RECIP_VEC_RSQRT) + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_BUILT_IN_SQRTF: + m_has_recip = true; + default: + break; + } + else if (TARGET_RECIP_VEC_DIV + && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) + { + machine_mode mode = TYPE_MODE (vectype); + switch (gimple_assign_rhs_code (stmt_info->stmt)) + { + case RDIV_EXPR: + if (GET_MODE_INNER (mode) == SFmode) + m_has_recip = true; + default: + break; + } + } + } + return retval; }