From patchwork Fri Jul 26 12:43:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xi Ruoyao X-Patchwork-Id: 1965299 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=xry111.site header.i=@xry111.site header.a=rsa-sha256 header.s=default header.b=iK3zIvzX; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WVnWG0VbBz1yXx for ; Fri, 26 Jul 2024 22:44:10 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 529F33838A12 for ; Fri, 26 Jul 2024 12:44:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from xry111.site (xry111.site [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id CE8173865C2C for ; Fri, 26 Jul 2024 12:43:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CE8173865C2C Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=xry111.site Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CE8173865C2C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=89.208.246.23 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721997827; cv=none; b=K7P28NFHkK40m/BjjRdmCSD2Aky6nkQwDtdR6S0k9zkESuVIfeYF123pN7SrHhqA40koUDcIj/azbYwZrxTbmRx7A8tJB5Uk1tSaw/syvD119tCPplz3T0vXZa+MfYKbKLjRdKuSTWlD36PYvXw1wRuSAHSei4RWSmkGDbLK9Ok= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721997827; c=relaxed/simple; bh=2bRwLOzYTIy4IRl7q1uB50rNdT3kVwbdtUMpu2g/ZX8=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qDK14lA1fW5AGpw0uge3Dnobt8YvLQnIAbLAkELlZIediQZmWApEjoous5okla9xg6wRTIqBnUxHehL5OdS/xjuKW63pWD+zgX/1XI3WZvXIRR0oCQY3cUCJdJhRl+lU9Gfkx20PInv+zP3A7U48oEZwjnNBWHkfZo1O70qEXJk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site; s=default; t=1721997823; bh=2bRwLOzYTIy4IRl7q1uB50rNdT3kVwbdtUMpu2g/ZX8=; h=From:To:Cc:Subject:Date:From; b=iK3zIvzX3hOlHE3mcf4rHq09r/Haja2Le4TZGdVdkHifiQcZu3i6X6oRDDWRcXnVc pbqFDbGzcKif8OV8qPuEWxKGtrfPa/zNYlkgaxc4JPL/EZn7scyUMgHsQed96oVV4V 71aLsrEg2dJRatTs0ZI51AzFr0R8LmBFkL1HRk7c= Received: from stargazer.. (unknown [113.200.174.100]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (Client did not present a certificate) (Authenticated sender: xry111@xry111.site) by xry111.site (Postfix) with ESMTPSA id 204EC659AC; Fri, 26 Jul 2024 08:43:40 -0400 (EDT) From: Xi Ruoyao To: gcc-patches@gcc.gnu.org Cc: chenglulu , i@xen0n.name, xuchenghua@loongson.cn, Xi Ruoyao Subject: [PATCH] LoongArch: Expand some SImode operations through "si3_extend" instructions if TARGET_64BIT Date: Fri, 26 Jul 2024 20:43:14 +0800 Message-ID: <20240726124330.2308173-1-xry111@xry111.site> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_FROM, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org We already had "si3_extend" insns and we hoped the fwprop or combine passes can use them to remove unnecessary sign extensions. But this does not always work: for cases like x << 1 | y, the compiler tends to do (sign_extend:DI (ior:SI (ashift:SI (reg:SI $r4) (const_int 1)) (reg:SI $r5))) instead of (ior:DI (sign_extend:DI (ashift:SI (reg:SI $r4) (const_int 1))) (sign_extend:DI (reg:SI $r5))) So we cannot match the ashlsi3_extend instruction here and we get: slli.w $r4,$r4,1 or $r4,$r5,$r4 slli.w $r4,$r4,0 # <= redundant jr $r1 To eliminate this redundant extension we need to turn SImode shift etc. to DImode "si3_extend" operations earlier, when we expand the SImode operation. We are already doing this for addition, now do it for shifts, rotates, substract, multiplication, division, and modulo as well. The bytepick.w definition for TARGET_64BIT needs to be adjusted so it won't be undone by the shift expanding. gcc/ChangeLog: * config/loongarch/loongarch.md (optab): Add (rotatert "rotr"). (3, 3, sub3, rotr3, mul3): Add a "*" to the insn name so we can redefine the names with define_expand. (*si3_extend): Remove "*" so we can use them in expanders. (*subsi3_extended, *mulsi3_extended): Likewise, also remove the trailing "ed" for consistency. (*si3_extended): Add mode for sign_extend to prevent an ICE using it in expanders. (shift_w, arith_w): New define_code_iterator. (3): New define_expand. Expand with si3_extend for SImode if TARGET_64BIT. (3): Likewise. (mul3): Expand to mulsi3_extended for SImode if TARGET_64BIT and ISA_HAS_DIV32. (3): Expand to si3_extended for SImode if TARGET_64BIT. (rotl3): Expand to rotrsi3_extend for SImode if TARGET_64BIT. (bytepick_w_): Add mode for lshiftrt and ashift. (bitsize, bytepick_imm, bytepick_w_ashift_amount): New define_mode_attr. (bytepick_w__extend): Adjust for the RTL change caused by 32-bit shift expanding. Now bytepick_imm only covers 2 and 3, separate one remaining case to ... (bytepick_w_1_extend): ... here, new define_insn. gcc/testsuite/ChangeLog: * gcc.target/loongarch/bitwise_extend.c: New test. --- Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? gcc/config/loongarch/loongarch.md | 131 +++++++++++++++--- .../gcc.target/loongarch/bitwise_extend.c | 45 ++++++ 2 files changed, 154 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/bitwise_extend.c diff --git a/gcc/config/loongarch/loongarch.md b/gcc/config/loongarch/loongarch.md index bc09712bce7..e1629c5a339 100644 --- a/gcc/config/loongarch/loongarch.md +++ b/gcc/config/loongarch/loongarch.md @@ -546,6 +546,7 @@ (define_code_attr u_bool [(sign_extend "false") (zero_extend "true")]) (define_code_attr optab [(ashift "ashl") (ashiftrt "ashr") (lshiftrt "lshr") + (rotatert "rotr") (ior "ior") (xor "xor") (and "and") @@ -624,6 +625,49 @@ (define_int_attr bytepick_imm [(8 "1") (48 "6") (56 "7")]) +;; Expand some 32-bit operations to si3_extend operations if TARGET_64BIT +;; so the redundant sign extension can be removed if the output is used as +;; an input of a bitwise operation. Note plus, rotl, and div are handled +;; separately. +(define_code_iterator shift_w [any_shift rotatert]) +(define_code_iterator arith_w [minus mult]) + +(define_expand "3" + [(set (match_operand:GPR 0 "register_operand" "=r") + (shift_w:GPR (match_operand:GPR 1 "register_operand" "r") + (match_operand:SI 2 "arith_operand" "rI")))] + "" +{ + if (TARGET_64BIT && mode == SImode) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_si3_extend (t, operands[1], operands[2])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } +}) + +(define_expand "3" + [(set (match_operand:GPR 0 "register_operand" "=r") + (arith_w:GPR (match_operand:GPR 1 "register_operand" "r") + (match_operand:GPR 2 "register_operand" "r")))] + "" +{ + if (TARGET_64BIT && mode == SImode) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_si3_extend (t, operands[1], operands[2])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } +}) + ;; ;; .................... ;; @@ -781,7 +825,7 @@ (define_insn "sub3" [(set_attr "type" "fadd") (set_attr "mode" "")]) -(define_insn "sub3" +(define_insn "*sub3" [(set (match_operand:GPR 0 "register_operand" "=r") (minus:GPR (match_operand:GPR 1 "register_operand" "r") (match_operand:GPR 2 "register_operand" "r")))] @@ -791,7 +835,7 @@ (define_insn "sub3" (set_attr "mode" "")]) -(define_insn "*subsi3_extended" +(define_insn "subsi3_extend" [(set (match_operand:DI 0 "register_operand" "=r") (sign_extend:DI (minus:SI (match_operand:SI 1 "reg_or_0_operand" "rJ") @@ -818,7 +862,7 @@ (define_insn "mul3" [(set_attr "type" "fmul") (set_attr "mode" "")]) -(define_insn "mul3" +(define_insn "*mul3" [(set (match_operand:GPR 0 "register_operand" "=r") (mult:GPR (match_operand:GPR 1 "register_operand" "r") (match_operand:GPR 2 "register_operand" "r")))] @@ -827,7 +871,7 @@ (define_insn "mul3" [(set_attr "type" "imul") (set_attr "mode" "")]) -(define_insn "*mulsi3_extended" +(define_insn "mulsi3_extend" [(set (match_operand:DI 0 "register_operand" "=r") (sign_extend:DI (mult:SI (match_operand:SI 1 "register_operand" "r") @@ -1001,8 +1045,19 @@ (define_expand "3" (match_operand:GPR 2 "register_operand")))] "" { - if (GET_MODE (operands[0]) == SImode && TARGET_64BIT && !ISA_HAS_DIV32) + if (GET_MODE (operands[0]) == SImode && TARGET_64BIT) { + if (ISA_HAS_DIV32) + { + rtx t = gen_reg_rtx (DImode); + emit_insn (gen_si3_extended (t, operands[1], operands[2])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } + rtx reg1 = gen_reg_rtx (DImode); rtx reg2 = gen_reg_rtx (DImode); rtx rd = gen_reg_rtx (DImode); @@ -1038,7 +1093,7 @@ (define_insn "*3" (define_insn "si3_extended" [(set (match_operand:DI 0 "register_operand" "=r,&r,&r") - (sign_extend + (sign_extend:DI (any_div:SI (match_operand:SI 1 "register_operand" "r,r,0") (match_operand:SI 2 "register_operand" "r,r,r"))))] "TARGET_64BIT && ISA_HAS_DIV32" @@ -2985,7 +3040,7 @@ (define_expand "cpymemsi" ;; ;; .................... -(define_insn "3" +(define_insn "*3" [(set (match_operand:GPR 0 "register_operand" "=r") (any_shift:GPR (match_operand:GPR 1 "register_operand" "r") (match_operand:SI 2 "arith_operand" "rI")))] @@ -3000,7 +3055,7 @@ (define_insn "3" [(set_attr "type" "shift") (set_attr "mode" "")]) -(define_insn "*si3_extend" +(define_insn "si3_extend" [(set (match_operand:DI 0 "register_operand" "=r") (sign_extend:DI (any_shift:SI (match_operand:SI 1 "register_operand" "r") @@ -3015,7 +3070,7 @@ (define_insn "*si3_extend" [(set_attr "type" "shift") (set_attr "mode" "SI")]) -(define_insn "rotr3" +(define_insn "*rotr3" [(set (match_operand:GPR 0 "register_operand" "=r,r") (rotatert:GPR (match_operand:GPR 1 "register_operand" "r,r") (match_operand:SI 2 "arith_operand" "r,I")))] @@ -3044,6 +3099,19 @@ (define_expand "rotl3" "" { operands[3] = gen_reg_rtx (SImode); + + if (TARGET_64BIT && mode == SImode) + { + rtx t = gen_reg_rtx (DImode); + + emit_insn (gen_negsi2 (operands[3], operands[2])); + emit_insn (gen_rotrsi3_extend (t, operands[1], operands[3])); + t = gen_lowpart (SImode, t); + SUBREG_PROMOTED_VAR_P (t) = 1; + SUBREG_PROMOTED_SET (t, SRP_SIGNED); + emit_move_insn (operands[0], t); + DONE; + } }); ;; The following templates were added to generate "bstrpick.d + alsl.d" @@ -4113,26 +4181,45 @@ (define_expand "2" (define_insn "bytepick_w_" [(set (match_operand:SI 0 "register_operand" "=r") - (ior:SI (lshiftrt (match_operand:SI 1 "register_operand" "r") - (const_int )) - (ashift (match_operand:SI 2 "register_operand" "r") - (const_int bytepick_w_ashift_amount))))] + (ior:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "r") + (const_int )) + (ashift:SI (match_operand:SI 2 "register_operand" "r") + (const_int bytepick_w_ashift_amount))))] "" "bytepick.w\t%0,%1,%2," [(set_attr "mode" "SI")]) +(define_mode_attr bitsize [(QI "8") (HI "16")]) +(define_mode_attr bytepick_imm [(QI "3") (HI "2")]) +(define_mode_attr bytepick_w_ashift_amount [(QI "24") (HI "16")]) + (define_insn "bytepick_w__extend" [(set (match_operand:DI 0 "register_operand" "=r") - (sign_extend:DI - (subreg:SI - (ior:DI (subreg:DI (lshiftrt - (match_operand:SI 1 "register_operand" "r") - (const_int )) 0) - (subreg:DI (ashift - (match_operand:SI 2 "register_operand" "r") - (const_int bytepick_w_ashift_amount)) 0)) 0)))] + (ior:DI + (ashift:DI + (sign_extend:DI + (subreg:SHORT (match_operand:DI 1 "register_operand" "r") 0)) + (const_int )) + (zero_extract:DI (match_operand:DI 2 "register_operand" "r") + (const_int ) + (const_int ))))] "TARGET_64BIT" - "bytepick.w\t%0,%1,%2," + "bytepick.w\t%0,%2,%1," + [(set_attr "mode" "SI")]) + +(define_insn "bytepick_w_1_extend" + [(set (match_operand:DI 0 "register_operand" "=r") + (ior:DI + (ashift:DI + (sign_extract:DI (match_operand:DI 1 "register_operand" "r") + (const_int 24) + (const_int 0)) + (const_int 8)) + (zero_extract:DI (match_operand:DI 2 "register_operand" "r") + (const_int 8) + (const_int 24))))] + "TARGET_64BIT" + "bytepick.w\t%0,%2,%1,1" [(set_attr "mode" "SI")]) (define_insn "bytepick_d_" diff --git a/gcc/testsuite/gcc.target/loongarch/bitwise_extend.c b/gcc/testsuite/gcc.target/loongarch/bitwise_extend.c new file mode 100644 index 00000000000..c2bc489a734 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/bitwise_extend.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=loongarch64 -mdiv32" } */ +/* { dg-final { scan-assembler-not "slli\\.w" } } */ + +int +f1 (int a, int b) +{ + return (a << b) | b; +} + +int +f2 (int a, int b) +{ + return (a - b) | b; +} + +int +f3 (int a, int b) +{ + return (a * b) | b; +} + +int +f4 (int a, int b) +{ + return (unsigned) a >> b | (unsigned) a << (32 - b) | b; +} + +int +f5 (int a, int b) +{ + return (unsigned) a << b | (unsigned) a >> (32 - b) | b; +} + +int +f6 (int a, int b) +{ + return (a % b) | b; +} + +int +f7 (int a, int b) +{ + return (a + b) | b; +}