From patchwork Wed Jan 24 09:44:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Li Wei X-Patchwork-Id: 1890131 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TKfFP6SzPz23dy for ; Wed, 24 Jan 2024 20:44:57 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D2C02385803B for ; Wed, 24 Jan 2024 09:44:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from eggs.gnu.org (eggs.gnu.org [IPv6:2001:470:142:3::10]) by sourceware.org (Postfix) with ESMTPS id B694B3858415 for ; Wed, 24 Jan 2024 09:44:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B694B3858415 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=loongson.cn ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B694B3858415 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2001:470:142:3::10 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706089475; cv=none; b=Ih/MVgcsBPewNzYoxZdOxYDnBjFMSo0TObwWiK+0kYEflYm3iF/e4PCwV4/NWziwAfnNW5FZnaG8M2O1yDqhY3zCEAFU7jiam1tc0sYVgJriKXUAiQ31nhsrSbEA03L5S8/dEvr/WjbjfVHDIcL/ZWwpVUx+ppnhl8O7xCoQ+tI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706089475; c=relaxed/simple; bh=oG+lGexML3PdO2Cn6ceJ9vNn68+EZ8FSchUJyX0wpeg=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=uj68CcJKjwrZuk/fT35XpJh7pClxzDU2ASEfx86Px7uj74wedUsC5bRaYBkFoxTo+xuDn27T96NA+M24zeqGyUo+wo/ovNspEn3yURiBG5lF4H3/SiI1kVOceJdEwrf7qB+SMZoa0MHFJialTb/WA2d2jgJMlQKSZkNhwAFLyV0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from mail.loongson.cn ([114.242.206.163]) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rSZo4-0005cr-PY for gcc-patches@gcc.gnu.org; Wed, 24 Jan 2024 04:44:31 -0500 Received: from loongson.cn (unknown [10.2.6.5]) by gateway (Coremail) with SMTP id _____8BxefD227BlFa8EAA--.19184S3; Wed, 24 Jan 2024 17:44:22 +0800 (CST) Received: from 5.5.5 (unknown [10.2.6.5]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Ax3c7z27BlWZwWAA--.34557S4; Wed, 24 Jan 2024 17:44:19 +0800 (CST) From: Li Wei To: gcc-patches@gcc.gnu.org Cc: xry111@xry111.site, i@xen0n.name, xuchenghua@loongson.cn, chenglulu@loongson.cn, Li Wei Subject: [PATCH v1] LoongArch: Optimize implementation of single-precision floating-point approximate division. Date: Wed, 24 Jan 2024 17:44:17 +0800 Message-Id: <20240124094417.26333-1-liwei@loongson.cn> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 X-CM-TRANSID: AQAAf8Ax3c7z27BlWZwWAA--.34557S4 X-CM-SenderInfo: 5olzvxo6or00hjvr0hdfq/ X-Coremail-Antispam: 1Uk129KBj93XoWxCF4xGFWrGr13WFykXrWfCrX_yoW5uF1rpa 9F9r1rKF48JrsrtFs7Jay8urn0qas2kw43u3Wft348Ar48Jr9Iqr18KryaqF17t3yYqrya gF47Cw1av3Wj9wcCm3ZEXasCq-sJn29KB7ZKAUJUUUU5529EdanIXcx71UUUUU7KY7ZEXa sCq-sGcSsGvfJ3Ic02F40EFcxC0VAKzVAqx4xG6I80ebIjqfuFe4nvWSU5nxnvy29KBjDU 0xBIdaVrnRJUUUkFb4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k26cxKx2 IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48v e4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI 0_Jr0_Gr1l84ACjcxK6I8E87Iv67AKxVW8Jr0_Cr1UM28EF7xvwVC2z280aVCY1x0267AK xVW8Jr0_Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l57IF6xkI12xvs2x26I8E6xACxx 1l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjxv20xvE14v26r1Y6r17McIj6I8E87Iv 67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1lF7xvr2IYc2Ij64vIr41l42xK82IYc2 Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s02 6x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r1j6r4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07UNvtZUUUUU= Received-SPF: pass client-ip=114.242.206.163; envelope-from=liwei@loongson.cn; helo=mail.loongson.cn X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, SPF_FAIL, SPF_HELO_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org We found that in the spec17 521.wrf program, some loop invariant code generated from single-precision floating-point approximate division calculation failed to propose a loop. This is because the pseudo-register that stores the intermediate temporary calculation results is rewritten in the implementation of single-precision floating-point approximate division, failing to propose invariants in the loop2_invariant pass. To this end, the intermediate temporary calculation results are stored in new pseudo-registers without destroying the read-write dependency, so that they could be recognized as loop invariants in the loop2_invariant pass. After optimization, the number of instructions of 521.wrf is reduced by 0.18% compared with before optimization (1716612948501 -> 1713471771364). gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust. gcc/testsuite/ChangeLog: * gcc.target/loongarch/invariant-recip.c: New test. --- gcc/config/loongarch/loongarch.cc | 19 +++++++---- .../gcc.target/loongarch/invariant-recip.c | 33 +++++++++++++++++++ 2 files changed, 46 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/invariant-recip.c diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc index 32a0b6f43e8..1b88147fd8c 100644 --- a/gcc/config/loongarch/loongarch.cc +++ b/gcc/config/loongarch/loongarch.cc @@ -10894,16 +10894,23 @@ void loongarch_emit_swdivsf (rtx res, rtx a, rtx b, machine_mode mode) /* x0 = 1./b estimate. */ emit_insn (gen_rtx_SET (x0, gen_rtx_UNSPEC (mode, gen_rtvec (1, b), unspec))); - /* 2.0 - b * x0 */ + /* e0 = 2.0 - b * x0. */ emit_insn (gen_rtx_SET (e0, gen_rtx_FMA (mode, gen_rtx_NEG (mode, b), x0, mtwo))); - /* x0 = a * x0 */ if (a != CONST1_RTX (mode)) - emit_insn (gen_rtx_SET (x0, gen_rtx_MULT (mode, a, x0))); - - /* res = e0 * x0 */ - emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); + { + rtx e1 = gen_reg_rtx (mode); + /* e1 = a * x0. */ + emit_insn (gen_rtx_SET (e1, gen_rtx_MULT (mode, a, x0))); + /* res = e0 * e1. */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, e1))); + } + else + { + /* res = e0 * x0. */ + emit_insn (gen_rtx_SET (res, gen_rtx_MULT (mode, e0, x0))); + } } static bool diff --git a/gcc/testsuite/gcc.target/loongarch/invariant-recip.c b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c new file mode 100644 index 00000000000..2f64f6ed5e5 --- /dev/null +++ b/gcc/testsuite/gcc.target/loongarch/invariant-recip.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=loongarch64 -mabi=lp64d -mrecip -mfrecipe -fdump-rtl-loop2_invariant " } */ +/* { dg-final { scan-rtl-dump "Decided to move dependent invariant" "loop2_invariant" } } */ + +void +nislfv_rain_plm (int im, int km, float dzl[im][km], float rql[im][km], + float dt) +{ + int i, k; + float con1, decfl; + float dz[km], qn[km], wi[km + 1]; + + for (i = 0; i < im; i++) + { + for (k = 0; k < km; k++) + { + dz[k] = dzl[i][k]; + } + con1 = 0.05; + for (k = km - 1; k >= 0; k--) + { + decfl = (wi[k + 1] - wi[k]) * dt / dz[k]; + if (decfl > con1) + { + wi[k] = wi[k + 1] - con1 * dz[k] / dt; + } + } + for (k = 0; k < km; k++) + { + rql[i][k] = qn[k]; + } + } +}