From patchwork Mon Sep 2 20:01:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Raphael Moreira Zinsly X-Patchwork-Id: 1979754 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=nF+QCSm4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WyKRc2t00z1yZ9 for ; Tue, 3 Sep 2024 06:02:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2F13E385EC36 for ; Mon, 2 Sep 2024 20:02:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by sourceware.org (Postfix) with ESMTPS id 7FE283858288 for ; Mon, 2 Sep 2024 20:02:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7FE283858288 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7FE283858288 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::42c ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725307333; cv=none; b=Q70ExwNi6CkqplgTTwfISaSjugj5kj7accHP8N7Gpac+WKt2zjxxaiAUQnB9E5LQDAMYmk+9mI5oHsc/ubNW9UchEtiLMN3RezHXcE6DFO6epqJxG2p6S4t4OD9cxdVd+8kftWCC3Nxvx3mWmmF8p2wqexNUs0eXbFAoHFD6M+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1725307333; c=relaxed/simple; bh=mxGo2DT6P3YU4gD6sqxHC1qyHMZe4uSwvUkF7NGMHmE=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=Bu7c6LJSEvPxID5peH6gq15pdApNGdjspxaqGqpcMyPcEsG7D8s5PB/cjFFmh3c9n1rTgUa4uoU2b/aF5CWYkIsbE3byEGzyGiRwxjXouktQSF3AXreVnDjil4MLaBJst8TMx5W1SD4aqVdoCY13jk149EUP8N3Fdhd0NMvOu9g= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-7176645e51bso600100b3a.3 for ; Mon, 02 Sep 2024 13:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1725307328; x=1725912128; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IqiHCRb8y5Ppco0++etyINYFao/0XTASO0feKOSotOk=; b=nF+QCSm4KV7HIXFEYLVvaq7QpxAj3XhGTVaCqyoWdFbl7jyP/UiMzMm8Zup9g+vsEp 9wpA6joQs/7tc2T6/6zvGoU8GnROn8a9pDIMwbz8APh8G9GFyDfprATP8hsr2XMivSQ/ +ir4LXBJZ30bFvshClsdPcG82GpR4IV6tkdn80pFbOmphTFUlEfxeeUMh6fgO0X0Zos5 KE571WBCpjPTqsKS10q97RyDnLC5HJmikSEFQreeagwfy2oUf5uTowkCmLItPlRN/EYg 1nuBJpemccxNYjWWHds+AynHh+RNtuykCVpw2R979Hqvvtv/SpXpVx+QSjnQHl6BA31f Oo7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725307328; x=1725912128; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IqiHCRb8y5Ppco0++etyINYFao/0XTASO0feKOSotOk=; b=V2wlrr4RqhsXxP/7P+2Y9ol2D93nJ4W3nXULN6Nk+Q9r97AhANSxU7AQ0eMXRq5yc3 /IeGBoHFGvJ8ShKW5nZSQVTEIrygZZv45yrzu8ajbdYl2Smmz6AvB8Ys0RJLc0nTyYTN 3rzFIrCnV5F4/oGGoG6yPgmwi8sUMHzzrYzBykpE+IEgcfwb9AP32KMOBAm5+4ddpk/a FwrNwsHyfIRewaOdo52kbd6snyowqADZXBPFlTShucvmuUKcVdzsz4ckfErXsytbvXoA tpdkNZ32fov01x55IxAmJcM9NCL7/Uj/6/YYKE9lVHzBTBf08bxYu+uS4v1paa6ZEFb0 Zktw== X-Gm-Message-State: AOJu0Yxs43jMDijLf8A3WgDawANfiS7d+GEKO9mCSsK7yUS8yf8z5UGA vYn/rhH3RzvakECffi8CHSk+26so4nwwfAq0RH7iEUfFfr1CiThCoK27LfzWXcBeqWBlC2btCnt H X-Google-Smtp-Source: AGHT+IG7CaoqWNWvHrRZTc7wPM7DhFK4391aOv6rV2pIDO0PCBe0EFMgEX2Z+KZ18LLjLBy1/VugXA== X-Received: by 2002:a05:6a21:39a:b0:1c4:2151:7276 with SMTP id adf61e73a8af0-1cce0ffe3b1mr16023754637.10.1725307327850; Mon, 02 Sep 2024 13:02:07 -0700 (PDT) Received: from marvin.dc1.ventanamicro.com ([189.4.72.88]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-715e569f485sm7205182b3a.129.2024.09.02.13.02.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Sep 2024 13:02:07 -0700 (PDT) From: Raphael Moreira Zinsly To: gcc-patches@gcc.gnu.org Cc: jlaw@ventanamicro.com, vineetg@rivosinc.com, Raphael Moreira Zinsly Subject: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements Date: Mon, 2 Sep 2024 17:01:56 -0300 Message-ID: <20240902200157.328705-2-rzinsly@ventanamicro.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20240902200157.328705-1-rzinsly@ventanamicro.com> References: <20240902200157.328705-1-rzinsly@ventanamicro.com> MIME-Version: 1.0 X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_ABUSEAT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_SBL_CSS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Improve handling of large constants in riscv_build_integer, generate better code for constants where the high half can be constructed by shifting/shiftNadding the low half or if the halves differ by less than 2k. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_build_integer): Detect new case of constants that can be improved. (riscv_move_integer): Add synthesys for concatening constants without Zbkb. gcc/testsuite/ChangeLog: * gcc.target/riscv/synthesis-12.c: New test. * gcc.target/riscv/synthesis-13.c: New test. * gcc.target/riscv/synthesis-14.c: New test. --- gcc/config/riscv/riscv.cc | 140 +++++++++++++++++- gcc/testsuite/gcc.target/riscv/synthesis-12.c | 26 ++++ gcc/testsuite/gcc.target/riscv/synthesis-13.c | 26 ++++ gcc/testsuite/gcc.target/riscv/synthesis-14.c | 28 ++++ 4 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-13.c create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-14.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index b963a57881e..64d5611cbd2 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -1231,6 +1231,124 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, } } + else if (cost > 4 && TARGET_64BIT && can_create_pseudo_p () + && allow_new_pseudos) + { + struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS]; + int alt_cost; + + unsigned HOST_WIDE_INT loval = value & 0xffffffff; + unsigned HOST_WIDE_INT hival = (value & ~loval) >> 32; + bool bit31 = (hival & 0x80000000) != 0; + int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival); + int leading_shift = clz_hwi (loval) - clz_hwi (hival); + int shiftval = 0; + + /* Adjust the shift into the high half accordingly. */ + if ((trailing_shift > 0 && hival == (loval >> trailing_shift)) + || (trailing_shift < 0 && hival == (loval << trailing_shift))) + shiftval = 32 - trailing_shift; + else if ((leading_shift < 0 && hival == (loval >> leading_shift)) + || (leading_shift > 0 && hival == (loval << leading_shift))) + shiftval = 32 + leading_shift; + + if (shiftval && !bit31) + alt_cost = 2 + riscv_build_integer_1 (alt_codes, sext_hwi (loval, 32), + mode); + + /* For constants where the upper half is a shift of the lower half we + can do a shift followed by an or. */ + if (shiftval && alt_cost < cost && !bit31) + { + /* We need to save the first constant we build. */ + alt_codes[alt_cost - 3].save_temporary = true; + + /* Now we want to shift the previously generated constant into the + high half. */ + alt_codes[alt_cost - 2].code = ASHIFT; + alt_codes[alt_cost - 2].value = shiftval; + alt_codes[alt_cost - 2].use_uw = false; + alt_codes[alt_cost - 2].save_temporary = false; + + /* And the final step, IOR the two halves together. Since this uses + the saved temporary, use CONCAT similar to what we do for Zbkb. */ + alt_codes[alt_cost - 1].code = CONCAT; + alt_codes[alt_cost - 1].value = 0; + alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; + + memcpy (codes, alt_codes, sizeof (alt_codes)); + cost = alt_cost; + } + + if (cost > 4 && !bit31 && TARGET_ZBA) + { + int value = 0; + + /* Check for a shNadd. */ + if (hival == loval * 3) + value = 3; + else if (hival == loval * 5) + value = 5; + else if (hival == loval * 9) + value = 9; + + if (value) + alt_cost = 2 + riscv_build_integer_1 (alt_codes, + sext_hwi (loval, 32), mode); + + /* For constants where the upper half is a shNadd of the lower half + we can do a similar transformation. */ + if (value && alt_cost < cost) + { + alt_codes[alt_cost - 3].save_temporary = true; + alt_codes[alt_cost - 2].code = FMA; + alt_codes[alt_cost - 2].value = value; + alt_codes[alt_cost - 2].use_uw = false; + alt_codes[alt_cost - 2].save_temporary = false; + alt_codes[alt_cost - 1].code = CONCAT; + alt_codes[alt_cost - 1].value = 0; + alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; + + memcpy (codes, alt_codes, sizeof (alt_codes)); + cost = alt_cost; + } + } + + if (cost > 4 && !bit31) + { + int value = hival - loval; + + /* For constants were the halves differ by less than 2048 we can + generate the upper half by using an addi on the lower half then + using a shift 32 followed by an or. */ + if (abs (value) <= 2047) + { + alt_cost = 3 + riscv_build_integer_1 (alt_codes, + sext_hwi (loval, 32), mode); + if (alt_cost < cost) + { + alt_codes[alt_cost - 4].save_temporary = true; + alt_codes[alt_cost - 3].code = PLUS; + alt_codes[alt_cost - 3].value = value; + alt_codes[alt_cost - 3].use_uw = false; + alt_codes[alt_cost - 3].save_temporary = false; + alt_codes[alt_cost - 2].code = ASHIFT; + alt_codes[alt_cost - 2].value = 32; + alt_codes[alt_cost - 2].use_uw = false; + alt_codes[alt_cost - 2].save_temporary = false; + alt_codes[alt_cost - 1].code = CONCAT; + alt_codes[alt_cost - 1].value = 0; + alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; + + memcpy (codes, alt_codes, sizeof (alt_codes)); + cost = alt_cost; + } + } + } + } return cost; } @@ -2864,12 +2982,22 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT value, } else if (codes[i].code == CONCAT || codes[i].code == VEC_MERGE) { - rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; - rtx t2 = codes[i].code == VEC_MERGE ? old_value : x; - gcc_assert (t2); - t2 = gen_lowpart (SImode, t2); - emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2)); - x = t; + if (codes[i].code == CONCAT && !TARGET_ZBKB) + { + /* The two values should have no bits in common, so we can + use PLUS instead of IOR which has a higher chance of + using a compressed instruction. */ + x = gen_rtx_PLUS (mode, x, old_value); + } + else + { + rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; + rtx t2 = codes[i].code == VEC_MERGE ? old_value : x; + gcc_assert (t2); + t2 = gen_lowpart (SImode, t2); + emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2)); + x = t; + } } else x = gen_rtx_fmt_ee (codes[i].code, mode, diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-12.c b/gcc/testsuite/gcc.target/riscv/synthesis-12.c new file mode 100644 index 00000000000..bf2f89042a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-12.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori|or)" 45 } } */ + + +unsigned long foo_0x7857f2de7857f2de(void) { return 0x7857f2de7857f2deUL; } +unsigned long foo_0x7fffdffe3fffefff(void) { return 0x7fffdffe3fffefffUL; } +unsigned long foo_0x1ffff7fe3fffeffc(void) { return 0x1ffff7fe3fffeffcUL; } +unsigned long foo_0x0a3fdbf0028ff6fc(void) { return 0x0a3fdbf0028ff6fcUL; } +unsigned long foo_0x014067e805019fa0(void) { return 0x014067e805019fa0UL; } +unsigned long foo_0x09d87e90009d87e9(void) { return 0x09d87e90009d87e9UL; } +unsigned long foo_0x2302320000118119(void) { return 0x2302320000118119UL; } +unsigned long foo_0x000711eb00e23d60(void) { return 0x000711eb00e23d60UL; } +unsigned long foo_0x5983800001660e00(void) { return 0x5983800001660e00UL; } diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-13.c b/gcc/testsuite/gcc.target/riscv/synthesis-13.c new file mode 100644 index 00000000000..957410acda1 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-13.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc_zba" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori|or)" 45 } } */ + + +unsigned long foo_0x7907d89a2857f2de(void) { return 0x7907d89a2857f2deUL; } +unsigned long foo_0x4fffaffb0fffefff(void) { return 0x4fffaffb0fffefffUL; } +unsigned long foo_0x23ff6fdc03ffeffc(void) { return 0x23ff6fdc03ffeffcUL; } +unsigned long foo_0x170faedc028ff6fc(void) { return 0x170faedc028ff6fcUL; } +unsigned long foo_0x5704dee01d019fa0(void) { return 0x5704dee01d019fa0UL; } +unsigned long foo_0x0589c731009d87e9(void) { return 0x0589c731009d87e9UL; } +unsigned long foo_0x0057857d00118119(void) { return 0x0057857d00118119UL; } +unsigned long foo_0x546b32e010e23d60(void) { return 0x546b32e010e23d60UL; } +unsigned long foo_0x64322a0021660e00(void) { return 0x64322a0021660e00UL; } diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-14.c b/gcc/testsuite/gcc.target/riscv/synthesis-14.c new file mode 100644 index 00000000000..bd4e4afa55a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-14.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori|or)" 65 } } */ + + +unsigned long foo_0x7857faae7857f2de(void) { return 0x7857faae7857f2deUL; } +unsigned long foo_0x0ffff7fe0fffefff(void) { return 0x0ffff7fe0fffefffUL; } +unsigned long foo_0x7857f2de7857faae(void) { return 0x7857f2de7857faaeUL; } +unsigned long foo_0x7857f2af7857faae(void) { return 0x7857f2af7857faaeUL; } +unsigned long foo_0x5fbfffff5fbffae5(void) { return 0x5fbfffff5fbffae5UL; } +unsigned long foo_0x3d3079db3d3079ac(void) { return 0x3d3079db3d3079acUL; } +unsigned long foo_0x046075fe046078a8(void) { return 0x046075fe046078a8UL; } +unsigned long foo_0x2411811a24118119(void) { return 0x2411811a24118119UL; } +unsigned long foo_0x70e23d6a70e23d6b(void) { return 0x70e23d6a70e23d6bUL; } +unsigned long foo_0x0c01df8c0c01df7d(void) { return 0x0c01df8c0c01df7dUL; } +unsigned long foo_0x7fff07d07fff0000(void) { return 0x7fff07d07fff0000UL; }