From patchwork Fri May 31 04:10:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1941930 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.a=rsa-sha256 header.s=20230601 header.b=niTuG7CU; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vr8n9080sz20Pc for ; Fri, 31 May 2024 14:11:07 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 073B7385840B for ; Fri, 31 May 2024 04:11:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by sourceware.org (Postfix) with ESMTPS id D5EFD3858C60 for ; Fri, 31 May 2024 04:10:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5EFD3858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D5EFD3858C60 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::82f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717128642; cv=none; b=t5dOTNG2p8N86JfI+N3f/mV8SWw70sZWN/2XdkAa6sop5J2Xd6qGKc3b5HuY3SrTqiIh26qgJxj7QCv9u8z3lOydeanaLSk/oDLBl8OUl5KkWnBCYziItpIzkLwWRHQICvRmdV2I9gLU9CpCCVvoTQGrTwxGq7jvu75MD1STSSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717128642; c=relaxed/simple; bh=I4mtJi8BEe5ou7bhm8uvlVvONH2d2qHQA9vUVJ9Eklk=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:To:Subject; b=XsCXvpCKr9UsdCwtFjx85UtaBmd87PM3ulp5tBKgmJy3JdydKx1PUvPoyBDNubiYeyrlcnkdc01LdLt7ZbIhzPr4LkUXtn+UtkeR9LkCT3nEE97UyIu5fJELDbxk7CtuyvZXeuhu0hXvsK76mY7zO81Lc9wWdKeKzo+58OX5yLk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qt1-x82f.google.com with SMTP id d75a77b69052e-43fdb797ee2so8868341cf.3 for ; Thu, 30 May 2024 21:10:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717128638; x=1717733438; darn=gcc.gnu.org; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=HojN5gv1lP7C9iKwYjx/uVqo79uzkRdus7vYxd+1Gsg=; b=niTuG7CULH61J8bbeS/MUN6csWJCQMXqfzQspFskvj2AbMDQ9LM5JmpewNRc0wnB4t A5U4A8lS8i8cS8iizK6q8F3TS7znWdTZOSiSJ4dzNb9AAwDwDBvGiD/cZ7TWY67TSwoO 7ADZunT22K3+8xitjxmEUN3l0nic0f1PC0LU5yPJIqzWh44rnBv49QKXk7lLDz6AoNbG GpOwS143W3YRdYqx73kUvuJ27hq3ltLwLRVSEd/6R853dnRkM2T855LuRMIeP1Xiui93 TqFKPif5VlH6VoOHfZpxLJiuOCN9UuKZ3JhrY8yWp7Vs9NipYTvwuhFGzZkPJJiuFW7K mmWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717128638; x=1717733438; h=subject:to:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HojN5gv1lP7C9iKwYjx/uVqo79uzkRdus7vYxd+1Gsg=; b=JmViXmgDS60VNc6N8bCspZJNdXYgqhWiV7yDU88swYzzZi41/tqoXpNT9eBUcXsJ/U Dewk7sfe4B/CETNF9GRtysPR7nMhB7ShiBTQq1PszScZNK5g+m2okTxkucuUCG4tTA5T xnDEZ3UclPil59MxPs4nhskEkucDbWRUq4YgF5mqr5TfbQk7QXgipAqnXMJg86YLZ7uf dX+ofVSgYj8L4tJ9Ww+jjFWIoFb85VI36RqSdv8qfbv9LnBe8Hvj9Zd8j8ql6kXhlQG5 sk2T6vv+U6ipGKkVxMRvL9RZSlk6jfBD3/yjR9qO3o4Q/J1xDnPy5ud99F7Ry5ie5Z5q YdYQ== X-Gm-Message-State: AOJu0Yxct+/5ZKHQLTREyykT1dYeqW4kJqeeSRTtB1AyKIa1+f+AY1vf 5H9225IGx1pg0sa08+rHVvHBRSuGw/FJLX4LZk6qlwkCb5riiXIRwr4A9iwL X-Google-Smtp-Source: AGHT+IFpQe1fINeevLXqEnNhDE/8uuXVvTDX7xiYKIPAXLxXa1cVK/Pc1PaoMkbGz8/fDqb8lsyTSQ== X-Received: by 2002:ac8:7f8f:0:b0:43e:3c3b:d9df with SMTP id d75a77b69052e-43ff54a7265mr5935391cf.61.1717128637320; Thu, 30 May 2024 21:10:37 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-43ff7833015sm465421cf.18.2024.05.30.21.10.36 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 30 May 2024 21:10:36 -0700 (PDT) Message-ID: <79a5d4d5-9356-4e49-9f2c-20638f66d267@gmail.com> Date: Thu, 30 May 2024 22:10:35 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law To: "gcc-patches@gcc.gnu.org" Subject: [to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Basically this adds the ability to generate two independent constants during synthesis, then bring them together with a pack instruction. Thus we never need to go out to the constant pool when zbkb is enabled. The worst sequence we ever generate is lui+addi+lui+addi+pack Obviously if either half can be synthesized with just a lui or just an addi, then we'll DTRT automagically. So for example: unsigned long foo_0xf857f2def857f2de(void) { return 0x1425000028000000; } The high and low halves are just a lui. So the final synthesis is: > li a5,671088640 # 15 [c=4 l=4] *movdi_64bit/1 > li a0,337969152 # 16 [c=4 l=4] *movdi_64bit/1 > pack a0,a5,a0 # 17 [c=12 l=4] riscv_xpack_di_si_2 On the implementation side, I think the bits I've put in here likely can be used to handle the repeating constant case for !zbkb. I think it likely could be used to help capture cases where the upper half can be derived from the lower half (say by turning a bit on or off, shifting or something similar). The key in both of these cases is we need a temporary register holding an intermediate value. Ventana's internal tester enables zbkb, but I don't think any of the other testers currently exercise zbkb. We'll probably want to change that at some point, but I don't think it's super-critical yet. While I can envision a few more cases where we could improve constant synthesis, No immediate plans to work in this space, but if someone is interested, some thoughts are recorded here: > https://wiki.riseproject.dev/display/HOME/CT_00_031+--+Additional+Constant+Synthesis+Improvements Jeff gcc/ * config/riscv/riscv.cc (riscv_integer_op): Add new field. (riscv_build_integer_1): Initialize the new field. (riscv_built_integer): Recognize more cases where Zbkb's pack instruction is profitable. (riscv_move_integer): Loop over all the codes. If requested, save the current constant into a temporary. Generate pack for more cases using the saved constant. diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 91fefacee80..10af38a5a81 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -250,6 +250,7 @@ struct riscv_arg_info { and each VALUE[i] is a constant integer. CODE[0] is undefined. */ struct riscv_integer_op { bool use_uw; + bool save_temporary; enum rtx_code code; unsigned HOST_WIDE_INT value; }; @@ -759,6 +760,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], codes[0].code = UNKNOWN; codes[0].value = value; codes[0].use_uw = false; + codes[0].save_temporary = false; return 1; } if (TARGET_ZBS && SINGLE_BIT_MASK_OPERAND (value)) @@ -767,6 +769,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], codes[0].code = UNKNOWN; codes[0].value = value; codes[0].use_uw = false; + codes[0].save_temporary = false; /* RISC-V sign-extends all 32bit values that live in a 32bit register. To avoid paradoxes, we thus need to use the @@ -796,6 +799,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost-1].code = PLUS; alt_codes[alt_cost-1].value = low_part; alt_codes[alt_cost-1].use_uw = false; + alt_codes[alt_cost-1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -810,6 +814,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost-1].code = XOR; alt_codes[alt_cost-1].value = low_part; alt_codes[alt_cost-1].use_uw = false; + alt_codes[alt_cost-1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -852,6 +857,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost-1].code = ASHIFT; alt_codes[alt_cost-1].value = shift; alt_codes[alt_cost-1].use_uw = use_uw; + alt_codes[alt_cost-1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -873,9 +879,11 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], codes[0].value = (((unsigned HOST_WIDE_INT) value >> trailing_ones) | (value << (64 - trailing_ones))); codes[0].use_uw = false; + codes[0].save_temporary = false; codes[1].code = ROTATERT; codes[1].value = 64 - trailing_ones; codes[1].use_uw = false; + codes[1].save_temporary = false; cost = 2; } /* Handle the case where the 11 bit range of zero bits wraps around. */ @@ -888,9 +896,11 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], | ((unsigned HOST_WIDE_INT) value >> (32 + upper_trailing_ones))); codes[0].use_uw = false; + codes[0].save_temporary = false; codes[1].code = ROTATERT; codes[1].value = 32 - upper_trailing_ones; codes[1].use_uw = false; + codes[1].save_temporary = false; cost = 2; } @@ -917,6 +927,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost].code = AND; alt_codes[alt_cost].value = ~(1UL << bit); alt_codes[alt_cost].use_uw = false; + alt_codes[alt_cost].save_temporary = false; alt_cost++; nval &= ~(1UL << bit); } @@ -938,6 +949,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost - 1].code = FMA; alt_codes[alt_cost - 1].value = 9; alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -948,6 +960,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost - 1].code = FMA; alt_codes[alt_cost - 1].value = 5; alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -958,6 +971,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost - 1].code = FMA; alt_codes[alt_cost - 1].value = 3; alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -978,6 +992,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[alt_cost - 1].code = PLUS; alt_codes[alt_cost - 1].value = adjustment; alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -995,6 +1010,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[i].code = (i == 0 ? UNKNOWN : IOR); alt_codes[i].value = value & 0x7ffff000; alt_codes[i].use_uw = false; + alt_codes[i].save_temporary = false; value &= ~0x7ffff000; i++; } @@ -1005,6 +1021,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[i].code = (i == 0 ? UNKNOWN : PLUS); alt_codes[i].value = value & 0x7ff; alt_codes[i].use_uw = false; + alt_codes[i].save_temporary = false; value &= ~0x7ff; i++; } @@ -1016,6 +1033,7 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS], alt_codes[i].code = (i == 0 ? UNKNOWN : IOR); alt_codes[i].value = 1UL << bit; alt_codes[i].use_uw = false; + alt_codes[i].save_temporary = false; value &= ~(1ULL << bit); i++; } @@ -1057,6 +1075,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, alt_codes[alt_cost-1].code = LSHIFTRT; alt_codes[alt_cost-1].value = shift; alt_codes[alt_cost-1].use_uw = false; + alt_codes[alt_cost-1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -1069,6 +1088,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, alt_codes[alt_cost-1].code = LSHIFTRT; alt_codes[alt_cost-1].value = shift; alt_codes[alt_cost-1].use_uw = false; + alt_codes[alt_cost-1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -1093,6 +1113,7 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, alt_codes[alt_cost - 1].code = XOR; alt_codes[alt_cost - 1].value = -1; alt_codes[alt_cost - 1].use_uw = false; + alt_codes[alt_cost - 1].save_temporary = false; memcpy (codes, alt_codes, sizeof (alt_codes)); cost = alt_cost; } @@ -1128,13 +1149,55 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, if (cost > 3 && TARGET_64BIT && TARGET_ZBKB) { unsigned HOST_WIDE_INT loval = value & 0xffffffff; - unsigned HOST_WIDE_INT hival = value & ~loval; - if (hival >> 32 == loval) + unsigned HOST_WIDE_INT hival = (value & ~loval) >> 32; + if (hival == loval) { cost = 1 + riscv_build_integer_1 (codes, sext_hwi (loval, 32), mode); codes[cost - 1].code = CONCAT; codes[cost - 1].value = 0; codes[cost - 1].use_uw = false; + codes[cost - 1].save_temporary = false; + } + + /* An arbitrary 64 bit constant can be synthesized in 5 instructions + using zbkb. We may do better than that if the upper or lower half + can be synthsized with a single LUI, ADDI or BSET. Regardless the + basic steps are the same. */ + if (cost > 3 && can_create_pseudo_p ()) + { + struct riscv_integer_op hi_codes[RISCV_MAX_INTEGER_OPS]; + struct riscv_integer_op lo_codes[RISCV_MAX_INTEGER_OPS]; + int hi_cost, lo_cost; + + /* Synthesize and get cost for each half. */ + lo_cost + = riscv_build_integer_1 (lo_codes, sext_hwi (loval, 32), mode); + hi_cost + = riscv_build_integer_1 (hi_codes, sext_hwi (hival, 32), mode); + + /* If profitable, finish synthesis using zbkb. */ + if (cost > hi_cost + lo_cost + 1) + { + /* We need the low half independent of the high half. So + mark it has creating a temporary we'll use later. */ + memcpy (codes, lo_codes, + lo_cost * sizeof (struct riscv_integer_op)); + codes[lo_cost - 1].save_temporary = true; + + /* Now the high half synthesis. */ + memcpy (codes + lo_cost, hi_codes, + hi_cost * sizeof (struct riscv_integer_op)); + + /* Adjust the cost. */ + cost = hi_cost + lo_cost + 1; + + /* And finally (ab)use VEC_MERGE to indicate we want to + put merge the two parts together. */ + codes[cost - 1].code = VEC_MERGE; + codes[cost - 1].value = 0; + codes[cost - 1].use_uw = false; + codes[cost - 1].save_temporary = false; + } } } @@ -2656,23 +2719,25 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT value, x = riscv_split_integer (value, mode); else { - codes[0].value = trunc_int_for_mode (codes[0].value, mode); - /* Apply each binary operation to X. */ - x = GEN_INT (codes[0].value); - - for (i = 1; i < num_ops; i++) + rtx old_value = NULL_RTX; + for (i = 0; i < num_ops; i++) { - if (!can_create_pseudo_p ()) + if (i != 0 && !can_create_pseudo_p ()) x = riscv_emit_set (temp, x); - else + else if (i != 0) x = force_reg (mode, x); codes[i].value = trunc_int_for_mode (codes[i].value, mode); - /* If the sequence requires using a "uw" form of an insn, we're - going to have to construct the RTL ourselves and put it in - a register to avoid force_reg/force_operand from mucking things - up. */ - if (codes[i].use_uw) + if (codes[i].code == UNKNOWN) { + /* UNKNOWN means load the constant value into X. */ + x = GEN_INT (codes[i].value); + } + else if (codes[i].use_uw) + { + /* If the sequence requires using a "uw" form of an insn, we're + going to have to construct the RTL ourselves and put it in + a register to avoid force_reg/force_operand from mucking + things up. */ gcc_assert (TARGET_64BIT || TARGET_ZBA); rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; @@ -2695,16 +2760,27 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT value, rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; x = riscv_emit_set (t, x); } - else if (codes[i].code == CONCAT) + else if (codes[i].code == CONCAT || codes[i].code == VEC_MERGE) { rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; - rtx t2 = gen_lowpart (SImode, x); + rtx t2 = codes[i].code == VEC_MERGE ? old_value : x; + gcc_assert (t2); + t2 = gen_lowpart (SImode, t2); emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2)); x = t; } else x = gen_rtx_fmt_ee (codes[i].code, mode, x, GEN_INT (codes[i].value)); + + /* If this entry in the code table indicates we should save away + the temporary holding the current value of X, then do so. */ + if (codes[i].save_temporary) + { + gcc_assert (old_value == NULL_RTX); + x = force_reg (mode, x); + old_value = x; + } } }