From patchwork Wed May 29 03:06:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1940870 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=eKpXjPzG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VpvRc67Mgz20Pr for ; Wed, 29 May 2024 13:06:36 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 44DC7384F4B9 for ; Wed, 29 May 2024 03:06:33 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by sourceware.org (Postfix) with ESMTPS id 64E5D3858D39 for ; Wed, 29 May 2024 03:06:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 64E5D3858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 64E5D3858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::72a ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716951975; cv=none; b=NuY2O2ucGRZopI6uKn+MS3k9bjXN8NU4xPlFcd8M+Kr6MBeVfyNQj5k8aAJAgfujrokBlCTJRmjbhrrRmfnsrkW+H7HjzHgwYjLRbvVkVMKilNCoTU+x1EbYAtsMO2CP0ikOjeqhNjPdA0qxjW/2l1u4SspBK86pQoERaD+7QlM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716951975; c=relaxed/simple; bh=HdouOt6u7GxGdkvrvotbTk2ovP6XWlZDlfUttI9/vek=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=VBhzWGwKCXtp2f2Qv3j/XbZrgwLDPnvJ0l5C9+xClUyZ1BEWXnn78ft/NigZSR3lq3Ot+Nhc4VSAF4++iKH3Si55XMQbYN/rDvic8rtMeqYpq+dewFJRem+WdaX+p+G6csWHsr24Tq4mKmo4kMngOiReDVrxM9mubXubnSXUpVo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-qk1-x72a.google.com with SMTP id af79cd13be357-794ba2d4579so101064685a.1 for ; Tue, 28 May 2024 20:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1716951972; x=1717556772; darn=gcc.gnu.org; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=jF5aw5dPURDO5am/1xqqUYs7FjEtet4M4uCnqfYxGEk=; b=eKpXjPzGpzDUokAh4l3s74E19rDc+pFGrmm3MITXITqna1zxyMIl3UZ0IeiTqGp+E6 hmIymCGjGWr6lBEQD+5o0esPr2N6wS6nfk9M46xykrHzTU2yYVV4nOdNRMBtz97/bgCM edR4WVG3Q/CEEwTCpyb2Eqz347sKTLeiXU3tU4fXtzKi79JhMjfTV1ADjKWY4/o8SRSa PQ4slxqZQN5VikkM0pzxvIlLQLTUqYhMcW4gpdcIvSQZ3LpnDIraftNWdbXzAIRV72nC jLSVDrC+N/+MISKRrkHH43STY6XQSpDy+f9Uco2y19KKVBITM4aUW39CpwkADoYPCqjS IHeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716951972; x=1717556772; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jF5aw5dPURDO5am/1xqqUYs7FjEtet4M4uCnqfYxGEk=; b=jUfZpJ0mCyx+B+mFgKe8ytxydn4051W2gaoSo+6PeHn0SzJZTXLyPVu5n+4PcjUHMF 3AQBjcauGPmLHsNbTjhcdL3JsMFTqaBwblgM2AbkHbI5ffZ581yNMNM+ZF22XxTs13FA Cza0EOlQcjABk1iH4zfKWfr/YShSwQYfM8JhmCNOIYjaic/xudVQPP2lSXu+66mHqrKw Q14D7QX4Yz90ARq7zKzCbtw57TblQLA5tQHntEhRYevPN1jXPvGwNqta4ogj5VakPcOi TP0bNQR+t4WxHbmOSOhVOnjkjaT2EgoTTaqV6e+ndAKey+FysJBoi0qFNnpCF32TrLU+ yoGA== X-Gm-Message-State: AOJu0YxM9EDrsGAmnebvSJ4dYx1/kJofGQAROWfh9x7bt9CGrTkQSfMp A9tr+qK3ZLGlIClgu3ozp6Wiee6HkJ+Tt5BX7MtyWP3vv+azu6S2Tt43lBC3TM0RbwZV8WQYROS 5 X-Google-Smtp-Source: AGHT+IGUQWkxVrKBL7xXtcILgj+bb6AKhKNyxaHS57SrhOXI0VBLun2fvWDhh3fSN0HFc9JVUr12/w== X-Received: by 2002:a05:6214:2d44:b0:6ad:8e28:9b94 with SMTP id 6a1803df08f44-6ad8e289cefmr44545006d6.65.1716951971802; Tue, 28 May 2024 20:06:11 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6ac20a126c0sm50284106d6.145.2024.05.28.20.06.10 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 28 May 2024 20:06:11 -0700 (PDT) Message-ID: Date: Tue, 28 May 2024 21:06:09 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law Subject: [to-be-committed] [RISC-V] Use pack to handle repeating constants To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, URIBL_BLACK autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch utilizes zbkb to improve the code we generate for 64bit constants when the high half is a duplicate of the low half. Basically we generate the low half and use a pack instruction with that same register repeated. ie pack dest,src,src That gives us a maximum sequence of 3 instructions and sometimes it will be just 2 instructions (say if the low 32bits can be constructed with a single addi or lui). As with shadd, I'm abusing an RTL opcode. This time it's CONCAT. It's reasonably close to what we're doing. Obviously it's just how we identify the desire to generate a pack in the array of opcodes. We don't actually emit a CONCAT. Note that we don't care about the potential sign extension from bit 31. pack will only look at bits 0..31 of each input (for rv64). So we go ahead and sign extend before synthesizing the low part as that allows us to handle more cases trivially. I had my testsuite generator chew on random cases of a repeating constant without any surprises. I don't see much point in including all those in the testcase (after all there's 2**32 of them). I've got a set of 10 I'm including. Nothing particularly interesting in them. An enterprising developer that needs this improved without zbkb could probably do so with a bit of work. First increase the cost by 1 unit. Second avoid cases where bit 31 is set and restrict it to cases when we can still create pseudos. On the codegen side, when encountering the CONCAT, generate the appropriate shift of "X" into a temporary register, then IOR the temporary with "X" into the new destination. Anyway, I've tested this in my tester (though it doesn't turn on zbkb, yet). I'll let the CI system chew on it overnight, but like mine, I don't think it lights up zbkb. So it's unlikely to spit out anything interesting. Jeff diff --git a/gcc/config/riscv/crypto.md b/gcc/config/riscv/crypto.md index b632312ade2..b9cac78fce1 100644 --- a/gcc/config/riscv/crypto.md +++ b/gcc/config/riscv/crypto.md @@ -107,7 +107,7 @@ (define_insn "riscv_pack_" ;; This is slightly more complex than the other pack patterns ;; that fully expose the RTL as it needs to self-adjust to ;; rv32 and rv64. But it's not that hard. -(define_insn "*riscv_xpack__2" +(define_insn "riscv_xpack___2" [(set (match_operand:X 0 "register_operand" "=r") (ior:X (ashift:X (match_operand:X 1 "register_operand" "r") (match_operand 2 "immediate_operand" "n")) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index a99211d56b1..91fefacee80 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -1123,6 +1123,22 @@ riscv_build_integer (struct riscv_integer_op *codes, HOST_WIDE_INT value, } } + /* With pack we can generate a 64 bit constant with the same high + and low 32 bits triviall. */ + if (cost > 3 && TARGET_64BIT && TARGET_ZBKB) + { + unsigned HOST_WIDE_INT loval = value & 0xffffffff; + unsigned HOST_WIDE_INT hival = value & ~loval; + if (hival >> 32 == loval) + { + cost = 1 + riscv_build_integer_1 (codes, sext_hwi (loval, 32), mode); + codes[cost - 1].code = CONCAT; + codes[cost - 1].value = 0; + codes[cost - 1].use_uw = false; + } + + } + return cost; } @@ -2679,6 +2695,13 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT value, rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; x = riscv_emit_set (t, x); } + else if (codes[i].code == CONCAT) + { + rtx t = can_create_pseudo_p () ? gen_reg_rtx (mode) : temp; + rtx t2 = gen_lowpart (SImode, x); + emit_insn (gen_riscv_xpack_di_si_2 (t, x, GEN_INT (32), t2)); + x = t; + } else x = gen_rtx_fmt_ee (codes[i].code, mode, x, GEN_INT (codes[i].value)); diff --git a/gcc/testsuite/gcc.target/riscv/synthesis-9.c b/gcc/testsuite/gcc.target/riscv/synthesis-9.c new file mode 100644 index 00000000000..cc622188abc --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/synthesis-9.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* We aggressively skip as we really just need to test the basic synthesis + which shouldn't vary based on the optimization level. -O1 seems to work + and eliminates the usual sources of extraneous dead code that would throw + off the counts. */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O2" "-O3" "-Os" "-Oz" "-flto" } } */ +/* { dg-options "-march=rv64gc_zba_zbb_zbkb_zbs" } */ + +/* Rather than test for a specific synthesis of all these constants or + having thousands of tests each testing one variant, we just test the + total number of instructions. + + This isn't expected to change much and any change is worthy of a look. */ +/* { dg-final { scan-assembler-times "\\t(add|addi|bseti|li|pack|ret|sh1add|sh2add|sh3add|slli|srli|xori)" 40 } } */ + + + +unsigned long foo_0xf857f2def857f2de(void) { return 0xf857f2def857f2deUL; } +unsigned long foo_0x99660e6399660e63(void) { return 0x99660e6399660e63UL; } +unsigned long foo_0x937f1b75937f1b75(void) { return 0x937f1b75937f1b75UL; } +unsigned long foo_0xb5019fa0b5019fa0(void) { return 0xb5019fa0b5019fa0UL; } +unsigned long foo_0xb828e6c1b828e6c1(void) { return 0xb828e6c1b828e6c1UL; } +unsigned long foo_0x839d87e9839d87e9(void) { return 0x839d87e9839d87e9UL; } +unsigned long foo_0xc29617c1c29617c1(void) { return 0xc29617c1c29617c1UL; } +unsigned long foo_0xa4118119a4118119(void) { return 0xa4118119a4118119UL; } +unsigned long foo_0x8c01df7d8c01df7d(void) { return 0x8c01df7d8c01df7dUL; } +unsigned long foo_0xf0e23d6bf0e23d6b(void) { return 0xf0e23d6bf0e23d6bUL; }