From patchwork Wed Jul 10 18:13:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1958975 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=KYJyMC9o; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WK5bb1sw2z1xqc for ; Thu, 11 Jul 2024 04:14:18 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 57031384A474 for ; Wed, 10 Jul 2024 18:14:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [IPv6:2607:f8b0:4864:20::333]) by sourceware.org (Postfix) with ESMTPS id AB0003851C30 for ; Wed, 10 Jul 2024 18:13:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB0003851C30 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AB0003851C30 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::333 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720635236; cv=none; b=KE4/Hb024fIx15Eh8sbWGARPhlN6v6mamZtMRq+rg8ObKdbxD67QTb2oClibEDNqRpzKZGzqBIVpuoTCCGIy5+oNXZJ+5A1849h6HGgvtUAJI1mM/pwl+U6X+R6gObgYWCW9qi5cH9fZshbGMZulKRO20yzA0Vqh8Lj2IdfHWyA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720635236; c=relaxed/simple; bh=8A+zDT/TDQQOgxjmzANa2fous4Gn5Ngx+CkCaWSPDVA=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=gYgAYyCdnx0GwvPNRevxYCbkjv1yWGr/Fk46gGTE7d0rVOILH4+iavIFM3JGT9pBLcGBusCvgPhcjHrwICaDHqIXgb55KebcIcAMU2bphSXAhgHRRaykcbYAhsfWBXoir9T8RkkKJKkAtHI5HVzfl5Lx8k/o6glLLOyHQD8utfA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ot1-x333.google.com with SMTP id 46e09a7af769-704473c0698so790361a34.3 for ; Wed, 10 Jul 2024 11:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1720635232; x=1721240032; darn=gcc.gnu.org; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=ns6PGakH+A/SRN75kEG734epbx+bwWkKwRykftQ3S6I=; b=KYJyMC9oQNLjq9RY0WeymIuH56BjhCfL9rTgxjdZLaMNLpONhkhbobClzego1/nLZf wl+FMcB6x9OWPbKxajEPfzcqq+jIcKVFYX+YNeIcuymQSTW5GjQyvCeQEpRp/MucvGXL ghtJieQxou56TtBQ+K0rOsNql5qKoI+cLCxNwPpFtxKXZ4vmh/ENzqlx0X1ZA4wiNHFg ZSz2TPmMkpLv0cnxjokDYY0AVkzUgbfxOi/rEH+Y57LxUOClWGIp6iNiMHp2lbNDKb/P MTRG+/iDE2BAyXVgHygDyqbn3v05SZtsxh/gsuLqQFLEZ795O1Hq1nrmNmyPgLllgSX9 TDYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720635232; x=1721240032; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ns6PGakH+A/SRN75kEG734epbx+bwWkKwRykftQ3S6I=; b=lX27Hzkc5fwTZ5F5Z8lnunzlgCY2j5re5LFoPOfgtOHDfDy4gZQM65OYLEPvDn/1JG E+WD2zfda+6N1m5b2RZmRanJ92ZJV7T0yPBjd1KD0V4W54sQU9Hy6AMNJS+/Wwpe+iPw 5MM/KKUduwGObfLuM+lP1eBTk5LLupBYpG3Rq7q9qmha+fpruu/xs7HlJac7DJ1oLW5b 11l5K485/EHtDPfrQduvyUdYVi55qsxky1GylUpIap5JaGn1efR+yN5eFeZSF+icB9q+ AEFNNLKEcD58Hni+QK79KnskcVuPJOZNShAcSGmWweEoZmcOusOBQZ+4EH+1GsShw0oG 6fNA== X-Gm-Message-State: AOJu0Yw0F/kITNa5KOxc2YC6bXvDxZaABfuBDHsbqYX8EniQuiCeDvgI o1HwFrFXHdxfsmMICnm9T8UgjyZQJV3GcydhZxzayDqJc/T44Q+0bgesPANZkgviNoIhMmvfsUO x X-Google-Smtp-Source: AGHT+IG1kdZ7OPeMMtVsS25d9EDaWaWhmMjtiPX4vy0SOt3XP9X1zYj5VNxtW8sa82C237k1etZAdQ== X-Received: by 2002:a9d:66d5:0:b0:703:5ac3:3e4e with SMTP id 46e09a7af769-70375a01c86mr6561127a34.7.1720635231799; Wed, 10 Jul 2024 11:13:51 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id 46e09a7af769-70374f7987asm950398a34.30.2024.07.10.11.13.50 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Jul 2024 11:13:51 -0700 (PDT) Message-ID: <4b819e72-26bb-47dc-a405-c239ab07db71@ventanamicro.com> Date: Wed, 10 Jul 2024 12:13:50 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law Subject: [to-be-committed][RISC-V] Eliminate unnecessary sign extension after inlined str[n]cmp To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch eliminates an unnecessary sign extension for scalar inlined string comparisons on rv64. Conceptually this is pretty simple. Prove all the paths which "return" a value from the inlined string comparison already have sign extended values. FINAL_LABEL is the point after the calculation of the return value. So if we have a jump to FINAL_LABEL, we must have a properly extended result value at that point. Second we're going to arrange in the .md part of the expander to use an X mode temporary for the result. After computing the result we will (if necessary) extract the low part of the result using a SUBREG tagged with the appropriate SUBREG_PROMOTED_* bits. So with that background. We find a jump to FINAL_LABEL in emit_strcmp_scalar_compare_byte. Since we know the result is X mode, we can just emit the subtraction of the two chars in X mode and we'll have a properly sign extended result. There's 4 jumps to final_label in emit_strcmp_scalar. The first is just returning zero and needs trivial simplification to not force the result into SImode. The second is after calling strcmp in the library. The ABI mandates that value is sign extended, so there's nothing to do for that case. The 3rd occurs after a call to emit_strcmp_scalar_result_calculation_nonul. If we dive into that routine it needs simplificationq similar to what we did in emit_strcmp_scalar_compare_byte The 4th occurs after a call to emit_strcmp_scalar_result_calculation which again needs trivial adjustment like we've done in the other routines. Finally, at the end of expand_strcmp, just store the X mode result sitting in SUB to RESULT. The net of all that is we know every path has its result properly extended to X mode. Standard redundant extension removal will take care of the rest. We've been running this within Ventana for about 6 months, so naturally it's been through various QA cycles, dhrystone, spec2017, etc. It's also been through a build/test cycle in my tester. Waiting on results from the pre-commit testing before moving forward. jeff ps. I suspect memcmp could probably benefit from the same treatment. gcc/ * config/riscv/riscv-string.cc (emit_strcmp_scalar_compare_byte): Set RESULT directly rather than using a new temporary. (emit_strcmp_scalar_result_calculation_nonul): Likewise. (emit_strcmp_scalar_result_calculation): Likewise. (riscv_expand_strcmp_scalar): Use CONST0_RTX rather than generating a new node. (expand_strcmp): Copy directly from SUB to RESULT. * config/riscv/riscv.md (cmpstrnsi, cmpstrsi): Pass an X mode temporary to the expansion routines. If necessary extract low part of the word to store in final result location. diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index d9380033175..29e38a1c8b7 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -140,9 +140,7 @@ static void emit_strcmp_scalar_compare_byte (rtx result, rtx data1, rtx data2, rtx final_label) { - rtx tmp = gen_reg_rtx (Xmode); - do_sub3 (tmp, data1, data2); - emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp))); + do_sub3 (result, data1, data2); emit_jump_insn (gen_jump (final_label)); emit_barrier (); /* No fall-through. */ } @@ -310,8 +308,7 @@ emit_strcmp_scalar_result_calculation_nonul (rtx result, rtx data1, rtx data2) rtx tmp = gen_reg_rtx (Xmode); emit_insn (gen_slt_3 (LTU, Xmode, Xmode, tmp, data1, data2)); do_neg2 (tmp, tmp); - do_ior3 (tmp, tmp, const1_rtx); - emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp))); + do_ior3 (result, tmp, const1_rtx); } /* strcmp-result calculation. @@ -367,9 +364,7 @@ emit_strcmp_scalar_result_calculation (rtx result, rtx data1, rtx data2, unsigned int shiftr = (xlen - 1) * BITS_PER_UNIT; do_lshr3 (data1, data1, GEN_INT (shiftr)); do_lshr3 (data2, data2, GEN_INT (shiftr)); - rtx tmp = gen_reg_rtx (Xmode); - do_sub3 (tmp, data1, data2); - emit_insn (gen_movsi (result, gen_lowpart (SImode, tmp))); + do_sub3 (result, data1, data2); } /* Expand str(n)cmp using Zbb/TheadBb instructions. @@ -444,7 +439,7 @@ riscv_expand_strcmp_scalar (rtx result, rtx src1, rtx src2, /* All compared and everything was equal. */ if (ncompare) { - emit_insn (gen_rtx_SET (result, gen_rtx_CONST_INT (SImode, 0))); + emit_insn (gen_rtx_SET (result, CONST0_RTX (GET_MODE (result)))); emit_jump_insn (gen_jump (final_label)); emit_barrier (); /* No fall-through. */ } @@ -1544,7 +1537,7 @@ expand_strcmp (rtx result, rtx src1, rtx src2, rtx nbytes, if (with_length) emit_label (done); - emit_insn (gen_movsi (result, gen_lowpart (SImode, sub))); + emit_move_insn (result, sub); return true; } diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 6032a65fc68..9c6382a2b4e 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -4345,9 +4345,19 @@ (define_expand "cmpstrnsi" "riscv_inline_strncmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { - if (riscv_expand_strcmp (operands[0], operands[1], operands[2], + rtx temp = gen_reg_rtx (word_mode); + if (riscv_expand_strcmp (temp, operands[1], operands[2], operands[3], operands[4])) - DONE; + { + if (TARGET_64BIT) + { + temp = gen_lowpart (SImode, temp); + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, SRP_SIGNED); + } + emit_move_insn (operands[0], temp); + DONE; + } else FAIL; }) @@ -4366,9 +4376,19 @@ (define_expand "cmpstrsi" "riscv_inline_strcmp && !optimize_size && (TARGET_ZBB || TARGET_XTHEADBB || TARGET_VECTOR)" { - if (riscv_expand_strcmp (operands[0], operands[1], operands[2], + rtx temp = gen_reg_rtx (word_mode); + if (riscv_expand_strcmp (temp, operands[1], operands[2], NULL_RTX, operands[3])) - DONE; + { + if (TARGET_64BIT) + { + temp = gen_lowpart (SImode, temp); + SUBREG_PROMOTED_VAR_P (temp) = 1; + SUBREG_PROMOTED_SET (temp, SRP_SIGNED); + } + emit_move_insn (operands[0], temp); + DONE; + } else FAIL; })