From patchwork Wed Jul 31 16:40:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1967202 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WYyXp3CCfz1yfG for ; Thu, 1 Aug 2024 02:41:30 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9F66F3858C3A for ; Wed, 31 Jul 2024 16:41:28 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id AAE31385B503 for ; Wed, 31 Jul 2024 16:41:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AAE31385B503 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AAE31385B503 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722444062; cv=none; b=h7RQiINc1heYpg/YaGa1waO43p4+xPEysYm2ZJuKz3PPOkHSbQTBQRwoBl7YPAI+xbBTZsBfDixXjVtk6sWZ/07NehacaZpa9hbYvZA1XOh7vt5uUq0CvNBFuQwU1U9wI9530W04HDDKyLWPP3LHUljt6uJmf/2M7f1V0gDuoL4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1722444062; c=relaxed/simple; bh=J0r8iz/pjbNmugHkZZtOpG2JBagCn6NzkY60czU26Tc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=NDDwav2h8HyxekwuUGiPwkXUPxL0UaXKwNY0+sJlnEzq1Rb2df4kSSl17glVeglioHDgdUveHm8GJEgZ/+snqy5vmeSw6ntaIWs5JnpoCKo0bO/1VbsMnQsPSPhU7/iFdHAWi0RKZy/7klpsoH39dmUQf5il3AAGcKeQqzRzkDg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E93731007 for ; Wed, 31 Jul 2024 09:41:25 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E53823F5A1 for ; Wed, 31 Jul 2024 09:40:59 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] Make may_trap_p_1 return false for constant pool references [PR116145] Date: Wed, 31 Jul 2024 17:40:58 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Spam-Status: No, score=-19.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org The testcase contains the constant: arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); which was initially hoisted by hand, but which gimple optimisers later propagated to each use (as expected). The constant was then expanded as a load-and-duplicate from the constant pool. Normally that load should then be hoisted back out of the loop, but may_trap_or_fault_p stopped that from happening in this case. The code responsible was: if (/* MEM_NOTRAP_P only relates to the actual position of the memory reference; moving it out of context such as when moving code when optimizing, might cause its address to become invalid. */ code_changed || !MEM_NOTRAP_P (x)) { poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size, GET_MODE (x), code_changed); } where code_changed is true. (Arguably it doesn't need to be true in this case, if we inserted invariants on the preheader edge, but it would still need to be true for conditionally executed loads.) Normally this wouldn't be a problem, since rtx_addr_can_trap_p_1 would recognise that the address refers to the constant pool. However, the SVE load-and-replicate instructions have a limited offset range, so it isn't possible for them to have a LO_SUM address. All we have is a plain pseudo base register. MEM_READONLY_P is defined as: /* 1 if RTX is a mem that is statically allocated in read-only memory. */ (RTL_FLAG_CHECK1 ("MEM_READONLY_P", (RTX), MEM)->unchanging) and so I think it should be safe to move memory references if both MEM_READONLY_P and MEM_NOTRAP_P are true. The testcase isn't a minimal reproducer, but I think it's good to have a realistic full routine in the testsuite. Bootstrapped & regression-tested on aarch64-linux-gnu. OK to install? Richard gcc/ PR rtl-optimization/116145 * rtlanal.cc (may_trap_p_1): Trust MEM_NOTRAP_P even for code movement if MEM_READONLY_P is also true. gcc/testsuite/ PR rtl-optimization/116145 * gcc.target/aarch64/sve/acle/general/pr116145.c: New test. --- gcc/rtlanal.cc | 14 ++++-- .../aarch64/sve/acle/general/pr116145.c | 46 +++++++++++++++++++ 2 files changed, 56 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc index 4158a531bdd..893a6afbbc5 100644 --- a/gcc/rtlanal.cc +++ b/gcc/rtlanal.cc @@ -3152,10 +3152,16 @@ may_trap_p_1 (const_rtx x, unsigned flags) && MEM_VOLATILE_P (x) && XEXP (x, 0) == stack_pointer_rtx) return true; - if (/* MEM_NOTRAP_P only relates to the actual position of the memory - reference; moving it out of context such as when moving code - when optimizing, might cause its address to become invalid. */ - code_changed + if (/* MEM_READONLY_P means that the memory is both statically + allocated and readonly, so MEM_NOTRAP_P should remain true + even if the memory reference is moved. This is certainly + true for the important case of force_const_mem. + + Otherwise, MEM_NOTRAP_P only relates to the actual position + of the memory reference; moving it out of context such as + when moving code when optimizing, might cause its address + to become invalid. */ + (code_changed && !MEM_READONLY_P (x)) || !MEM_NOTRAP_P (x)) { poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c new file mode 100644 index 00000000000..a3d93d3e1c8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr116145.c @@ -0,0 +1,46 @@ +// { dg-options "-O2" } + +#include +#include + +#pragma GCC target "+sve2" + +typedef unsigned char uchar; + +const uchar * +search_line_fast (const uchar *s, const uchar *end) +{ + size_t VL = svcntb(); + svuint8_t arr1, arr2; + svbool_t pc, pg = svptrue_b8(); + + // This should not be loaded inside the loop every time. + arr2 = svreinterpret_u8(svdup_u32(0x0a0d5c3f)); + + for (; s+VL <= end; s += VL) { + arr1 = svld1_u8(pg, s); + pc = svmatch_u8(pg, arr1, arr2); + + if (svptest_any(pg, pc)) { + pc = svbrkb_z(pg, pc); + return s+svcntp_b8(pg, pc); + } + } + + // Handle remainder. + if (s < end) { + pg = svwhilelt_b8((size_t)s, (size_t)end); + + arr1 = svld1_u8(pg, s); + pc = svmatch_u8(pg, arr1, arr2); + + if (svptest_any(pg, pc)) { + pc = svbrkb_z(pg, pc); + return s+svcntp_b8(pg, pc); + } + } + + return end; +} + +// { dg-final { scan-assembler {:\n\tld1b\t[^\n]*\n\tmatch\t[^\n]*\n\tb\.} } }