From patchwork Wed Jul 24 17:18:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jeff Law X-Patchwork-Id: 1964393 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ventanamicro.com header.i=@ventanamicro.com header.a=rsa-sha256 header.s=google header.b=osGYRi/H; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WTgjF4xBWz1yXx for ; Thu, 25 Jul 2024 03:18:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F2C0E385828E for ; Wed, 24 Jul 2024 17:18:54 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by sourceware.org (Postfix) with ESMTPS id D97C13858C78 for ; Wed, 24 Jul 2024 17:18:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D97C13858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=ventanamicro.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ventanamicro.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D97C13858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::235 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721841512; cv=none; b=AYqdpnpmrZXCEY1ObHF4E+iGq3TMefHVuXSBybJ5bdz3I41wYasm3nKnGj1ZN1zIU/HsAHFoO+hO4bSsEuoxGntmxaKGkeZc9KNuqEjP90+ir2AE5Kwj6QmT4PE6+OX7F3ip9vf2xQwWWtHDb6hvPKI+qEgJOvJGSKmuWw/vtrs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721841512; c=relaxed/simple; bh=f4VQ6o+3FoUp8tSnuxRHJDf44rTK6gtZPD/MvC66Asw=; h=DKIM-Signature:Message-ID:Date:MIME-Version:From:Subject:To; b=qG1jg3UfwztgO+cezqSAk6HsazK7hrCsIvIry1RG1n0dp6VARNwPpnhzkhupvHXZfjnWHtQvLqUiowW7yqUwwRGn7Ya6qB5kE1JrsdfSFuggc7nLpcq2dUMvQ54U90DIpBggG1cnY7fmXu5Ck3bbD3ahAwdanqkyu6Ej4CywZGg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oi1-x235.google.com with SMTP id 5614622812f47-3daf0e73a59so31641b6e.1 for ; Wed, 24 Jul 2024 10:18:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ventanamicro.com; s=google; t=1721841504; x=1722446304; darn=gcc.gnu.org; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=a3OjTdIMGY9nOSQFTYY1f3RwmPIP2d6RX43KTxJ2WJs=; b=osGYRi/HZ8Q9k4FWSCu8a9XJUKMNsDTBR2zz/4lWd+Td9+yO8+6eAevUQ1XK/Ldk98 FtZSsVEZYmGmvcpx2XnbMSfckUDGqfh1HaoRhw7+j6vVyuaLnRcgr89UzDc9NBPC+i36 i/SGx869Z48oVzppjHi0CiA33lp5XJT2JuXwwCCJDd5cpi9UpOhTi4JtD/XmSeHrIH/e GJP+FCillIRLoYakcNR+RMj3z7DCG7VCxXSy1amzp6AGEnliBmpHBoBWRPeXQd4nRp0v gKCA7utPQzc4+CtiKEBEN4DFX7RiTaCZA2HGgm5JMjO+yujiEhhHxMYX9KRNzeu1zndM 7b9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721841504; x=1722446304; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=a3OjTdIMGY9nOSQFTYY1f3RwmPIP2d6RX43KTxJ2WJs=; b=cLXmNh8F4T8c435NRlWvluDEzHUVFdURxlwrLViSds8BaGAYCZ9d1ctTOnNTyO/Xfq vFA1ZG8HuF8Kj5egNhaMjtc7sDnIbHH8zMrKFOnbRk5NIT3i92lJF6pQXx5P3cT7j7TN m0lFTGPkyJb+/r402fuwRQjvXizhfGPAC3wXnOiJs13YGTze9Mi4X+vYDPHoLovsZwsy perGg7P2ykUciX52942BV/13ZtZoLm/iOahOC2To+4YfmelxOLllYuqs6s8LY04Hac1u AtSPC6ZRvJ6Q9OXiebRiunFEm891sayiXkUzADKTqEnmCpdoKjuIi5xoXnivm1SwAEN+ sc2w== X-Gm-Message-State: AOJu0YxDk606D+5SVDIskJlphH3yfDcSrc+0TjBbo9DrRGH2hXOVyuqp 26MHM59IuIFlrLk7W72lWYuqAPZVs4w2AdKNBoUD0W290MHwIBmPXh1OjlxFHSN2LngNhcynE05 KfFs= X-Google-Smtp-Source: AGHT+IHuhWTjyaFzxnJCl7uUt3x+nVD9wLOaNT5BBy60G3e5Buh69UnJV6cJ6xBC0zfidYwWbha2HA== X-Received: by 2002:a05:6808:cb:b0:3da:e06a:1c71 with SMTP id 5614622812f47-3db09e56e1emr1098411b6e.21.1721841504106; Wed, 24 Jul 2024 10:18:24 -0700 (PDT) Received: from [172.31.0.109] ([136.36.72.243]) by smtp.gmail.com with ESMTPSA id 5614622812f47-3dae09cefa0sm2478975b6e.35.2024.07.24.10.18.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 24 Jul 2024 10:18:23 -0700 (PDT) Message-ID: Date: Wed, 24 Jul 2024 11:18:22 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Beta Content-Language: en-US From: Jeff Law Subject: [committed][rtl-optimization/116037] Explicitly track if a destination was skipped in ext-dce To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org So this has been in the hopper since the first bugs were reported against ext-dce. It'd been holding off committing as I was finding other issues in terms of correctness of live computations. There's still problems in that space, but I think it's time to push this chunk forward. I'm marking it as 116037, but it may impact other bugs. This patch starts explicitly tracking if set processing skipped a destination, which can happen for wide modes (TI+), vectors, certain subregs, etc. This is computed during ext_dce_set_processing. During use processing we use that flag to determine reliably if we need to make the inputs fully live and to avoid even trying to eliminate an extension if we skipped output processing. While testing this I found that a recent change to fix cases where we had two subreg input operands mucked up the code to make things like a shift/rotate count fully live. So that goof has been fixed. Bootstrapped and regression tested on x86. Most, but not all, of these changes have also been tested on the crosses. Pushing to the trunk. I'm not including it in this patch but I'm poking at converting this code to use note_uses/note_stores to make it more maintainable. The SUBREG and STRICT_LOW_PART handling of note_stores is problematical, but I think it's solvable. I haven't tried a conversion to note_uses yet. Jeff commit 679086172b84be18c55fdbb9cda7e97806e7c083 Author: Jeff Law Date: Wed Jul 24 11:16:26 2024 -0600 [rtl-optimization/116037] Explicitly track if a destination was skipped in ext-dce So this has been in the hopper since the first bugs were reported against ext-dce. It'd been holding off committing as I was finding other issues in terms of correctness of live computations. There's still problems in that space, but I think it's time to push this chunk forward. I'm marking it as 116037, but it may impact other bugs. This patch starts explicitly tracking if set processing skipped a destination, which can happen for wide modes (TI+), vectors, certain subregs, etc. This is computed during ext_dce_set_processing. During use processing we use that flag to determine reliably if we need to make the inputs fully live and to avoid even trying to eliminate an extension if we skipped output processing. While testing this I found that a recent change to fix cases where we had two subreg input operands mucked up the code to make things like a shift/rotate count fully live. So that goof has been fixed. Bootstrapped and regression tested on x86. Most, but not all, of these changes have also been tested on the crosses. Pushing to the trunk. I'm not including it in this patch but I'm poking at converting this code to use note_uses/note_stores to make it more maintainable. The SUBREG and STRICT_LOW_PART handling of note_stores is problematical, but I think it's solvable. I haven't tried a conversion to note_uses yet. PR rtl-optimization/116037 gcc/ * ext-dce.cc (ext_dce_process_sets): Note if we ever skip a dest and return that info explicitly. (ext_dce_process_uses): If a set was skipped, then consider all bits in every input as live. Do not try to optimize away an extension if we skipped processing a destination in the same insn. Restore code to make shift/rotate count fully live. (ext_dce_process_bb): Handle API changes for ext_dce_process_sets. gcc/testsuite/ * gcc.dg/torture/pr116037.c: New test diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc index c56dfb505b8..c94d1fc3414 100644 --- a/gcc/ext-dce.cc +++ b/gcc/ext-dce.cc @@ -181,9 +181,11 @@ safe_for_live_propagation (rtx_code code) within an object) are set by INSN, the more aggressive the optimization phase during use handling will be. */ -static void +static bool ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) { + bool skipped_dest = false; + subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, obj, NONCONST) { @@ -210,6 +212,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) /* Skip the subrtxs of this destination. There is little value in iterating into the subobjects, so just skip them for a bit of efficiency. */ + skipped_dest = true; iter.skip_subrtxes (); continue; } @@ -241,6 +244,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) /* Skip the subrtxs of the STRICT_LOW_PART. We can't process them because it'll set objects as no longer live when they are in fact still live. */ + skipped_dest = true; iter.skip_subrtxes (); continue; } @@ -291,6 +295,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) if (!is_a (GET_MODE (SUBREG_REG (x)), &outer_mode) || GET_MODE_BITSIZE (outer_mode) > 64) { + skipped_dest = true; iter.skip_subrtxes (); continue; } @@ -318,6 +323,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) remain the same. Thus we can not continue here, we must either figure out what part of the destination is modified or skip the sub-rtxs. */ + skipped_dest = true; iter.skip_subrtxes (); continue; } @@ -370,9 +376,11 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) else if (GET_CODE (x) == COND_EXEC) { /* This isn't ideal, but may not be so bad in practice. */ + skipped_dest = true; iter.skip_subrtxes (); } } + return skipped_dest; } /* INSN has a sign/zero extended source inside SET that we will @@ -566,7 +574,8 @@ carry_backpropagate (unsigned HOST_WIDE_INT mask, enum rtx_code code, rtx x) eliminated in CHANGED_PSEUDOS. */ static void -ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) +ext_dce_process_uses (rtx_insn *insn, rtx obj, + bitmap live_tmp, bool skipped_dest) { subrtx_var_iterator::array_type array_var; FOR_EACH_SUBRTX_VAR (iter, array_var, obj, NONCONST) @@ -640,6 +649,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) dst_mask |= mask_array[i]; dst_mask >>= bit; + /* If we ignored a destination during set processing, then + consider all the bits live. */ + if (skipped_dest) + dst_mask = -1; + /* ??? Could also handle ZERO_EXTRACT / SIGN_EXTRACT of the source specially to improve optimization. */ if (code == SIGN_EXTEND || code == ZERO_EXTEND) @@ -650,7 +664,7 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) /* DST_MASK could be zero if we had something in the SET that we couldn't handle. */ - if (modify && dst_mask && (dst_mask & ~src_mask) == 0) + if (modify && !skipped_dest && (dst_mask & ~src_mask) == 0) ext_dce_try_optimize_insn (insn, x); dst_mask &= src_mask; @@ -678,6 +692,10 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) unsigned HOST_WIDE_INT save_mask = dst_mask; for (;;) { + /* In general we want to restore DST_MASK before each loop + iteration. The exception is when the opcode implies that + the other operand is fully live. That's handled by + changing SAVE_MASK below. */ dst_mask = save_mask; /* Strip an outer paradoxical subreg. The bits outside the inner mode are don't cares. So we can just strip @@ -730,9 +748,13 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj, bitmap live_tmp) /* We might have (ashift (const_int 1) (reg...)) By setting dst_mask we can continue iterating on the - the next operand and it will be considered fully live. */ + the next operand and it will be considered fully live. + + Note that since we restore DST_MASK from SAVE_MASK at the + top of the loop, we have to change SAVE_MASK to get the + semantics we want. */ if (binop_implies_op2_fully_live (GET_CODE (src))) - dst_mask = -1; + save_mask = -1; /* If this was anything but a binary operand, break the inner loop. This is conservatively correct as it will cause the @@ -802,14 +824,16 @@ ext_dce_process_bb (basic_block bb) bitmap live_tmp = BITMAP_ALLOC (NULL); /* First process any sets/clobbers in INSN. */ - ext_dce_process_sets (insn, PATTERN (insn), live_tmp); + bool skipped_dest = ext_dce_process_sets (insn, PATTERN (insn), live_tmp); /* CALL_INSNs need processing their fusage data. */ if (CALL_P (insn)) - ext_dce_process_sets (insn, CALL_INSN_FUNCTION_USAGE (insn), live_tmp); + skipped_dest |= ext_dce_process_sets (insn, + CALL_INSN_FUNCTION_USAGE (insn), + live_tmp); /* And now uses, optimizing away SIGN/ZERO extensions as we go. */ - ext_dce_process_uses (insn, PATTERN (insn), live_tmp); + ext_dce_process_uses (insn, PATTERN (insn), live_tmp, skipped_dest); /* A nonlocal goto implicitly uses the frame pointer. */ if (JUMP_P (insn) && find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX)) @@ -832,7 +856,7 @@ ext_dce_process_bb (basic_block bb) if (global_regs[i]) bitmap_set_range (livenow, i * 4, 4); - ext_dce_process_uses (insn, CALL_INSN_FUNCTION_USAGE (insn), live_tmp); + ext_dce_process_uses (insn, CALL_INSN_FUNCTION_USAGE (insn), live_tmp, false); } BITMAP_FREE (live_tmp); diff --git a/gcc/testsuite/gcc.dg/torture/pr116037.c b/gcc/testsuite/gcc.dg/torture/pr116037.c new file mode 100644 index 00000000000..cb34ba4e5d4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116037.c @@ -0,0 +1,36 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32 } */ +/* { dg-additional-options "-Wno-psabi" } */ + +typedef __attribute__((__vector_size__ (64))) unsigned char VC; +typedef __attribute__((__vector_size__ (64))) unsigned short VS; +typedef __attribute__((__vector_size__ (64))) unsigned int VI; +typedef __attribute__((__vector_size__ (64))) unsigned long long VL; +typedef __attribute__((__vector_size__ (64))) unsigned __int128 VV; + +VC vc; +VS vs; +VI vi; +VL vl; + +VV +foo (unsigned long long x, VV vv) +{ + x &= -((VC) vv)[0]; + vi *= (VI) (VS){ -vs[0], vc[0], vs[1], vi[7], vs[7], vl[7], x, vi[5] }; + return x + vv; +} + +int +main () +{ + VV v = foo (0x01aabbccdd, (VV) { -0xff }); + if (v[0] != 0x01aabbccdd - 0xff) + __builtin_abort (); + if (v[1] != 0x01aabbccdd) + __builtin_abort (); + if (v[2] != 0x01aabbccdd) + __builtin_abort (); + if (v[3] != 0x01aabbccdd) + __builtin_abort (); +}