From patchwork Tue Jan 30 07:33:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1892680 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=RBQFwrn4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TPH3307ZSz1yQ0 for ; Tue, 30 Jan 2024 18:33:35 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EBFCC3858281 for ; Tue, 30 Jan 2024 07:33:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (unknown [69.48.154.134]) by sourceware.org (Postfix) with ESMTPS id 6C4D83858CDA for ; Tue, 30 Jan 2024 07:33:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6C4D83858CDA Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6C4D83858CDA Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=69.48.154.134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706599996; cv=none; b=cryZ2uHpfR/GHM+8urFWPkR7j9ItJsAx7FakmCHZgTSdkrTpTp0vEkVIKjUFWodnW34wAJJahBhvLNrNQ7gBCOEdToG1G5qNciAUgGdnb8ETnC4cAg2pafwyrOJ7TdZzf7qlRV17aAn5fn3HOQJBex+6mYGYWnPkswUnrUF7MCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706599996; c=relaxed/simple; bh=RGQAqXviyiUUNeWuKG3/JipZwA7lyP07ERRWr2MqKRw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=qG1lBEEMZh6y/Tr90HuvqIP4+wzB2/7zEoUW2vbB+WjQOb28Er2PYauiBgAt0LErMaPHjBnNrt3GgB/ej+/A73QNyr7f3s84pWGCRaFLn8IBQKHOgVL1jRJVKxI2B5ICLl7EbyWVWQLSJrBIp7P+tzZ5rITKm5me4GbpRM96yS8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=CE36GmzsB9hHX8bt0XAucazQ7pr8ReS+h4Y8SUX0sBE=; b=RBQFwrn4G3jhJy8BJEh9rsojvx UkObberMRYvwJpBfarMC0bQQd10+EpnM/8QP/42Nahe8ujUUA3BQ07mq9PCH5N3G8HnksonfzfWzN eLvnXPc7GC/iS/YLESTK1g/KHZR8tgBu11g+UzAwYVarQ7Sb0lBMYmKJIkVpgmF0j8VGoF4QQPt24 AqvkZ52RdrHrH6ilz+ahLuJe7iMjCCXMXwLYdeXFVzMp1vEdbc4pEaaXL1oMkVJphqzjNv8+EGk0j 0uvNaB3lCfieAAQnyyag4hRc+Wd8oXztZSS8PCCwmyzjsJPhkJcYMqvWXavVoPrNsaoiaayDFG7hB hGxzyqOA==; Received: from host86-160-20-123.range86-160.btcentralplus.com ([86.160.20.123]:52996 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1rUicL-005gVE-2D; Tue, 30 Jan 2024 00:33:13 -0700 From: "Roger Sayle" To: Cc: "'Richard Biener'" Subject: [tree-ssa PATCH] PR target/113560: Enhance is_widening_mult_rhs_p. Date: Tue, 30 Jan 2024 07:33:10 -0000 Message-ID: <003f01da534e$94918450$bdb48cf0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdpTTP3KkGi98p+aRrafgitXtKSoVw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch resolves PR113560, a code quality regression from GCC12 affecting x86_64, by enhancing the middle-end's tree-ssa-math-opts.cc to recognize more instances of widening multiplications. The widening multiplication perception code identifies cases like: _1 = (unsigned __int128) x; __res = _1 * 100; but in the reported test case, the original input looks like: _1 = (unsigned long long) x; _2 = (unsigned __int128) _1; __res = _2 * 100; which gets optimized by constant folding during tree-ssa to: _2 = x & 18446744073709551615; // x & 0xffffffffffffffff __res = _2 * 100; where the BIT_AND_EXPR hides (has consumed) the extension operation. This reveals the more general deficiency (missed optimization opportunity) in widening multiplication perception that additionally both __int128 foo(__int128 x, __int128 y) { return (x & 1000) * (y & 1000) } and unsigned __int128 bar(unsigned __int128 x, unsigned __int128) { return (x >> 80) * (y >> 80); } should be recognized as widening multiplications. Hence rather than test explicitly for BIT_AND_EXPR (as in the first version of this patch) the more general solution is to make use of range information, as provided by tree_non_zero_bits. As a demonstration of the observed improvements, function foo above currently with -O2 compiles on x86_64 to: foo: movq %rdi, %rsi movq %rdx, %r8 xorl %edi, %edi xorl %r9d, %r9d andl $1000, %esi andl $1000, %r8d movq %rdi, %rcx movq %r9, %rdx imulq %rsi, %rdx movq %rsi, %rax imulq %r8, %rcx addq %rdx, %rcx mulq %r8 addq %rdx, %rcx movq %rcx, %rdx ret with this patch, GCC recognizes the *w and instead generates: foo: movq %rdi, %rsi movq %rdx, %r8 andl $1000, %esi andl $1000, %r8d movq %rsi, %rax imulq %r8 ret which is perhaps easier to understand at the tree-level where __int128 foo (__int128 x, __int128 y) { __int128 _1; __int128 _2; __int128 _5; [local count: 1073741824]: _1 = x_3(D) & 1000; _2 = y_4(D) & 1000; _5 = _1 * _2; return _5; } gets transformed to: __int128 foo (__int128 x, __int128 y) { __int128 _1; __int128 _2; __int128 _5; signed long _7; signed long _8; [local count: 1073741824]: _1 = x_3(D) & 1000; _2 = y_4(D) & 1000; _7 = (signed long) _1; _8 = (signed long) _2; _5 = _7 w* _8; return _5; } This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-01-30 Roger Sayle gcc/ChangeLog PR target/113560 * tree-ssa-math-opts.cc (is_widening_mult_rhs_p): Use range information via tree_non_zero_bits to check if this operand is suitably extended for a widening (or highpart) multiplication. (convert_mult_to_widen): Insert explicit casts if the RHS or LHS isn't already of the claimed type. gcc/testsuite/ChangeLog PR target/113560 * g++.target/i386/pr113560.C: New test case. * gcc.target/i386/pr113560.c: Likewise. Thanks in advance, Roger diff --git a/gcc/testsuite/g++.target/i386/pr113560.C b/gcc/testsuite/g++.target/i386/pr113560.C new file mode 100644 index 0000000..179b68f --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr113560.C @@ -0,0 +1,19 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-Ofast -std=c++23 -march=znver4" } */ + +#include +auto f(char *buf, unsigned long long in) noexcept +{ + unsigned long long hi{}; + auto lo{_mulx_u64(in, 0x2af31dc462ull, &hi)}; + lo = _mulx_u64(lo, 100, &hi); + __builtin_memcpy(buf + 2, &hi, 2); + return buf + 10; +} + +/* { dg-final { scan-assembler-times "mulx" 1 } } */ +/* { dg-final { scan-assembler-times "mulq" 1 } } */ +/* { dg-final { scan-assembler-not "addq" } } */ +/* { dg-final { scan-assembler-not "adcq" } } */ +/* { dg-final { scan-assembler-not "salq" } } */ +/* { dg-final { scan-assembler-not "shldq" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr113560.c b/gcc/testsuite/gcc.target/i386/pr113560.c new file mode 100644 index 0000000..ac2e01a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr113560.c @@ -0,0 +1,17 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +unsigned __int128 foo(unsigned __int128 x, unsigned __int128 y) +{ + return (x & 1000) * (y & 1000); +} + +__int128 bar(__int128 x, __int128 y) +{ + return (x & 1000) * (y & 1000); +} + +/* { dg-final { scan-assembler-times "\tmulq" 1 } } */ +/* { dg-final { scan-assembler-times "\timulq" 1 } } */ +/* { dg-final { scan-assembler-not "addq" } } */ +/* { dg-final { scan-assembler-not "xorl" } } */ diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 2db26e4..010fec4 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -2555,9 +2555,43 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out, stmt = SSA_NAME_DEF_STMT (rhs); if (is_gimple_assign (stmt)) { - if (! widening_mult_conversion_strippable_p (type, stmt)) - rhs1 = rhs; - else + /* Use tree_non_zero_bits to see if this operand is zero_extended + for unsigned widening multiplications or non-negative for + signed widening multiplications. */ + if (TREE_CODE (type) == INTEGER_TYPE + && (TYPE_PRECISION (type) & 1) == 0 + && int_mode_for_size (TYPE_PRECISION (type) / 2, 1).exists ()) + { + unsigned int prec = TYPE_PRECISION (type); + unsigned int hprec = prec / 2; + wide_int bits = wide_int::from (tree_nonzero_bits (rhs), + prec, + TYPE_SIGN (TREE_TYPE (rhs))); + if (TYPE_UNSIGNED (type) + && wi::bit_and (bits, wi::mask (hprec, true, prec)) == 0) + { + *type_out = build_nonstandard_integer_type (hprec, true); + /* X & MODE_MASK can be simplified to (T)X. */ + if (gimple_assign_rhs_code (stmt) == BIT_AND_EXPR + && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST + && wi::to_wide (gimple_assign_rhs2 (stmt)) + == wi::mask (hprec, false, prec)) + *new_rhs_out = gimple_assign_rhs1 (stmt); + else + *new_rhs_out = rhs; + return true; + } + else if (!TYPE_UNSIGNED (type) + && wi::bit_and (bits, wi::mask (hprec - 1, true, prec)) + == 0) + { + *type_out = build_nonstandard_integer_type (hprec, false); + *new_rhs_out = rhs; + return true; + } + } + + if (widening_mult_conversion_strippable_p (type, stmt)) { rhs1 = gimple_assign_rhs1 (stmt); @@ -2568,6 +2602,8 @@ is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out, return true; } } + else + rhs1 = rhs; } else rhs1 = rhs; @@ -2827,12 +2863,16 @@ convert_mult_to_widen (gimple *stmt, gimple_stmt_iterator *gsi) if (2 * actual_precision > TYPE_PRECISION (type)) return false; if (actual_precision != TYPE_PRECISION (type1) - || from_unsigned1 != TYPE_UNSIGNED (type1)) + || from_unsigned1 != TYPE_UNSIGNED (type1) + || (TREE_TYPE (rhs1) != type1 + && TREE_CODE (rhs1) != INTEGER_CST)) rhs1 = build_and_insert_cast (gsi, loc, build_nonstandard_integer_type (actual_precision, from_unsigned1), rhs1); if (actual_precision != TYPE_PRECISION (type2) - || from_unsigned2 != TYPE_UNSIGNED (type2)) + || from_unsigned2 != TYPE_UNSIGNED (type2) + || (TREE_TYPE (rhs2) != type2 + && TREE_CODE (rhs2) != INTEGER_CST)) rhs2 = build_and_insert_cast (gsi, loc, build_nonstandard_integer_type (actual_precision, from_unsigned2), rhs2);