From patchwork Thu Aug 15 09:34:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1972711 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=e+9E7z3t; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Wl0N13C3Cz1yXZ for ; Thu, 15 Aug 2024 19:35:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 34B3B3858428 for ; Thu, 15 Aug 2024 09:35:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [69.48.154.134]) by sourceware.org (Postfix) with ESMTPS id 2EF3D3858D29 for ; Thu, 15 Aug 2024 09:34:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2EF3D3858D29 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2EF3D3858D29 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=69.48.154.134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723714493; cv=none; b=hCMbx0exdNujt/vRCT8lzWAkE5XysGYWEEEIk15spLPXzz+lFtRrvaXZHQblWJVO6LxfOg3oV6w/WX1k9LsNKP40hgP1quBp/BOc9PjyrGQwBNPc12o6NDOVTiFjmvGAEdkGiCDXG8FXTCG7QobPCcf5cIXbVmk+7GWAxBWwoEA= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1723714493; c=relaxed/simple; bh=QsLN9DHj9FY0ziQVhZfA0YLqaW3EXCYUG6MuRKu7ZgI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=mrZVRzkhdLome/5+jSa8ivPSwNxsQGgLjbEbdL1xdiV22/nEOtTmKZ6tHFQYPWt+W7RXL2vp/QdrtSWLHf9PU7866Q1f9hs7Rz/xDdxta+kwaXwVDknJBTB/ZI4ICAqZE6fxzOcDfocma9gXTBIOGMu1zAtzCNVfcsrR0xgwAqA= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=JSZoLUVmEDSquz/Sa+uSS2xjMcO+//oMzjHyMF9Fpnk=; b=e+9E7z3tmSxFSw+43Up9ahciep Y4wqYf+a2sIAHJySakZSsRDG5XvS/GuEWMxSMI4PWtjc4dr/qGVHYzhvaGB4rZyD8vKsCDOM61rLs ukAi2TOcKp7ceZ17WBZVE1jUxi8PwEj9XaMfsLGEoA1YsLIzGrIj0ffRLzo+nn4qKCi+MAtb7b+Vc +p6cSsdPV8vQTTsmbhFw+VQrpQL6b3n+KiG25s+UkGxR6E6KH7njHElvNLyYTuHya95IWdKQidWWo c6HYOKXrdvhd2wqD3Nllp0LhVu617wBzcHfqEX1FhKlZCqeokS1X350Z1AVLUPuEOHrLQB2+oPuyG 2Eq7zCYg==; Received: from [168.86.199.179] (port=49799 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1seWsc-00000003zFk-2ldl; Thu, 15 Aug 2024 05:34:50 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86_64 PATCH] Support wide immediate constants in STV. Date: Thu, 15 Aug 2024 10:34:48 +0100 Message-ID: <012901daeef6$5ffea3a0$1ffbeae0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: Adru9gwjD4AgT+wgQbaWxG+qLU1D7g== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org As requested this patch is split out from my earlier submission. This patch provides more accurate costs/gains for (wide) immediate constants in STV, suitably adjusting the costs/gains when the highpart and lowpart words are the same. One minor complication is that the middle-end assumes (when generating memset) that SSE constants will be shared/amortized across multiple consecutive writes. Hence to avoid testsuite regressions, we add a heuristic that considers an immediate constant to be very cheap, if that same immediate value occurs in the previous instruction or in the following instruction. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2024-08-15 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (timode_immed_const_gain): New function to determine the gain/cost on a CONST_WIDE_INT. (local_duplicate_constant_p): Helper function to see if the same immediate constant appears in the previous or next insn. (timode_scalar_chain::compute_convert_gain): Fix whitespace. : Provide more accurate estimates using timode_immed_const_gain and local_duplicate_constant_p. : Handle CONSTANT_SCALAR_INT_P (src). Thanks again, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index c36d181..78564df 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -1503,6 +1503,53 @@ general_scalar_chain::convert_insn (rtx_insn *insn) df_insn_rescan (insn); } +/* Helper function to compute gain for loading an immediate constant. + Typically, two movabsq for TImode vs. vmovdqa for V1TImode, but + with numerous special cases. */ + +static int +timode_immed_const_gain (rtx cst) +{ + /* movabsq vs. movabsq+vmovq+vunpacklqdq. */ + if (CONST_WIDE_INT_P (cst) + && CONST_WIDE_INT_NUNITS (cst) == 2 + && CONST_WIDE_INT_ELT (cst, 0) == CONST_WIDE_INT_ELT (cst, 1)) + return optimize_insn_for_size_p () ? -COSTS_N_BYTES (9) + : -COSTS_N_INSNS (2); + /* 2x movabsq ~ vmovdqa. */ + return 0; +} + +/* Return true if the constant CST in mode MODE is found as an + immediate operand in the insn after INSN, or the insn before it. */ + +static bool +local_duplicate_constant_p (rtx_insn *insn, machine_mode mode, rtx cst) +{ + rtx set; + + rtx_insn *next = NEXT_INSN (insn); + if (next) + { + set = single_set (insn); + if (set + && GET_MODE (SET_DEST (set)) == mode + && rtx_equal_p (SET_SRC (set), cst)) + return true; + } + + rtx_insn *prev = PREV_INSN (insn); + if (prev) + { + set = single_set (insn); + if (set + && GET_MODE (SET_DEST (set)) == mode + && rtx_equal_p (SET_SRC (set), cst)) + return true; + } + return false; +} + /* Compute a gain for chain conversion. */ int @@ -1549,7 +1596,17 @@ timode_scalar_chain::compute_convert_gain () case CONST_INT: if (MEM_P (dst) && standard_sse_constant_p (src, V1TImode)) - igain = optimize_insn_for_size_p() ? COSTS_N_BYTES (11) : 1; + igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (11) : 1; + break; + + case CONST_WIDE_INT: + igain = local_duplicate_constant_p (insn, TImode, src) + ? 0 + : timode_immed_const_gain (src); + /* 2 x mov vs. vmovdqa. */ + if (MEM_P (dst)) + igain += optimize_insn_for_size_p () ? COSTS_N_BYTES (3) + : COSTS_N_INSNS (1); break; case NOT: @@ -1562,6 +1619,8 @@ timode_scalar_chain::compute_convert_gain () case IOR: if (!MEM_P (dst)) igain = COSTS_N_INSNS (1); + if (CONST_SCALAR_INT_P (XEXP (src, 1))) + igain += timode_immed_const_gain (XEXP (src, 1)); break; case ASHIFT: