From patchwork Thu Jul 30 11:23:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1338764 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=P/j1IwND; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BHSjc4FYNz9sR4 for ; Thu, 30 Jul 2020 21:24:19 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 123C83857C41; Thu, 30 Jul 2020 11:24:16 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id DD4663858D35 for ; Thu, 30 Jul 2020 11:23:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DD4663858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=roger@nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=b32luq4L6SLfjHUoMw3jVVEzktHIv098Yx/m5HrLkJk=; b=P/j1IwNDFPtnIsst4APAEswoME fNjdTSmHCzi/H1FpHScNb8ofIU48th3I5KY7ekZxumcaZzhIgzplKyE0WlpnDyUuWO0RSFHclQKPY a58bxgqtgNX5iR6eovXGHMIcHTVL0ut5AAh9adu3Te5KTeLGoObcRS+8iSjKkg3ot9ZjOV+gHJawp YBGDTwX9ijWj4ai08LvpO67ak/OhbFMVr6wwOHPHdUqjzkI+ohXvdlluNyMpe5jdu6cdkoHgkW6oo G1Lw4x5JlqHBQXX9nu01yi9iTrKWX3tSLJGNOVUuv90ZhGBupMTy+MIvkhF5GRHkdbMGFlnLAVkbQ ZamGzNrg==; Received: from host86-137-89-56.range86-137.btcentralplus.com ([86.137.89.56]:51136 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from ) id 1k16fE-00089o-1C; Thu, 30 Jul 2020 07:23:56 -0400 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] x86_64: Integer min/max improvements. Date: Thu, 30 Jul 2020 12:23:53 +0100 Message-ID: <021801d66663$e93294d0$bb97be70$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdZmYYLTfuAg373tT3amGZntY9Z4fQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 'Uros Bizjak' Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" This patch tweaks the way that min and max are expanded, so that the semantics of these operations are visible to the early RTL optimization passes, until split into explicit comparison and conditional move instructions. The good news is that i386.md already contains all of the required logic (many thanks to Richard Biener and Uros Bizjak), but this is currently only to enable scalar-to-vector (STV) synthesis of min/max instructions. This change enables this functionality for all TARGET_CMOVE architectures for SImode, HImode and DImode. My first attempt to support "min(reg,reg)" as an instruction revealed why this functionality isn't already enabled: In PR rtl-optimization 91154 we end up generating a cmp instead of a test in gcc.target/i386/minmax-3.c which has poor performance on AMD Opteron. My solution to this is to actually support "min(reg,general_operand)" allowing us to inspect any immediate constant at the point this operation is split. This allows us to use "test" instructions for min/max against 0, 1 and -1. As an added benefit it allows us to CSE large immediate constants, reducing code size. Previously, "int foo(int x) { return max(x,12345); }" would generate: foo: cmpl $12345, %edi movl $12345, %eax cmovge %edi, %eax ret with this patch we instead generate: foo: movl $12345, %eax cmpl %eax, %edi cmovge %edi, %eax ret I've also included a peephole2 to avoid the "movl $0,%eax" instructions being produced by the register allocator. Materializing constants as late as possible reduces register pressure, but for const0_rtx on x86, it's preferable to use "xor" by moving this set from between a flags setting operation and its use. This should also help instruction macro fusion on some microarchitectures, where test/cmp and the following instruction can sometimes be combined. Previously, "int foo(int x) { return max(x,0); }" would generate: foo: testl %edi, %edi movl $0, %eax cmovns %edi, %eax ret with this patch we instead generate: foo: xorl %eax, %eax testl %edi, %edi cmovns %edi, %eax ret The benefits of having min/max explicit at the RTL level can be seen from compiling the following example with "-O2 -fno-tree-reassoc". #define max(a,b) (((a) > (b))? (a) : (b)) int foo(int x) { int y = max(x,5); return max(y,10); } where GCC currently produces: foo: cmpl $5, %edi movl $5, %eax movl $10, %edx cmovge %edi, %eax cmpl $10, %eax cmovl %edx, %eax ret and with this patch it instead now produces: foo: movl $10, %eax cmpl %eax, %edi cmovge %edi, %eax ret The original motivation was from looking at a performance critical function in a quantum mechanics code, that performed MIN_EXPR and MAX_EXPR of the same arguments (effectively a two-element sort), where GCC was performing the comparison twice. I'd hoped that it might be possible to fuse these together, perhaps in combine, but this patch is an intermediate step towards that goal. This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap followed by make -k check with no new regressions. Ok for mainline? 2020-07-30 Roger Sayle * config/i386/i386.md (MAXMIN_IMODE): No longer needed. (3): Support SWI248 and general_operand for second operand, when TARGET_CMOVE. (3_1 splitter): Optimize comparisons against 0, 1 and -1 to use "test" instead of "cmp". (*di3_doubleword): Likewise, allow general_operand and enable on TARGET_CMOVE. (peephole2): Convert clearing a register after a flag setting instruction into an xor followed by the original flag setter. Many thanks in advance. Almost all of the credit goes to Uros and Richard for already implementing this feature, I've just copied the transformations from optab expansion, that allow it to be enabled without penalty (this late in the compilation). Roger --- Roger Sayle NextMove Software Cambridge, UK diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index b24a455..717d55f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -18809,45 +18809,68 @@ ;; min/max patterns -(define_mode_iterator MAXMIN_IMODE - [(SI "TARGET_SSE4_1") (DI "TARGET_AVX512VL")]) (define_code_attr maxmin_rel [(smax "GE") (smin "LE") (umax "GEU") (umin "LEU")]) (define_expand "3" [(parallel - [(set (match_operand:MAXMIN_IMODE 0 "register_operand") - (maxmin:MAXMIN_IMODE - (match_operand:MAXMIN_IMODE 1 "register_operand") - (match_operand:MAXMIN_IMODE 2 "nonimmediate_operand"))) + [(set (match_operand:SWI248 0 "register_operand") + (maxmin:SWI248 + (match_operand:SWI248 1 "register_operand") + (match_operand:SWI248 2 "general_operand"))) (clobber (reg:CC FLAGS_REG))])] - "TARGET_STV") + "TARGET_CMOVE") (define_insn_and_split "*3_1" - [(set (match_operand:MAXMIN_IMODE 0 "register_operand") - (maxmin:MAXMIN_IMODE - (match_operand:MAXMIN_IMODE 1 "register_operand") - (match_operand:MAXMIN_IMODE 2 "nonimmediate_operand"))) + [(set (match_operand:SWI248 0 "register_operand") + (maxmin:SWI248 + (match_operand:SWI248 1 "register_operand") + (match_operand:SWI248 2 "general_operand"))) (clobber (reg:CC FLAGS_REG))] - "(TARGET_64BIT || mode != DImode) && TARGET_STV + "TARGET_CMOVE && ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 0) - (if_then_else:MAXMIN_IMODE (match_dup 3) + (if_then_else:SWI248 (match_dup 3) (match_dup 1) (match_dup 2)))] { machine_mode mode = mode; + rtx cmp_op = operands[2]; - if (!register_operand (operands[2], mode)) - operands[2] = force_reg (mode, operands[2]); + if (!register_operand (cmp_op, mode)) + operands[2] = force_reg (mode, cmp_op); enum rtx_code code = ; - machine_mode cmpmode = SELECT_CC_MODE (code, operands[1], operands[2]); + + if (cmp_op == const1_rtx) + { + /* Convert smax (x, 1) into (x > 0 ? x : 1). + Convert umax (x, 1) into (x != 0 ? x : 1). + Convert ?min (x, 1) into (x <= 0 ? x : 1). */ + cmp_op = const0_rtx; + if (code == GE) + code = GT; + else if (code == GEU) + code = NE; + } + /* Convert smin (x, -1) into (x < 0 ? x : -1). */ + else if (cmp_op == constm1_rtx && code == LE) + { + cmp_op = const0_rtx; + code = LT; + } + /* Convert smax (x, -1) into (x >= 0 ? x : -1). */ + else if (cmp_op == constm1_rtx && code == GE) + cmp_op = const0_rtx; + else if (cmp_op != const0_rtx) + cmp_op = operands[2]; + + machine_mode cmpmode = SELECT_CC_MODE (code, operands[1], cmp_op); rtx flags = gen_rtx_REG (cmpmode, FLAGS_REG); - rtx tmp = gen_rtx_COMPARE (cmpmode, operands[1], operands[2]); + rtx tmp = gen_rtx_COMPARE (cmpmode, operands[1], cmp_op); emit_insn (gen_rtx_SET (flags, tmp)); operands[3] = gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx); @@ -18856,9 +18879,9 @@ (define_insn_and_split "*di3_doubleword" [(set (match_operand:DI 0 "register_operand") (maxmin:DI (match_operand:DI 1 "register_operand") - (match_operand:DI 2 "nonimmediate_operand"))) + (match_operand:DI 2 "general_operand"))) (clobber (reg:CC FLAGS_REG))] - "!TARGET_64BIT && TARGET_STV && TARGET_AVX512VL + "!TARGET_64BIT && TARGET_CMOVE && ix86_pre_reload_split ()" "#" "&& 1" @@ -18910,6 +18933,20 @@ gcc_unreachable (); } }) + +;; Avoid clearing a register between a flags setting comparison and its use, +;; i.e. prefer "xorl %eax,%eax; test/cmp" over "test/cmp; movl $0, %eax". +(define_peephole2 + [(set (reg FLAGS_REG) (match_operand 0)) + (set (match_operand:SWI48 1 "register_operand") (const_int 0))] + "peep2_regno_dead_p (0, FLAGS_REG) + && !reg_overlap_mentioned_p (operands[1], operands[0])" + [(parallel [(set (match_dup 1) (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 2) (match_dup 0))] +{ + operands[2] = gen_rtx_REG (GET_MODE (operands[0]), FLAGS_REG); +}) ;; Misc patterns (?)