From patchwork Sun Jul 9 21:30:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1805453 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=oaIqY8c6; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QzgKf67L1z20bm for ; Mon, 10 Jul 2023 07:30:45 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B970D3857732 for ; Sun, 9 Jul 2023 21:30:43 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 4E6543858C2C for ; Sun, 9 Jul 2023 21:30:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4E6543858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=dl6cT0sd/gfrErb9mAfpWvTCIGRWNtdPmhSrRwYsieM=; b=oaIqY8c6eAye6wUMlvLF0CPJP8 WvjP5xDCuP7rIqHwPQ24oGCod4CVYfiRKSpdlb5r4I9sv/jPD2D9aWH9jIPpwTwxu++nmC/5Z+Ymd tR3j44BywkQb6OQfYEFWPWV9RN1tXKYWWhnMlAEGdcX7vGGqibPPkNsrs2Big1IrJHzG/eH6BrrNV 4EBAqwWxOXY3+PI9FMn6Uvjz7L83Dg20vQ53upHN6Vjta6l4Sa5CAlvIXgsU5f4fLFJDVWipfr52m rkNfdMS8+sJ159Ze4Jl5MhrQ5pbiri/ZnzA2QbyrE8SbO1W6dXuoi52tUcGIe6iK2pkBhgwvPuBYr /M9b9LyQ==; Received: from host86-161-68-50.range86-161.btcentralplus.com ([86.161.68.50]:52428 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qIbzC-0002Wi-29; Sun, 09 Jul 2023 17:30:30 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [X86 PATCH] Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns. Date: Sun, 9 Jul 2023 22:30:28 +0100 Message-ID: <037001d9b2ac$9627f3a0$c277dae0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: Admyq+FPKoTNKxGWSjia3ZLK//EPbg== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch implements another of Uros' suggestions, to investigate a insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64. In PR 88873, the RTL the middle-end expands for passing V2DF in TImode is subtly different from what it does for V2DI in TImode, sufficiently so that my explanations for why insvti_lowpart_1 isn't required don't apply in this case. This patch adds an insvti_lowpart_1 pattern, complementing the existing insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1. Because the middle-end represents 128-bit constants using CONST_WIDE_INT and 64-bit constants using CONST_INT, it's easiest to treat these as different patterns, rather than attempt parameterization. This patch also includes a peephole2 (actually a pair) to transform xchg instructions into mov instructions, when one of the destinations is unused. This optimization is required to produce the optimal code sequences below. For the 64-bit case: __int128 foo(__int128 x, unsigned long long y) { __int128 m = ~((__int128)~0ull); __int128 t = x & m; __int128 r = t | y; return r; } Before: xchgq %rdi, %rsi movq %rdx, %rax xorl %esi, %esi xorl %edx, %edx orq %rsi, %rax orq %rdi, %rdx ret After: movq %rdx, %rax movq %rsi, %rdx ret For the 32-bit case: long long bar(long long x, int y) { long long mask = ~0ull << 32; long long t = x & mask; long long r = t | (unsigned int)y; return r; } Before: pushl %ebx movl 12(%esp), %edx xorl %ebx, %ebx xorl %eax, %eax movl 16(%esp), %ecx orl %ebx, %edx popl %ebx orl %ecx, %eax ret After: movl 12(%esp), %eax movl 8(%esp), %edx ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-07-09 Roger Sayle gcc/ChangeLog * config/i386/i386.md (peephole2): Transform xchg insn with a REG_UNUSED note to a (simple) move. (*insvti_lowpart_1): New define_insn_and_split. (*insvdi_lowpart_1): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/insvdi_lowpart-1.c: New test case. * gcc.target/i386/insvti_lowpart-1.c: Likewise. Cheers, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e47ced1..ea04d0a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3243,6 +3243,30 @@ [(parallel [(set (match_dup 1) (match_dup 2)) (set (match_dup 2) (match_dup 1))])]) +;; Convert xchg with a REG_UNUSED note to a mov (variant #1). +(define_peephole2 + [(parallel [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "general_reg_operand")) + (set (match_dup 1) (match_dup 0))])] + "((REGNO (operands[0]) != AX_REG + && REGNO (operands[1]) != AX_REG) + || optimize_size < 2 + || !optimize_insn_for_size_p ()) + && peep2_reg_dead_p (1, operands[0])" + [(set (match_dup 1) (match_dup 0))]) + +;; Convert xchg with a REG_UNUSED note to a mov (variant #2). +(define_peephole2 + [(parallel [(set (match_operand:SWI 0 "general_reg_operand") + (match_operand:SWI 1 "general_reg_operand")) + (set (match_dup 1) (match_dup 0))])] + "((REGNO (operands[0]) != AX_REG + && REGNO (operands[1]) != AX_REG) + || optimize_size < 2 + || !optimize_insn_for_size_p ()) + && peep2_reg_dead_p (1, operands[1])" + [(set (match_dup 0) (match_dup 1))]) + ;; Convert moves to/from AX_REG into xchg with -Oz. (define_peephole2 [(set (match_operand:SWI48 0 "general_reg_operand") @@ -3573,6 +3597,48 @@ split_double_concat (TImode, operands[0], operands[4], operands[2]); DONE; }) + +(define_insn_and_split "*insvti_lowpart_1" + [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r") + (any_or_plus:TI + (and:TI + (match_operand:TI 1 "nonimmediate_operand" "r,m,r,m") + (match_operand:TI 3 "const_scalar_int_operand" "n,n,n,n")) + (zero_extend:TI + (match_operand:DI 2 "nonimmediate_operand" "r,r,m,m"))))] + "TARGET_64BIT + && CONST_WIDE_INT_P (operands[3]) + && CONST_WIDE_INT_NUNITS (operands[3]) == 2 + && CONST_WIDE_INT_ELT (operands[3], 0) == 0 + && CONST_WIDE_INT_ELT (operands[3], 1) == -1" + "#" + "&& reload_completed" + [(const_int 0)] +{ + operands[4] = gen_highpart (DImode, operands[1]); + split_double_concat (TImode, operands[0], operands[2], operands[4]); + DONE; +}) + +(define_insn_and_split "*insvdi_lowpart_1" + [(set (match_operand:DI 0 "nonimmediate_operand" "=ro,r,r,&r") + (any_or_plus:DI + (and:DI + (match_operand:DI 1 "nonimmediate_operand" "r,m,r,m") + (match_operand:DI 3 "const_int_operand" "n,n,n,n")) + (zero_extend:DI + (match_operand:SI 2 "nonimmediate_operand" "r,r,m,m"))))] + "!TARGET_64BIT + && CONST_INT_P (operands[3]) + && UINTVAL (operands[3]) == 0xffffffff00000000ll" + "#" + "&& reload_completed" + [(const_int 0)] +{ + operands[4] = gen_highpart (SImode, operands[1]); + split_double_concat (DImode, operands[0], operands[2], operands[4]); + DONE; +}) ;; Floating point push instructions. diff --git a/gcc/testsuite/gcc.target/i386/insvdi_lowpart-1.c b/gcc/testsuite/gcc.target/i386/insvdi_lowpart-1.c new file mode 100644 index 0000000..4d94fec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/insvdi_lowpart-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +long long foo(long long x, int y) +{ + long long mask = ~0ull << 32; + long long t = x & mask; + long long r = t | (unsigned int)y; + return r; +} + +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "orq" } } */ diff --git a/gcc/testsuite/gcc.target/i386/insvti_lowpart-1.c b/gcc/testsuite/gcc.target/i386/insvti_lowpart-1.c new file mode 100644 index 0000000..4e1fbbb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/insvti_lowpart-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +__int128 foo(__int128 x, unsigned long long y) +{ + __int128 m = ~((__int128)~0ull); + __int128 t = x & m; + __int128 r = t | y; + return r; +} + +/* { dg-final { scan-assembler-not "xorl" } } */ +/* { dg-final { scan-assembler-not "orq" } } */