From patchwork Wed Aug 2 22:18:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1816188 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=lMynHj9J; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4RGRGH648pz1yds for ; Thu, 3 Aug 2023 08:19:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 976C23858000 for ; Wed, 2 Aug 2023 22:18:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id AE74A3858D1E for ; Wed, 2 Aug 2023 22:18:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE74A3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=kMtBSdlaa3hVNcbETuFAlSJ6FMEutBD+7/PcovaM3P4=; b=lMynHj9JrKrtuWstBmsBiJQ98Z u7YS0ciI4NhwVagpI9NBULXQAVL+o/8mctHCPaxUuasgJBF8/UQS0OMNLhjzRZ9lVQuN1NB6W763j zqIagGPJKvUvSAi9gW2gehFv6vaU4rwx8rOsdB/LqE8CP+/hdcUpSiKmDQuLn9chE+wDuMc+Tt1e9 nhi3y/RVCRJa9BN4s5zkDr+mKrAStNy63c4VLRjnNGqoCAzYJ/rtdYFWaBuTxi/hs/NhiXIYcyP0w tzeWesco5t5xTx4jR2pEqwHGs4IFq3c01dN4yqcgzwANCtUt3PIEfO2uh2kBasjc1LaQa4sufcU9G nLNzvDMg==; Received: from host86-161-68-50.range86-161.btcentralplus.com ([86.161.68.50]:55597 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qRKB4-0004kR-2A; Wed, 02 Aug 2023 18:18:46 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword. Date: Wed, 2 Aug 2023 23:18:44 +0100 Message-ID: <000901d9c58f$4e236d00$ea6a4700$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdnFjgRypHZcCuddTYW2iPPxUL1YZA== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is a conservative fix for PR target/110792, a wrong-code regression affecting doubleword rotations by BITS_PER_WORD, which effectively swaps the highpart and lowpart words, when the source to be rotated resides in memory. The issue is that if the register used to hold the lowpart of the destination is mentioned in the address of the memory operand, the current define_insn_and_split unintentionally clobbers it before reading the highpart. Hence, for the testcase, the incorrectly generated code looks like: salq $4, %rdi // calculate address movq WHIRL_S+8(%rdi), %rdi // accidentally clobber addr movq WHIRL_S(%rdi), %rbp // load (wrong) lowpart Traditionally, the textbook way to fix this would be to add an explicit early clobber to the instruction's constraints. (define_insn_and_split "32di2_doubleword" - [(set (match_operand:DI 0 "register_operand" "=r,r,r") + [(set (match_operand:DI 0 "register_operand" "=r,r,&r") (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o") (const_int 32)))] but unfortunately this currently generates significantly worse code, due to a strange choice of reloads (effectively memcpy), which ends up looking like: salq $4, %rdi // calculate address movdqa WHIRL_S(%rdi), %xmm0 // load the double word in SSE reg. movaps %xmm0, -16(%rsp) // store the SSE reg back to the stack movq -8(%rsp), %rdi // load highpart movq -16(%rsp), %rbp // load lowpart Note that reload's "&" doesn't distinguish between the memory being early clobbered, vs the registers used in an addressing mode being early clobbered. The fix proposed in this patch is to remove the third alternative, that allowed offsetable memory as an operand, forcing reload to place the operand into a register before the rotation. This results in: salq $4, %rdi movq WHIRL_S(%rdi), %rax movq WHIRL_S+8(%rdi), %rdi movq %rax, %rbp I believe there's a more advanced solution, by swapping the order of the loads (if first destination register is mentioned in the address), or inserting a lea insn (if both destination registers are mentioned in the address), but this fix is a minimal "safe" solution, that should hopefully be suitable for backporting. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-08-02 Roger Sayle gcc/ChangeLog PR target/110792 * config/i386/i386.md (ti3): For rotations by 64 bits place operand in a register before gen_64ti2_doubleword. (di3): Likewise, for rotations by 32 bits, place operand in a register before gen_32di2_doubleword. (32di2_doubleword): Constrain operand to be in register. (64ti2_doubleword): Likewise. gcc/testsuite/ChangeLog PR target/110792 * g++.target/i386/pr110792.C: New 32-bit C++ test case. * gcc.target/i386/pr110792.c: New 64-bit C test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 4db210c..849e1de 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -15340,7 +15340,10 @@ emit_insn (gen_ix86_ti3_doubleword (operands[0], operands[1], operands[2])); else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 64) - emit_insn (gen_64ti2_doubleword (operands[0], operands[1])); + { + operands[1] = force_reg (TImode, operands[1]); + emit_insn (gen_64ti2_doubleword (operands[0], operands[1])); + } else { rtx amount = force_reg (QImode, operands[2]); @@ -15375,7 +15378,10 @@ emit_insn (gen_ix86_di3_doubleword (operands[0], operands[1], operands[2])); else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 32) - emit_insn (gen_32di2_doubleword (operands[0], operands[1])); + { + operands[1] = force_reg (DImode, operands[1]); + emit_insn (gen_32di2_doubleword (operands[0], operands[1])); + } else FAIL; @@ -15543,8 +15549,8 @@ }) (define_insn_and_split "32di2_doubleword" - [(set (match_operand:DI 0 "register_operand" "=r,r,r") - (any_rotate:DI (match_operand:DI 1 "nonimmediate_operand" "0,r,o") + [(set (match_operand:DI 0 "register_operand" "=r,r") + (any_rotate:DI (match_operand:DI 1 "register_operand" "0,r") (const_int 32)))] "!TARGET_64BIT" "#" @@ -15561,8 +15567,8 @@ }) (define_insn_and_split "64ti2_doubleword" - [(set (match_operand:TI 0 "register_operand" "=r,r,r") - (any_rotate:TI (match_operand:TI 1 "nonimmediate_operand" "0,r,o") + [(set (match_operand:TI 0 "register_operand" "=r,r") + (any_rotate:TI (match_operand:TI 1 "register_operand" "0,r") (const_int 64)))] "TARGET_64BIT" "#" diff --git a/gcc/testsuite/g++.target/i386/pr110792.C b/gcc/testsuite/g++.target/i386/pr110792.C new file mode 100644 index 0000000..ce21a7a --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr110792.C @@ -0,0 +1,16 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +template +inline T rotr(T input) +{ + return static_cast((input >> ROT) | (input << (8 * sizeof(T) - ROT))); +} + +unsigned long long WHIRL_S[256] = {0x18186018C07830D8}; +unsigned long long whirl(unsigned char x0) +{ + const unsigned long long s4 = WHIRL_S[x0&0xFF]; + return rotr<32>(s4); +} +/* { dg-final { scan-assembler-not "movl\tWHIRL_S\\+4\\(,%eax,8\\), %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr110792.c b/gcc/testsuite/gcc.target/i386/pr110792.c new file mode 100644 index 0000000..b65125c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110792.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +static inline unsigned __int128 rotr(unsigned __int128 input) +{ + return ((input >> 64) | (input << (64))); +} + +unsigned __int128 WHIRL_S[256] = {((__int128)0x18186018C07830D8) << 64 |0x18186018C07830D8}; +unsigned __int128 whirl(unsigned char x0) +{ + register int t __asm("rdi") = x0&0xFF; + const unsigned __int128 s4 = WHIRL_S[t]; + register unsigned __int128 tt __asm("rdi") = rotr(s4); + asm("":::"memory"); + return tt; +} +/* { dg-final { scan-assembler-not "movq\tWHIRL_S\\+8\\(%rdi\\), %rdi" } } */