From patchwork Fri Aug 12 21:24:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1666004 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4M4Gsl1vHsz9sG0 for ; Sat, 13 Aug 2022 07:24:57 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B3AA6385AE5F for ; Fri, 12 Aug 2022 21:24:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 417BF3858D28 for ; Fri, 12 Aug 2022 21:24:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 417BF3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:In-Reply-To:References:Cc:To:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=AS69v+U5vb9lWAXochN86T2sjq/PV8aqdRSnW2/WH4E=; b=W8EXxk8wqxrmzKUGBVkmD8b3W/ 2lomA37DElPW6Pu2gIQxRetXPB+SRjJO8J2mU8fonIRQDBW5CRGX3s5C+VOa8YBAqZ5zblV30zTih BilegtyFQCO16k4PG6oZ0vnV5uoYy+5ewBGh6547hPxHADRaHeAkCgxfc2NIX1Lt3bHpyxedJoECJ zfC0pjcXEGxEVNmbse04bSR3+0cKbjCpBDd1gqIn4Oefm8woedz5d9Cgdq9tNQyQIeGLdm6Oc2XQ7 rl0MBl0tQEnvfKKbzz2qVHsyw99I0UcuYVBsk8lDDo+B+kmqyOTwBJWuzXP/YOJh9WaJ4FFC2FjlF v1uH9LiQ==; Received: from host86-169-41-119.range86-169.btcentralplus.com ([86.169.41.119]:62521 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oMc92-00029n-G9; Fri, 12 Aug 2022 17:24:40 -0400 From: "Roger Sayle" To: "'Uros Bizjak'" References: <037701d8a8fa$4f65ed80$ee31c880$@nextmovesoftware.com> In-Reply-To: Subject: [x86 PATCH take #2] Move V1TI shift/rotate lowering from expand to pre-reload split. Date: Fri, 12 Aug 2022 22:24:39 +0100 Message-ID: <008101d8ae91$ef2ca100$cd85e300$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiujsvRgZXZ+aNFQf+nZhSnNFc/jw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: gcc-patches@gcc.gnu.org Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi Uros, As requested, here's an updated version of my patch that introduces a new const_0_to_255_not_mul_8_operand as you've requested. I think in this instance, having mutually exclusive patterns that can appear in any order, without imposing implicit ordering constraints, is slightly preferable, especially as (thanks to STV) some related patterns may appear in sse.md and others appear in i386.md (making ordering tricky). This patch has been retested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-08-12 Roger Sayle Uroš Bizjak gcc/ChangeLog * config/i386/predicates.md (const_0_to_255_not_mul_8_operand): New predicate for values between 0/1 and 255, not multiples of 8. * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left shifts by constant bit counts. (*ashlvti3_internal): New define_insn_and_split that lowers logical left shifts by constant bit counts, that aren't multiples of 8, before reload. (lshrv1ti3): Delay lowering of logical right shifts by constant. (*lshrv1ti3_internal): New define_insn_and_split that lowers logical right shifts by constant bit counts, that aren't multiples of 8, before reload. (ashrv1ti3):: Delay lowering of arithmetic right shifts by constant bit counts. (*ashrv1ti3_internal): New define_insn_and_split that lowers arithmetic right shifts by constant bit counts before reload. (rotlv1ti3): Delay lowering of rotate left by constant. (*rotlv1ti3_internal): New define_insn_and_split that lowers rotate left by constant bits counts before reload. (rotrv1ti3): Delay lowering of rotate right by constant. (*rotrv1ti3_internal): New define_insn_and_split that lowers rotate right by constant bits counts before reload. Thanks again, Roger > -----Original Message----- > From: Uros Bizjak > Sent: 08 August 2022 08:48 > To: Roger Sayle > Cc: gcc-patches@gcc.gnu.org > Subject: Re: [x86 PATCH] Move V1TI shift/rotate lowering from expand to pre- > reload split. > > On Fri, Aug 5, 2022 at 8:36 PM Roger Sayle > wrote: > > > > > > This patch moves the lowering of 128-bit V1TImode shifts and rotations > > by constant bit counts to sequences of SSE operations from the RTL > > expansion pass to the pre-reload split pass. Postponing this > > splitting of shifts and rotates enables (will enable) the TImode > > equivalents of these operations/ instructions to be considered as > > candidates by the (TImode) STV pass. > > Technically, this patch changes the existing expanders to continue to > > lower shifts by variable amounts, but constant operands become RTL > > instructions, specified by define_insn_and_split that are triggered by > > x86_pre_reload_split. The one minor complication is that logical > > shifts by multiples of eight, don't get split, but are handled by > > existing insn patterns, such as sse2_ashlv1ti3 and sse2_lshrv1ti3. > > There should be no changes in generated code with this patch, which > > just adjusts the pass in which transformations get applied. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > > > > > 2022-08-05 Roger Sayle > > > > gcc/ChangeLog > > * config/i386/sse.md (ashlv1ti3): Delay lowering of logical left > > shifts by constant bit counts. > > (*ashlvti3_internal): New define_insn_and_split that lowers > > logical left shifts by constant bit counts, that aren't multiples > > of 8, before reload. > > (lshrv1ti3): Delay lowering of logical right shifts by constant. > > (*lshrv1ti3_internal): New define_insn_and_split that lowers > > logical right shifts by constant bit counts, that aren't multiples > > of 8, before reload. > > (ashrv1ti3):: Delay lowering of arithmetic right shifts by > > constant bit counts. > > (*ashrv1ti3_internal): New define_insn_and_split that lowers > > arithmetic right shifts by constant bit counts before reload. > > (rotlv1ti3): Delay lowering of rotate left by constant. > > (*rotlv1ti3_internal): New define_insn_and_split that lowers > > rotate left by constant bits counts before reload. > > (rotrv1ti3): Delay lowering of rotate right by constant. > > (*rotrv1ti3_internal): New define_insn_and_split that lowers > > rotate right by constant bits counts before reload. > > +(define_insn_and_split "*ashlv1ti3_internal" > + [(set (match_operand:V1TI 0 "register_operand") > (ashift:V1TI > (match_operand:V1TI 1 "register_operand") > - (match_operand:QI 2 "general_operand")))] > - "TARGET_SSE2 && TARGET_64BIT" > + (match_operand:SI 2 "const_0_to_255_operand")))] > + "TARGET_SSE2 > + && TARGET_64BIT > + && (INTVAL (operands[2]) & 7) != 0 > > Please introduce const_0_to_255_not_mul_8_operand predicate. > Alternatively, and preferably, you can use pattern shadowing, where the > preceding, more constrained pattern will match before the following, more > broad pattern will. > > Uros. diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 064596d..4f16bb7 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -931,6 +931,14 @@ return val <= 255*8 && val % 8 == 0; }) +;; Match 1 to 255 except multiples of 8 +(define_predicate "const_0_to_255_not_mul_8_operand" + (match_code "const_int") +{ + unsigned HOST_WIDE_INT val = INTVAL (op); + return val <= 255 && val % 8 != 0; +}) + ;; Return true if OP is CONST_INT >= 1 and <= 31 (a valid operand ;; for shift & compare patterns, as shifting by 0 does not change flags). (define_predicate "const_1_to_31_operand" diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 14d12d1..468c5f1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15995,10 +15995,29 @@ (define_expand "ashlv1ti3" [(set (match_operand:V1TI 0 "register_operand") + (ashift:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:QI 2 "general_operand")))] + "TARGET_SSE2 && TARGET_64BIT" +{ + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (ASHIFT, operands); + DONE; + } +}) + +(define_insn_and_split "*ashlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") (ashift:V1TI (match_operand:V1TI 1 "register_operand") - (match_operand:QI 2 "general_operand")))] - "TARGET_SSE2 && TARGET_64BIT" + (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] { ix86_expand_v1ti_shift (ASHIFT, operands); DONE; @@ -16011,6 +16030,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_shift (LSHIFTRT, operands); + DONE; + } +}) + +(define_insn_and_split "*lshrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (lshiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_not_mul_8_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_shift (LSHIFTRT, operands); DONE; }) @@ -16022,6 +16060,26 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_ashiftrt (operands); + DONE; + } +}) + + +(define_insn_and_split "*ashrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (ashiftrt:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_ashiftrt (operands); DONE; }) @@ -16033,6 +16091,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATE, operands); + DONE; + } +}) + +(define_insn_and_split "*rotlv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotate:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATE, operands); DONE; }) @@ -16044,6 +16121,25 @@ (match_operand:QI 2 "general_operand")))] "TARGET_SSE2 && TARGET_64BIT" { + if (!CONST_INT_P (operands[2])) + { + ix86_expand_v1ti_rotate (ROTATERT, operands); + DONE; + } +}) + +(define_insn_and_split "*rotrv1ti3_internal" + [(set (match_operand:V1TI 0 "register_operand") + (rotatert:V1TI + (match_operand:V1TI 1 "register_operand") + (match_operand:SI 2 "const_0_to_255_operand")))] + "TARGET_SSE2 + && TARGET_64BIT + && ix86_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ ix86_expand_v1ti_rotate (ROTATERT, operands); DONE; })