From patchwork Tue Jun 13 16:03:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1794594 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=lqQdj2j6; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QgYJs1nBvz20Wq for ; Wed, 14 Jun 2023 02:04:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E73C538582B7 for ; Tue, 13 Jun 2023 16:04:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 16D3B3858D38 for ; Tue, 13 Jun 2023 16:03:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 16D3B3858D38 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Y04yKv8JVfCCAwq/WxfwOlAFvUc44U+iog/Kl15GlLQ=; b=lqQdj2j6LzFZMR0oOMFQwUg6pO US1K0GzpZKcAMemZLMD8nV+uehQmInRMuRZCTIYVMASutvv0nozeJciMvyd18FLUYwepLUgjLV4CV 9iVe5j+y2hZJ9GAA9+s1/FQ1F3w+RadEr8ZFeiiIBem7X5v0E3iP2C0FCY/9GscW7pG6G6Tia95rb X3iHO8sbY/2jC4okfLbXAUkqeLu9DBCl/2nNgzoR/ZwMvcThnHYT95MkFLT2owukbN2iEcEf+W03J HF/0EnzWkqT3x0xBIshnuGLqkHzDIuLar9/yjZT+1orPbBa4/d0hmbYsFzl58UDS6sBu9XlzMbt5h FXR5uyGQ==; Received: from [185.62.158.67] (port=53993 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1q96Ut-0007s6-2q; Tue, 13 Jun 2023 12:03:56 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Convert ptestz of pandn into ptestc. Date: Tue, 13 Jun 2023 17:03:52 +0100 Message-ID: <00ce01d99e10$a7e04b20$f7a0e160$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmeDvbwBJDSGQpLQO6xKBTuvF0/yQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch is the next instalment in a set of backend patches around improvements to ptest/vptest. A previous patch optimized the sequence t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the property that ZF is set to (X&Y) == 0. This patch performs a similar transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost) equivalent ptestc(y,x), using the property that the CF flags is set to (~X&Y) == 0. The tricky bit is that this sets the CF flag instead of the ZF flag, so we can only perform this transformation when we can also convert the flags' consumer, as well as the producer. For the test case: int foo (__m128i x, __m128i y) { __m128i a = x & ~y; return __builtin_ia32_ptestz128 (a, a); } With -O2 -msse4.1 we previously generated: foo: pandn %xmm0, %xmm1 xorl %eax, %eax ptest %xmm1, %xmm1 sete %al ret with this patch we now generate: foo: xorl %eax, %eax ptest %xmm0, %xmm1 setc %al ret At the same time, this patch also provides alternative fixes for PR target/109973 and PR target/110118, by recognizing that ptestc(x,x) always sets the carry flag (X&~X is always zero). This is achieved both by recognizing the special case in ix86_expand_sse_ptest and with a splitter to convert an eligible ptest into an stc. The next piece is, of course, STV of "if (x & ~y)..." This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-13 Roger Sayle gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize expansion of ptestc with equal operands as returning const1_rtx. * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost estimates of UNSPEC_PTEST, where the ptest performs the PAND or PAND of its operands. * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST of reg_equal_p operands into an x86_stc instruction. (define_split): Split pandn/ptestz/setne into ptestc/setnc. (define_split): Split pandn/ptestz/sete into ptestc/setc. (define_split): Split pandn/ptestz/je into ptestc/jc. (define_split): Split pandn/ptestz/jne into ptestc/jnc. gcc/testsuite/ChangeLog * gcc.target/i386/avx-vptest-4.c: New test case. * gcc.target/i386/avx-vptest-5.c: Likewise. * gcc.target/i386/avx-vptest-6.c: Likewise. * gcc.target/i386/pr109973-1.c: Update test case. * gcc.target/i386/pr109973-2.c: Likewise. * gcc.target/i386/sse4_1-ptest-4.c: New test case. * gcc.target/i386/sse4_1-ptest-5.c: Likewise. * gcc.target/i386/sse4_1-ptest-6.c: Likewise. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index def060a..1d11af2 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10222,6 +10222,13 @@ ix86_expand_sse_ptest (const struct builtin_description *d, tree exp, machine_mode mode1 = insn_data[d->icode].operand[1].mode; enum rtx_code comparison = d->comparison; + /* ptest reg, reg sets the carry flag. */ + if (comparison == LTU + && (d->code == IX86_BUILTIN_PTESTC + || d->code == IX86_BUILTIN_PTESTC256) + && rtx_equal_p (op0, op1)) + return const1_rtx; + if (VECTOR_MODE_P (mode0)) op0 = safe_vector_operand (op0, mode0); if (VECTOR_MODE_P (mode1)) diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 3a1444d..3e99e23 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21423,16 +21423,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, else if (XINT (x, 1) == UNSPEC_PTEST) { *total = cost->sse_op; - if (XVECLEN (x, 0) == 2 - && GET_CODE (XVECEXP (x, 0, 0)) == AND) + rtx test_op0 = XVECEXP (x, 0, 0); + if (!rtx_equal_p (test_op0, XVECEXP (x, 0, 1))) + return false; + if (GET_CODE (test_op0) == AND) { - rtx andop = XVECEXP (x, 0, 0); - *total += rtx_cost (XEXP (andop, 0), GET_MODE (andop), - AND, opno, speed) - + rtx_cost (XEXP (andop, 1), GET_MODE (andop), - AND, opno, speed); - return true; + rtx and_op0 = XEXP (test_op0, 0); + if (GET_CODE (and_op0) == NOT) + and_op0 = XEXP (and_op0, 0); + *total += rtx_cost (and_op0, GET_MODE (and_op0), + AND, 0, speed) + + rtx_cost (XEXP (test_op0, 1), GET_MODE (and_op0), + AND, 1, speed); } + else + *total = rtx_cost (test_op0, GET_MODE (test_op0), + UNSPEC, 0, speed); + return true; } return false; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9bec09d..282bcbe 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -23147,6 +23147,92 @@ [(set (reg:CCZ FLAGS_REG) (unspec:CCZ [(match_dup 0) (match_dup 1)] UNSPEC_PTEST))]) +;; ptest reg,reg sets the carry flag. +(define_split + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_operand:V_AVX 0 "register_operand") + (match_operand:V_AVX 1 "register_operand")] + UNSPEC_PTEST))] + "TARGET_SSE4_1 + && rtx_equal_p (operands[0], operands[1])" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(const_int 0)] UNSPEC_STC))]) + +;; pandn/ptestz/setne -> ptestc/setnc +(define_split + [(set (match_operand:QI 0 "register_operand") + (ne:QI + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (geu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) + +;; Changing the CCmode of FLAGS_REG requires updating both def and use. +;; pandn/ptestz/sete -> ptestc/setc +(define_split + [(set (strict_low_part (subreg:QI (match_operand:SI 0 "register_operand") 0)) + (eq:QI + (unspec:CCZ [ + (and:V_AVX (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (strict_low_part (subreg:QI (match_dup 0) 0)) + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))]) + +;; pandn/ptestz/je -> ptestc/jc +(define_split + [(set (pc) + (if_then_else + (ne + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (geu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) + +;; pandn/ptestz/jne -> ptestc/jnc +(define_split + [(set (pc) + (if_then_else + (eq + (unspec:CCZ [ + (and:V_AVX + (not:V_AVX (match_operand:V_AVX 1 "register_operand")) + (match_operand:V_AVX 2 "register_operand")) + (and:V_AVX (not:V_AVX (match_dup 1)) (match_dup 2))] + UNSPEC_PTEST) + (const_int 0)) + (match_operand 0) + (pc)))] + "TARGET_SSE4_1" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC [(match_dup 1) (match_dup 2)] UNSPEC_PTEST)) + (set (pc) (if_then_else (ltu (reg:CCC FLAGS_REG) (const_int 0)) + (match_dup 0) + (pc)))]) + (define_expand "nearbyint2" [(set (match_operand:VFH 0 "register_operand") (unspec:VFH diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-4.c b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c new file mode 100644 index 0000000..4f16cc8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return __builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return __builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler "vptest" } } */ +/* { dg-final { scan-assembler "setc" } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-5.c b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c new file mode 100644 index 0000000..21b1872 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-5.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +int foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + return !__builtin_ia32_ptestz256 (a, a); +} + +int bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + return !__builtin_ia32_ptestz256 (a, a); +} + +/* { dg-final { scan-assembler "vptest" } } */ +/* { dg-final { scan-assembler "setnc" } } */ +/* { dg-final { scan-assembler-not "vpandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx-vptest-6.c b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c new file mode 100644 index 0000000..c99e65f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-vptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ + +typedef long long __m256i __attribute__ ((__vector_size__ (32))); + +extern void ext (void); + +void foo (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void foo2 (__m256i x, __m256i y) +{ + __m256i a = x & ~y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +void bar2 (__m256i x, __m256i y) +{ + __m256i a = ~x & y; + if (__builtin_ia32_ptestz256 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "jn?c" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-1.c b/gcc/testsuite/gcc.target/i386/pr109973-1.c index a1b6136b..1d812dd 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-1.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-1.c @@ -10,4 +10,4 @@ foo (__m256i x, __m256i y) return __builtin_ia32_ptestc256 (a, a); } -/* { dg-final { scan-assembler "vpand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr109973-2.c b/gcc/testsuite/gcc.target/i386/pr109973-2.c index 167f6ee..1068c3e 100644 --- a/gcc/testsuite/gcc.target/i386/pr109973-2.c +++ b/gcc/testsuite/gcc.target/i386/pr109973-2.c @@ -10,4 +10,4 @@ foo (__m128i x, __m128i y) return __builtin_ia32_ptestc128 (a, a); } -/* { dg-final { scan-assembler "pand" } } */ +/* { dg-final { scan-assembler "movl\[ \\t]*\\\$1, %eax" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c new file mode 100644 index 0000000..999cff2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-4.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return __builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return __builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "setc" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "sete" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c new file mode 100644 index 0000000..c3a23da --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-5.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +int foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + return !__builtin_ia32_ptestz128 (a, a); +} + +int bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + return !__builtin_ia32_ptestz128 (a, a); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "setnc" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "setne" } } */ + diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c new file mode 100644 index 0000000..d49c6bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-ptest-6.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef long long __m128i __attribute__ ((__vector_size__ (16))); + +extern void ext (void); + +void foo (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void foo2 (__m128i x, __m128i y) +{ + __m128i a = x & ~y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +void bar2 (__m128i x, __m128i y) +{ + __m128i a = ~x & y; + if (__builtin_ia32_ptestz128 (a, a)) + ext(); +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler "jn?c" } } */ +/* { dg-final { scan-assembler-not "pandn" } } */ +/* { dg-final { scan-assembler-not "jne" } } */ +/* { dg-final { scan-assembler-not "je" } } */