From patchwork Wed Apr 13 09:03:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1616646 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=ZpQePj6l; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4Kdc8F307vz9s3q for ; Wed, 13 Apr 2022 19:03:39 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1DD023857347 for ; Wed, 13 Apr 2022 09:03:38 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id AFFF3385803E for ; Wed, 13 Apr 2022 09:03:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AFFF3385803E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=fYUR9o2ltbftgB2WrTXMNaQsDnPWDgeJH9eeBWJgJB8=; b=ZpQePj6l9WPhVOCTCXqdbP/MpB yJ2PDBIK1dSGpy0cT0LYEgMjIIPnmKyCyOJdGEGLyOq1u5g3f0vfbFVYtyY0TjGOAJHvrbGbYO23x RiwUrIlexzDK1qi0+xv06F3kLxAFCeI62lNzjMXM+UDx63Ua5JZgHEC8QCAlhdAgd0oR5eFl040Uw LvWZEWcxXhfN3SA3cKZHZ7FJcpFnP350VOigHL72bMiNaUn8CMoslZmDLJpjUmrZam5c7pMW28WHh zOHcBkdBQzJ3twoskWmL+vXYhcbbW/FK1gqIVMt1RA3eF42B7vLLpnEmm/bb0am0AEQYni5Hpiewf 523nZdiw==; Received: from host109-151-117-89.range109-151.btcentralplus.com ([109.151.117.89]:50333 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1neYuH-0006gq-6k; Wed, 13 Apr 2022 05:03:21 -0400 From: "Roger Sayle" To: Subject: [x86 PATCH] PR target/70321: Split double word equality/inequality after STV. Date: Wed, 13 Apr 2022 10:03:18 +0100 Message-ID: <001f01d84f15$52729c60$f757d520$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdhPE/GpAOut7SuJQLWl0maAM/GtXg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, LOTS_OF_MONEY, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch resolves the last piece of PR target/70321 a code quality (P2 regression) affecting mainline. Currently, for HJ's testcase: void foo (long long ixi) { if (ixi != 14348907) __builtin_abort (); } GCC with -m32 -O2 generates four instructions for the comparison: movl 16(%esp), %eax movl 20(%esp), %edx xorl $14348907, %eax orl %eax, %edx but with this patch it now requires only three, making better use of x86's addressing modes: movl 16(%esp), %eax xorl $14348907, %eax orl 20(%esp), %eax The solution is to expand "doubleword" equality/inequality expressions using flag setting COMPARE instructions for the early RTL passes, and then split them during split1, after STV and before reload. Hence on x86_64, we now see/allow things like: (insn 11 8 12 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:TI 84 [ x ]) (reg:TI 96))) "cmpti.c":2:43 30 {*cmpti_doubleword} This allows the STV pass to decide whether it's preferrable to perform this comparison using vector operations, i.e. a pxor/ptest sequence, or as scalar integer operations, i.e. a xor/xor/or sequence. Alas this required tweaking of the STV pass to recognize the "new" form of these comparisons and split out the pxor operation itself. To confirm this still works as expected I've added a new STV test case: long long a[1024]; long long b[1024]; int foo() { for (int i=0; i<1024; i++) { long long t = (a[i]<<8) | (b[i]<<24); if (t == 0) return 1; } return 0; } where with -m32 -O2 -msse4.1 the above comparison with zero should look like: punpcklqdq %xmm0, %xmm0 ptest %xmm0, %xmm0 Although this patch includes one or two minor tweaks to provide all the necessary infrastructure to support conversion of TImode comparisons to V1TImode (and SImode comparisons to V4SImode), STV doesn't yet implement these transformations, but this is something that can be considered after stage 4. Indeed the new convert_compare functionality is split out into a method to simplify its potential reuse by the timode_scalar_chain class. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures for -m64, and two (missed-optimization) FAILs on -m32 (both around pandn). The first gcc.target/i386/pr65105-5.c is resolved by the patch I posted yesterday, the other i386/pr78794.c is easily fixed by tweaking STV's "gain" estimates, but it's preferrable to generate better code for both the scalar and vector implementations before tweaking these parameters (patches to follow shortly). Ok for mainline? 2022-04-13 Roger Sayle gcc/ChangeLog PR target/70321 * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose DI mode equality/inequality using XOR here. Instead generate a COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode). * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode. (general_scalar_chain::convert_compare): New function to convert scalar equality/inequality comparison into vector operations. (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call new convert_compare helper method. (convertible_comparion_p): Update to match doubleword COMPARE of two register, memory or integer constant operands. * config/i386/i386-features.h (general_scalar_chain::convert_compare): Prototype/declare member function here. * config/i386/i386.md (cstore4): Change mode to SDWIM, but only allow new doubleword modes for EQ and NE operators. (*cmp_doubleword): New define_insn_and_split, to split a doubleword comparison into a pair of XORs followed by an IOR to set the (zero) flags register, optimizing the XORs if possible. * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode iterator; V_AVX is (currently) only used by ptest. (sse4_1 mode attribute): Update to support V1TI and V2TI. gcc/testsuite/ChangeLog PR target/70321 * gcc.target/i386/pr70321.c: New test case. * gcc.target/i386/sse4_1-stv-1.c: New test case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 794315ee..572ee70 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2309,21 +2309,15 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label) case E_DImode: if (TARGET_64BIT) goto simple; - /* For 32-bit target DI comparison may be performed on - SSE registers. To allow this we should avoid split - to SI mode which is achieved by doing xor in DI mode - and then comparing with zero (which is recognized by - STV pass). We don't compare using xor when optimizing - for size. */ - if (!optimize_insn_for_size_p () - && TARGET_STV - && (code == EQ || code == NE)) - { - op0 = force_reg (mode, gen_rtx_XOR (mode, op0, op1)); - op1 = const0_rtx; - } /* FALLTHRU */ case E_TImode: + /* DI and TI mode equality/inequality comparisons may be performed + on SSE registers. Avoid splitting them, except when optimizing + for size. */ + if ((code == EQ || code == NE) + && !optimize_insn_for_size_p ()) + goto simple; + /* Expand DImode branch into multiple compare+branch. */ { rtx lo[2], hi[2]; @@ -2342,34 +2336,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label) submode = mode == DImode ? SImode : DImode; - /* When comparing for equality, we can use (hi0^hi1)|(lo0^lo1) to - avoid two branches. This costs one extra insn, so disable when - optimizing for size. */ - - if ((code == EQ || code == NE) - && (!optimize_insn_for_size_p () - || hi[1] == const0_rtx || lo[1] == const0_rtx)) - { - rtx xor0, xor1; - - xor1 = hi[0]; - if (hi[1] != const0_rtx) - xor1 = expand_binop (submode, xor_optab, xor1, hi[1], - NULL_RTX, 0, OPTAB_WIDEN); - - xor0 = lo[0]; - if (lo[1] != const0_rtx) - xor0 = expand_binop (submode, xor_optab, xor0, lo[1], - NULL_RTX, 0, OPTAB_WIDEN); - - tmp = expand_binop (submode, ior_optab, xor1, xor0, - NULL_RTX, 0, OPTAB_WIDEN); - - ix86_expand_branch (code, tmp, const0_rtx, label); - return; - } - - /* Otherwise, if we are doing less-than or greater-or-equal-than, + /* If we are doing less-than or greater-or-equal-than, op1 is a constant and the low word is zero, then we can just examine the high word. Similarly for low word -1 and less-or-equal-than or greater-than. */ diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 6fe41c3..8908e42 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -711,8 +711,7 @@ gen_gpr_to_xmm_move_src (enum machine_mode vmode, rtx gpr) switch (GET_MODE_NUNITS (vmode)) { case 1: - /* We are not using this case currently. */ - gcc_unreachable (); + return gen_rtx_SUBREG (vmode, gpr, 0); case 2: return gen_rtx_VEC_CONCAT (vmode, gpr, CONST0_RTX (GET_MODE_INNER (vmode))); @@ -932,6 +931,48 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn) } } +/* Convert COMPARE to vector mode. */ + +rtx +general_scalar_chain::convert_compare (rtx op1, rtx op2, rtx_insn *insn) +{ + rtx tmp = gen_reg_rtx (vmode); + rtx src; + convert_op (&op1, insn); + /* Comparison against anything other than zero, requires an XOR. */ + if (op2 != const0_rtx) + { + convert_op (&op2, insn); + /* If both operands are MEMs, explicitly load the OP1 into TMP. */ + if (MEM_P (op1) && MEM_P (op2)) + { + emit_insn_before (gen_rtx_SET (tmp, op1), insn); + src = tmp; + } + else + src = op1; + src = gen_rtx_XOR (vmode, src, op2); + } + else + src = op1; + emit_insn_before (gen_rtx_SET (tmp, src), insn); + + if (vmode == V2DImode) + emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared (tmp), + copy_rtx_if_shared (tmp), + copy_rtx_if_shared (tmp)), + insn); + else if (vmode == V4SImode) + emit_insn_before (gen_sse2_pshufd (copy_rtx_if_shared (tmp), + copy_rtx_if_shared (tmp), + const0_rtx), + insn); + + return gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (tmp), + copy_rtx_if_shared (tmp)), + UNSPEC_PTEST); +} + /* Convert INSN to vector mode. */ void @@ -1090,19 +1131,8 @@ general_scalar_chain::convert_insn (rtx_insn *insn) break; case COMPARE: - src = SUBREG_REG (XEXP (XEXP (src, 0), 0)); - - gcc_assert (REG_P (src) && GET_MODE (src) == DImode); - subreg = gen_rtx_SUBREG (V2DImode, src, 0); - emit_insn_before (gen_vec_interleave_lowv2di - (copy_rtx_if_shared (subreg), - copy_rtx_if_shared (subreg), - copy_rtx_if_shared (subreg)), - insn); dst = gen_rtx_REG (CCmode, FLAGS_REG); - src = gen_rtx_UNSPEC (CCmode, gen_rtvec (2, copy_rtx_if_shared (subreg), - copy_rtx_if_shared (subreg)), - UNSPEC_PTEST); + src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn); break; case CONST_INT: @@ -1339,20 +1369,14 @@ pseudo_reg_set (rtx_insn *insn) return set; } -/* Check if comparison INSN may be transformed - into vector comparison. Currently we transform - zero checks only which look like: - - (set (reg:CCZ 17 flags) - (compare:CCZ (ior:SI (subreg:SI (reg:DI x) 4) - (subreg:SI (reg:DI x) 0)) - (const_int 0 [0]))) */ +/* Check if comparison INSN may be transformed into vector comparison. + Currently we transform equality/inequality checks which look like: + (set (reg:CCZ 17 flags) (compare:CCZ (reg:TI x) (reg:TI y))) */ static bool convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) { - /* ??? Currently convertible for double-word DImode chain only. */ - if (TARGET_64BIT || mode != DImode) + if (mode != (TARGET_64BIT ? TImode : DImode)) return false; if (!TARGET_SSE4_1) @@ -1375,31 +1399,14 @@ convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) rtx op1 = XEXP (src, 0); rtx op2 = XEXP (src, 1); - if (op2 != CONST0_RTX (GET_MODE (op2))) + if (!CONST_INT_P (op1) + && ((!REG_P (op1) && !MEM_P (op1)) + || GET_MODE (op1) != mode)) return false; - if (GET_CODE (op1) != IOR) - return false; - - op2 = XEXP (op1, 1); - op1 = XEXP (op1, 0); - - if (!SUBREG_P (op1) - || !SUBREG_P (op2) - || GET_MODE (op1) != SImode - || GET_MODE (op2) != SImode - || ((SUBREG_BYTE (op1) != 0 - || SUBREG_BYTE (op2) != GET_MODE_SIZE (SImode)) - && (SUBREG_BYTE (op2) != 0 - || SUBREG_BYTE (op1) != GET_MODE_SIZE (SImode)))) - return false; - - op1 = SUBREG_REG (op1); - op2 = SUBREG_REG (op2); - - if (op1 != op2 - || !REG_P (op1) - || GET_MODE (op1) != DImode) + if (!CONST_INT_P (op2) + && ((!REG_P (op2) && !MEM_P (op2)) + || GET_MODE (op2) != mode)) return false; return true; diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h index 5c30760..891cb46 100644 --- a/gcc/config/i386/i386-features.h +++ b/gcc/config/i386/i386-features.h @@ -181,6 +181,7 @@ class general_scalar_chain : public scalar_chain void convert_reg (rtx_insn *insn, rtx dst, rtx src); void make_vector_copies (rtx_insn *, rtx); void convert_registers (); + rtx convert_compare (rtx op1, rtx op2, rtx_insn *insn); int vector_const_cost (rtx exp); }; diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index c74edd1..aadc2dd 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1340,14 +1340,20 @@ (define_expand "cstore4" [(set (reg:CC FLAGS_REG) - (compare:CC (match_operand:SWIM 2 "nonimmediate_operand") - (match_operand:SWIM 3 ""))) + (compare:CC (match_operand:SDWIM 2 "nonimmediate_operand") + (match_operand:SDWIM 3 ""))) (set (match_operand:QI 0 "register_operand") (match_operator 1 "ordered_comparison_operator" [(reg:CC FLAGS_REG) (const_int 0)]))] "" { - if (MEM_P (operands[2]) && MEM_P (operands[3])) + if (mode == (TARGET_64BIT ? TImode : DImode)) + { + if (GET_CODE (operands[1]) != EQ + && GET_CODE (operands[1]) != NE) + FAIL; + } + else if (MEM_P (operands[2]) && MEM_P (operands[3])) operands[2] = force_reg (mode, operands[2]); ix86_expand_setcc (operands[0], GET_CODE (operands[1]), operands[2], operands[3]); @@ -1483,6 +1489,52 @@ [(set_attr "type" "icmp") (set_attr "mode" "QI")]) +(define_insn_and_split "*cmp_doubleword" + [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (match_operand: 0 "nonimmediate_operand") + (match_operand: 1 "x86_64_general_operand")))] + "ix86_pre_reload_split ()" + "#" + "&& 1" + [(parallel [(set (reg:CCZ FLAGS_REG) + (compare:CCZ (ior:DWIH (match_dup 4) (match_dup 5)) + (const_int 0))) + (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])] +{ + split_double_mode (mode, &operands[0], 2, &operands[0], &operands[2]); + /* Placing the SUBREG pieces in pseudos helps reload. */ + for (int i = 0; i < 4; i++) + if (SUBREG_P (operands[i])) + operands[i] = force_reg (mode, operands[i]); + + operands[4] = gen_reg_rtx (mode); + if (operands[1] == const0_rtx) + emit_move_insn (operands[4], operands[0]); + else if (operands[0] == const0_rtx) + emit_move_insn (operands[4], operands[1]); + else if (operands[1] == constm1_rtx) + emit_insn (gen_one_cmpl2 (operands[4], operands[0])); + else if (operands[0] == constm1_rtx) + emit_insn (gen_one_cmpl2 (operands[4], operands[1])); + else + emit_insn (gen_xor3 (operands[4], operands[0], operands[1])); + + if (operands[3] == const0_rtx) + operands[5] = operands[2]; + else if (operands[2] == const0_rtx) + operands[5] = operands[3]; + else + { + operands[5] = gen_reg_rtx (mode); + if (operands[3] == constm1_rtx) + emit_insn (gen_one_cmpl2 (operands[5], operands[2])); + else if (operands[2] == constm1_rtx) + emit_insn (gen_one_cmpl2 (operands[5], operands[3])); + else + emit_insn (gen_xor3 (operands[5], operands[2], operands[3])); + } +}) + ;; These implement float point compares. ;; %%% See if we can get away with VOIDmode operands on the actual insns, ;; which would allow mix and match FP modes on the compares. Which is what diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index a852c16..e2e38ef 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -469,9 +469,9 @@ ;; All DImode vector integer modes (define_mode_iterator V_AVX - [V16QI V8HI V4SI V2DI V4SF V2DF + [V16QI V8HI V4SI V2DI V1TI V4SF V2DF (V32QI "TARGET_AVX") (V16HI "TARGET_AVX") - (V8SI "TARGET_AVX") (V4DI "TARGET_AVX") + (V8SI "TARGET_AVX") (V4DI "TARGET_AVX") (V2TI "TARGET_AVX") (V8SF "TARGET_AVX") (V4DF"TARGET_AVX")]) (define_mode_iterator VI48_AVX @@ -893,6 +893,7 @@ [(V4SF "sse4_1") (V2DF "sse4_1") (V8SF "avx") (V4DF "avx") (V8DF "avx512f") + (V2TI "avx") (V1TI "sse4_1") (V4DI "avx") (V2DI "sse4_1") (V8SI "avx") (V4SI "sse4_1") (V16QI "sse4_1") (V32QI "avx") diff --git a/gcc/testsuite/gcc.target/i386/pr70321.c b/gcc/testsuite/gcc.target/i386/pr70321.c new file mode 100644 index 0000000..eaba728 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70321.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +void foo (long long ixi) +{ + if (ixi != 14348907) + __builtin_abort (); +} + +/* { dg-final { scan-assembler-times "mov" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c new file mode 100644 index 0000000..9486d0c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -msse4.1" } */ +long long a[1024]; +long long b[1024]; + +int foo() +{ + for (int i=0; i<1024; i++) + { + long long t = (a[i]<<8) | (b[i]<<24); + if (t == 0) + return 1; + } + return 0; +} + +/* { dg-final { scan-assembler "ptest" } } */ +/* { dg-final { scan-assembler-not "pxor" } } */