diff mbox series

[x86] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

Message ID 005301dab8bc$1e287020$5a795060$@nextmovesoftware.com
State New
Headers show
Series [x86] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart. | expand

Commit Message

Roger Sayle June 7, 2024, 9:21 a.m. UTC
This patch addresses PR target/115351, which is a code quality regression
on x86 when passing floating point complex numbers.  The ABI considers
these arguments to have TImode, requiring interunit moves to place the
FP values (which are actually passed in SSE registers) into the upper
and lower parts of a TImode pseudo, and then similar moves back again
before they can be used.

The cause of the regression is that changes in how TImode initialization
is represented in RTL now prevents the RTL optimizers from eliminating
these redundant moves.  The specific cause is that the *concatditi3
pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
rtx_cost, preventing fwprop1 from propagating it.  This pattern just
sets the hipart and lopart of a double-word register, typically two
instructions (less if reload can allocate things appropriately) but
the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.

propagating insn 5 into insn 6, replacing:
(set (reg:TI 110)
    (ior:TI (and:TI (reg:TI 110)
            (const_wide_int 0x0ffffffffffffffff))
        (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
            (const_int 64 [0x40]))))
successfully matched this instruction to *concatditi3_3:
(set (reg:TI 110)
    (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
0))
            (const_int 64 [0x40]))
        (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0))))
change not profitable (cost 50 -> cost 52)

This issue is resolved by having ix86_rtx_costs return more reasonable
values for these (place-holder) patterns.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-06-07  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/115351
        * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
        *concatditi3 and *insvti_highpart patterns, about two insns.

gcc/testsuite/ChangeLog
        PR target/115351
        * g++.target/i386/pr115351.C: New test case.


Thanks in advance (and sorry for any inconvenience),
Roger
--

Comments

Uros Bizjak June 7, 2024, 11:09 a.m. UTC | #1
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch addresses PR target/115351, which is a code quality regression
> on x86 when passing floating point complex numbers.  The ABI considers
> these arguments to have TImode, requiring interunit moves to place the
> FP values (which are actually passed in SSE registers) into the upper
> and lower parts of a TImode pseudo, and then similar moves back again
> before they can be used.
>
> The cause of the regression is that changes in how TImode initialization
> is represented in RTL now prevents the RTL optimizers from eliminating
> these redundant moves.  The specific cause is that the *concatditi3
> pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
> rtx_cost, preventing fwprop1 from propagating it.  This pattern just
> sets the hipart and lopart of a double-word register, typically two
> instructions (less if reload can allocate things appropriately) but
> the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.
>
> propagating insn 5 into insn 6, replacing:
> (set (reg:TI 110)
>     (ior:TI (and:TI (reg:TI 110)
>             (const_wide_int 0x0ffffffffffffffff))
>         (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
>             (const_int 64 [0x40]))))
> successfully matched this instruction to *concatditi3_3:
> (set (reg:TI 110)
>     (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
> 0))
>             (const_int 64 [0x40]))
>         (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0))))
> change not profitable (cost 50 -> cost 52)
>
> This issue is resolved by having ix86_rtx_costs return more reasonable
> values for these (place-holder) patterns.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-07  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/115351
>         * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
>         *concatditi3 and *insvti_highpart patterns, about two insns.
>
> gcc/testsuite/ChangeLog
>         PR target/115351
>         * g++.target/i386/pr115351.C: New test case.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance (and sorry for any inconvenience),
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 69cd4ae..9d08dc7 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21868,6 +21868,49 @@  ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
 	    }
 	  *total = ix86_vec_cost (mode, cost->sse_op);
 	}
+      else if (TARGET_64BIT
+	       && mode == TImode
+	       && GET_CODE (XEXP (x, 0)) == ASHIFT
+	       && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND
+	       && GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == DImode
+	       && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+	       && INTVAL (XEXP (XEXP (x, 0), 1)) == 64
+	       && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+	       && GET_MODE (XEXP (XEXP (x, 1), 0)) == DImode)
+	{
+	  /* *concatditi3 is cheap.  */
+	  rtx op0 = XEXP (XEXP (XEXP (x, 0), 0), 0);
+	  rtx op1 = XEXP (XEXP (x, 1), 0);
+	  *total = (SUBREG_P (op0) && GET_MODE (SUBREG_REG (op0)) == DFmode)
+		   ? COSTS_N_INSNS (1)    /* movq.  */
+		   : set_src_cost (op0, DImode, speed);
+	  *total += (SUBREG_P (op1) && GET_MODE (SUBREG_REG (op1)) == DFmode)
+		    ? COSTS_N_INSNS (1)    /* movq.  */
+		    : set_src_cost (op1, DImode, speed);
+	  return true;
+	}
+      else if (TARGET_64BIT
+	       && mode == TImode
+	       && GET_CODE (XEXP (x, 0)) == AND
+	       && REG_P (XEXP (XEXP (x, 0), 0))
+	       && CONST_WIDE_INT_P (XEXP (XEXP (x, 0), 1))
+	       && CONST_WIDE_INT_NUNITS (XEXP (XEXP (x, 0), 1)) == 2
+	       && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 0) == -1
+	       && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 1) == 0
+	       && GET_CODE (XEXP (x, 1)) == ASHIFT
+	       && GET_CODE (XEXP (XEXP (x, 1), 0)) == ZERO_EXTEND
+	       && GET_MODE (XEXP (XEXP (XEXP (x, 1), 0), 0)) == DImode
+	       && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+	       && INTVAL (XEXP (XEXP (x, 1), 1)) == 64)
+	{
+	  /* *insvti_highpart is cheap.  */
+	  rtx op = XEXP (XEXP (XEXP (x, 1), 0), 0);
+	  *total = COSTS_N_INSNS (1) + 1;
+	  *total += (SUBREG_P (op) && GET_MODE (SUBREG_REG (op)) == DFmode)
+		    ? COSTS_N_INSNS (1)    /* movq.  */
+		    : set_src_cost (op, DImode, speed);
+	  return true;
+	}
       else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
 	*total = cost->add * 2;
       else
diff --git a/gcc/testsuite/g++.target/i386/pr115351.C b/gcc/testsuite/g++.target/i386/pr115351.C
new file mode 100644
index 0000000..24132f3
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr115351.C
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -std=c++11" } */
+
+struct complex
+{
+    double real;
+    double imag;
+};
+
+complex blub(complex z)
+{
+  return {
+	   z.real * z.real - z.imag * z.imag,
+	   2 * z.real * z.imag
+	  };
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler-not "xchg" } } */