Message ID | 005301dab8bc$1e287020$5a795060$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart. | expand |
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch addresses PR target/115351, which is a code quality regression > on x86 when passing floating point complex numbers. The ABI considers > these arguments to have TImode, requiring interunit moves to place the > FP values (which are actually passed in SSE registers) into the upper > and lower parts of a TImode pseudo, and then similar moves back again > before they can be used. > > The cause of the regression is that changes in how TImode initialization > is represented in RTL now prevents the RTL optimizers from eliminating > these redundant moves. The specific cause is that the *concatditi3 > pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default) > rtx_cost, preventing fwprop1 from propagating it. This pattern just > sets the hipart and lopart of a double-word register, typically two > instructions (less if reload can allocate things appropriately) but > the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52. > > propagating insn 5 into insn 6, replacing: > (set (reg:TI 110) > (ior:TI (and:TI (reg:TI 110) > (const_wide_int 0x0ffffffffffffffff)) > (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0)) > (const_int 64 [0x40])))) > successfully matched this instruction to *concatditi3_3: > (set (reg:TI 110) > (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) > 0)) > (const_int 64 [0x40])) > (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0)))) > change not profitable (cost 50 -> cost 52) > > This issue is resolved by having ix86_rtx_costs return more reasonable > values for these (place-holder) patterns. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2024-06-07 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/115351 > * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the > *concatditi3 and *insvti_highpart patterns, about two insns. > > gcc/testsuite/ChangeLog > PR target/115351 > * g++.target/i386/pr115351.C: New test case. LGTM. Thanks, Uros. > > > Thanks in advance (and sorry for any inconvenience), > Roger > -- >
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 69cd4ae..9d08dc7 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -21868,6 +21868,49 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, } *total = ix86_vec_cost (mode, cost->sse_op); } + else if (TARGET_64BIT + && mode == TImode + && GET_CODE (XEXP (x, 0)) == ASHIFT + && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)) == DImode + && CONST_INT_P (XEXP (XEXP (x, 0), 1)) + && INTVAL (XEXP (XEXP (x, 0), 1)) == 64 + && GET_CODE (XEXP (x, 1)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (x, 1), 0)) == DImode) + { + /* *concatditi3 is cheap. */ + rtx op0 = XEXP (XEXP (XEXP (x, 0), 0), 0); + rtx op1 = XEXP (XEXP (x, 1), 0); + *total = (SUBREG_P (op0) && GET_MODE (SUBREG_REG (op0)) == DFmode) + ? COSTS_N_INSNS (1) /* movq. */ + : set_src_cost (op0, DImode, speed); + *total += (SUBREG_P (op1) && GET_MODE (SUBREG_REG (op1)) == DFmode) + ? COSTS_N_INSNS (1) /* movq. */ + : set_src_cost (op1, DImode, speed); + return true; + } + else if (TARGET_64BIT + && mode == TImode + && GET_CODE (XEXP (x, 0)) == AND + && REG_P (XEXP (XEXP (x, 0), 0)) + && CONST_WIDE_INT_P (XEXP (XEXP (x, 0), 1)) + && CONST_WIDE_INT_NUNITS (XEXP (XEXP (x, 0), 1)) == 2 + && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 0) == -1 + && CONST_WIDE_INT_ELT (XEXP (XEXP (x, 0), 1), 1) == 0 + && GET_CODE (XEXP (x, 1)) == ASHIFT + && GET_CODE (XEXP (XEXP (x, 1), 0)) == ZERO_EXTEND + && GET_MODE (XEXP (XEXP (XEXP (x, 1), 0), 0)) == DImode + && CONST_INT_P (XEXP (XEXP (x, 1), 1)) + && INTVAL (XEXP (XEXP (x, 1), 1)) == 64) + { + /* *insvti_highpart is cheap. */ + rtx op = XEXP (XEXP (XEXP (x, 1), 0), 0); + *total = COSTS_N_INSNS (1) + 1; + *total += (SUBREG_P (op) && GET_MODE (SUBREG_REG (op)) == DFmode) + ? COSTS_N_INSNS (1) /* movq. */ + : set_src_cost (op, DImode, speed); + return true; + } else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) *total = cost->add * 2; else diff --git a/gcc/testsuite/g++.target/i386/pr115351.C b/gcc/testsuite/g++.target/i386/pr115351.C new file mode 100644 index 0000000..24132f3 --- /dev/null +++ b/gcc/testsuite/g++.target/i386/pr115351.C @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -std=c++11" } */ + +struct complex +{ + double real; + double imag; +}; + +complex blub(complex z) +{ + return { + z.real * z.real - z.imag * z.imag, + 2 * z.real * z.imag + }; +} + +/* { dg-final { scan-assembler-not "movq" } } */ +/* { dg-final { scan-assembler-not "xchg" } } */