Message ID | 01d101dace43$91fae2c0$b5f0a840$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86,SSE] PR target/115751: Avoid force_reg in ix86_expand_ternlog. | expand |
On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch fixes a problem with splitting of complex AVX512 ternlog > instructions on x86_64. A recent change allows the ternlog pattern > to have multiple mem-like operands prior to reload, by emitting any > "reloads" as necessary during split1, before register allocation. > The issue is that this code calls force_reg to place the mem-like > operand into a register, but unfortunately the vec_duplicate (broadcast) > form of operands supported by ternlog isn't considered a "general_operand", > i.e. supported by all instructions. This mismatch triggers an ICE in > the middle-end's force_reg, even though the x86 supports loading these > vec_duplicate operands into a vector register in a single (move) > instruction. > > This patch resolves this problem by replacing force_reg with calls > to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end, > knows these will be recognized by recog). > > I'll admit that I've been unable to reproduce this error without a > testcase, but my assumption when developing the previous patch was > that was safe to call force_reg on a vec_duplicate, which this PR > shows to be wrong (currently). I'll let smarter minds pronounce on > whether changing i386.md's definition of general_operand may be an > alternate solution, but such a change can be independent of this fix. > [I've a related patch to expand the CONST_VECTORs allowed in > ix86_legitimate_constant_p before reload, but keeping everything > happy is tricky]. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? Ok. > > > 2024-07-04 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/115751 > * config/i386/i386-expand.c (ix86_expand_ternlog): Avoid use of > force_reg to "reload" non-register operands, as these may contain > vec_duplicate (broadcast) operands that aren't supported by > force_reg. Use (safer) gen_reg_rtx and emit_move instead. > > > Thanks in advance (sorry for the inconvenience), > Roger > -- >
On Fri, Jul 5, 2024 at 8:06 AM Hongtao Liu <crazylht@gmail.com> wrote: > > On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > > > > This patch fixes a problem with splitting of complex AVX512 ternlog > > instructions on x86_64. A recent change allows the ternlog pattern > > to have multiple mem-like operands prior to reload, by emitting any > > "reloads" as necessary during split1, before register allocation. > > The issue is that this code calls force_reg to place the mem-like > > operand into a register, but unfortunately the vec_duplicate (broadcast) > > form of operands supported by ternlog isn't considered a "general_operand", > > i.e. supported by all instructions. This mismatch triggers an ICE in > > the middle-end's force_reg, even though the x86 supports loading these > > vec_duplicate operands into a vector register in a single (move) > > instruction. > > > > This patch resolves this problem by replacing force_reg with calls > > to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end, > > knows these will be recognized by recog). > > > > I'll admit that I've been unable to reproduce this error without a > > testcase, but my assumption when developing the previous patch was > > that was safe to call force_reg on a vec_duplicate, which this PR > > shows to be wrong (currently). I'll let smarter minds pronounce on > > whether changing i386.md's definition of general_operand may be an > > alternate solution, but such a change can be independent of this fix. > > [I've a related patch to expand the CONST_VECTORs allowed in > > ix86_legitimate_constant_p before reload, but keeping everything > > happy is tricky]. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > Ok. There're also some other places that use force_reg or gen_lowpart in switch (idx & 0xff), I think they're also buggy when op is vec_duplicate. > > > > > > 2024-07-04 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/115751 > > * config/i386/i386-expand.c (ix86_expand_ternlog): Avoid use of > > force_reg to "reload" non-register operands, as these may contain > > vec_duplicate (broadcast) operands that aren't supported by > > force_reg. Use (safer) gen_reg_rtx and emit_move instead. > > > > > > Thanks in advance (sorry for the inconvenience), > > Roger > > -- > > > > > -- > BR, > Hongtao
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index dd2c3a8..178dd0b 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -26049,14 +26049,25 @@ ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx, break; } - tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0); + if (!register_operand (op0, mode)) + { + /* We can't use force_reg (mode, op0). */ + tmp0 = gen_reg_rtx (GET_MODE (op0)); + emit_move_insn (tmp0,op0); + } + else + tmp0 = op0; if (GET_MODE (tmp0) != mode) tmp0 = gen_lowpart (mode, tmp0); if (!op1 || rtx_equal_p (op0, op1)) tmp1 = copy_rtx (tmp0); else if (!register_operand (op1, mode)) - tmp1 = force_reg (mode, op1); + { + /* We can't use force_reg (mode, op1). */ + tmp1 = gen_reg_rtx (GET_MODE (op1)); + emit_move_insn (tmp1, op1); + } else tmp1 = op1; if (GET_MODE (tmp1) != mode)