diff mbox series

[x86,SSE] PR target/115751: Avoid force_reg in ix86_expand_ternlog.

Message ID 01d101dace43$91fae2c0$b5f0a840$@nextmovesoftware.com
State New
Headers show
Series [x86,SSE] PR target/115751: Avoid force_reg in ix86_expand_ternlog. | expand

Commit Message

Roger Sayle July 4, 2024, 6:54 p.m. UTC
This patch fixes a problem with splitting of complex AVX512 ternlog
instructions on x86_64.  A recent change allows the ternlog pattern
to have multiple mem-like operands prior to reload, by emitting any
"reloads" as necessary during split1, before register allocation.
The issue is that this code calls force_reg to place the mem-like
operand into a register, but unfortunately the vec_duplicate (broadcast)
form of operands supported by ternlog isn't considered a "general_operand",
i.e. supported by all instructions.  This mismatch triggers an ICE in
the middle-end's force_reg, even though the x86 supports loading these
vec_duplicate operands into a vector register in a single (move)
instruction.

This patch resolves this problem by replacing force_reg with calls
to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
knows these will be recognized by recog).

I'll admit that I've been unable to reproduce this error without a
testcase, but my assumption when developing the previous patch was
that was safe to call force_reg on a vec_duplicate, which this PR
shows to be wrong (currently).  I'll let smarter minds pronounce on
whether changing i386.md's definition of general_operand may be an
alternate solution, but such a change can be independent of this fix.
[I've a related patch to expand the CONST_VECTORs allowed in
ix86_legitimate_constant_p before reload, but keeping everything
happy is tricky].

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/115751
        * config/i386/i386-expand.c (ix86_expand_ternlog): Avoid use of
        force_reg to "reload" non-register operands, as these may contain
        vec_duplicate (broadcast) operands that aren't supported by
        force_reg.  Use (safer) gen_reg_rtx and emit_move instead.


Thanks in advance (sorry for the inconvenience),
Roger
--

Comments

Hongtao Liu July 5, 2024, 12:06 a.m. UTC | #1
On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch fixes a problem with splitting of complex AVX512 ternlog
> instructions on x86_64.  A recent change allows the ternlog pattern
> to have multiple mem-like operands prior to reload, by emitting any
> "reloads" as necessary during split1, before register allocation.
> The issue is that this code calls force_reg to place the mem-like
> operand into a register, but unfortunately the vec_duplicate (broadcast)
> form of operands supported by ternlog isn't considered a "general_operand",
> i.e. supported by all instructions.  This mismatch triggers an ICE in
> the middle-end's force_reg, even though the x86 supports loading these
> vec_duplicate operands into a vector register in a single (move)
> instruction.
>
> This patch resolves this problem by replacing force_reg with calls
> to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
> knows these will be recognized by recog).
>
> I'll admit that I've been unable to reproduce this error without a
> testcase, but my assumption when developing the previous patch was
> that was safe to call force_reg on a vec_duplicate, which this PR
> shows to be wrong (currently).  I'll let smarter minds pronounce on
> whether changing i386.md's definition of general_operand may be an
> alternate solution, but such a change can be independent of this fix.
> [I've a related patch to expand the CONST_VECTORs allowed in
> ix86_legitimate_constant_p before reload, but keeping everything
> happy is tricky].
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
Ok.
>
>
> 2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/115751
>         * config/i386/i386-expand.c (ix86_expand_ternlog): Avoid use of
>         force_reg to "reload" non-register operands, as these may contain
>         vec_duplicate (broadcast) operands that aren't supported by
>         force_reg.  Use (safer) gen_reg_rtx and emit_move instead.
>
>
> Thanks in advance (sorry for the inconvenience),
> Roger
> --
>
Hongtao Liu July 5, 2024, 12:34 a.m. UTC | #2
On Fri, Jul 5, 2024 at 8:06 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch fixes a problem with splitting of complex AVX512 ternlog
> > instructions on x86_64.  A recent change allows the ternlog pattern
> > to have multiple mem-like operands prior to reload, by emitting any
> > "reloads" as necessary during split1, before register allocation.
> > The issue is that this code calls force_reg to place the mem-like
> > operand into a register, but unfortunately the vec_duplicate (broadcast)
> > form of operands supported by ternlog isn't considered a "general_operand",
> > i.e. supported by all instructions.  This mismatch triggers an ICE in
> > the middle-end's force_reg, even though the x86 supports loading these
> > vec_duplicate operands into a vector register in a single (move)
> > instruction.
> >
> > This patch resolves this problem by replacing force_reg with calls
> > to gen_reg_rtx and emit_move (as the i386 backend, unlike the middle-end,
> > knows these will be recognized by recog).
> >
> > I'll admit that I've been unable to reproduce this error without a
> > testcase, but my assumption when developing the previous patch was
> > that was safe to call force_reg on a vec_duplicate, which this PR
> > shows to be wrong (currently).  I'll let smarter minds pronounce on
> > whether changing i386.md's definition of general_operand may be an
> > alternate solution, but such a change can be independent of this fix.
> > [I've a related patch to expand the CONST_VECTORs allowed in
> > ix86_legitimate_constant_p before reload, but keeping everything
> > happy is tricky].
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> Ok.
There're also some other places that use force_reg or gen_lowpart in
switch (idx & 0xff), I think they're also buggy when op is
vec_duplicate.

> >
> >
> > 2024-07-04  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/115751
> >         * config/i386/i386-expand.c (ix86_expand_ternlog): Avoid use of
> >         force_reg to "reload" non-register operands, as these may contain
> >         vec_duplicate (broadcast) operands that aren't supported by
> >         force_reg.  Use (safer) gen_reg_rtx and emit_move instead.
> >
> >
> > Thanks in advance (sorry for the inconvenience),
> > Roger
> > --
> >
>
>
> --
> BR,
> Hongtao
diff mbox series

Patch

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index dd2c3a8..178dd0b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -26049,14 +26049,25 @@  ix86_expand_ternlog (machine_mode mode, rtx op0, rtx op1, rtx op2, int idx,
       break;
     }
 
-  tmp0 = register_operand (op0, mode) ? op0 : force_reg (mode, op0);
+  if (!register_operand (op0, mode))
+    {
+      /* We can't use force_reg (mode, op0).  */
+      tmp0 = gen_reg_rtx (GET_MODE (op0));
+      emit_move_insn (tmp0,op0);
+    }
+  else
+    tmp0 = op0;
   if (GET_MODE (tmp0) != mode)
     tmp0 = gen_lowpart (mode, tmp0);
 
   if (!op1 || rtx_equal_p (op0, op1))
     tmp1 = copy_rtx (tmp0);
   else if (!register_operand (op1, mode))
-    tmp1 = force_reg (mode, op1);
+    {
+      /* We can't use force_reg (mode, op1).  */
+      tmp1 = gen_reg_rtx (GET_MODE (op1));
+      emit_move_insn (tmp1, op1);
+    }
   else
     tmp1 = op1;
   if (GET_MODE (tmp1) != mode)