diff mbox series

[1/2,Middle-end] Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask).

Message ID 20241015032955.3677006-2-hongtao.liu@intel.com
State New
Headers show
Series Canonicalize (vec_merge (fma op1 op2 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask) | expand

Commit Message

Liu, Hongtao Oct. 15, 2024, 3:29 a.m. UTC
For x86 masked fma, there're 2 rtl representations
1) (vec_merge (fma op2 op1 op3) op1 mask)
2) (vec_merge (fma op1 op2 op3) op1 mask).

 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
 5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
 5896        (vec_merge:VFH_AVX512VL
 5897          (fma:VFH_AVX512VL
 5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
 5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
 5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
 5901          (match_dup 1)
 5902          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
 5903  "TARGET_AVX512F && <round_mode_condition>"
 5904  "@
 5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
 5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
 5907  [(set_attr "type" "ssemuladd")
 5908   (set_attr "prefix" "evex")
 5909   (set_attr "mode" "<MODE>")])

Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
we once tried to replace it with (match_operand:M 5
"nonimmediate_operand" "0")) to enable more flexibility for pattern
match and recog, but it triggered an ICE in reload(reload can handle
at most one perand with "0" constraint).

So we need either add 2 patterns in the backend or just do the
canonicalization in the middle-end.

gcc/ChangeLog:

	* combine.cc (maybe_swap_commutative_operands):
	Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
	to (vec_merge (fma op1 op2 op3) op1 mask).
---
 gcc/combine.cc | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

Comments

Richard Biener Oct. 15, 2024, 12:02 p.m. UTC | #1
On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> For x86 masked fma, there're 2 rtl representations
> 1) (vec_merge (fma op2 op1 op3) op1 mask)
> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>
>  5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
>  5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>  5896        (vec_merge:VFH_AVX512VL
>  5897          (fma:VFH_AVX512VL
>  5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>  5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
>  5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
>  5901          (match_dup 1)
>  5902          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
>  5903  "TARGET_AVX512F && <round_mode_condition>"
>  5904  "@
>  5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
>  5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
>  5907  [(set_attr "type" "ssemuladd")
>  5908   (set_attr "prefix" "evex")
>  5909   (set_attr "mode" "<MODE>")])
>
> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
> we once tried to replace it with (match_operand:M 5
> "nonimmediate_operand" "0")) to enable more flexibility for pattern
> match and recog, but it triggered an ICE in reload(reload can handle
> at most one perand with "0" constraint).
>
> So we need either add 2 patterns in the backend or just do the
> canonicalization in the middle-end.

Nice spot to handle this.  OK with the minor not below fixed
and in case there are no further comments from CCed folks.

I'll note there's (vec_select (vec_concat (....)) as alternate
way to perform a (vec_merge ...) but I don't feel strongly
for supporting that alternative without evidence it's used.
aarch64 seems to use an UNSPEC for masking but it
seems to have at least two patterns to merge with
either the first or the third input but failing to handle the
other (second) operand of a multiplication (*cond_fma<mode>_2 and _4);
as both are "register_operand" I don't see how canonicalization
works there?  Of course we can't do anything for UNSPECs.
RISC-V has mastered to obfuscate its machine description so I
have no idea ;)

Richard.

> gcc/ChangeLog:
>
>         * combine.cc (maybe_swap_commutative_operands):
>         Canonicalize (vec_merge (fma op2 op1 op3) op1 mask)
>         to (vec_merge (fma op1 op2 op3) op1 mask).
> ---
>  gcc/combine.cc | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index fef06a6cdc0..aa40fdcc50d 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -5656,6 +5656,31 @@ maybe_swap_commutative_operands (rtx x)
>        SUBST (XEXP (x, 1), temp);
>      }
>
> +  /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to
> +     (vec_merge (fma op1 op2 op3) op1 mask).  */
> +  if (GET_CODE (x) == VEC_MERGE
> +      && GET_CODE (XEXP (x, 0)) == FMA)
> +    {
> +      rtx fma_op1 = XEXP (XEXP (x, 0), 0);
> +      rtx fma_op2 = XEXP (XEXP (x, 0), 1);
> +      rtx masked_op = XEXP (x, 1);
> +      if (rtx_equal_p (masked_op, fma_op2))
> +       {
> +         if (GET_CODE (fma_op1) == NEG)
> +           {

please add a comment like

  /* Keep the negate canonicalized to the first operand.  */

> +             fma_op1 = XEXP (fma_op1, 0);
> +             SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2);
> +             SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
> +           }
> +         else
> +           {
> +             SUBST (XEXP (XEXP (x, 0), 0), fma_op2);
> +             SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
> +           }
> +

stray vertical space

> +       }
> +    }
> +
>    unsigned n_elts = 0;
>    if (GET_CODE (x) == VEC_MERGE
>        && CONST_INT_P (XEXP (x, 2))
> --
> 2.31.1
>
Richard Sandiford Oct. 15, 2024, 2:32 p.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao.liu@intel.com> wrote:
>>
>> For x86 masked fma, there're 2 rtl representations
>> 1) (vec_merge (fma op2 op1 op3) op1 mask)
>> 2) (vec_merge (fma op1 op2 op3) op1 mask).
>>
>>  5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>"
>>  5895  [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v")
>>  5896        (vec_merge:VFH_AVX512VL
>>  5897          (fma:VFH_AVX512VL
>>  5898            (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0")
>>  5899            (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v")
>>  5900            (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>"))
>>  5901          (match_dup 1)
>>  5902          (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))]
>>  5903  "TARGET_AVX512F && <round_mode_condition>"
>>  5904  "@
>>  5905   vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>}
>>  5906   vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}"
>>  5907  [(set_attr "type" "ssemuladd")
>>  5908   (set_attr "prefix" "evex")
>>  5909   (set_attr "mode" "<MODE>")])
>>
>> Here op1 has constraint "0", and the scecond op1 is (match_dup 1),
>> we once tried to replace it with (match_operand:M 5
>> "nonimmediate_operand" "0")) to enable more flexibility for pattern
>> match and recog, but it triggered an ICE in reload(reload can handle
>> at most one perand with "0" constraint).
>>
>> So we need either add 2 patterns in the backend or just do the
>> canonicalization in the middle-end.
>
> Nice spot to handle this.  OK with the minor not below fixed
> and in case there are no further comments from CCed folks.
>
> I'll note there's (vec_select (vec_concat (....)) as alternate
> way to perform a (vec_merge ...) but I don't feel strongly
> for supporting that alternative without evidence it's used.
> aarch64 seems to use an UNSPEC for masking but it
> seems to have at least two patterns to merge with
> either the first or the third input but failing to handle the
> other (second) operand of a multiplication (*cond_fma<mode>_2 and _4);
> as both are "register_operand" I don't see how canonicalization
> works there?

We try to handle that in the expander instead:

    /* Swap the multiplication operands if the fallback value is the
       second of the two.  */
    if (rtx_equal_p (operands[3], operands[5]))
      std::swap (operands[2], operands[3]);

I suppose that doesn't help if the equivalence is only discovered
by later RTL optimisations.  Hopefully that should be rare though.

> Of course we can't do anything for UNSPECs.

Yeah.  If we were going to represent the SVE operations in generic RTL,
then I think the natural choice would be if_then_else.  vec_merge wouldn't
really work, since the predicates are vector-like rather than integers,
and since the operands can be variable length.

But there were two reasons for not doing that:

(1) It doesn't accurately describe the behaviour of FP operations.
    E.g. an if_then_else or vec_merge of an fma does a full-vector fma
    and then discards some of the results.  Taken at face value, that
    should raise the same exceptions as a plain fma.  The SVE instructions
    instead only raise exceptions for active lanes.

(2) For SVE, the predicate is often mandatory.  I was worried that if we
    exposed it as generic RTL, simplify-rtx might apply some tricks that
    get rid of the condition/predicate, and so get trapped in dead-end
    recog attempts that made less progress than the UNSPEC versions.

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/combine.cc b/gcc/combine.cc
index fef06a6cdc0..aa40fdcc50d 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -5656,6 +5656,31 @@  maybe_swap_commutative_operands (rtx x)
       SUBST (XEXP (x, 1), temp);
     }
 
+  /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to
+     (vec_merge (fma op1 op2 op3) op1 mask).  */
+  if (GET_CODE (x) == VEC_MERGE
+      && GET_CODE (XEXP (x, 0)) == FMA)
+    {
+      rtx fma_op1 = XEXP (XEXP (x, 0), 0);
+      rtx fma_op2 = XEXP (XEXP (x, 0), 1);
+      rtx masked_op = XEXP (x, 1);
+      if (rtx_equal_p (masked_op, fma_op2))
+	{
+	  if (GET_CODE (fma_op1) == NEG)
+	    {
+	      fma_op1 = XEXP (fma_op1, 0);
+	      SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2);
+	      SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
+	    }
+	  else
+	    {
+	      SUBST (XEXP (XEXP (x, 0), 0), fma_op2);
+	      SUBST (XEXP (XEXP (x, 0), 1), fma_op1);
+	    }
+
+	}
+    }
+
   unsigned n_elts = 0;
   if (GET_CODE (x) == VEC_MERGE
       && CONST_INT_P (XEXP (x, 2))