Message ID | 20241015032955.3677006-2-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Canonicalize (vec_merge (fma op1 op2 op3) op1 mask) to (vec_merge (fma op1 op2 op3) op1 mask) | expand |
On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao.liu@intel.com> wrote: > > For x86 masked fma, there're 2 rtl representations > 1) (vec_merge (fma op2 op1 op3) op1 mask) > 2) (vec_merge (fma op1 op2 op3) op1 mask). > > 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>" > 5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") > 5896 (vec_merge:VFH_AVX512VL > 5897 (fma:VFH_AVX512VL > 5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0") > 5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v") > 5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")) > 5901 (match_dup 1) > 5902 (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))] > 5903 "TARGET_AVX512F && <round_mode_condition>" > 5904 "@ > 5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>} > 5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}" > 5907 [(set_attr "type" "ssemuladd") > 5908 (set_attr "prefix" "evex") > 5909 (set_attr "mode" "<MODE>")]) > > Here op1 has constraint "0", and the scecond op1 is (match_dup 1), > we once tried to replace it with (match_operand:M 5 > "nonimmediate_operand" "0")) to enable more flexibility for pattern > match and recog, but it triggered an ICE in reload(reload can handle > at most one perand with "0" constraint). > > So we need either add 2 patterns in the backend or just do the > canonicalization in the middle-end. Nice spot to handle this. OK with the minor not below fixed and in case there are no further comments from CCed folks. I'll note there's (vec_select (vec_concat (....)) as alternate way to perform a (vec_merge ...) but I don't feel strongly for supporting that alternative without evidence it's used. aarch64 seems to use an UNSPEC for masking but it seems to have at least two patterns to merge with either the first or the third input but failing to handle the other (second) operand of a multiplication (*cond_fma<mode>_2 and _4); as both are "register_operand" I don't see how canonicalization works there? Of course we can't do anything for UNSPECs. RISC-V has mastered to obfuscate its machine description so I have no idea ;) Richard. > gcc/ChangeLog: > > * combine.cc (maybe_swap_commutative_operands): > Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) > to (vec_merge (fma op1 op2 op3) op1 mask). > --- > gcc/combine.cc | 25 +++++++++++++++++++++++++ > 1 file changed, 25 insertions(+) > > diff --git a/gcc/combine.cc b/gcc/combine.cc > index fef06a6cdc0..aa40fdcc50d 100644 > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -5656,6 +5656,31 @@ maybe_swap_commutative_operands (rtx x) > SUBST (XEXP (x, 1), temp); > } > > + /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to > + (vec_merge (fma op1 op2 op3) op1 mask). */ > + if (GET_CODE (x) == VEC_MERGE > + && GET_CODE (XEXP (x, 0)) == FMA) > + { > + rtx fma_op1 = XEXP (XEXP (x, 0), 0); > + rtx fma_op2 = XEXP (XEXP (x, 0), 1); > + rtx masked_op = XEXP (x, 1); > + if (rtx_equal_p (masked_op, fma_op2)) > + { > + if (GET_CODE (fma_op1) == NEG) > + { please add a comment like /* Keep the negate canonicalized to the first operand. */ > + fma_op1 = XEXP (fma_op1, 0); > + SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2); > + SUBST (XEXP (XEXP (x, 0), 1), fma_op1); > + } > + else > + { > + SUBST (XEXP (XEXP (x, 0), 0), fma_op2); > + SUBST (XEXP (XEXP (x, 0), 1), fma_op1); > + } > + stray vertical space > + } > + } > + > unsigned n_elts = 0; > if (GET_CODE (x) == VEC_MERGE > && CONST_INT_P (XEXP (x, 2)) > -- > 2.31.1 >
Richard Biener <richard.guenther@gmail.com> writes: > On Tue, Oct 15, 2024 at 5:30 AM liuhongt <hongtao.liu@intel.com> wrote: >> >> For x86 masked fma, there're 2 rtl representations >> 1) (vec_merge (fma op2 op1 op3) op1 mask) >> 2) (vec_merge (fma op1 op2 op3) op1 mask). >> >> 5894(define_insn "<avx512>_fmadd_<mode>_mask<round_name>" >> 5895 [(set (match_operand:VFH_AVX512VL 0 "register_operand" "=v,v") >> 5896 (vec_merge:VFH_AVX512VL >> 5897 (fma:VFH_AVX512VL >> 5898 (match_operand:VFH_AVX512VL 1 "nonimmediate_operand" "0,0") >> 5899 (match_operand:VFH_AVX512VL 2 "<round_nimm_predicate>" "<round_constraint>,v") >> 5900 (match_operand:VFH_AVX512VL 3 "<round_nimm_predicate>" "v,<round_constraint>")) >> 5901 (match_dup 1) >> 5902 (match_operand:<avx512fmaskmode> 4 "register_operand" "Yk,Yk")))] >> 5903 "TARGET_AVX512F && <round_mode_condition>" >> 5904 "@ >> 5905 vfmadd132<ssemodesuffix>\t{<round_op5>%2, %3, %0%{%4%}|%0%{%4%}, %3, %2<round_op5>} >> 5906 vfmadd213<ssemodesuffix>\t{<round_op5>%3, %2, %0%{%4%}|%0%{%4%}, %2, %3<round_op5>}" >> 5907 [(set_attr "type" "ssemuladd") >> 5908 (set_attr "prefix" "evex") >> 5909 (set_attr "mode" "<MODE>")]) >> >> Here op1 has constraint "0", and the scecond op1 is (match_dup 1), >> we once tried to replace it with (match_operand:M 5 >> "nonimmediate_operand" "0")) to enable more flexibility for pattern >> match and recog, but it triggered an ICE in reload(reload can handle >> at most one perand with "0" constraint). >> >> So we need either add 2 patterns in the backend or just do the >> canonicalization in the middle-end. > > Nice spot to handle this. OK with the minor not below fixed > and in case there are no further comments from CCed folks. > > I'll note there's (vec_select (vec_concat (....)) as alternate > way to perform a (vec_merge ...) but I don't feel strongly > for supporting that alternative without evidence it's used. > aarch64 seems to use an UNSPEC for masking but it > seems to have at least two patterns to merge with > either the first or the third input but failing to handle the > other (second) operand of a multiplication (*cond_fma<mode>_2 and _4); > as both are "register_operand" I don't see how canonicalization > works there? We try to handle that in the expander instead: /* Swap the multiplication operands if the fallback value is the second of the two. */ if (rtx_equal_p (operands[3], operands[5])) std::swap (operands[2], operands[3]); I suppose that doesn't help if the equivalence is only discovered by later RTL optimisations. Hopefully that should be rare though. > Of course we can't do anything for UNSPECs. Yeah. If we were going to represent the SVE operations in generic RTL, then I think the natural choice would be if_then_else. vec_merge wouldn't really work, since the predicates are vector-like rather than integers, and since the operands can be variable length. But there were two reasons for not doing that: (1) It doesn't accurately describe the behaviour of FP operations. E.g. an if_then_else or vec_merge of an fma does a full-vector fma and then discards some of the results. Taken at face value, that should raise the same exceptions as a plain fma. The SVE instructions instead only raise exceptions for active lanes. (2) For SVE, the predicate is often mandatory. I was worried that if we exposed it as generic RTL, simplify-rtx might apply some tricks that get rid of the condition/predicate, and so get trapped in dead-end recog attempts that made less progress than the UNSPEC versions. Thanks, Richard
diff --git a/gcc/combine.cc b/gcc/combine.cc index fef06a6cdc0..aa40fdcc50d 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -5656,6 +5656,31 @@ maybe_swap_commutative_operands (rtx x) SUBST (XEXP (x, 1), temp); } + /* Canonicalize (vec_merge (fma op2 op1 op3) op1 mask) to + (vec_merge (fma op1 op2 op3) op1 mask). */ + if (GET_CODE (x) == VEC_MERGE + && GET_CODE (XEXP (x, 0)) == FMA) + { + rtx fma_op1 = XEXP (XEXP (x, 0), 0); + rtx fma_op2 = XEXP (XEXP (x, 0), 1); + rtx masked_op = XEXP (x, 1); + if (rtx_equal_p (masked_op, fma_op2)) + { + if (GET_CODE (fma_op1) == NEG) + { + fma_op1 = XEXP (fma_op1, 0); + SUBST (XEXP (XEXP (XEXP (x, 0), 0), 0), fma_op2); + SUBST (XEXP (XEXP (x, 0), 1), fma_op1); + } + else + { + SUBST (XEXP (XEXP (x, 0), 0), fma_op2); + SUBST (XEXP (XEXP (x, 0), 1), fma_op1); + } + + } + } + unsigned n_elts = 0; if (GET_CODE (x) == VEC_MERGE && CONST_INT_P (XEXP (x, 2))