Message ID | 7ea24dfc-34dd-4931-8614-6fd0ef86972a@gmail.com |
---|---|
State | New |
Headers | show |
Series | [RFC] tree-if-conv: Handle nonzero masked elements [PR115336]. | expand |
On Fri, 5 Jul 2024, Robin Dapp wrote: > Hi, > > in PR115336 we have the following > > vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0); > vect_iftmp.44 = vect_patt_391 | { 1, ... }; > .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44); > > which assumes that a maskload sets the masked-out elements to zero. To me this looks like mis-applying of match.pd:6083? Applying pattern match.pd:6083, gimple-match-1.cc:45749 gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219; new phi replacement stmt iftmp.0_62 = iftmp.0_61 | _219; so originally it wasn't iftmp.0_61 = .MASK_LOAD (_260, 8B, _259); iftmp.0_62 = iftmp.0_61 | _219; but sth like iftmp.0_62 = <mask> ? _219 : iftmp.0_61; and iftmp.0_61 is zero-one-valued because it is _Bool. Shouldn't if-conversion always turn the COND_EXPR into a .COND_IOR here? > RVV does not guarantee this and, unfortunately, leaves it to the > hardware implementation to do basically anything it wants (even keep > the previous value). In the PR this leads to us reusing a previous > vector register and stale, nonzero values causing an invalid result. > > Based on a Richard Sandiford's feedback I started with the attached > patch - it's more an RFC in its current shape and obviously not > tested exhaustively. > > The idea is: > - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value) > after a MASK_LOAD if the target requires it. > - Elide the VCOND_MASK when there is a COND_OP with a replacing else > value later. > > Is this, generally, reasonable or is there a better way? > > My initial idea was to add an else value to MASK_LOAD. Richard's > concern was, though, that this might make nonzero else values > appear inexpensive when they are actually not. > > Even though I, mechanically, added match.pd patterns to catch > the most common cases (already at a point where a separate function > maybe in gimple-match-exports? would make more sense), there is > still significant code-quality fallout. > The regressing cases are often of a form where the VCOND_MASK is > not just a conversion away but rather hidden behind some unmasked > operation. I'm not sure if we could ever recognize everything that > way without descending very deep. > > Regards > Robin > > gcc/ChangeLog: > > PR middle-end/115336 > > * config/riscv/riscv.cc (riscv_preferred_else_value): Add > MASK_LOAD. > * config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): > Set to true. > * defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New. > * doc/tm.texi: Document. > * doc/tm.texi.in: Document. > * match.pd: Add patterns to allow replacing the else value of a > VEC_COND. > * tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND > after a MASK_LOAD if the target does not guarantee zeroing. > (predicate_statements): Add temporary lhs argument. > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold > result. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr115336.c: New test. > --- > gcc/config/riscv/riscv.cc | 2 + > gcc/config/riscv/riscv.h | 2 + > gcc/defaults.h | 4 + > gcc/doc/tm.texi | 5 + > gcc/doc/tm.texi.in | 5 + > gcc/match.pd | 125 +++++++++++++++++- > .../gcc.target/riscv/rvv/autovec/pr115336.c | 20 +++ > gcc/tree-if-conv.cc | 57 +++++++- > gcc/tree-ssa-math-opts.cc | 8 ++ > 9 files changed, 217 insertions(+), 11 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index c17141d909a..e10bc6824b9 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops, > { > if (riscv_v_ext_mode_p (TYPE_MODE (vectype))) > return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); > + else if (ifn == IFN_MASK_LOAD) > + return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); > > return default_preferred_else_value (ifn, vectype, nops, ops); > } > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h > index 57910eecd3e..dc15eb5e60f 100644 > --- a/gcc/config/riscv/riscv.h > +++ b/gcc/config/riscv/riscv.h > @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls (void); > /* Check TLS Descriptors mechanism is selected. */ > #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS) > > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1 > + > #endif /* ! GCC_RISCV_H */ > diff --git a/gcc/defaults.h b/gcc/defaults.h > index 92f3e07f742..6ffbdaea229 100644 > --- a/gcc/defaults.h > +++ b/gcc/defaults.h > @@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > #define DWARF_VERSION_DEFAULT 5 > #endif > > +#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0 > +#endif > + > #ifndef USED_FOR_TARGET > /* Done this way to keep gengtype happy. */ > #if BITS_PER_UNIT == 8 > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > index be5543b72f8..2fdebe7fd21 100644 > --- a/gcc/doc/tm.texi > +++ b/gcc/doc/tm.texi > @@ -12683,6 +12683,11 @@ maintainer is familiar with. > > @end defmac > > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > +Bla > + > +@end defmac > + > @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active}) > This hook is used to determine the level of target support for > @code{__builtin_speculation_safe_value}. If called with an argument > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > index 87a7f895174..276c38325dc 100644 > --- a/gcc/doc/tm.texi.in > +++ b/gcc/doc/tm.texi.in > @@ -8097,6 +8097,11 @@ maintainer is familiar with. > > @end defmac > > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > +Bla > + > +@end defmac > + > @hook TARGET_HAVE_SPECULATION_SAFE_VALUE > > @hook TARGET_SPECULATION_SAFE_VALUE > diff --git a/gcc/match.pd b/gcc/match.pd > index 3d0689c9312..2fdc14ef56f 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -9744,7 +9744,22 @@ and, > #endif > > /* Detect cases in which a VEC_COND_EXPR effectively replaces the > - "else" value of an IFN_COND_*. */ > + "else" value of an IFN_COND_* as well as when the IFN_COND_* > + replaces the else of the VEC_COND_EXPR. */ > +(for cond_op (COND_UNARY) > + (simplify > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 (view_convert:op_type @1) @3))))) > + > +(for cond_len_op (COND_LEN_UNARY) > + (simplify > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5))))) > + > (for cond_op (COND_BINARY) > (simplify > (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4) > @@ -9756,7 +9771,27 @@ and, > (with { tree op_type = TREE_TYPE (@5); } > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > - (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))) > + (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))) > + (simplify > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 (view_convert:op_type @1) @3 @4)))) > + (simplify > + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 (view_convert:op_type @2) @4)))) > + (simplify > + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 (convert:op_type @1) @3 @4)))) > + (simplify > + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 (convert:op_type @2) @4))))) > > /* Same for ternary operations. */ > (for cond_op (COND_TERNARY) > @@ -9770,7 +9805,37 @@ and, > (with { tree op_type = TREE_TYPE (@6); } > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > - (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > + (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))) > + (simplify > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 (view_convert:op_type @1) @3 @4 @5)))) > + (simplify > + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 (view_convert:op_type @2) @4 @5)))) > + (simplify > + (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 @2 (view_convert:op_type @3) @5)))) > + (simplify > + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 (convert:op_type @1) @3 @4 @5)))) > + (simplify > + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 (convert:op_type @2) @4 @5)))) > + (simplify > + (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_op @0 @1 @2 (convert:op_type @3) @5))))) > > /* Detect cases in which a VEC_COND_EXPR effectively replaces the > "else" value of an IFN_COND_LEN_*. */ > @@ -9785,7 +9850,27 @@ and, > (with { tree op_type = TREE_TYPE (@5); } > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > - (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))) > + (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))) > + (simplify > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6)))) > + (simplify > + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6)))) > + (simplify > + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6)))) > + (simplify > + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6))))) > > /* Same for ternary operations. */ > (for cond_len_op (COND_LEN_TERNARY) > @@ -9799,7 +9884,37 @@ and, > (with { tree op_type = TREE_TYPE (@6); } > (if (inverse_conditions_p (@0, @2) > && element_precision (type) == element_precision (op_type)) > - (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))) > + (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))) > + (simplify > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7)))) > + (simplify > + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7)))) > + (simplify > + (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7)))) > + (simplify > + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@3); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7)))) > + (simplify > + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7)))) > + (simplify > + (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7) > + (with { tree op_type = TREE_TYPE (@1); } > + (if (element_precision (type) == element_precision (op_type)) > + (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7))))) > > /* Detect simplication for a conditional reduction where > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > new file mode 100644 > index 00000000000..29e55705a7a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d } } */ > + > +short d[19]; > +_Bool e[100][19][19]; > +_Bool f[10000]; > + > +int main() > +{ > + for (long g = 0; g < 19; ++g) > + d[g] = 3; > + _Bool(*h)[19][19] = e; > + for (short g = 0; g < 9; g++) > + for (int i = 4; i < 16; i += 3) > + f[i * 9 + g] = d[i] ? d[i] : h[g][i][2]; > + for (long i = 120; i < 122; ++i) > + __builtin_printf("%d\n", f[i]); > +} > + > +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 57992b6deca..c0c7013c817 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-vectorizer.h" > #include "tree-eh.h" > #include "cgraph.h" > +#include "tree-dfa.h" > > /* For lang_hooks.types.type_for_mode. */ > #include "langhooks.h" > @@ -2453,11 +2454,14 @@ mask_exists (int size, const vec<int> &vec) > that does so. */ > > static gimple * > -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) > +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask, > + tree override_lhs = NULL_TREE) > { > gcall *new_stmt; > > - tree lhs = gimple_assign_lhs (stmt); > + tree lhs = override_lhs; > + if (!lhs) > + lhs = gimple_assign_lhs (stmt); > tree rhs = gimple_assign_rhs1 (stmt); > tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; > mark_addressable (ref); > @@ -2789,11 +2793,52 @@ predicate_statements (loop_p loop) > vect_masks.safe_push (mask); > } > if (gimple_assign_single_p (stmt)) > - new_stmt = predicate_load_or_store (&gsi, stmt, mask); > - else > - new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); > + { > + bool target_has_else > + = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO; > + tree lhs = gimple_get_lhs (stmt); > + > + bool is_load = TREE_CODE (lhs) == SSA_NAME; > + > + gimple_seq stmts2 = NULL; > + tree tmplhs = is_load && target_has_else > + ? make_temp_ssa_name (TREE_TYPE (lhs), NULL, > + "_ifc_") : lhs; > + gimple *call_stmt > + = predicate_load_or_store (&gsi, stmt, mask, tmplhs); > + > + gimple_seq_add_stmt (&stmts2, call_stmt); > + > + if (lhs != tmplhs) > + ssa_names.add (tmplhs); > > - gsi_replace (&gsi, new_stmt, true); > + if (is_load && target_has_else) > + { > + unsigned nops = gimple_call_num_args (call_stmt); > + tree *ops = XALLOCAVEC (tree, nops); > + > + for (unsigned i = 0; i < nops; i++) > + ops[i] = gimple_call_arg (call_stmt, i); > + > + tree els_operand > + = targetm.preferred_else_value (IFN_MASK_LOAD, > + TREE_TYPE (lhs), > + nops, ops); > + > + tree rhs > + = fold_build_cond_expr (TREE_TYPE (tmplhs), > + mask, tmplhs, els_operand); > + gassign *cond_stmt > + = gimple_build_assign (gimple_get_lhs (stmt), rhs); > + gimple_seq_add_stmt (&stmts2, cond_stmt); > + } > + gsi_replace_with_seq (&gsi, stmts2, true); > + } > + else > + { > + new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); > + gsi_replace (&gsi, new_stmt, true); > + } > } > else if (((lhs = gimple_assign_lhs (stmt)), true) > && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > index 57085488722..65e6f91fe8b 100644 > --- a/gcc/tree-ssa-math-opts.cc > +++ b/gcc/tree-ssa-math-opts.cc > @@ -3193,6 +3193,14 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2) > /* Follow all SSA edges so that we generate FMS, FNMA and FNMS > regardless of where the negation occurs. */ > gimple *orig_stmt = gsi_stmt (gsi); > + if (fold_stmt (&gsi, follow_all_ssa_edges)) > + { > + if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) > + gcc_unreachable (); > + update_stmt (gsi_stmt (gsi)); > + } > + /* Fold the result again. */ > + orig_stmt = gsi_stmt (gsi); > if (fold_stmt (&gsi, follow_all_ssa_edges)) > { > if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) >
On Fri, 5 Jul 2024, Richard Biener wrote: > On Fri, 5 Jul 2024, Robin Dapp wrote: > > > Hi, > > > > in PR115336 we have the following > > > > vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0); > > vect_iftmp.44 = vect_patt_391 | { 1, ... }; > > .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44); > > > > which assumes that a maskload sets the masked-out elements to zero. > > To me this looks like mis-applying of match.pd:6083? > > Applying pattern match.pd:6083, gimple-match-1.cc:45749 > gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219; > new phi replacement stmt > iftmp.0_62 = iftmp.0_61 | _219; > > so originally it wasn't > > iftmp.0_61 = .MASK_LOAD (_260, 8B, _259); > iftmp.0_62 = iftmp.0_61 | _219; > > but sth like > > iftmp.0_62 = <mask> ? _219 : iftmp.0_61; > > and iftmp.0_61 is zero-one-valued because it is _Bool. Shouldn't > if-conversion always turn the COND_EXPR into a .COND_IOR here? > > > RVV does not guarantee this and, unfortunately, leaves it to the > > hardware implementation to do basically anything it wants (even keep > > the previous value). In the PR this leads to us reusing a previous > > vector register and stale, nonzero values causing an invalid result. > > > > Based on a Richard Sandiford's feedback I started with the attached > > patch - it's more an RFC in its current shape and obviously not > > tested exhaustively. > > > > The idea is: > > - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value) > > after a MASK_LOAD if the target requires it. > > - Elide the VCOND_MASK when there is a COND_OP with a replacing else > > value later. > > > > Is this, generally, reasonable or is there a better way? > > > > My initial idea was to add an else value to MASK_LOAD. Richard's > > concern was, though, that this might make nonzero else values > > appear inexpensive when they are actually not. In general I'd agree that we want an else value for .MASK_LOAD and also possibly for .MASK_STORE (but then we need to represent an else value that says do-not-change/merge). > > Even though I, mechanically, added match.pd patterns to catch > > the most common cases (already at a point where a separate function > > maybe in gimple-match-exports? would make more sense), there is > > still significant code-quality fallout. > > The regressing cases are often of a form where the VCOND_MASK is > > not just a conversion away but rather hidden behind some unmasked > > operation. I'm not sure if we could ever recognize everything that > > way without descending very deep. I think the vectorizer would need to track masks in a better way, much like we have the SLP permute optimization phase we'd have a mask optimization phase. Richard. > > Regards > > Robin > > > > gcc/ChangeLog: > > > > PR middle-end/115336 > > > > * config/riscv/riscv.cc (riscv_preferred_else_value): Add > > MASK_LOAD. > > * config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): > > Set to true. > > * defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New. > > * doc/tm.texi: Document. > > * doc/tm.texi.in: Document. > > * match.pd: Add patterns to allow replacing the else value of a > > VEC_COND. > > * tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND > > after a MASK_LOAD if the target does not guarantee zeroing. > > (predicate_statements): Add temporary lhs argument. > > * tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold > > result. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/riscv/rvv/autovec/pr115336.c: New test. > > --- > > gcc/config/riscv/riscv.cc | 2 + > > gcc/config/riscv/riscv.h | 2 + > > gcc/defaults.h | 4 + > > gcc/doc/tm.texi | 5 + > > gcc/doc/tm.texi.in | 5 + > > gcc/match.pd | 125 +++++++++++++++++- > > .../gcc.target/riscv/rvv/autovec/pr115336.c | 20 +++ > > gcc/tree-if-conv.cc | 57 +++++++- > > gcc/tree-ssa-math-opts.cc | 8 ++ > > 9 files changed, 217 insertions(+), 11 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > > > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > > index c17141d909a..e10bc6824b9 100644 > > --- a/gcc/config/riscv/riscv.cc > > +++ b/gcc/config/riscv/riscv.cc > > @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops, > > { > > if (riscv_v_ext_mode_p (TYPE_MODE (vectype))) > > return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); > > + else if (ifn == IFN_MASK_LOAD) > > + return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); > > > > return default_preferred_else_value (ifn, vectype, nops, ops); > > } > > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h > > index 57910eecd3e..dc15eb5e60f 100644 > > --- a/gcc/config/riscv/riscv.h > > +++ b/gcc/config/riscv/riscv.h > > @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls (void); > > /* Check TLS Descriptors mechanism is selected. */ > > #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS) > > > > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1 > > + > > #endif /* ! GCC_RISCV_H */ > > diff --git a/gcc/defaults.h b/gcc/defaults.h > > index 92f3e07f742..6ffbdaea229 100644 > > --- a/gcc/defaults.h > > +++ b/gcc/defaults.h > > @@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see > > #define DWARF_VERSION_DEFAULT 5 > > #endif > > > > +#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0 > > +#endif > > + > > #ifndef USED_FOR_TARGET > > /* Done this way to keep gengtype happy. */ > > #if BITS_PER_UNIT == 8 > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > index be5543b72f8..2fdebe7fd21 100644 > > --- a/gcc/doc/tm.texi > > +++ b/gcc/doc/tm.texi > > @@ -12683,6 +12683,11 @@ maintainer is familiar with. > > > > @end defmac > > > > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > > +Bla > > + > > +@end defmac > > + > > @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active}) > > This hook is used to determine the level of target support for > > @code{__builtin_speculation_safe_value}. If called with an argument > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > index 87a7f895174..276c38325dc 100644 > > --- a/gcc/doc/tm.texi.in > > +++ b/gcc/doc/tm.texi.in > > @@ -8097,6 +8097,11 @@ maintainer is familiar with. > > > > @end defmac > > > > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO > > +Bla > > + > > +@end defmac > > + > > @hook TARGET_HAVE_SPECULATION_SAFE_VALUE > > > > @hook TARGET_SPECULATION_SAFE_VALUE > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 3d0689c9312..2fdc14ef56f 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -9744,7 +9744,22 @@ and, > > #endif > > > > /* Detect cases in which a VEC_COND_EXPR effectively replaces the > > - "else" value of an IFN_COND_*. */ > > + "else" value of an IFN_COND_* as well as when the IFN_COND_* > > + replaces the else of the VEC_COND_EXPR. */ > > +(for cond_op (COND_UNARY) > > + (simplify > > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 (view_convert:op_type @1) @3))))) > > + > > +(for cond_len_op (COND_LEN_UNARY) > > + (simplify > > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5))))) > > + > > (for cond_op (COND_BINARY) > > (simplify > > (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4) > > @@ -9756,7 +9771,27 @@ and, > > (with { tree op_type = TREE_TYPE (@5); } > > (if (inverse_conditions_p (@0, @2) > > && element_precision (type) == element_precision (op_type)) > > - (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))) > > + (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))) > > + (simplify > > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 (view_convert:op_type @1) @3 @4)))) > > + (simplify > > + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 (view_convert:op_type @2) @4)))) > > + (simplify > > + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 (convert:op_type @1) @3 @4)))) > > + (simplify > > + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 (convert:op_type @2) @4))))) > > > > /* Same for ternary operations. */ > > (for cond_op (COND_TERNARY) > > @@ -9770,7 +9805,37 @@ and, > > (with { tree op_type = TREE_TYPE (@6); } > > (if (inverse_conditions_p (@0, @2) > > && element_precision (type) == element_precision (op_type)) > > - (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) > > + (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))) > > + (simplify > > + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 (view_convert:op_type @1) @3 @4 @5)))) > > + (simplify > > + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 (view_convert:op_type @2) @4 @5)))) > > + (simplify > > + (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 @2 (view_convert:op_type @3) @5)))) > > + (simplify > > + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 (convert:op_type @1) @3 @4 @5)))) > > + (simplify > > + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 (convert:op_type @2) @4 @5)))) > > + (simplify > > + (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_op @0 @1 @2 (convert:op_type @3) @5))))) > > > > /* Detect cases in which a VEC_COND_EXPR effectively replaces the > > "else" value of an IFN_COND_LEN_*. */ > > @@ -9785,7 +9850,27 @@ and, > > (with { tree op_type = TREE_TYPE (@5); } > > (if (inverse_conditions_p (@0, @2) > > && element_precision (type) == element_precision (op_type)) > > - (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))) > > + (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))) > > + (simplify > > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6)))) > > + (simplify > > + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6)))) > > + (simplify > > + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6)))) > > + (simplify > > + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6))))) > > > > /* Same for ternary operations. */ > > (for cond_len_op (COND_LEN_TERNARY) > > @@ -9799,7 +9884,37 @@ and, > > (with { tree op_type = TREE_TYPE (@6); } > > (if (inverse_conditions_p (@0, @2) > > && element_precision (type) == element_precision (op_type)) > > - (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))) > > + (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))) > > + (simplify > > + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7)))) > > + (simplify > > + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7)))) > > + (simplify > > + (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7)))) > > + (simplify > > + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@3); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7)))) > > + (simplify > > + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7)))) > > + (simplify > > + (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7) > > + (with { tree op_type = TREE_TYPE (@1); } > > + (if (element_precision (type) == element_precision (op_type)) > > + (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7))))) > > > > /* Detect simplication for a conditional reduction where > > > > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > > new file mode 100644 > > index 00000000000..29e55705a7a > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c > > @@ -0,0 +1,20 @@ > > +/* { dg-do compile } */ > > +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d } } */ > > + > > +short d[19]; > > +_Bool e[100][19][19]; > > +_Bool f[10000]; > > + > > +int main() > > +{ > > + for (long g = 0; g < 19; ++g) > > + d[g] = 3; > > + _Bool(*h)[19][19] = e; > > + for (short g = 0; g < 9; g++) > > + for (int i = 4; i < 16; i += 3) > > + f[i * 9 + g] = d[i] ? d[i] : h[g][i][2]; > > + for (long i = 120; i < 122; ++i) > > + __builtin_printf("%d\n", f[i]); > > +} > > + > > +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */ > > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > > index 57992b6deca..c0c7013c817 100644 > > --- a/gcc/tree-if-conv.cc > > +++ b/gcc/tree-if-conv.cc > > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > > #include "tree-vectorizer.h" > > #include "tree-eh.h" > > #include "cgraph.h" > > +#include "tree-dfa.h" > > > > /* For lang_hooks.types.type_for_mode. */ > > #include "langhooks.h" > > @@ -2453,11 +2454,14 @@ mask_exists (int size, const vec<int> &vec) > > that does so. */ > > > > static gimple * > > -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) > > +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask, > > + tree override_lhs = NULL_TREE) > > { > > gcall *new_stmt; > > > > - tree lhs = gimple_assign_lhs (stmt); > > + tree lhs = override_lhs; > > + if (!lhs) > > + lhs = gimple_assign_lhs (stmt); > > tree rhs = gimple_assign_rhs1 (stmt); > > tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; > > mark_addressable (ref); > > @@ -2789,11 +2793,52 @@ predicate_statements (loop_p loop) > > vect_masks.safe_push (mask); > > } > > if (gimple_assign_single_p (stmt)) > > - new_stmt = predicate_load_or_store (&gsi, stmt, mask); > > - else > > - new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); > > + { > > + bool target_has_else > > + = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO; > > + tree lhs = gimple_get_lhs (stmt); > > + > > + bool is_load = TREE_CODE (lhs) == SSA_NAME; > > + > > + gimple_seq stmts2 = NULL; > > + tree tmplhs = is_load && target_has_else > > + ? make_temp_ssa_name (TREE_TYPE (lhs), NULL, > > + "_ifc_") : lhs; > > + gimple *call_stmt > > + = predicate_load_or_store (&gsi, stmt, mask, tmplhs); > > + > > + gimple_seq_add_stmt (&stmts2, call_stmt); > > + > > + if (lhs != tmplhs) > > + ssa_names.add (tmplhs); > > > > - gsi_replace (&gsi, new_stmt, true); > > + if (is_load && target_has_else) > > + { > > + unsigned nops = gimple_call_num_args (call_stmt); > > + tree *ops = XALLOCAVEC (tree, nops); > > + > > + for (unsigned i = 0; i < nops; i++) > > + ops[i] = gimple_call_arg (call_stmt, i); > > + > > + tree els_operand > > + = targetm.preferred_else_value (IFN_MASK_LOAD, > > + TREE_TYPE (lhs), > > + nops, ops); > > + > > + tree rhs > > + = fold_build_cond_expr (TREE_TYPE (tmplhs), > > + mask, tmplhs, els_operand); > > + gassign *cond_stmt > > + = gimple_build_assign (gimple_get_lhs (stmt), rhs); > > + gimple_seq_add_stmt (&stmts2, cond_stmt); > > + } > > + gsi_replace_with_seq (&gsi, stmts2, true); > > + } > > + else > > + { > > + new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); > > + gsi_replace (&gsi, new_stmt, true); > > + } > > } > > else if (((lhs = gimple_assign_lhs (stmt)), true) > > && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > > index 57085488722..65e6f91fe8b 100644 > > --- a/gcc/tree-ssa-math-opts.cc > > +++ b/gcc/tree-ssa-math-opts.cc > > @@ -3193,6 +3193,14 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2) > > /* Follow all SSA edges so that we generate FMS, FNMA and FNMS > > regardless of where the negation occurs. */ > > gimple *orig_stmt = gsi_stmt (gsi); > > + if (fold_stmt (&gsi, follow_all_ssa_edges)) > > + { > > + if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) > > + gcc_unreachable (); > > + update_stmt (gsi_stmt (gsi)); > > + } > > + /* Fold the result again. */ > > + orig_stmt = gsi_stmt (gsi); > > if (fold_stmt (&gsi, follow_all_ssa_edges)) > > { > > if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) > > > >
Robin Dapp <rdapp.gcc@gmail.com> writes: > Hi, > > in PR115336 we have the following > > vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0); > vect_iftmp.44 = vect_patt_391 | { 1, ... }; > .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44); > > which assumes that a maskload sets the masked-out elements to zero. > RVV does not guarantee this and, unfortunately, leaves it to the > hardware implementation to do basically anything it wants (even keep > the previous value). In the PR this leads to us reusing a previous > vector register and stale, nonzero values causing an invalid result. > > Based on a Richard Sandiford's feedback I started with the attached > patch - it's more an RFC in its current shape and obviously not > tested exhaustively. > > The idea is: > - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value) > after a MASK_LOAD if the target requires it. > - Elide the VCOND_MASK when there is a COND_OP with a replacing else > value later. > > Is this, generally, reasonable or is there a better way? > > My initial idea was to add an else value to MASK_LOAD. Richard's > concern was, though, that this might make nonzero else values > appear inexpensive when they are actually not. Yeah, but like I said off-list, I still think an else operand is the way to go. It would be quite a bit of work to retrofit at this point, but it feels like the right long-term solution. FTR, my concern & suggestion was: I suppose the difficulty is that we might make: MASK_LOAD (mask, ptr, some-arbitrary-else-value) seem as cheap as: MASK_LOAD (mask, ptr, { 0, 0,. ... 0}) which definitely isn't the case for SVE (and I'm guessing also for AVX). It would be better to keep the: COND_EXPR (mask, ..., some-arbitrary-else-value) separate and try to optimise it with surrounding code. Perhaps we could use the optab's insn predicates to test whether a given else value is acceptable? We already do something similar for the gather load arguments. Since then, there's been discussion about taking the same predicated-based approach for vec_cmp. That is, we'd add the else value to the maskload optab and MASK_LOAD internal function, but use predicates to test whether a target supports a particular else value. Thanks, Richard
Richard Biener <rguenther@suse.de> writes: > On Fri, 5 Jul 2024, Richard Biener wrote: > >> On Fri, 5 Jul 2024, Robin Dapp wrote: >> >> > Hi, >> > >> > in PR115336 we have the following >> > >> > vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0); >> > vect_iftmp.44 = vect_patt_391 | { 1, ... }; >> > .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44); >> > >> > which assumes that a maskload sets the masked-out elements to zero. >> >> To me this looks like mis-applying of match.pd:6083? >> >> Applying pattern match.pd:6083, gimple-match-1.cc:45749 >> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219; >> new phi replacement stmt >> iftmp.0_62 = iftmp.0_61 | _219; >> >> so originally it wasn't >> >> iftmp.0_61 = .MASK_LOAD (_260, 8B, _259); >> iftmp.0_62 = iftmp.0_61 | _219; >> >> but sth like >> >> iftmp.0_62 = <mask> ? _219 : iftmp.0_61; >> >> and iftmp.0_61 is zero-one-valued because it is _Bool. Shouldn't >> if-conversion always turn the COND_EXPR into a .COND_IOR here? >> >> > RVV does not guarantee this and, unfortunately, leaves it to the >> > hardware implementation to do basically anything it wants (even keep >> > the previous value). In the PR this leads to us reusing a previous >> > vector register and stale, nonzero values causing an invalid result. >> > >> > Based on a Richard Sandiford's feedback I started with the attached >> > patch - it's more an RFC in its current shape and obviously not >> > tested exhaustively. >> > >> > The idea is: >> > - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value) >> > after a MASK_LOAD if the target requires it. >> > - Elide the VCOND_MASK when there is a COND_OP with a replacing else >> > value later. >> > >> > Is this, generally, reasonable or is there a better way? >> > >> > My initial idea was to add an else value to MASK_LOAD. Richard's >> > concern was, though, that this might make nonzero else values >> > appear inexpensive when they are actually not. > > In general I'd agree that we want an else value for .MASK_LOAD > and also possibly for .MASK_STORE (but then we need to represent an > else value that says do-not-change/merge). Not sure about MASK_STORE. AFAICT, an else value for MASK_STORE would be directly equivalent to: tmp = vec_cond_expr <mask, store_value, else_value>; *ptr = tmp; and IMO it'd be better to keep representing it like that instead. That is, MASK_STORE is changing something that already has a definition (the target memory), and so there's no ambiguity in having a masked operation that only changes part of the target. But for MASK_LOAD and other masked vector-producing functions, we're creating a new full vector based on a partial vector operation. There's no existing target to modify, and so we need to get the definition of the inactive elements from somewhere else. Thanks, Richard
> Am 05.07.2024 um 16:00 schrieb Richard Sandiford <richard.sandiford@arm.com>: > > Richard Biener <rguenther@suse.de> writes: >>> On Fri, 5 Jul 2024, Richard Biener wrote: >>> >>>> On Fri, 5 Jul 2024, Robin Dapp wrote: >>> >>>> Hi, >>>> >>>> in PR115336 we have the following >>>> >>>> vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0); >>>> vect_iftmp.44 = vect_patt_391 | { 1, ... }; >>>> .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44); >>>> >>>> which assumes that a maskload sets the masked-out elements to zero. >>> >>> To me this looks like mis-applying of match.pd:6083? >>> >>> Applying pattern match.pd:6083, gimple-match-1.cc:45749 >>> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219; >>> new phi replacement stmt >>> iftmp.0_62 = iftmp.0_61 | _219; >>> >>> so originally it wasn't >>> >>> iftmp.0_61 = .MASK_LOAD (_260, 8B, _259); >>> iftmp.0_62 = iftmp.0_61 | _219; >>> >>> but sth like >>> >>> iftmp.0_62 = <mask> ? _219 : iftmp.0_61; >>> >>> and iftmp.0_61 is zero-one-valued because it is _Bool. Shouldn't >>> if-conversion always turn the COND_EXPR into a .COND_IOR here? >>> >>>> RVV does not guarantee this and, unfortunately, leaves it to the >>>> hardware implementation to do basically anything it wants (even keep >>>> the previous value). In the PR this leads to us reusing a previous >>>> vector register and stale, nonzero values causing an invalid result. >>>> >>>> Based on a Richard Sandiford's feedback I started with the attached >>>> patch - it's more an RFC in its current shape and obviously not >>>> tested exhaustively. >>>> >>>> The idea is: >>>> - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value) >>>> after a MASK_LOAD if the target requires it. >>>> - Elide the VCOND_MASK when there is a COND_OP with a replacing else >>>> value later. >>>> >>>> Is this, generally, reasonable or is there a better way? >>>> >>>> My initial idea was to add an else value to MASK_LOAD. Richard's >>>> concern was, though, that this might make nonzero else values >>>> appear inexpensive when they are actually not. >> >> In general I'd agree that we want an else value for .MASK_LOAD >> and also possibly for .MASK_STORE (but then we need to represent an >> else value that says do-not-change/merge). > > Not sure about MASK_STORE. AFAICT, an else value for MASK_STORE would > be directly equivalent to: > > tmp = vec_cond_expr <mask, store_value, else_value>; > *ptr = tmp; > > and IMO it'd be better to keep representing it like that instead. > > That is, MASK_STORE is changing something that already has a definition > (the target memory), and so there's no ambiguity in having a masked > operation that only changes part of the target. > > But for MASK_LOAD and other masked vector-producing functions, we're > creating a new full vector based on a partial vector operation. > There's no existing target to modify, and so we need to get the > definition of the inactive elements from somewhere else. Agreed. I also think the predicate idea for the else value in MASK_LOAD is good. Richard > Thanks, > Richard
> FTR, my concern & suggestion was: > > I suppose the difficulty is that we might make: > > MASK_LOAD (mask, ptr, some-arbitrary-else-value) > > seem as cheap as: > > MASK_LOAD (mask, ptr, { 0, 0,. ... 0}) > > which definitely isn't the case for SVE (and I'm guessing also > for AVX). It would be better to keep the: > > COND_EXPR (mask, ..., some-arbitrary-else-value) > > separate and try to optimise it with surrounding code. Ah, I rather understood that as an "either or" rather than an "and". > That is, we'd add the else value to the maskload optab and MASK_LOAD > internal function, but use predicates to test whether a target supports > a particular else value. So e.g. for SVE we'd have res = MASK_LOAD (mask, ptr, {0, 0, ..., 0}) COND_EXPR (mask, res, {0, 0, ...}) which would always be folded/optimized away, while for RVV we'd have res = MASK_LOAD (mask, ptr, whatever_else_value) COND_EXPR (mask, res, {0, 0, ...}) that could/should result in a separate instruction depending on the rest of the code? Regards Robin
> To me this looks like mis-applying of match.pd:6083? > > Applying pattern match.pd:6083, gimple-match-1.cc:45749 > gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219; > new phi replacement stmt > iftmp.0_62 = iftmp.0_61 | _219; > > so originally it wasn't > > iftmp.0_61 = .MASK_LOAD (_260, 8B, _259); > iftmp.0_62 = iftmp.0_61 | _219; > > but sth like > > iftmp.0_62 = <mask> ? _219 : iftmp.0_61; > > and iftmp.0_61 is zero-one-valued because it is _Bool. Shouldn't > if-conversion always turn the COND_EXPR into a .COND_IOR here? Yes, just that the mask is inverted in the original COND_EXPR. Hmm, with COND_IOR we'd explicitly set the else value to the target's preferred else value then I presume. That's probably more elegant than an extra VEC_COND but I could imagine seeing the same problems of redundant VEC_CONDs not being folded. Regards Robin
Robin Dapp <rdapp.gcc@gmail.com> writes: >> FTR, my concern & suggestion was: >> >> I suppose the difficulty is that we might make: >> >> MASK_LOAD (mask, ptr, some-arbitrary-else-value) >> >> seem as cheap as: >> >> MASK_LOAD (mask, ptr, { 0, 0,. ... 0}) >> >> which definitely isn't the case for SVE (and I'm guessing also >> for AVX). It would be better to keep the: >> >> COND_EXPR (mask, ..., some-arbitrary-else-value) >> >> separate and try to optimise it with surrounding code. > > Ah, I rather understood that as an "either or" rather than an "and". Yeah, could have been clearer, sorry. >> That is, we'd add the else value to the maskload optab and MASK_LOAD >> internal function, but use predicates to test whether a target supports >> a particular else value. > > So e.g. for SVE we'd have > > res = MASK_LOAD (mask, ptr, {0, 0, ..., 0}) > COND_EXPR (mask, res, {0, 0, ...}) > > which would always be folded/optimized away, while for RVV > we'd have > > res = MASK_LOAD (mask, ptr, whatever_else_value) > COND_EXPR (mask, res, {0, 0, ...}) > > that could/should result in a separate instruction depending on > the rest of the code? Yeah, I think so. I guess for RVV there's a choice between: (1) making the insn predicate accept all else values and making the insn emit an explicit blend between the loaded result and the else value (2) making the insn predicate only accept “undefined” (SCRATCH in rtl terms) (2) sounds more in keeping with Juzhe's fix for PR110751. Thanks, Richard
> Yeah, I think so. I guess for RVV there's a choice between: > > (1) making the insn predicate accept all else values and making > the insn emit an explicit blend between the loaded result > and the else value > > (2) making the insn predicate only accept “undefined” (SCRATCH in > rtl terms) > > (2) sounds more in keeping with Juzhe's fix for PR110751. I think (2) is the reasonable choice for the future, however: I misinterpreted the RVV spec. The implementation has the choice between filling with ones or leaving unchanged (merging). So it's not really "anything" as I claimed before. This implies that we could defer to target_preferred_else_value and eventually, in the target, leave it to the uarch what to return (either "1" or "undefined"). On the general strategy: Should we (for SVE, AVX512 etc.) have a match pattern that folds res = MASK_LOAD (mask, ptr, {0, 0, ..., 0}) COND_EXPR (mask, res, {0, 0, ...}) into just MASK_LOAD then? MASK_LOAD (..., {0, 0, ..., 0}) would not be emitted for RVV and we'd need to fold the COND_EXPR differently, maybe with a combination of match patterns similar to the ones in my RFC as well as a separate "pass" for mask tracking as Richi suggested? Regards Robin p.s. We have the choice of different policies for length masking and element masking. I don't think the problem arose yet due to the way we handle length control in the vectorizer but I could imagine, at least in principle, the need for two separate else values for MASK_LEN.
> Am 07.07.2024 um 11:26 schrieb Robin Dapp <rdapp.gcc@gmail.com>: > > >> >> Yeah, I think so. I guess for RVV there's a choice between: >> >> (1) making the insn predicate accept all else values and making >> the insn emit an explicit blend between the loaded result >> and the else value >> >> (2) making the insn predicate only accept “undefined” (SCRATCH in >> rtl terms) >> >> (2) sounds more in keeping with Juzhe's fix for PR110751. > > > I think (2) is the reasonable choice for the future, however: > > I misinterpreted the RVV spec. The implementation has the choice > between filling with ones or leaving unchanged (merging). So it's > not really "anything" as I claimed before. > This implies that we could defer to target_preferred_else_value and > eventually, in the target, leave it to the uarch what to return > (either "1" or "undefined"). > > On the general strategy: > Should we (for SVE, AVX512 etc.) have a match pattern that folds > > res = MASK_LOAD (mask, ptr, {0, 0, ..., 0}) > COND_EXPR (mask, res, {0, 0, ...}) > > into just MASK_LOAD then? MASK_LOAD (..., {0, 0, ..., 0}) would not > be emitted for RVV and we'd need to fold the COND_EXPR differently, > maybe with a combination of match patterns similar to the ones in > my RFC as well as a separate "pass" for mask tracking as Richi > suggested? I think it’s less than ideal to only do this via match but I think we cannot do without the match patterns if you think of unrolling and simplification exposing the opportunities. Richard > Regards > Robin > > p.s. We have the choice of different policies for length masking and > element masking. I don't think the problem arose yet due to the way > we handle length control in the vectorizer but I could imagine, at > least in principle, the need for two separate else values for MASK_LEN.
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index c17141d909a..e10bc6824b9 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops, { if (riscv_v_ext_mode_p (TYPE_MODE (vectype))) return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); + else if (ifn == IFN_MASK_LOAD) + return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype)); return default_preferred_else_value (ifn, vectype, nops, ops); } diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h index 57910eecd3e..dc15eb5e60f 100644 --- a/gcc/config/riscv/riscv.h +++ b/gcc/config/riscv/riscv.h @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls (void); /* Check TLS Descriptors mechanism is selected. */ #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS) +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1 + #endif /* ! GCC_RISCV_H */ diff --git a/gcc/defaults.h b/gcc/defaults.h index 92f3e07f742..6ffbdaea229 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define DWARF_VERSION_DEFAULT 5 #endif +#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0 +#endif + #ifndef USED_FOR_TARGET /* Done this way to keep gengtype happy. */ #if BITS_PER_UNIT == 8 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index be5543b72f8..2fdebe7fd21 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -12683,6 +12683,11 @@ maintainer is familiar with. @end defmac +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO +Bla + +@end defmac + @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active}) This hook is used to determine the level of target support for @code{__builtin_speculation_safe_value}. If called with an argument diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 87a7f895174..276c38325dc 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -8097,6 +8097,11 @@ maintainer is familiar with. @end defmac +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO +Bla + +@end defmac + @hook TARGET_HAVE_SPECULATION_SAFE_VALUE @hook TARGET_SPECULATION_SAFE_VALUE diff --git a/gcc/match.pd b/gcc/match.pd index 3d0689c9312..2fdc14ef56f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -9744,7 +9744,22 @@ and, #endif /* Detect cases in which a VEC_COND_EXPR effectively replaces the - "else" value of an IFN_COND_*. */ + "else" value of an IFN_COND_* as well as when the IFN_COND_* + replaces the else of the VEC_COND_EXPR. */ +(for cond_op (COND_UNARY) + (simplify + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 (view_convert:op_type @1) @3))))) + +(for cond_len_op (COND_LEN_UNARY) + (simplify + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5))))) + (for cond_op (COND_BINARY) (simplify (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4) @@ -9756,7 +9771,27 @@ and, (with { tree op_type = TREE_TYPE (@5); } (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) - (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))) + (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))) + (simplify + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 (view_convert:op_type @1) @3 @4)))) + (simplify + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 (view_convert:op_type @2) @4)))) + (simplify + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 (convert:op_type @1) @3 @4)))) + (simplify + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 (convert:op_type @2) @4))))) /* Same for ternary operations. */ (for cond_op (COND_TERNARY) @@ -9770,7 +9805,37 @@ and, (with { tree op_type = TREE_TYPE (@6); } (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) - (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))) + (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))) + (simplify + (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 (view_convert:op_type @1) @3 @4 @5)))) + (simplify + (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 (view_convert:op_type @2) @4 @5)))) + (simplify + (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 @2 (view_convert:op_type @3) @5)))) + (simplify + (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 (convert:op_type @1) @3 @4 @5)))) + (simplify + (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 (convert:op_type @2) @4 @5)))) + (simplify + (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_op @0 @1 @2 (convert:op_type @3) @5))))) /* Detect cases in which a VEC_COND_EXPR effectively replaces the "else" value of an IFN_COND_LEN_*. */ @@ -9785,7 +9850,27 @@ and, (with { tree op_type = TREE_TYPE (@5); } (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) - (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))) + (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))) + (simplify + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6)))) + (simplify + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6)))) + (simplify + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6)))) + (simplify + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6))))) /* Same for ternary operations. */ (for cond_len_op (COND_LEN_TERNARY) @@ -9799,7 +9884,37 @@ and, (with { tree op_type = TREE_TYPE (@6); } (if (inverse_conditions_p (@0, @2) && element_precision (type) == element_precision (op_type)) - (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))) + (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))) + (simplify + (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7)))) + (simplify + (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7)))) + (simplify + (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7)))) + (simplify + (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7) + (with { tree op_type = TREE_TYPE (@3); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7)))) + (simplify + (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7)))) + (simplify + (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7) + (with { tree op_type = TREE_TYPE (@1); } + (if (element_precision (type) == element_precision (op_type)) + (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7))))) /* Detect simplication for a conditional reduction where diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c new file mode 100644 index 00000000000..29e55705a7a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d } } */ + +short d[19]; +_Bool e[100][19][19]; +_Bool f[10000]; + +int main() +{ + for (long g = 0; g < 19; ++g) + d[g] = 3; + _Bool(*h)[19][19] = e; + for (short g = 0; g < 9; g++) + for (int i = 4; i < 16; i += 3) + f[i * 9 + g] = d[i] ? d[i] : h[g][i][2]; + for (long i = 120; i < 122; ++i) + __builtin_printf("%d\n", f[i]); +} + +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */ diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc index 57992b6deca..c0c7013c817 100644 --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-vectorizer.h" #include "tree-eh.h" #include "cgraph.h" +#include "tree-dfa.h" /* For lang_hooks.types.type_for_mode. */ #include "langhooks.h" @@ -2453,11 +2454,14 @@ mask_exists (int size, const vec<int> &vec) that does so. */ static gimple * -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask) +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask, + tree override_lhs = NULL_TREE) { gcall *new_stmt; - tree lhs = gimple_assign_lhs (stmt); + tree lhs = override_lhs; + if (!lhs) + lhs = gimple_assign_lhs (stmt); tree rhs = gimple_assign_rhs1 (stmt); tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs; mark_addressable (ref); @@ -2789,11 +2793,52 @@ predicate_statements (loop_p loop) vect_masks.safe_push (mask); } if (gimple_assign_single_p (stmt)) - new_stmt = predicate_load_or_store (&gsi, stmt, mask); - else - new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + { + bool target_has_else + = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO; + tree lhs = gimple_get_lhs (stmt); + + bool is_load = TREE_CODE (lhs) == SSA_NAME; + + gimple_seq stmts2 = NULL; + tree tmplhs = is_load && target_has_else + ? make_temp_ssa_name (TREE_TYPE (lhs), NULL, + "_ifc_") : lhs; + gimple *call_stmt + = predicate_load_or_store (&gsi, stmt, mask, tmplhs); + + gimple_seq_add_stmt (&stmts2, call_stmt); + + if (lhs != tmplhs) + ssa_names.add (tmplhs); - gsi_replace (&gsi, new_stmt, true); + if (is_load && target_has_else) + { + unsigned nops = gimple_call_num_args (call_stmt); + tree *ops = XALLOCAVEC (tree, nops); + + for (unsigned i = 0; i < nops; i++) + ops[i] = gimple_call_arg (call_stmt, i); + + tree els_operand + = targetm.preferred_else_value (IFN_MASK_LOAD, + TREE_TYPE (lhs), + nops, ops); + + tree rhs + = fold_build_cond_expr (TREE_TYPE (tmplhs), + mask, tmplhs, els_operand); + gassign *cond_stmt + = gimple_build_assign (gimple_get_lhs (stmt), rhs); + gimple_seq_add_stmt (&stmts2, cond_stmt); + } + gsi_replace_with_seq (&gsi, stmts2, true); + } + else + { + new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names); + gsi_replace (&gsi, new_stmt, true); + } } else if (((lhs = gimple_assign_lhs (stmt)), true) && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index 57085488722..65e6f91fe8b 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -3193,6 +3193,14 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2) /* Follow all SSA edges so that we generate FMS, FNMA and FNMS regardless of where the negation occurs. */ gimple *orig_stmt = gsi_stmt (gsi); + if (fold_stmt (&gsi, follow_all_ssa_edges)) + { + if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) + gcc_unreachable (); + update_stmt (gsi_stmt (gsi)); + } + /* Fold the result again. */ + orig_stmt = gsi_stmt (gsi); if (fold_stmt (&gsi, follow_all_ssa_edges)) { if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))