diff mbox series

[RFC] tree-if-conv: Handle nonzero masked elements [PR115336].

Message ID 7ea24dfc-34dd-4931-8614-6fd0ef86972a@gmail.com
State New
Headers show
Series [RFC] tree-if-conv: Handle nonzero masked elements [PR115336]. | expand

Commit Message

Robin Dapp July 5, 2024, 11:45 a.m. UTC
Hi,

in PR115336 we have the following

  vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
  vect_iftmp.44 = vect_patt_391 | { 1, ... };
  .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);

which assumes that a maskload sets the masked-out elements to zero.
RVV does not guarantee this and, unfortunately, leaves it to the
hardware implementation to do basically anything it wants (even keep
the previous value).  In the PR this leads to us reusing a previous
vector register and stale, nonzero values causing an invalid result.

Based on a Richard Sandiford's feedback I started with the attached
patch - it's more an RFC in its current shape and obviously not
tested exhaustively.

The idea is:
 - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
   after a MASK_LOAD if the target requires it.
 - Elide the VCOND_MASK when there is a COND_OP with a replacing else
   value later.

Is this, generally, reasonable or is there a better way?

My initial idea was to add an else value to MASK_LOAD.  Richard's
concern was, though, that this might make nonzero else values
appear inexpensive when they are actually not.

Even though I, mechanically, added match.pd patterns to catch
the most common cases (already at a point where a separate function
maybe in gimple-match-exports? would make more sense), there is
still significant code-quality fallout.
The regressing cases are often of a form where the VCOND_MASK is
not just a conversion away but rather hidden behind some unmasked
operation.  I'm not sure if we could ever recognize everything that
way without descending very deep.

Regards
 Robin

gcc/ChangeLog:

	PR middle-end/115336

	* config/riscv/riscv.cc (riscv_preferred_else_value): Add
	MASK_LOAD.
	* config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
	Set to true.
	* defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
	* doc/tm.texi: Document.
	* doc/tm.texi.in: Document.
	* match.pd: Add patterns to allow replacing the else value of a
	VEC_COND.
	* tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
	after a MASK_LOAD if the target does not guarantee zeroing.
	(predicate_statements): Add temporary lhs argument.
	* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
	result.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
---
 gcc/config/riscv/riscv.cc                     |   2 +
 gcc/config/riscv/riscv.h                      |   2 +
 gcc/defaults.h                                |   4 +
 gcc/doc/tm.texi                               |   5 +
 gcc/doc/tm.texi.in                            |   5 +
 gcc/match.pd                                  | 125 +++++++++++++++++-
 .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
 gcc/tree-if-conv.cc                           |  57 +++++++-
 gcc/tree-ssa-math-opts.cc                     |   8 ++
 9 files changed, 217 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c

Comments

Richard Biener July 5, 2024, 12:16 p.m. UTC | #1
On Fri, 5 Jul 2024, Robin Dapp wrote:

> Hi,
> 
> in PR115336 we have the following
> 
>   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
>   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
> 
> which assumes that a maskload sets the masked-out elements to zero.

To me this looks like mis-applying of match.pd:6083?

Applying pattern match.pd:6083, gimple-match-1.cc:45749
gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
new phi replacement stmt
iftmp.0_62 = iftmp.0_61 | _219;

so originally it wasn't

  iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
  iftmp.0_62 = iftmp.0_61 | _219;

but sth like

  iftmp.0_62 = <mask> ? _219 : iftmp.0_61;

and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
if-conversion always turn the COND_EXPR into a .COND_IOR here?


> RVV does not guarantee this and, unfortunately, leaves it to the
> hardware implementation to do basically anything it wants (even keep
> the previous value).  In the PR this leads to us reusing a previous
> vector register and stale, nonzero values causing an invalid result.
> 
> Based on a Richard Sandiford's feedback I started with the attached
> patch - it's more an RFC in its current shape and obviously not
> tested exhaustively.
> 
> The idea is:
>  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>    after a MASK_LOAD if the target requires it.
>  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>    value later.
> 
> Is this, generally, reasonable or is there a better way?
> 
> My initial idea was to add an else value to MASK_LOAD.  Richard's
> concern was, though, that this might make nonzero else values
> appear inexpensive when they are actually not.
> 
> Even though I, mechanically, added match.pd patterns to catch
> the most common cases (already at a point where a separate function
> maybe in gimple-match-exports? would make more sense), there is
> still significant code-quality fallout.
> The regressing cases are often of a form where the VCOND_MASK is
> not just a conversion away but rather hidden behind some unmasked
> operation.  I'm not sure if we could ever recognize everything that
> way without descending very deep.
> 
> Regards
>  Robin
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/115336
> 
> 	* config/riscv/riscv.cc (riscv_preferred_else_value): Add
> 	MASK_LOAD.
> 	* config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
> 	Set to true.
> 	* defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
> 	* doc/tm.texi: Document.
> 	* doc/tm.texi.in: Document.
> 	* match.pd: Add patterns to allow replacing the else value of a
> 	VEC_COND.
> 	* tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
> 	after a MASK_LOAD if the target does not guarantee zeroing.
> 	(predicate_statements): Add temporary lhs argument.
> 	* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
> 	result.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
> ---
>  gcc/config/riscv/riscv.cc                     |   2 +
>  gcc/config/riscv/riscv.h                      |   2 +
>  gcc/defaults.h                                |   4 +
>  gcc/doc/tm.texi                               |   5 +
>  gcc/doc/tm.texi.in                            |   5 +
>  gcc/match.pd                                  | 125 +++++++++++++++++-
>  .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
>  gcc/tree-if-conv.cc                           |  57 +++++++-
>  gcc/tree-ssa-math-opts.cc                     |   8 ++
>  9 files changed, 217 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> 
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index c17141d909a..e10bc6824b9 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops,
>  {
>    if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
>      return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> +  else if (ifn == IFN_MASK_LOAD)
> +    return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
>  
>    return default_preferred_else_value (ifn, vectype, nops, ops);
>  }
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 57910eecd3e..dc15eb5e60f 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
>  /* Check TLS Descriptors mechanism is selected.  */
>  #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
>  
> +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1
> +
>  #endif /* ! GCC_RISCV_H */
> diff --git a/gcc/defaults.h b/gcc/defaults.h
> index 92f3e07f742..6ffbdaea229 100644
> --- a/gcc/defaults.h
> +++ b/gcc/defaults.h
> @@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #define DWARF_VERSION_DEFAULT 5
>  #endif
>  
> +#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0
> +#endif
> +
>  #ifndef USED_FOR_TARGET
>  /* Done this way to keep gengtype happy.  */
>  #if BITS_PER_UNIT == 8
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index be5543b72f8..2fdebe7fd21 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12683,6 +12683,11 @@ maintainer is familiar with.
>  
>  @end defmac
>  
> +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> +Bla
> +
> +@end defmac
> +
>  @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active})
>  This hook is used to determine the level of target support for
>   @code{__builtin_speculation_safe_value}.  If called with an argument
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 87a7f895174..276c38325dc 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8097,6 +8097,11 @@ maintainer is familiar with.
>  
>  @end defmac
>  
> +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> +Bla
> +
> +@end defmac
> +
>  @hook TARGET_HAVE_SPECULATION_SAFE_VALUE
>  
>  @hook TARGET_SPECULATION_SAFE_VALUE
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3d0689c9312..2fdc14ef56f 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -9744,7 +9744,22 @@ and,
>  #endif
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> -   "else" value of an IFN_COND_*.  */
> +   "else" value of an IFN_COND_* as well as when the IFN_COND_*
> +   replaces the else of the VEC_COND_EXPR.  */
> +(for cond_op (COND_UNARY)
> + (simplify
> +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 (view_convert:op_type @1) @3)))))
> +
> +(for cond_len_op (COND_LEN_UNARY)
> + (simplify
> +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5)))))
> +
>  (for cond_op (COND_BINARY)
>   (simplify
>    (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4)
> @@ -9756,7 +9771,27 @@ and,
>    (with { tree op_type = TREE_TYPE (@5); }
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
> -    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))))
> +    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))
> + (simplify
> +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 (view_convert:op_type @1) @3 @4))))
> + (simplify
> +  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 (view_convert:op_type @2) @4))))
> + (simplify
> +  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 (convert:op_type @1) @3 @4))))
> + (simplify
> +  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 (convert:op_type @2) @4)))))
>  
>  /* Same for ternary operations.  */
>  (for cond_op (COND_TERNARY)
> @@ -9770,7 +9805,37 @@ and,
>    (with { tree op_type = TREE_TYPE (@6); }
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
> -    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> +    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))
> + (simplify
> +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 (view_convert:op_type @1) @3 @4 @5))))
> + (simplify
> +  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 (view_convert:op_type @2) @4 @5))))
> + (simplify
> +  (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 @2 (view_convert:op_type @3) @5))))
> + (simplify
> +  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 (convert:op_type @1) @3 @4 @5))))
> + (simplify
> +  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 (convert:op_type @2) @4 @5))))
> + (simplify
> +  (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_op @0 @1 @2 (convert:op_type @3) @5)))))
>  
>  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
>     "else" value of an IFN_COND_LEN_*.  */
> @@ -9785,7 +9850,27 @@ and,
>    (with { tree op_type = TREE_TYPE (@5); }
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
> -    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> +    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))
> + (simplify
> +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6))))
> + (simplify
> +  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6))))
> + (simplify
> +  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6))))
> + (simplify
> +  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6)))))
>  
>  /* Same for ternary operations.  */
>  (for cond_len_op (COND_LEN_TERNARY)
> @@ -9799,7 +9884,37 @@ and,
>    (with { tree op_type = TREE_TYPE (@6); }
>     (if (inverse_conditions_p (@0, @2)
>          && element_precision (type) == element_precision (op_type))
> -    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))))
> +    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))
> + (simplify
> +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7))))
> + (simplify
> +  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7))))
> + (simplify
> +  (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7))))
> + (simplify
> +  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@3); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7))))
> + (simplify
> +  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7))))
> + (simplify
> +  (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7)
> +  (with { tree op_type = TREE_TYPE (@1); }
> +   (if (element_precision (type) == element_precision (op_type))
> +    (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7)))))
>  
>  /* Detect simplication for a conditional reduction where
>  
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> new file mode 100644
> index 00000000000..29e55705a7a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d  } } */
> +
> +short d[19];
> +_Bool e[100][19][19];
> +_Bool f[10000];
> +
> +int main()
> +{
> +  for (long g = 0; g < 19; ++g)
> +    d[g] = 3;
> +  _Bool(*h)[19][19] = e;
> +  for (short g = 0; g < 9; g++)
> +    for (int i = 4; i < 16; i += 3)
> +      f[i * 9 + g] = d[i] ? d[i] : h[g][i][2];
> +  for (long i = 120; i < 122; ++i)
> +    __builtin_printf("%d\n", f[i]);
> +}
> +
> +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 57992b6deca..c0c7013c817 100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "tree-eh.h"
>  #include "cgraph.h"
> +#include "tree-dfa.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -2453,11 +2454,14 @@ mask_exists (int size, const vec<int> &vec)
>     that does so.  */
>  
>  static gimple *
> -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask)
> +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask,
> +			 tree override_lhs = NULL_TREE)
>  {
>    gcall *new_stmt;
>  
> -  tree lhs = gimple_assign_lhs (stmt);
> +  tree lhs = override_lhs;
> +  if (!lhs)
> +    lhs = gimple_assign_lhs (stmt);
>    tree rhs = gimple_assign_rhs1 (stmt);
>    tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
>    mark_addressable (ref);
> @@ -2789,11 +2793,52 @@ predicate_statements (loop_p loop)
>  		  vect_masks.safe_push (mask);
>  		}
>  	      if (gimple_assign_single_p (stmt))
> -		new_stmt = predicate_load_or_store (&gsi, stmt, mask);
> -	      else
> -		new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
> +		{
> +		  bool target_has_else
> +		    = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO;
> +		  tree lhs = gimple_get_lhs (stmt);
> +
> +		  bool is_load = TREE_CODE (lhs) == SSA_NAME;
> +
> +		  gimple_seq stmts2 = NULL;
> +		  tree tmplhs = is_load && target_has_else
> +		    ? make_temp_ssa_name (TREE_TYPE (lhs), NULL,
> +					  "_ifc_") : lhs;
> +		  gimple *call_stmt
> +		    = predicate_load_or_store (&gsi, stmt, mask, tmplhs);
> +
> +		  gimple_seq_add_stmt (&stmts2, call_stmt);
> +
> +		  if (lhs != tmplhs)
> +		    ssa_names.add (tmplhs);
>  
> -	      gsi_replace (&gsi, new_stmt, true);
> +		  if (is_load && target_has_else)
> +		    {
> +		      unsigned nops = gimple_call_num_args (call_stmt);
> +		      tree *ops = XALLOCAVEC (tree, nops);
> +
> +		      for (unsigned i = 0; i < nops; i++)
> +			ops[i] = gimple_call_arg (call_stmt, i);
> +
> +		      tree els_operand
> +			= targetm.preferred_else_value (IFN_MASK_LOAD,
> +							TREE_TYPE (lhs),
> +							nops, ops);
> +
> +		      tree rhs
> +			= fold_build_cond_expr (TREE_TYPE (tmplhs),
> +						mask, tmplhs, els_operand);
> +		      gassign *cond_stmt
> +			= gimple_build_assign (gimple_get_lhs (stmt), rhs);
> +		      gimple_seq_add_stmt (&stmts2, cond_stmt);
> +		    }
> +		  gsi_replace_with_seq (&gsi, stmts2, true);
> +		}
> +	      else
> +		{
> +		  new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
> +		  gsi_replace (&gsi, new_stmt, true);
> +		}
>  	    }
>  	  else if (((lhs = gimple_assign_lhs (stmt)), true)
>  		   && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index 57085488722..65e6f91fe8b 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -3193,6 +3193,14 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2)
>        /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>  	 regardless of where the negation occurs.  */
>        gimple *orig_stmt = gsi_stmt (gsi);
> +      if (fold_stmt (&gsi, follow_all_ssa_edges))
> +	{
> +	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
> +	    gcc_unreachable ();
> +	  update_stmt (gsi_stmt (gsi));
> +	}
> +      /* Fold the result again.  */
> +      orig_stmt = gsi_stmt (gsi);
>        if (fold_stmt (&gsi, follow_all_ssa_edges))
>  	{
>  	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>
Richard Biener July 5, 2024, 12:30 p.m. UTC | #2
On Fri, 5 Jul 2024, Richard Biener wrote:

> On Fri, 5 Jul 2024, Robin Dapp wrote:
> 
> > Hi,
> > 
> > in PR115336 we have the following
> > 
> >   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
> >   vect_iftmp.44 = vect_patt_391 | { 1, ... };
> >   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
> > 
> > which assumes that a maskload sets the masked-out elements to zero.
> 
> To me this looks like mis-applying of match.pd:6083?
> 
> Applying pattern match.pd:6083, gimple-match-1.cc:45749
> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
> new phi replacement stmt
> iftmp.0_62 = iftmp.0_61 | _219;
> 
> so originally it wasn't
> 
>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>   iftmp.0_62 = iftmp.0_61 | _219;
> 
> but sth like
> 
>   iftmp.0_62 = <mask> ? _219 : iftmp.0_61;
> 
> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
> if-conversion always turn the COND_EXPR into a .COND_IOR here?
> 
> > RVV does not guarantee this and, unfortunately, leaves it to the
> > hardware implementation to do basically anything it wants (even keep
> > the previous value).  In the PR this leads to us reusing a previous
> > vector register and stale, nonzero values causing an invalid result.
> > 
> > Based on a Richard Sandiford's feedback I started with the attached
> > patch - it's more an RFC in its current shape and obviously not
> > tested exhaustively.
> > 
> > The idea is:
> >  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
> >    after a MASK_LOAD if the target requires it.
> >  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
> >    value later.
> > 
> > Is this, generally, reasonable or is there a better way?
> > 
> > My initial idea was to add an else value to MASK_LOAD.  Richard's
> > concern was, though, that this might make nonzero else values
> > appear inexpensive when they are actually not.

In general I'd agree that we want an else value for .MASK_LOAD
and also possibly for .MASK_STORE (but then we need to represent an
else value that says do-not-change/merge).

> > Even though I, mechanically, added match.pd patterns to catch
> > the most common cases (already at a point where a separate function
> > maybe in gimple-match-exports? would make more sense), there is
> > still significant code-quality fallout.
> > The regressing cases are often of a form where the VCOND_MASK is
> > not just a conversion away but rather hidden behind some unmasked
> > operation.  I'm not sure if we could ever recognize everything that
> > way without descending very deep.

I think the vectorizer would need to track masks in a better way,
much like we have the SLP permute optimization phase we'd have a
mask optimization phase.

Richard.

> > Regards
> >  Robin
> > 
> > gcc/ChangeLog:
> > 
> > 	PR middle-end/115336
> > 
> > 	* config/riscv/riscv.cc (riscv_preferred_else_value): Add
> > 	MASK_LOAD.
> > 	* config/riscv/riscv.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO):
> > 	Set to true.
> > 	* defaults.h (TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO): New.
> > 	* doc/tm.texi: Document.
> > 	* doc/tm.texi.in: Document.
> > 	* match.pd: Add patterns to allow replacing the else value of a
> > 	VEC_COND.
> > 	* tree-if-conv.cc (predicate_load_or_store): Emit a VEC_COND
> > 	after a MASK_LOAD if the target does not guarantee zeroing.
> > 	(predicate_statements): Add temporary lhs argument.
> > 	* tree-ssa-math-opts.cc (convert_mult_to_fma_1): Re-fold fold
> > 	result.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 	* gcc.target/riscv/rvv/autovec/pr115336.c: New test.
> > ---
> >  gcc/config/riscv/riscv.cc                     |   2 +
> >  gcc/config/riscv/riscv.h                      |   2 +
> >  gcc/defaults.h                                |   4 +
> >  gcc/doc/tm.texi                               |   5 +
> >  gcc/doc/tm.texi.in                            |   5 +
> >  gcc/match.pd                                  | 125 +++++++++++++++++-
> >  .../gcc.target/riscv/rvv/autovec/pr115336.c   |  20 +++
> >  gcc/tree-if-conv.cc                           |  57 +++++++-
> >  gcc/tree-ssa-math-opts.cc                     |   8 ++
> >  9 files changed, 217 insertions(+), 11 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> > 
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index c17141d909a..e10bc6824b9 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -11481,6 +11481,8 @@ riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops,
> >  {
> >    if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
> >      return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> > +  else if (ifn == IFN_MASK_LOAD)
> > +    return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
> >  
> >    return default_preferred_else_value (ifn, vectype, nops, ops);
> >  }
> > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > index 57910eecd3e..dc15eb5e60f 100644
> > --- a/gcc/config/riscv/riscv.h
> > +++ b/gcc/config/riscv/riscv.h
> > @@ -1264,4 +1264,6 @@ extern void riscv_remove_unneeded_save_restore_calls (void);
> >  /* Check TLS Descriptors mechanism is selected.  */
> >  #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
> >  
> > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1
> > +
> >  #endif /* ! GCC_RISCV_H */
> > diff --git a/gcc/defaults.h b/gcc/defaults.h
> > index 92f3e07f742..6ffbdaea229 100644
> > --- a/gcc/defaults.h
> > +++ b/gcc/defaults.h
> > @@ -1457,6 +1457,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> >  #define DWARF_VERSION_DEFAULT 5
> >  #endif
> >  
> > +#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> > +#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0
> > +#endif
> > +
> >  #ifndef USED_FOR_TARGET
> >  /* Done this way to keep gengtype happy.  */
> >  #if BITS_PER_UNIT == 8
> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> > index be5543b72f8..2fdebe7fd21 100644
> > --- a/gcc/doc/tm.texi
> > +++ b/gcc/doc/tm.texi
> > @@ -12683,6 +12683,11 @@ maintainer is familiar with.
> >  
> >  @end defmac
> >  
> > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> > +Bla
> > +
> > +@end defmac
> > +
> >  @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active})
> >  This hook is used to determine the level of target support for
> >   @code{__builtin_speculation_safe_value}.  If called with an argument
> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> > index 87a7f895174..276c38325dc 100644
> > --- a/gcc/doc/tm.texi.in
> > +++ b/gcc/doc/tm.texi.in
> > @@ -8097,6 +8097,11 @@ maintainer is familiar with.
> >  
> >  @end defmac
> >  
> > +@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
> > +Bla
> > +
> > +@end defmac
> > +
> >  @hook TARGET_HAVE_SPECULATION_SAFE_VALUE
> >  
> >  @hook TARGET_SPECULATION_SAFE_VALUE
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 3d0689c9312..2fdc14ef56f 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -9744,7 +9744,22 @@ and,
> >  #endif
> >  
> >  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> > -   "else" value of an IFN_COND_*.  */
> > +   "else" value of an IFN_COND_* as well as when the IFN_COND_*
> > +   replaces the else of the VEC_COND_EXPR.  */
> > +(for cond_op (COND_UNARY)
> > + (simplify
> > +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 (view_convert:op_type @1) @3)))))
> > +
> > +(for cond_len_op (COND_LEN_UNARY)
> > + (simplify
> > +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5)))))
> > +
> >  (for cond_op (COND_BINARY)
> >   (simplify
> >    (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4)
> > @@ -9756,7 +9771,27 @@ and,
> >    (with { tree op_type = TREE_TYPE (@5); }
> >     (if (inverse_conditions_p (@0, @2)
> >          && element_precision (type) == element_precision (op_type))
> > -    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))))
> > +    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))
> > + (simplify
> > +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 (view_convert:op_type @1) @3 @4))))
> > + (simplify
> > +  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 (view_convert:op_type @2) @4))))
> > + (simplify
> > +  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 (convert:op_type @1) @3 @4))))
> > + (simplify
> > +  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 (convert:op_type @2) @4)))))
> >  
> >  /* Same for ternary operations.  */
> >  (for cond_op (COND_TERNARY)
> > @@ -9770,7 +9805,37 @@ and,
> >    (with { tree op_type = TREE_TYPE (@6); }
> >     (if (inverse_conditions_p (@0, @2)
> >          && element_precision (type) == element_precision (op_type))
> > -    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
> > +    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))
> > + (simplify
> > +  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 (view_convert:op_type @1) @3 @4 @5))))
> > + (simplify
> > +  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 (view_convert:op_type @2) @4 @5))))
> > + (simplify
> > +  (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 @2 (view_convert:op_type @3) @5))))
> > + (simplify
> > +  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 (convert:op_type @1) @3 @4 @5))))
> > + (simplify
> > +  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 (convert:op_type @2) @4 @5))))
> > + (simplify
> > +  (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_op @0 @1 @2 (convert:op_type @3) @5)))))
> >  
> >  /* Detect cases in which a VEC_COND_EXPR effectively replaces the
> >     "else" value of an IFN_COND_LEN_*.  */
> > @@ -9785,7 +9850,27 @@ and,
> >    (with { tree op_type = TREE_TYPE (@5); }
> >     (if (inverse_conditions_p (@0, @2)
> >          && element_precision (type) == element_precision (op_type))
> > -    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
> > +    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))
> > + (simplify
> > +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6))))
> > + (simplify
> > +  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6))))
> > + (simplify
> > +  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6))))
> > + (simplify
> > +  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6)))))
> >  
> >  /* Same for ternary operations.  */
> >  (for cond_len_op (COND_LEN_TERNARY)
> > @@ -9799,7 +9884,37 @@ and,
> >    (with { tree op_type = TREE_TYPE (@6); }
> >     (if (inverse_conditions_p (@0, @2)
> >          && element_precision (type) == element_precision (op_type))
> > -    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))))
> > +    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))
> > + (simplify
> > +  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7))))
> > + (simplify
> > +  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7))))
> > + (simplify
> > +  (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7))))
> > + (simplify
> > +  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@3); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7))))
> > + (simplify
> > +  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7))))
> > + (simplify
> > +  (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7)
> > +  (with { tree op_type = TREE_TYPE (@1); }
> > +   (if (element_precision (type) == element_precision (op_type))
> > +    (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7)))))
> >  
> >  /* Detect simplication for a conditional reduction where
> >  
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> > new file mode 100644
> > index 00000000000..29e55705a7a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d  } } */
> > +
> > +short d[19];
> > +_Bool e[100][19][19];
> > +_Bool f[10000];
> > +
> > +int main()
> > +{
> > +  for (long g = 0; g < 19; ++g)
> > +    d[g] = 3;
> > +  _Bool(*h)[19][19] = e;
> > +  for (short g = 0; g < 9; g++)
> > +    for (int i = 4; i < 16; i += 3)
> > +      f[i * 9 + g] = d[i] ? d[i] : h[g][i][2];
> > +  for (long i = 120; i < 122; ++i)
> > +    __builtin_printf("%d\n", f[i]);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */
> > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> > index 57992b6deca..c0c7013c817 100644
> > --- a/gcc/tree-if-conv.cc
> > +++ b/gcc/tree-if-conv.cc
> > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "tree-vectorizer.h"
> >  #include "tree-eh.h"
> >  #include "cgraph.h"
> > +#include "tree-dfa.h"
> >  
> >  /* For lang_hooks.types.type_for_mode.  */
> >  #include "langhooks.h"
> > @@ -2453,11 +2454,14 @@ mask_exists (int size, const vec<int> &vec)
> >     that does so.  */
> >  
> >  static gimple *
> > -predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask)
> > +predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask,
> > +			 tree override_lhs = NULL_TREE)
> >  {
> >    gcall *new_stmt;
> >  
> > -  tree lhs = gimple_assign_lhs (stmt);
> > +  tree lhs = override_lhs;
> > +  if (!lhs)
> > +    lhs = gimple_assign_lhs (stmt);
> >    tree rhs = gimple_assign_rhs1 (stmt);
> >    tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
> >    mark_addressable (ref);
> > @@ -2789,11 +2793,52 @@ predicate_statements (loop_p loop)
> >  		  vect_masks.safe_push (mask);
> >  		}
> >  	      if (gimple_assign_single_p (stmt))
> > -		new_stmt = predicate_load_or_store (&gsi, stmt, mask);
> > -	      else
> > -		new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
> > +		{
> > +		  bool target_has_else
> > +		    = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO;
> > +		  tree lhs = gimple_get_lhs (stmt);
> > +
> > +		  bool is_load = TREE_CODE (lhs) == SSA_NAME;
> > +
> > +		  gimple_seq stmts2 = NULL;
> > +		  tree tmplhs = is_load && target_has_else
> > +		    ? make_temp_ssa_name (TREE_TYPE (lhs), NULL,
> > +					  "_ifc_") : lhs;
> > +		  gimple *call_stmt
> > +		    = predicate_load_or_store (&gsi, stmt, mask, tmplhs);
> > +
> > +		  gimple_seq_add_stmt (&stmts2, call_stmt);
> > +
> > +		  if (lhs != tmplhs)
> > +		    ssa_names.add (tmplhs);
> >  
> > -	      gsi_replace (&gsi, new_stmt, true);
> > +		  if (is_load && target_has_else)
> > +		    {
> > +		      unsigned nops = gimple_call_num_args (call_stmt);
> > +		      tree *ops = XALLOCAVEC (tree, nops);
> > +
> > +		      for (unsigned i = 0; i < nops; i++)
> > +			ops[i] = gimple_call_arg (call_stmt, i);
> > +
> > +		      tree els_operand
> > +			= targetm.preferred_else_value (IFN_MASK_LOAD,
> > +							TREE_TYPE (lhs),
> > +							nops, ops);
> > +
> > +		      tree rhs
> > +			= fold_build_cond_expr (TREE_TYPE (tmplhs),
> > +						mask, tmplhs, els_operand);
> > +		      gassign *cond_stmt
> > +			= gimple_build_assign (gimple_get_lhs (stmt), rhs);
> > +		      gimple_seq_add_stmt (&stmts2, cond_stmt);
> > +		    }
> > +		  gsi_replace_with_seq (&gsi, stmts2, true);
> > +		}
> > +	      else
> > +		{
> > +		  new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
> > +		  gsi_replace (&gsi, new_stmt, true);
> > +		}
> >  	    }
> >  	  else if (((lhs = gimple_assign_lhs (stmt)), true)
> >  		   && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> > index 57085488722..65e6f91fe8b 100644
> > --- a/gcc/tree-ssa-math-opts.cc
> > +++ b/gcc/tree-ssa-math-opts.cc
> > @@ -3193,6 +3193,14 @@ convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2)
> >        /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
> >  	 regardless of where the negation occurs.  */
> >        gimple *orig_stmt = gsi_stmt (gsi);
> > +      if (fold_stmt (&gsi, follow_all_ssa_edges))
> > +	{
> > +	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
> > +	    gcc_unreachable ();
> > +	  update_stmt (gsi_stmt (gsi));
> > +	}
> > +      /* Fold the result again.  */
> > +      orig_stmt = gsi_stmt (gsi);
> >        if (fold_stmt (&gsi, follow_all_ssa_edges))
> >  	{
> >  	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
> > 
> 
>
Richard Sandiford July 5, 2024, 1:51 p.m. UTC | #3
Robin Dapp <rdapp.gcc@gmail.com> writes:
> Hi,
>
> in PR115336 we have the following
>
>   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
>   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>
> which assumes that a maskload sets the masked-out elements to zero.
> RVV does not guarantee this and, unfortunately, leaves it to the
> hardware implementation to do basically anything it wants (even keep
> the previous value).  In the PR this leads to us reusing a previous
> vector register and stale, nonzero values causing an invalid result.
>
> Based on a Richard Sandiford's feedback I started with the attached
> patch - it's more an RFC in its current shape and obviously not
> tested exhaustively.
>
> The idea is:
>  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>    after a MASK_LOAD if the target requires it.
>  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>    value later.
>
> Is this, generally, reasonable or is there a better way?
>
> My initial idea was to add an else value to MASK_LOAD.  Richard's
> concern was, though, that this might make nonzero else values
> appear inexpensive when they are actually not.

Yeah, but like I said off-list, I still think an else operand is the way
to go.  It would be quite a bit of work to retrofit at this point, but
it feels like the right long-term solution.

FTR, my concern & suggestion was:

  I suppose the difficulty is that we might make:

    MASK_LOAD (mask, ptr, some-arbitrary-else-value)

  seem as cheap as:

    MASK_LOAD (mask, ptr, { 0, 0,. ... 0})

  which definitely isn't the case for SVE (and I'm guessing also
  for AVX).  It would be better to keep the:

    COND_EXPR (mask, ..., some-arbitrary-else-value)

  separate and try to optimise it with surrounding code.

  Perhaps we could use the optab's insn predicates to test whether
  a given else value is acceptable?  We already do something similar
  for the gather load arguments.

Since then, there's been discussion about taking the same
predicated-based approach for vec_cmp.

That is, we'd add the else value to the maskload optab and MASK_LOAD
internal function, but use predicates to test whether a target supports
a particular else value.

Thanks,
Richard
Richard Sandiford July 5, 2024, 1:59 p.m. UTC | #4
Richard Biener <rguenther@suse.de> writes:
> On Fri, 5 Jul 2024, Richard Biener wrote:
>
>> On Fri, 5 Jul 2024, Robin Dapp wrote:
>> 
>> > Hi,
>> > 
>> > in PR115336 we have the following
>> > 
>> >   vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
>> >   vect_iftmp.44 = vect_patt_391 | { 1, ... };
>> >   .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>> > 
>> > which assumes that a maskload sets the masked-out elements to zero.
>> 
>> To me this looks like mis-applying of match.pd:6083?
>> 
>> Applying pattern match.pd:6083, gimple-match-1.cc:45749
>> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
>> new phi replacement stmt
>> iftmp.0_62 = iftmp.0_61 | _219;
>> 
>> so originally it wasn't
>> 
>>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>>   iftmp.0_62 = iftmp.0_61 | _219;
>> 
>> but sth like
>> 
>>   iftmp.0_62 = <mask> ? _219 : iftmp.0_61;
>> 
>> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
>> if-conversion always turn the COND_EXPR into a .COND_IOR here?
>> 
>> > RVV does not guarantee this and, unfortunately, leaves it to the
>> > hardware implementation to do basically anything it wants (even keep
>> > the previous value).  In the PR this leads to us reusing a previous
>> > vector register and stale, nonzero values causing an invalid result.
>> > 
>> > Based on a Richard Sandiford's feedback I started with the attached
>> > patch - it's more an RFC in its current shape and obviously not
>> > tested exhaustively.
>> > 
>> > The idea is:
>> >  - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>> >    after a MASK_LOAD if the target requires it.
>> >  - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>> >    value later.
>> > 
>> > Is this, generally, reasonable or is there a better way?
>> > 
>> > My initial idea was to add an else value to MASK_LOAD.  Richard's
>> > concern was, though, that this might make nonzero else values
>> > appear inexpensive when they are actually not.
>
> In general I'd agree that we want an else value for .MASK_LOAD
> and also possibly for .MASK_STORE (but then we need to represent an
> else value that says do-not-change/merge).

Not sure about MASK_STORE.  AFAICT, an else value for MASK_STORE would
be directly equivalent to:

  tmp = vec_cond_expr <mask, store_value, else_value>;
  *ptr = tmp;

and IMO it'd be better to keep representing it like that instead.

That is, MASK_STORE is changing something that already has a definition
(the target memory), and so there's no ambiguity in having a masked
operation that only changes part of the target.

But for MASK_LOAD and other masked vector-producing functions, we're
creating a new full vector based on a partial vector operation.
There's no existing target to modify, and so we need to get the
definition of the inactive elements from somewhere else.

Thanks,
Richard
Richard Biener July 5, 2024, 2:32 p.m. UTC | #5
> Am 05.07.2024 um 16:00 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> 
> Richard Biener <rguenther@suse.de> writes:
>>> On Fri, 5 Jul 2024, Richard Biener wrote:
>>> 
>>>> On Fri, 5 Jul 2024, Robin Dapp wrote:
>>> 
>>>> Hi,
>>>> 
>>>> in PR115336 we have the following
>>>> 
>>>>  vect_patt_391 = .MASK_LEN_GATHER_LOAD (_470, vect__59, 1, { 0, ... }, { 0, ... }, _482, 0);
>>>>  vect_iftmp.44 = vect_patt_391 | { 1, ... };
>>>>  .MASK_LEN_STORE (vectp_f.45, 8B, { -1, ... }, _482, 0, vect_iftmp.44);
>>>> 
>>>> which assumes that a maskload sets the masked-out elements to zero.
>>> 
>>> To me this looks like mis-applying of match.pd:6083?
>>> 
>>> Applying pattern match.pd:6083, gimple-match-1.cc:45749
>>> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
>>> new phi replacement stmt
>>> iftmp.0_62 = iftmp.0_61 | _219;
>>> 
>>> so originally it wasn't
>>> 
>>>  iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>>>  iftmp.0_62 = iftmp.0_61 | _219;
>>> 
>>> but sth like
>>> 
>>>  iftmp.0_62 = <mask> ? _219 : iftmp.0_61;
>>> 
>>> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
>>> if-conversion always turn the COND_EXPR into a .COND_IOR here?
>>> 
>>>> RVV does not guarantee this and, unfortunately, leaves it to the
>>>> hardware implementation to do basically anything it wants (even keep
>>>> the previous value).  In the PR this leads to us reusing a previous
>>>> vector register and stale, nonzero values causing an invalid result.
>>>> 
>>>> Based on a Richard Sandiford's feedback I started with the attached
>>>> patch - it's more an RFC in its current shape and obviously not
>>>> tested exhaustively.
>>>> 
>>>> The idea is:
>>>> - In ifcvt emit a VCOND_MASK (mask, load_result, preferred_else_value)
>>>>   after a MASK_LOAD if the target requires it.
>>>> - Elide the VCOND_MASK when there is a COND_OP with a replacing else
>>>>   value later.
>>>> 
>>>> Is this, generally, reasonable or is there a better way?
>>>> 
>>>> My initial idea was to add an else value to MASK_LOAD.  Richard's
>>>> concern was, though, that this might make nonzero else values
>>>> appear inexpensive when they are actually not.
>> 
>> In general I'd agree that we want an else value for .MASK_LOAD
>> and also possibly for .MASK_STORE (but then we need to represent an
>> else value that says do-not-change/merge).
> 
> Not sure about MASK_STORE.  AFAICT, an else value for MASK_STORE would
> be directly equivalent to:
> 
>  tmp = vec_cond_expr <mask, store_value, else_value>;
>  *ptr = tmp;
> 
> and IMO it'd be better to keep representing it like that instead.
> 
> That is, MASK_STORE is changing something that already has a definition
> (the target memory), and so there's no ambiguity in having a masked
> operation that only changes part of the target.
> 
> But for MASK_LOAD and other masked vector-producing functions, we're
> creating a new full vector based on a partial vector operation.
> There's no existing target to modify, and so we need to get the
> definition of the inactive elements from somewhere else.

Agreed.  I also think the predicate idea for the else value in MASK_LOAD is good.

Richard 

> Thanks,
> Richard
Robin Dapp July 5, 2024, 3:02 p.m. UTC | #6
> FTR, my concern & suggestion was:
> 
>   I suppose the difficulty is that we might make:
> 
>     MASK_LOAD (mask, ptr, some-arbitrary-else-value)
> 
>   seem as cheap as:
> 
>     MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
> 
>   which definitely isn't the case for SVE (and I'm guessing also
>   for AVX).  It would be better to keep the:
> 
>     COND_EXPR (mask, ..., some-arbitrary-else-value)
> 
>   separate and try to optimise it with surrounding code.

Ah, I rather understood that as an "either or" rather than an "and".

> That is, we'd add the else value to the maskload optab and MASK_LOAD
> internal function, but use predicates to test whether a target supports
> a particular else value.

So e.g. for SVE we'd have

  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
  COND_EXPR (mask, res, {0, 0, ...})

which would always be folded/optimized away, while for RVV
we'd have

  res = MASK_LOAD (mask, ptr, whatever_else_value)
  COND_EXPR (mask, res, {0, 0, ...})

that could/should result in a separate instruction depending on
the rest of the code?

Regards
 Robin
Robin Dapp July 5, 2024, 3:03 p.m. UTC | #7
> To me this looks like mis-applying of match.pd:6083?
> 
> Applying pattern match.pd:6083, gimple-match-1.cc:45749
> gimple_simplified to iftmp.0_62 = iftmp.0_61 | _219;
> new phi replacement stmt
> iftmp.0_62 = iftmp.0_61 | _219;
> 
> so originally it wasn't
> 
>   iftmp.0_61 = .MASK_LOAD (_260, 8B, _259);
>   iftmp.0_62 = iftmp.0_61 | _219;
> 
> but sth like
> 
>   iftmp.0_62 = <mask> ? _219 : iftmp.0_61;
> 
> and iftmp.0_61 is zero-one-valued because it is _Bool.  Shouldn't
> if-conversion always turn the COND_EXPR into a .COND_IOR here?

Yes, just that the mask is inverted in the original COND_EXPR.
Hmm, with COND_IOR we'd explicitly set the else value to the
target's preferred else value then I presume.  That's probably
more elegant than an extra VEC_COND but I could imagine seeing
the same problems of redundant VEC_CONDs not being folded.

Regards
 Robin
Richard Sandiford July 5, 2024, 4:30 p.m. UTC | #8
Robin Dapp <rdapp.gcc@gmail.com> writes:
>> FTR, my concern & suggestion was:
>> 
>>   I suppose the difficulty is that we might make:
>> 
>>     MASK_LOAD (mask, ptr, some-arbitrary-else-value)
>> 
>>   seem as cheap as:
>> 
>>     MASK_LOAD (mask, ptr, { 0, 0,. ... 0})
>> 
>>   which definitely isn't the case for SVE (and I'm guessing also
>>   for AVX).  It would be better to keep the:
>> 
>>     COND_EXPR (mask, ..., some-arbitrary-else-value)
>> 
>>   separate and try to optimise it with surrounding code.
>
> Ah, I rather understood that as an "either or" rather than an "and".

Yeah, could have been clearer, sorry.

>> That is, we'd add the else value to the maskload optab and MASK_LOAD
>> internal function, but use predicates to test whether a target supports
>> a particular else value.
>
> So e.g. for SVE we'd have
>
>   res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
>   COND_EXPR (mask, res, {0, 0, ...})
>
> which would always be folded/optimized away, while for RVV
> we'd have
>
>   res = MASK_LOAD (mask, ptr, whatever_else_value)
>   COND_EXPR (mask, res, {0, 0, ...})
>
> that could/should result in a separate instruction depending on
> the rest of the code?

Yeah, I think so.  I guess for RVV there's a choice between:

(1) making the insn predicate accept all else values and making
    the insn emit an explicit blend between the loaded result
    and the else value

(2) making the insn predicate only accept “undefined” (SCRATCH in
    rtl terms)

(2) sounds more in keeping with Juzhe's fix for PR110751.

Thanks,
Richard
Robin Dapp July 7, 2024, 9:26 a.m. UTC | #9
> Yeah, I think so.  I guess for RVV there's a choice between:
> 
> (1) making the insn predicate accept all else values and making
>     the insn emit an explicit blend between the loaded result
>     and the else value
> 
> (2) making the insn predicate only accept “undefined” (SCRATCH in
>     rtl terms)
> 
> (2) sounds more in keeping with Juzhe's fix for PR110751.


I think (2) is the reasonable choice for the future, however:

I misinterpreted the RVV spec.  The implementation has the choice
between filling with ones or leaving unchanged (merging).  So it's
not really "anything" as I claimed before.
This implies that we could defer to target_preferred_else_value and
eventually, in the target, leave it to the uarch what to return
(either "1" or "undefined").

On the general strategy:
Should we (for SVE, AVX512 etc.) have a match pattern that folds
 
  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
  COND_EXPR (mask, res, {0, 0, ...})

into just MASK_LOAD then?  MASK_LOAD (..., {0, 0, ..., 0}) would not
be emitted for RVV and we'd need to fold the COND_EXPR differently,
maybe with a combination of match patterns similar to the ones in
my RFC as well as a separate "pass" for mask tracking as Richi
suggested?

Regards
 Robin

p.s. We have the choice of different policies for length masking and
element masking.  I don't think the problem arose yet due to the way
we handle length control in the vectorizer but I could imagine, at
least in principle, the need for two separate else values for MASK_LEN.
Richard Biener July 7, 2024, 11:28 a.m. UTC | #10
> Am 07.07.2024 um 11:26 schrieb Robin Dapp <rdapp.gcc@gmail.com>:
> 
> 
>> 
>> Yeah, I think so.  I guess for RVV there's a choice between:
>> 
>> (1) making the insn predicate accept all else values and making
>>    the insn emit an explicit blend between the loaded result
>>    and the else value
>> 
>> (2) making the insn predicate only accept “undefined” (SCRATCH in
>>    rtl terms)
>> 
>> (2) sounds more in keeping with Juzhe's fix for PR110751.
> 
> 
> I think (2) is the reasonable choice for the future, however:
> 
> I misinterpreted the RVV spec.  The implementation has the choice
> between filling with ones or leaving unchanged (merging).  So it's
> not really "anything" as I claimed before.
> This implies that we could defer to target_preferred_else_value and
> eventually, in the target, leave it to the uarch what to return
> (either "1" or "undefined").
> 
> On the general strategy:
> Should we (for SVE, AVX512 etc.) have a match pattern that folds
> 
>  res = MASK_LOAD (mask, ptr, {0, 0, ..., 0})
>  COND_EXPR (mask, res, {0, 0, ...})
> 
> into just MASK_LOAD then?  MASK_LOAD (..., {0, 0, ..., 0}) would not
> be emitted for RVV and we'd need to fold the COND_EXPR differently,
> maybe with a combination of match patterns similar to the ones in
> my RFC as well as a separate "pass" for mask tracking as Richi
> suggested?

I think it’s less than ideal to only do this via match but I think we cannot do without the match patterns if you think of unrolling and simplification exposing the opportunities.

Richard 

> Regards
> Robin
> 
> p.s. We have the choice of different policies for length masking and
> element masking.  I don't think the problem arose yet due to the way
> we handle length control in the vectorizer but I could imagine, at
> least in principle, the need for two separate else values for MASK_LEN.
diff mbox series

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index c17141d909a..e10bc6824b9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -11481,6 +11481,8 @@  riscv_preferred_else_value (unsigned ifn, tree vectype, unsigned int nops,
 {
   if (riscv_v_ext_mode_p (TYPE_MODE (vectype)))
     return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
+  else if (ifn == IFN_MASK_LOAD)
+    return get_or_create_ssa_default_def (cfun, create_tmp_var (vectype));
 
   return default_preferred_else_value (ifn, vectype, nops, ops);
 }
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 57910eecd3e..dc15eb5e60f 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -1264,4 +1264,6 @@  extern void riscv_remove_unneeded_save_restore_calls (void);
 /* Check TLS Descriptors mechanism is selected.  */
 #define TARGET_TLSDESC (riscv_tls_dialect == TLS_DESCRIPTORS)
 
+#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 1
+
 #endif /* ! GCC_RISCV_H */
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 92f3e07f742..6ffbdaea229 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1457,6 +1457,10 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define DWARF_VERSION_DEFAULT 5
 #endif
 
+#ifndef TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
+#define TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO 0
+#endif
+
 #ifndef USED_FOR_TARGET
 /* Done this way to keep gengtype happy.  */
 #if BITS_PER_UNIT == 8
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index be5543b72f8..2fdebe7fd21 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12683,6 +12683,11 @@  maintainer is familiar with.
 
 @end defmac
 
+@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
+Bla
+
+@end defmac
+
 @deftypefn {Target Hook} bool TARGET_HAVE_SPECULATION_SAFE_VALUE (bool @var{active})
 This hook is used to determine the level of target support for
  @code{__builtin_speculation_safe_value}.  If called with an argument
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 87a7f895174..276c38325dc 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8097,6 +8097,11 @@  maintainer is familiar with.
 
 @end defmac
 
+@defmac TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO
+Bla
+
+@end defmac
+
 @hook TARGET_HAVE_SPECULATION_SAFE_VALUE
 
 @hook TARGET_SPECULATION_SAFE_VALUE
diff --git a/gcc/match.pd b/gcc/match.pd
index 3d0689c9312..2fdc14ef56f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -9744,7 +9744,22 @@  and,
 #endif
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
-   "else" value of an IFN_COND_*.  */
+   "else" value of an IFN_COND_* as well as when the IFN_COND_*
+   replaces the else of the VEC_COND_EXPR.  */
+(for cond_op (COND_UNARY)
+ (simplify
+  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 (view_convert:op_type @1) @3)))))
+
+(for cond_len_op (COND_LEN_UNARY)
+ (simplify
+  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5)))))
+
 (for cond_op (COND_BINARY)
  (simplify
   (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3)) @4)
@@ -9756,7 +9771,27 @@  and,
   (with { tree op_type = TREE_TYPE (@5); }
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
-    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1)))))))
+    (view_convert (cond_op @2 @3 @4 (view_convert:op_type @1))))))
+ (simplify
+  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 (view_convert:op_type @1) @3 @4))))
+ (simplify
+  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 (view_convert:op_type @2) @4))))
+ (simplify
+  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 (convert:op_type @1) @3 @4))))
+ (simplify
+  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 (convert:op_type @2) @4)))))
 
 /* Same for ternary operations.  */
 (for cond_op (COND_TERNARY)
@@ -9770,7 +9805,37 @@  and,
   (with { tree op_type = TREE_TYPE (@6); }
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
-    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))))))
+    (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1))))))
+ (simplify
+  (cond_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 (view_convert:op_type @1) @3 @4 @5))))
+ (simplify
+  (cond_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 (view_convert:op_type @2) @4 @5))))
+ (simplify
+  (cond_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 @2 (view_convert:op_type @3) @5))))
+ (simplify
+  (cond_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 (convert:op_type @1) @3 @4 @5))))
+ (simplify
+  (cond_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 (convert:op_type @2) @4 @5))))
+ (simplify
+  (cond_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_op @0 @1 @2 (convert:op_type @3) @5)))))
 
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
    "else" value of an IFN_COND_LEN_*.  */
@@ -9785,7 +9850,27 @@  and,
   (with { tree op_type = TREE_TYPE (@5); }
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
-    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7))))))
+    (view_convert (cond_len_op @2 @3 @4 (view_convert:op_type @1) @6 @7)))))
+ (simplify
+  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6))))
+ (simplify
+  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6))))
+ (simplify
+  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6))))
+ (simplify
+  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6)))))
 
 /* Same for ternary operations.  */
 (for cond_len_op (COND_LEN_TERNARY)
@@ -9799,7 +9884,37 @@  and,
   (with { tree op_type = TREE_TYPE (@6); }
    (if (inverse_conditions_p (@0, @2)
         && element_precision (type) == element_precision (op_type))
-    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8))))))
+    (view_convert (cond_len_op @2 @3 @4 @5 (view_convert:op_type @1) @7 @8)))))
+ (simplify
+  (cond_len_op @0 (view_convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 (view_convert:op_type @1) @3 @4 @5 @6 @7))))
+ (simplify
+  (cond_len_op @0 @1 (view_convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 (view_convert:op_type @2) @4 @5 @6 @7))))
+ (simplify
+  (cond_len_op @0 @1 @2 (view_convert? (vec_cond @0 @3 @4)) @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 @2 (view_convert:op_type @3) @5 @6 @7))))
+ (simplify
+  (cond_len_op @0 (convert? (vec_cond @0 @1 @2)) @3 @4 @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@3); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 (convert:op_type @1) @3 @4 @5 @6 @7))))
+ (simplify
+  (cond_len_op @0 @1 (convert? (vec_cond @0 @2 @3)) @4 @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 (convert:op_type @2) @4 @5 @6 @7))))
+ (simplify
+  (cond_len_op @0 @1 @2 (convert? (vec_cond @0 @3 @4)) @5 @6 @7)
+  (with { tree op_type = TREE_TYPE (@1); }
+   (if (element_precision (type) == element_precision (op_type))
+    (cond_len_op @0 @1 @2 (convert:op_type @3) @5 @6 @7)))))
 
 /* Detect simplication for a conditional reduction where
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
new file mode 100644
index 00000000000..29e55705a7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr115336.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile } */
+/* { dg-options { -O3 -march=rv64gcv_zvl256b -mabi=lp64d  } } */
+
+short d[19];
+_Bool e[100][19][19];
+_Bool f[10000];
+
+int main()
+{
+  for (long g = 0; g < 19; ++g)
+    d[g] = 3;
+  _Bool(*h)[19][19] = e;
+  for (short g = 0; g < 9; g++)
+    for (int i = 4; i < 16; i += 3)
+      f[i * 9 + g] = d[i] ? d[i] : h[g][i][2];
+  for (long i = 120; i < 122; ++i)
+    __builtin_printf("%d\n", f[i]);
+}
+
+/* { dg-final { scan-assembler-times {vmv.v.i\s*v[0-9]+,0} 3 } } */
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 57992b6deca..c0c7013c817 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -124,6 +124,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "tree-eh.h"
 #include "cgraph.h"
+#include "tree-dfa.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -2453,11 +2454,14 @@  mask_exists (int size, const vec<int> &vec)
    that does so.  */
 
 static gimple *
-predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask)
+predicate_load_or_store (gimple_stmt_iterator *gsi, gassign *stmt, tree mask,
+			 tree override_lhs = NULL_TREE)
 {
   gcall *new_stmt;
 
-  tree lhs = gimple_assign_lhs (stmt);
+  tree lhs = override_lhs;
+  if (!lhs)
+    lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
   tree ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
   mark_addressable (ref);
@@ -2789,11 +2793,52 @@  predicate_statements (loop_p loop)
 		  vect_masks.safe_push (mask);
 		}
 	      if (gimple_assign_single_p (stmt))
-		new_stmt = predicate_load_or_store (&gsi, stmt, mask);
-	      else
-		new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
+		{
+		  bool target_has_else
+		    = TARGET_MASK_LOAD_MASKED_MAYBE_NONZERO;
+		  tree lhs = gimple_get_lhs (stmt);
+
+		  bool is_load = TREE_CODE (lhs) == SSA_NAME;
+
+		  gimple_seq stmts2 = NULL;
+		  tree tmplhs = is_load && target_has_else
+		    ? make_temp_ssa_name (TREE_TYPE (lhs), NULL,
+					  "_ifc_") : lhs;
+		  gimple *call_stmt
+		    = predicate_load_or_store (&gsi, stmt, mask, tmplhs);
+
+		  gimple_seq_add_stmt (&stmts2, call_stmt);
+
+		  if (lhs != tmplhs)
+		    ssa_names.add (tmplhs);
 
-	      gsi_replace (&gsi, new_stmt, true);
+		  if (is_load && target_has_else)
+		    {
+		      unsigned nops = gimple_call_num_args (call_stmt);
+		      tree *ops = XALLOCAVEC (tree, nops);
+
+		      for (unsigned i = 0; i < nops; i++)
+			ops[i] = gimple_call_arg (call_stmt, i);
+
+		      tree els_operand
+			= targetm.preferred_else_value (IFN_MASK_LOAD,
+							TREE_TYPE (lhs),
+							nops, ops);
+
+		      tree rhs
+			= fold_build_cond_expr (TREE_TYPE (tmplhs),
+						mask, tmplhs, els_operand);
+		      gassign *cond_stmt
+			= gimple_build_assign (gimple_get_lhs (stmt), rhs);
+		      gimple_seq_add_stmt (&stmts2, cond_stmt);
+		    }
+		  gsi_replace_with_seq (&gsi, stmts2, true);
+		}
+	      else
+		{
+		  new_stmt = predicate_rhs_code (stmt, mask, cond, &ssa_names);
+		  gsi_replace (&gsi, new_stmt, true);
+		}
 	    }
 	  else if (((lhs = gimple_assign_lhs (stmt)), true)
 		   && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 57085488722..65e6f91fe8b 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3193,6 +3193,14 @@  convert_mult_to_fma_1 (tree mul_result, tree op1, tree op2)
       /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
 	 regardless of where the negation occurs.  */
       gimple *orig_stmt = gsi_stmt (gsi);
+      if (fold_stmt (&gsi, follow_all_ssa_edges))
+	{
+	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
+	    gcc_unreachable ();
+	  update_stmt (gsi_stmt (gsi));
+	}
+      /* Fold the result again.  */
+      orig_stmt = gsi_stmt (gsi);
       if (fold_stmt (&gsi, follow_all_ssa_edges))
 	{
 	  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))