diff mbox series

SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

Message ID 8B42CDB5-96A6-43BD-ACF7-C4D04F5C98E5@nvidia.com
State New
Headers show
Series SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types | expand

Commit Message

Jennifer Schmitz Nov. 11, 2024, 3:11 p.m. UTC
As follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
this patch implements folding of svmul and svdiv by -1 to svneg for
unsigned SVE vector types. The key idea is to reuse the existing code that
does this fold for signed types and feed it as callback to a helper function
that adds the necessary type conversions.

For example, for the test case
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  return svmul_n_u64_x (pg, x, -1);
}

the following gimple sequence is emitted (-O2 -mcpu=grace):
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  svuint64_t D.12921;
  svint64_t D.12920;
  svuint64_t D.12919;

  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
  D.12921 = svneg_s64_x (pg, D.12920);
  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
  goto <D.12922>;
  <D.12922>:
  return D.12919;
}

In general, the new helper gimple_folder::convert_and_fold
- takes a target type and a function pointer,
- converts all non-boolean vector types to the target type,
- replaces the converted arguments in the function call,
- calls the callback function,
- adds the necessary view converts to the gimple sequence,
- and returns the new call.

Because all arguments are converted to the same target types, the helper
function is only suitable for folding calls whose arguments are all of
the same type. If necessary, this could be extended to convert the
arguments to different types differentially.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins-base.cc
	(svmul_impl::fold): Wrap code for folding to svneg in lambda
	function and pass to gimple_folder::convert_and_fold to enable
	the transform for unsigned types.
	(svdiv_impl::fold): Likewise.
	* config/aarch64/aarch64-sve-builtins.cc
	(gimple_folder::convert_and_fold): New function that converts
	operands to target type before calling callback function, adding the
	necessary conversion statements.
	* config/aarch64/aarch64-sve-builtins.h
	(gimple_folder::convert_and_fold): Declare function.
	(signed_type_suffix_index): Return type_suffix_index of signed
	vector type for given width.
	(function_instance::signed_type): Return signed vector type for
	given width.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
	outcome.
	* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
	expected outcome.
---
 .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
 gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
 .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
 .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
 .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
 .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
 9 files changed, 180 insertions(+), 50 deletions(-)

Comments

Richard Sandiford Nov. 13, 2024, 11:54 a.m. UTC | #1
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> As follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
> this patch implements folding of svmul and svdiv by -1 to svneg for
> unsigned SVE vector types. The key idea is to reuse the existing code that
> does this fold for signed types and feed it as callback to a helper function
> that adds the necessary type conversions.

I only meant that we should do this for multiplication (since the sign
isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
be right for unsigned division, since unsigned division by the maximum
value is instead equivalent to x == MAX ? MAX : 0.

Some comments on the multiplication bit below:

>
> For example, for the test case
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   return svmul_n_u64_x (pg, x, -1);
> }
>
> the following gimple sequence is emitted (-O2 -mcpu=grace):
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   svuint64_t D.12921;
>   svint64_t D.12920;
>   svuint64_t D.12919;
>
>   D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>   D.12921 = svneg_s64_x (pg, D.12920);
>   D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>   goto <D.12922>;
>   <D.12922>:
>   return D.12919;
> }
>
> In general, the new helper gimple_folder::convert_and_fold
> - takes a target type and a function pointer,
> - converts all non-boolean vector types to the target type,
> - replaces the converted arguments in the function call,
> - calls the callback function,
> - adds the necessary view converts to the gimple sequence,
> - and returns the new call.
>
> Because all arguments are converted to the same target types, the helper
> function is only suitable for folding calls whose arguments are all of
> the same type. If necessary, this could be extended to convert the
> arguments to different types differentially.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-sve-builtins-base.cc
> 	(svmul_impl::fold): Wrap code for folding to svneg in lambda
> 	function and pass to gimple_folder::convert_and_fold to enable
> 	the transform for unsigned types.
> 	(svdiv_impl::fold): Likewise.
> 	* config/aarch64/aarch64-sve-builtins.cc
> 	(gimple_folder::convert_and_fold): New function that converts
> 	operands to target type before calling callback function, adding the
> 	necessary conversion statements.
> 	* config/aarch64/aarch64-sve-builtins.h
> 	(gimple_folder::convert_and_fold): Declare function.
> 	(signed_type_suffix_index): Return type_suffix_index of signed
> 	vector type for given width.
> 	(function_instance::signed_type): Return signed vector type for
> 	given width.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
> 	outcome.
> 	* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
> 	expected outcome.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>  gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>  gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>  .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>  .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>  9 files changed, 180 insertions(+), 50 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 1c9f515a52c..6df14a8f4c4 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> [...]
> @@ -2082,33 +2091,49 @@ public:
>        return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>  
>      /* If one of the operands is all integer -1, fold to svneg.  */
> -    tree pg = gimple_call_arg (f.call, 0);
> -    tree negated_op = NULL;
> -    if (integer_minus_onep (op2))
> -      negated_op = op1;
> -    else if (integer_minus_onep (op1))
> -      negated_op = op2;
> -    if (!f.type_suffix (0).unsigned_p && negated_op)
> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))

Formatting nit, sorry, but: indentation looks off.

>        {
> -	function_instance instance ("svneg", functions::svneg,
> -				    shapes::unary, MODE_none,
> -				    f.type_suffix_ids, GROUP_none, f.pred);
> -	gcall *call = f.redirect_call (instance);
> -	unsigned offset_index = 0;
> -	if (f.pred == PRED_m)
> +	auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>  	  {
> -	    offset_index = 1;
> -	    gimple_call_set_arg (call, 0, op1);
> -	  }
> -	else
> -	  gimple_set_num_ops (call, 5);
> -	gimple_call_set_arg (call, offset_index, pg);
> -	gimple_call_set_arg (call, offset_index + 1, negated_op);
> -	return call;

Rather than having convert_and_fold modify the call in-place (which
would create an incorrectly typed call if we leave the function itself
unchanged), could we make it pass back a "tree" that contains the
(possibly converted) lhs and a "vec<tree> &" that contains the
(possibly converted) arguments?

> +	    tree pg = gimple_call_arg (f.call, 0);
> +	    tree op1 = gimple_call_arg (f.call, 1);
> +	    tree op2 = gimple_call_arg (f.call, 2);
> +	    tree negated_op = op1;
> +	    bool negate_op1 = true;
> +	    if (integer_minus_onep (op1))
> +	      {
> +		negated_op = op2;
> +		negate_op1 = false;
> +	      }
> +	    type_suffix_pair signed_tsp =
> +	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
> +		f.type_suffix_ids[1]};
> +	    function_instance instance ("svneg", functions::svneg,
> +					shapes::unary, MODE_none,
> +					signed_tsp, GROUP_none, f.pred);
> +	    gcall *call = f.redirect_call (instance);
> +	    unsigned offset = 0;
> +	    if (f.pred == PRED_m)
> +	      {
> +		offset = 1;
> +		tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +		tree inactive = negate_op1 ? gimple_call_arg (f.call, 1)
> +		  : build_minus_one_cst (ty);
> +		gimple_call_set_arg (call, 0, inactive);

Isn't the old approach of using op1 as the inactive argument still
correct?

Otherwise it looks good, thanks.

Richard

> +	      }
> +	    else
> +	      gimple_set_num_ops (call, 5);
> +	    gimple_call_set_arg (call, offset, pg);
> +	    gimple_call_set_arg (call, offset + 1, negated_op);
> +	    return call;
> +	  };
> +	tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +	return f.convert_and_fold (ty, mul_by_m1);
>        }
>  
>      /* If one of the operands is a uniform power of 2, fold to a left shift
>         by immediate.  */
> +    tree pg = gimple_call_arg (f.call, 0);
>      tree op1_cst = uniform_integer_cst_p (op1);
>      tree op2_cst = uniform_integer_cst_p (op2);
>      tree shift_op1, shift_op2 = NULL;
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 44b7f6edae5..6523a782dac 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3576,6 +3576,46 @@ gimple_folder::redirect_pred_x ()
>    return redirect_call (instance);
>  }
>  
> +/* Convert all non-boolean vector-type operands to TYPE, fold the call
> +   using callback F, and convert the result back to the original type.
> +   Add the necessary conversion statements. Return the updated call.  */
> +gimple *
> +gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &))
> +{
> +  gcc_assert (VECTOR_TYPE_P (type)
> +	      && TYPE_MODE (type) != VNx16BImode);
> +  tree old_type = TREE_TYPE (lhs);
> +  if (useless_type_conversion_p (type, old_type))
> +    return fp (*this);
> +
> +  unsigned int num_args = gimple_call_num_args (call);
> +  gimple_seq stmts = NULL;
> +  tree op, op_type, t1, t2, t3;
> +  gimple *g;
> +  gcall *new_call;
> +  for (unsigned int i = 0; i < num_args; ++i)
> +    {
> +      op = gimple_call_arg (call, i);
> +      op_type = TREE_TYPE (op);
> +      if (VECTOR_TYPE_P (op_type)
> +	  && TREE_CODE (op) != VECTOR_CST
> +	  && TYPE_MODE (op_type) != VNx16BImode)
> +	{
> +	  t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op);
> +	  gimple_call_set_arg (call, i, t1);
> +	}
> +    }
> +
> +  new_call = fp (*this);
> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  t2 = create_tmp_var (gimple_call_return_type (new_call));
> +  gimple_call_set_lhs (new_call, t2);
> +  t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2);
> +  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3);
> +  gsi_insert_after (gsi, g, GSI_SAME_STMT);
> +  return new_call;
> +}
> +
>  /* Fold the call to constant VAL.  */
>  gimple *
>  gimple_folder::fold_to_cstu (poly_uint64 val)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index d5cc6e0a40d..b48fed0df6b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -406,6 +406,7 @@ public:
>    tree scalar_type (unsigned int) const;
>    tree vector_type (unsigned int) const;
>    tree tuple_type (unsigned int) const;
> +  tree signed_type (unsigned int) const;
>    unsigned int elements_per_vq (unsigned int) const;
>    machine_mode vector_mode (unsigned int) const;
>    machine_mode tuple_mode (unsigned int) const;
> @@ -630,6 +631,7 @@ public:
>  
>    gcall *redirect_call (const function_instance &);
>    gimple *redirect_pred_x ();
> +  gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &));
>  
>    gimple *fold_to_cstu (poly_uint64);
>    gimple *fold_to_pfalse ();
> @@ -860,6 +862,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>    gcc_unreachable ();
>  }
>  
> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
> +inline type_suffix_index
> +signed_type_suffix_index (unsigned int element_bits)
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return TYPE_SUFFIX_s8;
> +  case 16: return TYPE_SUFFIX_s16;
> +  case 32: return TYPE_SUFFIX_s32;
> +  case 64: return TYPE_SUFFIX_s64;
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the single field in tuple type TYPE.  */
>  inline tree
>  tuple_type_field (tree type)
> @@ -1025,6 +1041,20 @@ function_instance::tuple_type (unsigned int i) const
>    return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>  }
>  
> +/* Return the signed vector type of width ELEMENT_BITS.  */
> +inline tree
> +function_instance::signed_type (unsigned int element_bits) const
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the number of elements of type suffix I that fit within a
>     128-bit block.  */
>  inline unsigned int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> index 1e8d6104845..fcadc05d75b 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
>  		z0 = svdiv_n_u32_m (p0, z1, 1),
>  		z0 = svdiv_m (p0, z1, 1))
>  
> +/*
> +** div_m1_u32_m_tied1:
> +**	neg	z0\.s, p0/m, z0\.s
> +**	ret
> +*/
> +TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t,
> +		z0 = svdiv_n_u32_m (p0, z0, -1),
> +		z0 = svdiv_m (p0, z0, -1))
> +
>  /*
>  ** div_2_u32_m_tied1:
>  **	lsr	z0\.s, p0/m, z0\.s, #1
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> index ab049affb63..b95d5af13d8 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
>  		z0 = svdiv_n_u64_m (p0, z1, 1),
>  		z0 = svdiv_m (p0, z1, 1))
>  
> +/*
> +** div_m1_u64_m_tied1:
> +**	neg	z0\.d, p0/m, z0\.d
> +**	ret
> +*/
> +TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t,
> +		z0 = svdiv_n_u64_m (p0, z0, -1),
> +		z0 = svdiv_m (p0, z0, -1))
> +
>  /*
>  ** div_2_u64_m_tied1:
>  **	lsr	z0\.d, p0/m, z0\.d, #1
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> index bdf6fcb98d6..e228dc5995d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.h, p0/m, z0\.h, \1\.h
> +**	neg	z0\.h, p0/m, z0\.h
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_x:
> -**	mul	z0\.h, z0\.h, #-1
> +**	neg	z0\.h, p0/m, z0\.h
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> index a61e85fa12d..e8f52c9d785 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.s, p0/m, z0\.s, \1\.s
> +**	neg	z0\.s, p0/m, z0\.s
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_x:
> -**	mul	z0\.s, z0\.s, #-1
> +**	neg	z0\.s, p0/m, z0\.s
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> index eee1f8a0c99..2ccdc3642c5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.d, p0/m, z0\.d, \1\.d
> +**	neg	z0\.d, p0/m, z0\.d
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>  		z0 = svmul_n_u64_m (p0, z0, -1),
>  		z0 = svmul_m (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_m:
> +**	mov	(z[0-9]+)\.b, #-1
> +**	mov	(z[0-9]+\.d), z0\.d
> +**	movprfx	z0, \1
> +**	neg	z0\.d, p0/m, \2
> +**	ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
> +		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
> +		z0 = svmul_m (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_u64_z_tied1:
>  **	movprfx	z0\.d, p0/z, z0\.d
> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_x:
> -**	mul	z0\.d, z0\.d, #-1
> +**	neg	z0\.d, p0/m, z0\.d
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>  		z0 = svmul_n_u64_x (p0, z0, -1),
>  		z0 = svmul_x (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_x:
> +**	neg	z0\.d, p0/m, z0\.d
> +**	ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
> +		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
> +		z0 = svmul_x (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_m127_u64_x:
>  **	mul	z0\.d, z0\.d, #-127
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> index 06ee1b3e7c8..8e53a4821f0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.b, p0/m, z0\.b, \1\.b
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>  
>  /*
>  ** mul_255_u8_x:
> -**	mul	z0\.b, z0\.b, #-1
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_x:
> -**	mul	z0\.b, z0\.b, #-1
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
Jennifer Schmitz Nov. 20, 2024, 9:06 a.m. UTC | #2
> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul and svdiv by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
> 
> I only meant that we should do this for multiplication (since the sign
> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
> be right for unsigned division, since unsigned division by the maximum
> value is instead equivalent to x == MAX ? MAX : 0.
> 
> Some comments on the multiplication bit below:
> 
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  svuint64_t D.12921;
>>  svint64_t D.12920;
>>  svuint64_t D.12919;
>> 
>>  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>  D.12921 = svneg_s64_x (pg, D.12920);
>>  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>  goto <D.12922>;
>>  <D.12922>:
>>  return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts all non-boolean vector types to the target type,
>> - replaces the converted arguments in the function call,
>> - calls the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>>      * config/aarch64/aarch64-sve-builtins-base.cc
>>      (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>      function and pass to gimple_folder::convert_and_fold to enable
>>      the transform for unsigned types.
>>      (svdiv_impl::fold): Likewise.
>>      * config/aarch64/aarch64-sve-builtins.cc
>>      (gimple_folder::convert_and_fold): New function that converts
>>      operands to target type before calling callback function, adding the
>>      necessary conversion statements.
>>      * config/aarch64/aarch64-sve-builtins.h
>>      (gimple_folder::convert_and_fold): Declare function.
>>      (signed_type_suffix_index): Return type_suffix_index of signed
>>      vector type for given width.
>>      (function_instance::signed_type): Return signed vector type for
>>      given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>      outcome.
>>      * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>      expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 9 files changed, 180 insertions(+), 50 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 1c9f515a52c..6df14a8f4c4 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> [...]
>> @@ -2082,33 +2091,49 @@ public:
>>       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>>     /* If one of the operands is all integer -1, fold to svneg.  */
>> -    tree pg = gimple_call_arg (f.call, 0);
>> -    tree negated_op = NULL;
>> -    if (integer_minus_onep (op2))
>> -      negated_op = op1;
>> -    else if (integer_minus_onep (op1))
>> -      negated_op = op2;
>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
> 
> Formatting nit, sorry, but: indentation looks off.
> 
>>       {
>> -     function_instance instance ("svneg", functions::svneg,
>> -                                 shapes::unary, MODE_none,
>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>> -     gcall *call = f.redirect_call (instance);
>> -     unsigned offset_index = 0;
>> -     if (f.pred == PRED_m)
>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>        {
>> -         offset_index = 1;
>> -         gimple_call_set_arg (call, 0, op1);
>> -       }
>> -     else
>> -       gimple_set_num_ops (call, 5);
>> -     gimple_call_set_arg (call, offset_index, pg);
>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>> -     return call;
> 
> Rather than having convert_and_fold modify the call in-place (which
> would create an incorrectly typed call if we leave the function itself
> unchanged), could we make it pass back a "tree" that contains the
> (possibly converted) lhs and a "vec<tree> &" that contains the
> (possibly converted) arguments?
Dear Richard,
I agree that it's preferable to not have convert_and_fold modify the call in-place and wanted to ask for clarification about which sort of design you had in mind:
Option 1:
tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such that the function takes a target type TYPE to which all non-boolean vector operands are converted and a reference for a vec<tree> that it fills with converted argument trees. It returns a tree with the (possibly converted) lhs. It also adds the convert_view statements, but makes no changes to the call itself.
The caller of the function uses the converted arguments and lhs to assemble the new gcall.

Option 2:
gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, vec<tree>, gimple_folder &), where the function converts the lhs and arguments to TYPE and assigns them to the newly created tree lhs_conv and vec<tree> args_conv that are passed to the function pointer FP. The callback assembles the new call and returns it to convert_and_fold, which adds the necessary conversion statements before returning the new call to the caller. So, in this case it would be the callback modifying the call.

Or yet another option? What are your thoughts on this?
Thanks,
Jennifer
> 
>> +         tree pg = gimple_call_arg (f.call, 0);
>> +         tree op1 = gimple_call_arg (f.call, 1);
>> +         tree op2 = gimple_call_arg (f.call, 2);
>> +         tree negated_op = op1;
>> +         bool negate_op1 = true;
>> +         if (integer_minus_onep (op1))
>> +           {
>> +             negated_op = op2;
>> +             negate_op1 = false;
>> +           }
>> +         type_suffix_pair signed_tsp =
>> +           {signed_type_suffix_index (f.type_suffix (0).element_bits),
>> +             f.type_suffix_ids[1]};
>> +         function_instance instance ("svneg", functions::svneg,
>> +                                     shapes::unary, MODE_none,
>> +                                     signed_tsp, GROUP_none, f.pred);
>> +         gcall *call = f.redirect_call (instance);
>> +         unsigned offset = 0;
>> +         if (f.pred == PRED_m)
>> +           {
>> +             offset = 1;
>> +             tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> +             tree inactive = negate_op1 ? gimple_call_arg (f.call, 1)
>> +               : build_minus_one_cst (ty);
>> +             gimple_call_set_arg (call, 0, inactive);
> 
> Isn't the old approach of using op1 as the inactive argument still
> correct?
> 
> Otherwise it looks good, thanks.
> 
> Richard
> 
>> +           }
>> +         else
>> +           gimple_set_num_ops (call, 5);
>> +         gimple_call_set_arg (call, offset, pg);
>> +         gimple_call_set_arg (call, offset + 1, negated_op);
>> +         return call;
>> +       };
>> +     tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> +     return f.convert_and_fold (ty, mul_by_m1);
>>       }
>> 
>>     /* If one of the operands is a uniform power of 2, fold to a left shift
>>        by immediate.  */
>> +    tree pg = gimple_call_arg (f.call, 0);
>>     tree op1_cst = uniform_integer_cst_p (op1);
>>     tree op2_cst = uniform_integer_cst_p (op2);
>>     tree shift_op1, shift_op2 = NULL;
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 44b7f6edae5..6523a782dac 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3576,6 +3576,46 @@ gimple_folder::redirect_pred_x ()
>>   return redirect_call (instance);
>> }
>> 
>> +/* Convert all non-boolean vector-type operands to TYPE, fold the call
>> +   using callback F, and convert the result back to the original type.
>> +   Add the necessary conversion statements. Return the updated call.  */
>> +gimple *
>> +gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &))
>> +{
>> +  gcc_assert (VECTOR_TYPE_P (type)
>> +           && TYPE_MODE (type) != VNx16BImode);
>> +  tree old_type = TREE_TYPE (lhs);
>> +  if (useless_type_conversion_p (type, old_type))
>> +    return fp (*this);
>> +
>> +  unsigned int num_args = gimple_call_num_args (call);
>> +  gimple_seq stmts = NULL;
>> +  tree op, op_type, t1, t2, t3;
>> +  gimple *g;
>> +  gcall *new_call;
>> +  for (unsigned int i = 0; i < num_args; ++i)
>> +    {
>> +      op = gimple_call_arg (call, i);
>> +      op_type = TREE_TYPE (op);
>> +      if (VECTOR_TYPE_P (op_type)
>> +       && TREE_CODE (op) != VECTOR_CST
>> +       && TYPE_MODE (op_type) != VNx16BImode)
>> +     {
>> +       t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op);
>> +       gimple_call_set_arg (call, i, t1);
>> +     }
>> +    }
>> +
>> +  new_call = fp (*this);
>> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +  t2 = create_tmp_var (gimple_call_return_type (new_call));
>> +  gimple_call_set_lhs (new_call, t2);
>> +  t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2);
>> +  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3);
>> +  gsi_insert_after (gsi, g, GSI_SAME_STMT);
>> +  return new_call;
>> +}
>> +
>> /* Fold the call to constant VAL.  */
>> gimple *
>> gimple_folder::fold_to_cstu (poly_uint64 val)
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index d5cc6e0a40d..b48fed0df6b 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -406,6 +406,7 @@ public:
>>   tree scalar_type (unsigned int) const;
>>   tree vector_type (unsigned int) const;
>>   tree tuple_type (unsigned int) const;
>> +  tree signed_type (unsigned int) const;
>>   unsigned int elements_per_vq (unsigned int) const;
>>   machine_mode vector_mode (unsigned int) const;
>>   machine_mode tuple_mode (unsigned int) const;
>> @@ -630,6 +631,7 @@ public:
>> 
>>   gcall *redirect_call (const function_instance &);
>>   gimple *redirect_pred_x ();
>> +  gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &));
>> 
>>   gimple *fold_to_cstu (poly_uint64);
>>   gimple *fold_to_pfalse ();
>> @@ -860,6 +862,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>>   gcc_unreachable ();
>> }
>> 
>> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
>> +inline type_suffix_index
>> +signed_type_suffix_index (unsigned int element_bits)
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return TYPE_SUFFIX_s8;
>> +  case 16: return TYPE_SUFFIX_s16;
>> +  case 32: return TYPE_SUFFIX_s32;
>> +  case 64: return TYPE_SUFFIX_s64;
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the single field in tuple type TYPE.  */
>> inline tree
>> tuple_type_field (tree type)
>> @@ -1025,6 +1041,20 @@ function_instance::tuple_type (unsigned int i) const
>>   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>> }
>> 
>> +/* Return the signed vector type of width ELEMENT_BITS.  */
>> +inline tree
>> +function_instance::signed_type (unsigned int element_bits) const
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
>> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
>> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
>> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the number of elements of type suffix I that fit within a
>>    128-bit block.  */
>> inline unsigned int
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
>> index 1e8d6104845..fcadc05d75b 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
>> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
>>              z0 = svdiv_n_u32_m (p0, z1, 1),
>>              z0 = svdiv_m (p0, z1, 1))
>> 
>> +/*
>> +** div_m1_u32_m_tied1:
>> +**   neg     z0\.s, p0/m, z0\.s
>> +**   ret
>> +*/
>> +TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t,
>> +             z0 = svdiv_n_u32_m (p0, z0, -1),
>> +             z0 = svdiv_m (p0, z0, -1))
>> +
>> /*
>> ** div_2_u32_m_tied1:
>> **   lsr     z0\.s, p0/m, z0\.s, #1
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
>> index ab049affb63..b95d5af13d8 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
>> @@ -72,6 +72,15 @@ TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
>>              z0 = svdiv_n_u64_m (p0, z1, 1),
>>              z0 = svdiv_m (p0, z1, 1))
>> 
>> +/*
>> +** div_m1_u64_m_tied1:
>> +**   neg     z0\.d, p0/m, z0\.d
>> +**   ret
>> +*/
>> +TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t,
>> +             z0 = svdiv_n_u64_m (p0, z0, -1),
>> +             z0 = svdiv_m (p0, z0, -1))
>> +
>> /*
>> ** div_2_u64_m_tied1:
>> **   lsr     z0\.d, p0/m, z0\.d, #1
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> index bdf6fcb98d6..e228dc5995d 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_m:
>> -**   mov     (z[0-9]+)\.b, #-1
>> -**   mul     z0\.h, p0/m, z0\.h, \1\.h
>> +**   neg     z0\.h, p0/m, z0\.h
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_x:
>> -**   mul     z0\.h, z0\.h, #-1
>> +**   neg     z0\.h, p0/m, z0\.h
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> index a61e85fa12d..e8f52c9d785 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_m:
>> -**   mov     (z[0-9]+)\.b, #-1
>> -**   mul     z0\.s, p0/m, z0\.s, \1\.s
>> +**   neg     z0\.s, p0/m, z0\.s
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_x:
>> -**   mul     z0\.s, z0\.s, #-1
>> +**   neg     z0\.s, p0/m, z0\.s
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> index eee1f8a0c99..2ccdc3642c5 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_m:
>> -**   mov     (z[0-9]+)\.b, #-1
>> -**   mul     z0\.d, p0/m, z0\.d, \1\.d
>> +**   neg     z0\.d, p0/m, z0\.d
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>>              z0 = svmul_n_u64_m (p0, z0, -1),
>>              z0 = svmul_m (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_m:
>> +**   mov     (z[0-9]+)\.b, #-1
>> +**   mov     (z[0-9]+\.d), z0\.d
>> +**   movprfx z0, \1
>> +**   neg     z0\.d, p0/m, \2
>> +**   ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
>> +             z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
>> +             z0 = svmul_m (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_u64_z_tied1:
>> **   movprfx z0\.d, p0/z, z0\.d
>> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_x:
>> -**   mul     z0\.d, z0\.d, #-1
>> +**   neg     z0\.d, p0/m, z0\.d
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>>              z0 = svmul_n_u64_x (p0, z0, -1),
>>              z0 = svmul_x (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_x:
>> +**   neg     z0\.d, p0/m, z0\.d
>> +**   ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
>> +             z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
>> +             z0 = svmul_x (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_m127_u64_x:
>> **   mul     z0\.d, z0\.d, #-127
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> index 06ee1b3e7c8..8e53a4821f0 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_m:
>> -**   mov     (z[0-9]+)\.b, #-1
>> -**   mul     z0\.b, p0/m, z0\.b, \1\.b
>> +**   neg     z0\.b, p0/m, z0\.b
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
>> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_255_u8_x:
>> -**   mul     z0\.b, z0\.b, #-1
>> +**   neg     z0\.b, p0/m, z0\.b
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_x:
>> -**   mul     z0\.b, z0\.b, #-1
>> +**   neg     z0\.b, p0/m, z0\.b
>> **   ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
Richard Sandiford Nov. 20, 2024, 12:43 p.m. UTC | #3
Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>> As follow-up to
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>> does this fold for signed types and feed it as callback to a helper function
>>> that adds the necessary type conversions.
>> 
>> I only meant that we should do this for multiplication (since the sign
>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>> be right for unsigned division, since unsigned division by the maximum
>> value is instead equivalent to x == MAX ? MAX : 0.
>> 
>> Some comments on the multiplication bit below:
>> 
>>> 
>>> For example, for the test case
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  return svmul_n_u64_x (pg, x, -1);
>>> }
>>> 
>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>> {
>>>  svuint64_t D.12921;
>>>  svint64_t D.12920;
>>>  svuint64_t D.12919;
>>> 
>>>  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>  D.12921 = svneg_s64_x (pg, D.12920);
>>>  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>  goto <D.12922>;
>>>  <D.12922>:
>>>  return D.12919;
>>> }
>>> 
>>> In general, the new helper gimple_folder::convert_and_fold
>>> - takes a target type and a function pointer,
>>> - converts all non-boolean vector types to the target type,
>>> - replaces the converted arguments in the function call,
>>> - calls the callback function,
>>> - adds the necessary view converts to the gimple sequence,
>>> - and returns the new call.
>>> 
>>> Because all arguments are converted to the same target types, the helper
>>> function is only suitable for folding calls whose arguments are all of
>>> the same type. If necessary, this could be extended to convert the
>>> arguments to different types differentially.
>>> 
>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>> OK for mainline?
>>> 
>>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>> 
>>> gcc/ChangeLog:
>>> 
>>>      * config/aarch64/aarch64-sve-builtins-base.cc
>>>      (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>      function and pass to gimple_folder::convert_and_fold to enable
>>>      the transform for unsigned types.
>>>      (svdiv_impl::fold): Likewise.
>>>      * config/aarch64/aarch64-sve-builtins.cc
>>>      (gimple_folder::convert_and_fold): New function that converts
>>>      operands to target type before calling callback function, adding the
>>>      necessary conversion statements.
>>>      * config/aarch64/aarch64-sve-builtins.h
>>>      (gimple_folder::convert_and_fold): Declare function.
>>>      (signed_type_suffix_index): Return type_suffix_index of signed
>>>      vector type for given width.
>>>      (function_instance::signed_type): Return signed vector type for
>>>      given width.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>>      * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>      outcome.
>>>      * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>      * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>      expected outcome.
>>> ---
>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>> 
>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> index 1c9f515a52c..6df14a8f4c4 100644
>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>> [...]
>>> @@ -2082,33 +2091,49 @@ public:
>>>       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>> 
>>>     /* If one of the operands is all integer -1, fold to svneg.  */
>>> -    tree pg = gimple_call_arg (f.call, 0);
>>> -    tree negated_op = NULL;
>>> -    if (integer_minus_onep (op2))
>>> -      negated_op = op1;
>>> -    else if (integer_minus_onep (op1))
>>> -      negated_op = op2;
>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>> 
>> Formatting nit, sorry, but: indentation looks off.
>> 
>>>       {
>>> -     function_instance instance ("svneg", functions::svneg,
>>> -                                 shapes::unary, MODE_none,
>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>> -     gcall *call = f.redirect_call (instance);
>>> -     unsigned offset_index = 0;
>>> -     if (f.pred == PRED_m)
>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>        {
>>> -         offset_index = 1;
>>> -         gimple_call_set_arg (call, 0, op1);
>>> -       }
>>> -     else
>>> -       gimple_set_num_ops (call, 5);
>>> -     gimple_call_set_arg (call, offset_index, pg);
>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>> -     return call;
>> 
>> Rather than having convert_and_fold modify the call in-place (which
>> would create an incorrectly typed call if we leave the function itself
>> unchanged), could we make it pass back a "tree" that contains the
>> (possibly converted) lhs and a "vec<tree> &" that contains the
>> (possibly converted) arguments?
> Dear Richard,
> I agree that it's preferable to not have convert_and_fold modify the call in-place and wanted to ask for clarification about which sort of design you had in mind:
> Option 1:
> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such that the function takes a target type TYPE to which all non-boolean vector operands are converted and a reference for a vec<tree> that it fills with converted argument trees. It returns a tree with the (possibly converted) lhs. It also adds the convert_view statements, but makes no changes to the call itself.
> The caller of the function uses the converted arguments and lhs to assemble the new gcall.
>
> Option 2:
> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, vec<tree>, gimple_folder &), where the function converts the lhs and arguments to TYPE and assigns them to the newly created tree lhs_conv and vec<tree> args_conv that are passed to the function pointer FP. The callback assembles the new call and returns it to convert_and_fold, which adds the necessary conversion statements before returning the new call to the caller. So, in this case it would be the callback modifying the call.

Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
I suppose the callback should return a gimple * rather than a gcall *
though, in case the callback wants to create a gassign instead.

(Thanks for checking btw)

Richard
Jennifer Schmitz Nov. 26, 2024, 8:18 a.m. UTC | #4
> On 20 Nov 2024, at 13:43, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>>> As follow-up to
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>>> does this fold for signed types and feed it as callback to a helper function
>>>> that adds the necessary type conversions.
>>> 
>>> I only meant that we should do this for multiplication (since the sign
>>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>>> be right for unsigned division, since unsigned division by the maximum
>>> value is instead equivalent to x == MAX ? MAX : 0.
>>> 
>>> Some comments on the multiplication bit below:
>>> 
>>>> 
>>>> For example, for the test case
>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>> {
>>>> return svmul_n_u64_x (pg, x, -1);
>>>> }
>>>> 
>>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>> {
>>>> svuint64_t D.12921;
>>>> svint64_t D.12920;
>>>> svuint64_t D.12919;
>>>> 
>>>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>> D.12921 = svneg_s64_x (pg, D.12920);
>>>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>> goto <D.12922>;
>>>> <D.12922>:
>>>> return D.12919;
>>>> }
>>>> 
>>>> In general, the new helper gimple_folder::convert_and_fold
>>>> - takes a target type and a function pointer,
>>>> - converts all non-boolean vector types to the target type,
>>>> - replaces the converted arguments in the function call,
>>>> - calls the callback function,
>>>> - adds the necessary view converts to the gimple sequence,
>>>> - and returns the new call.
>>>> 
>>>> Because all arguments are converted to the same target types, the helper
>>>> function is only suitable for folding calls whose arguments are all of
>>>> the same type. If necessary, this could be extended to convert the
>>>> arguments to different types differentially.
>>>> 
>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>>> OK for mainline?
>>>> 
>>>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>>> 
>>>> gcc/ChangeLog:
>>>> 
>>>>     * config/aarch64/aarch64-sve-builtins-base.cc
>>>>     (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>>     function and pass to gimple_folder::convert_and_fold to enable
>>>>     the transform for unsigned types.
>>>>     (svdiv_impl::fold): Likewise.
>>>>     * config/aarch64/aarch64-sve-builtins.cc
>>>>     (gimple_folder::convert_and_fold): New function that converts
>>>>     operands to target type before calling callback function, adding the
>>>>     necessary conversion statements.
>>>>     * config/aarch64/aarch64-sve-builtins.h
>>>>     (gimple_folder::convert_and_fold): Declare function.
>>>>     (signed_type_suffix_index): Return type_suffix_index of signed
>>>>     vector type for given width.
>>>>     (function_instance::signed_type): Return signed vector type for
>>>>     given width.
>>>> 
>>>> gcc/testsuite/ChangeLog:
>>>> 
>>>>     * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>>     outcome.
>>>>     * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>>     * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>>     expected outcome.
>>>> ---
>>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>>> 
>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> index 1c9f515a52c..6df14a8f4c4 100644
>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>> [...]
>>>> @@ -2082,33 +2091,49 @@ public:
>>>>      return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>>> 
>>>>    /* If one of the operands is all integer -1, fold to svneg.  */
>>>> -    tree pg = gimple_call_arg (f.call, 0);
>>>> -    tree negated_op = NULL;
>>>> -    if (integer_minus_onep (op2))
>>>> -      negated_op = op1;
>>>> -    else if (integer_minus_onep (op1))
>>>> -      negated_op = op2;
>>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>> 
>>> Formatting nit, sorry, but: indentation looks off.
>>> 
>>>>      {
>>>> -     function_instance instance ("svneg", functions::svneg,
>>>> -                                 shapes::unary, MODE_none,
>>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>>> -     gcall *call = f.redirect_call (instance);
>>>> -     unsigned offset_index = 0;
>>>> -     if (f.pred == PRED_m)
>>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>>       {
>>>> -         offset_index = 1;
>>>> -         gimple_call_set_arg (call, 0, op1);
>>>> -       }
>>>> -     else
>>>> -       gimple_set_num_ops (call, 5);
>>>> -     gimple_call_set_arg (call, offset_index, pg);
>>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>>> -     return call;
>>> 
>>> Rather than having convert_and_fold modify the call in-place (which
>>> would create an incorrectly typed call if we leave the function itself
>>> unchanged), could we make it pass back a "tree" that contains the
>>> (possibly converted) lhs and a "vec<tree> &" that contains the
>>> (possibly converted) arguments?
>> Dear Richard,
>> I agree that it's preferable to not have convert_and_fold modify the call in-place and wanted to ask for clarification about which sort of design you had in mind:
>> Option 1:
>> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such that the function takes a target type TYPE to which all non-boolean vector operands are converted and a reference for a vec<tree> that it fills with converted argument trees. It returns a tree with the (possibly converted) lhs. It also adds the convert_view statements, but makes no changes to the call itself.
>> The caller of the function uses the converted arguments and lhs to assemble the new gcall.
>> 
>> Option 2:
>> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, vec<tree>, gimple_folder &), where the function converts the lhs and arguments to TYPE and assigns them to the newly created tree lhs_conv and vec<tree> args_conv that are passed to the function pointer FP. The callback assembles the new call and returns it to convert_and_fold, which adds the necessary conversion statements before returning the new call to the caller. So, in this case it would be the callback modifying the call.
> 
> Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
> I suppose the callback should return a gimple * rather than a gcall *
> though, in case the callback wants to create a gassign instead.
> 
> (Thanks for checking btw)
Thanks for clarifying. Below you find the updated patch. I made the following changes:
- removed folding of svdiv and reverted the test cases
- pass lhs_conv and args_conv to callback instead of having convert_and_fold change the call inplace
- re-validated, no regression.
Thanks,
Jennifer


As follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
this patch implements folding of svmul by -1 to svneg for
unsigned SVE vector types. The key idea is to reuse the existing code that
does this fold for signed types and feed it as callback to a helper function
that adds the necessary type conversions.

For example, for the test case
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  return svmul_n_u64_x (pg, x, -1);
}

the following gimple sequence is emitted (-O2 -mcpu=grace):
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  svint64_t D.12921;
  svint64_t D.12920;
  svuint64_t D.12919;

  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
  D.12921 = svneg_s64_x (pg, D.12920);
  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
  goto <D.12922>;
  <D.12922>:
  return D.12919;
}

In general, the new helper gimple_folder::convert_and_fold
- takes a target type and a function pointer,
- converts the lhs and all non-boolean vector types to the target type,
- passes the converted lhs and arguments to the callback,
- receives the new gimple statement from the callback function,
- adds the necessary view converts to the gimple sequence,
- and returns the new call.

Because all arguments are converted to the same target types, the helper
function is only suitable for folding calls whose arguments are all of
the same type. If necessary, this could be extended to convert the
arguments to different types differentially.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins-base.cc
	(svmul_impl::fold): Wrap code for folding to svneg in lambda
	function and pass to gimple_folder::convert_and_fold to enable
	the transform for unsigned types.
	* config/aarch64/aarch64-sve-builtins.cc
	(gimple_folder::convert_and_fold): New function that converts
	operands to target type before calling callback function, adding the
	necessary conversion statements.
	* config/aarch64/aarch64-sve-builtins.h
	(gimple_folder::convert_and_fold): Declare function.
	(signed_type_suffix_index): Return type_suffix_index of signed
	vector type for given width.
	(function_instance::signed_type): Return signed vector type for
	given width.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
	expected outcome.
---
 .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
 gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
 .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
 .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
 7 files changed, 153 insertions(+), 34 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 87e9909b55a..52401a8c57a 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2092,33 +2092,61 @@ public:
       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
     /* If one of the operands is all integer -1, fold to svneg.  */
-    tree pg = gimple_call_arg (f.call, 0);
-    tree negated_op = NULL;
-    if (integer_minus_onep (op2))
-      negated_op = op1;
-    else if (integer_minus_onep (op1))
-      negated_op = op2;
-    if (!f.type_suffix (0).unsigned_p && negated_op)
+    if (integer_minus_onep (op1) || integer_minus_onep (op2))
       {
-	function_instance instance ("svneg", functions::svneg,
-				    shapes::unary, MODE_none,
-				    f.type_suffix_ids, GROUP_none, f.pred);
-	gcall *call = f.redirect_call (instance);
-	unsigned offset_index = 0;
-	if (f.pred == PRED_m)
+	auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
+			    vec<tree> &args_conv) -> gimple *
 	  {
-	    offset_index = 1;
-	    gimple_call_set_arg (call, 0, op1);
-	  }
-	else
-	  gimple_set_num_ops (call, 5);
-	gimple_call_set_arg (call, offset_index, pg);
-	gimple_call_set_arg (call, offset_index + 1, negated_op);
-	return call;
+	    gcc_assert (lhs_conv && args_conv.length () == 3);
+	    tree pg = args_conv[0];
+	    tree op1 = args_conv[1];
+	    tree op2 = args_conv[2];
+	    tree negated_op = op1;
+	    bool negate_op1 = true;
+	    if (integer_minus_onep (op1))
+	      {
+		negated_op = op2;
+		negate_op1 = false;
+	      }
+	    type_suffix_pair signed_tsp =
+	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
+		f.type_suffix_ids[1]};
+	    function_instance instance ("svneg", functions::svneg,
+					shapes::unary, MODE_none,
+					signed_tsp, GROUP_none, f.pred);
+	    gcall *call = f.redirect_call (instance);
+	    gimple_call_set_lhs (call, lhs_conv);
+	    unsigned offset = 0;
+	    tree fntype, op1_type = TREE_TYPE (op1);
+	    if (f.pred == PRED_m)
+	      {
+		offset = 1;
+		tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
+		fntype = build_function_type_array (TREE_TYPE (lhs_conv),
+						    3, arg_types);
+		tree ty = f.signed_type (f.type_suffix (0).element_bits);
+		tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);
+		gimple_call_set_arg (call, 0, inactive);
+	      }
+	    else
+	      {
+		gimple_set_num_ops (call, 5);
+		tree arg_types[2] = {TREE_TYPE (pg), op1_type};
+		fntype = build_function_type_array (TREE_TYPE (lhs_conv),
+						    2, arg_types);
+	      }
+	    gimple_call_set_fntype (call, fntype);
+	    gimple_call_set_arg (call, offset, pg);
+	    gimple_call_set_arg (call, offset + 1, negated_op);
+	    return call;
+	  };
+	tree ty = f.signed_type (f.type_suffix (0).element_bits);
+	return f.convert_and_fold (ty, mul_by_m1);
       }
 
     /* If one of the operands is a uniform power of 2, fold to a left shift
        by immediate.  */
+    tree pg = gimple_call_arg (f.call, 0);
     tree op1_cst = uniform_integer_cst_p (op1);
     tree op2_cst = uniform_integer_cst_p (op2);
     tree shift_op1, shift_op2 = NULL;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 0fec1cd439e..01b0da22c9b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
   return redirect_call (instance);
 }
 
+/* Convert the lhs and all non-boolean vector-type operands to TYPE.
+   Pass the converted variables to the callback FP, and finally convert the
+   result back to the original type. Add the necessary conversion statements.
+   Return the new call.  */
+gimple *
+gimple_folder::convert_and_fold (tree type,
+				 gimple *(*fp) (gimple_folder &,
+						tree, vec<tree> &))
+{
+  gcc_assert (VECTOR_TYPE_P (type)
+	      && TYPE_MODE (type) != VNx16BImode);
+  tree old_ty = TREE_TYPE (lhs);
+  gimple_seq stmts = NULL;
+  tree lhs_conv, op, op_ty, t;
+  gimple *g, *new_stmt;
+  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
+  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
+  unsigned int num_args = gimple_call_num_args (call);
+  vec<tree> args_conv = vNULL;
+  args_conv.safe_grow (num_args);
+  for (unsigned int i = 0; i < num_args; ++i)
+    {
+      op = gimple_call_arg (call, i);
+      op_ty = TREE_TYPE (op);
+      args_conv[i] =
+	(VECTOR_TYPE_P (op_ty)
+	 && TREE_CODE (op) != VECTOR_CST
+	 && TYPE_MODE (op_ty) != VNx16BImode
+	 && !useless_type_conversion_p (op_ty, type))
+	? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
+    }
+
+  new_stmt = fp (*this, lhs_conv, args_conv);
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+  if (convert_lhs_p)
+    {
+      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
+      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
+      gsi_insert_after (gsi, g, GSI_SAME_STMT);
+    }
+  return new_stmt;
+}
+
 /* Fold the call to constant VAL.  */
 gimple *
 gimple_folder::fold_to_cstu (poly_uint64 val)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 4094f8207f9..3e919e52e2c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -406,6 +406,7 @@ public:
   tree scalar_type (unsigned int) const;
   tree vector_type (unsigned int) const;
   tree tuple_type (unsigned int) const;
+  tree signed_type (unsigned int) const;
   unsigned int elements_per_vq (unsigned int) const;
   machine_mode vector_mode (unsigned int) const;
   machine_mode tuple_mode (unsigned int) const;
@@ -632,6 +633,8 @@ public:
 
   gcall *redirect_call (const function_instance &);
   gimple *redirect_pred_x ();
+  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
+					       tree, vec<tree> &));
 
   gimple *fold_to_cstu (poly_uint64);
   gimple *fold_to_pfalse ();
@@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
   gcc_unreachable ();
 }
 
+/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
+inline type_suffix_index
+signed_type_suffix_index (unsigned int element_bits)
+{
+  switch (element_bits)
+  {
+  case 8: return TYPE_SUFFIX_s8;
+  case 16: return TYPE_SUFFIX_s16;
+  case 32: return TYPE_SUFFIX_s32;
+  case 64: return TYPE_SUFFIX_s64;
+  }
+  gcc_unreachable ();
+}
+
 /* Return the single field in tuple type TYPE.  */
 inline tree
 tuple_type_field (tree type)
@@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const
   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
 }
 
+/* Return the signed vector type of width ELEMENT_BITS.  */
+inline tree
+function_instance::signed_type (unsigned int element_bits) const
+{
+  switch (element_bits)
+  {
+  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
+  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
+  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
+  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
+  }
+  gcc_unreachable ();
+}
+
 /* Return the number of elements of type suffix I that fit within a
    128-bit block.  */
 inline unsigned int
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index bdf6fcb98d6..e228dc5995d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
 
 /*
 ** mul_m1_u16_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.h, p0/m, z0\.h, \1\.h
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
 
 /*
 ** mul_m1_u16_x:
-**	mul	z0\.h, z0\.h, #-1
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
index a61e85fa12d..e8f52c9d785 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
 
 /*
 ** mul_m1_u32_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.s, p0/m, z0\.s, \1\.s
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
 
 /*
 ** mul_m1_u32_x:
-**	mul	z0\.s, z0\.s, #-1
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
index eee1f8a0c99..2ccdc3642c5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
@@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
 
 /*
 ** mul_m1_u64_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.d, p0/m, z0\.d, \1\.d
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
 		z0 = svmul_n_u64_m (p0, z0, -1),
 		z0 = svmul_m (p0, z0, -1))
 
+/*
+** mul_m1r_u64_m:
+**	mov	(z[0-9]+)\.b, #-1
+**	mov	(z[0-9]+\.d), z0\.d
+**	movprfx	z0, \1
+**	neg	z0\.d, p0/m, \2
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
+		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
+		z0 = svmul_m (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_u64_z_tied1:
 **	movprfx	z0\.d, p0/z, z0\.d
@@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
 
 /*
 ** mul_m1_u64_x:
-**	mul	z0\.d, z0\.d, #-1
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
 		z0 = svmul_n_u64_x (p0, z0, -1),
 		z0 = svmul_x (p0, z0, -1))
 
+/*
+** mul_m1r_u64_x:
+**	neg	z0\.d, p0/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
+		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
+		z0 = svmul_x (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_m127_u64_x:
 **	mul	z0\.d, z0\.d, #-127
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index 06ee1b3e7c8..8e53a4821f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
 
 /*
 ** mul_m1_u8_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.b, p0/m, z0\.b, \1\.b
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
@@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
 
 /*
 ** mul_255_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
@@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
 
 /*
 ** mul_m1_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
Jennifer Schmitz Dec. 3, 2024, 8:14 a.m. UTC | #5
Ping.
Thanks,
Jennifer

> On 26 Nov 2024, at 09:18, Jennifer Schmitz <jschmitz@nvidia.com> wrote:
> 
> 
> 
>> On 20 Nov 2024, at 13:43, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>> 
>>>> External email: Use caution opening links or attachments
>>>> 
>>>> 
>>>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>>>> As follow-up to
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>>>> does this fold for signed types and feed it as callback to a helper function
>>>>> that adds the necessary type conversions.
>>>> 
>>>> I only meant that we should do this for multiplication (since the sign
>>>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>>>> be right for unsigned division, since unsigned division by the maximum
>>>> value is instead equivalent to x == MAX ? MAX : 0.
>>>> 
>>>> Some comments on the multiplication bit below:
>>>> 
>>>>> 
>>>>> For example, for the test case
>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>> {
>>>>> return svmul_n_u64_x (pg, x, -1);
>>>>> }
>>>>> 
>>>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>> {
>>>>> svuint64_t D.12921;
>>>>> svint64_t D.12920;
>>>>> svuint64_t D.12919;
>>>>> 
>>>>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>>> D.12921 = svneg_s64_x (pg, D.12920);
>>>>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>>> goto <D.12922>;
>>>>> <D.12922>:
>>>>> return D.12919;
>>>>> }
>>>>> 
>>>>> In general, the new helper gimple_folder::convert_and_fold
>>>>> - takes a target type and a function pointer,
>>>>> - converts all non-boolean vector types to the target type,
>>>>> - replaces the converted arguments in the function call,
>>>>> - calls the callback function,
>>>>> - adds the necessary view converts to the gimple sequence,
>>>>> - and returns the new call.
>>>>> 
>>>>> Because all arguments are converted to the same target types, the helper
>>>>> function is only suitable for folding calls whose arguments are all of
>>>>> the same type. If necessary, this could be extended to convert the
>>>>> arguments to different types differentially.
>>>>> 
>>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>>>> OK for mainline?
>>>>> 
>>>>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>>>> 
>>>>> gcc/ChangeLog:
>>>>> 
>>>>>    * config/aarch64/aarch64-sve-builtins-base.cc
>>>>>    (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>>>    function and pass to gimple_folder::convert_and_fold to enable
>>>>>    the transform for unsigned types.
>>>>>    (svdiv_impl::fold): Likewise.
>>>>>    * config/aarch64/aarch64-sve-builtins.cc
>>>>>    (gimple_folder::convert_and_fold): New function that converts
>>>>>    operands to target type before calling callback function, adding the
>>>>>    necessary conversion statements.
>>>>>    * config/aarch64/aarch64-sve-builtins.h
>>>>>    (gimple_folder::convert_and_fold): Declare function.
>>>>>    (signed_type_suffix_index): Return type_suffix_index of signed
>>>>>    vector type for given width.
>>>>>    (function_instance::signed_type): Return signed vector type for
>>>>>    given width.
>>>>> 
>>>>> gcc/testsuite/ChangeLog:
>>>>> 
>>>>>    * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>>>    outcome.
>>>>>    * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>>>    * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>>>    * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>>>    * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>>>    * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>>>    expected outcome.
>>>>> ---
>>>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>>>> 
>>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>> index 1c9f515a52c..6df14a8f4c4 100644
>>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>> [...]
>>>>> @@ -2082,33 +2091,49 @@ public:
>>>>>     return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>>>> 
>>>>>   /* If one of the operands is all integer -1, fold to svneg.  */
>>>>> -    tree pg = gimple_call_arg (f.call, 0);
>>>>> -    tree negated_op = NULL;
>>>>> -    if (integer_minus_onep (op2))
>>>>> -      negated_op = op1;
>>>>> -    else if (integer_minus_onep (op1))
>>>>> -      negated_op = op2;
>>>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>>> 
>>>> Formatting nit, sorry, but: indentation looks off.
>>>> 
>>>>>     {
>>>>> -     function_instance instance ("svneg", functions::svneg,
>>>>> -                                 shapes::unary, MODE_none,
>>>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>>>> -     gcall *call = f.redirect_call (instance);
>>>>> -     unsigned offset_index = 0;
>>>>> -     if (f.pred == PRED_m)
>>>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>>>      {
>>>>> -         offset_index = 1;
>>>>> -         gimple_call_set_arg (call, 0, op1);
>>>>> -       }
>>>>> -     else
>>>>> -       gimple_set_num_ops (call, 5);
>>>>> -     gimple_call_set_arg (call, offset_index, pg);
>>>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>>>> -     return call;
>>>> 
>>>> Rather than having convert_and_fold modify the call in-place (which
>>>> would create an incorrectly typed call if we leave the function itself
>>>> unchanged), could we make it pass back a "tree" that contains the
>>>> (possibly converted) lhs and a "vec<tree> &" that contains the
>>>> (possibly converted) arguments?
>>> Dear Richard,
>>> I agree that it's preferable to not have convert_and_fold modify the call in-place and wanted to ask for clarification about which sort of design you had in mind:
>>> Option 1:
>>> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such that the function takes a target type TYPE to which all non-boolean vector operands are converted and a reference for a vec<tree> that it fills with converted argument trees. It returns a tree with the (possibly converted) lhs. It also adds the convert_view statements, but makes no changes to the call itself.
>>> The caller of the function uses the converted arguments and lhs to assemble the new gcall.
>>> 
>>> Option 2:
>>> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, vec<tree>, gimple_folder &), where the function converts the lhs and arguments to TYPE and assigns them to the newly created tree lhs_conv and vec<tree> args_conv that are passed to the function pointer FP. The callback assembles the new call and returns it to convert_and_fold, which adds the necessary conversion statements before returning the new call to the caller. So, in this case it would be the callback modifying the call.
>> 
>> Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
>> I suppose the callback should return a gimple * rather than a gcall *
>> though, in case the callback wants to create a gassign instead.
>> 
>> (Thanks for checking btw)
> Thanks for clarifying. Below you find the updated patch. I made the following changes:
> - removed folding of svdiv and reverted the test cases
> - pass lhs_conv and args_conv to callback instead of having convert_and_fold change the call inplace
> - re-validated, no regression.
> Thanks,
> Jennifer
> 
> 
> As follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
> this patch implements folding of svmul by -1 to svneg for
> unsigned SVE vector types. The key idea is to reuse the existing code that
> does this fold for signed types and feed it as callback to a helper function
> that adds the necessary type conversions.
> 
> For example, for the test case
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>  return svmul_n_u64_x (pg, x, -1);
> }
> 
> the following gimple sequence is emitted (-O2 -mcpu=grace):
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>  svint64_t D.12921;
>  svint64_t D.12920;
>  svuint64_t D.12919;
> 
>  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>  D.12921 = svneg_s64_x (pg, D.12920);
>  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>  goto <D.12922>;
>  <D.12922>:
>  return D.12919;
> }
> 
> In general, the new helper gimple_folder::convert_and_fold
> - takes a target type and a function pointer,
> - converts the lhs and all non-boolean vector types to the target type,
> - passes the converted lhs and arguments to the callback,
> - receives the new gimple statement from the callback function,
> - adds the necessary view converts to the gimple sequence,
> - and returns the new call.
> 
> Because all arguments are converted to the same target types, the helper
> function is only suitable for folding calls whose arguments are all of
> the same type. If necessary, this could be extended to convert the
> arguments to different types differentially.
> 
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve-builtins-base.cc
> (svmul_impl::fold): Wrap code for folding to svneg in lambda
> function and pass to gimple_folder::convert_and_fold to enable
> the transform for unsigned types.
> * config/aarch64/aarch64-sve-builtins.cc
> (gimple_folder::convert_and_fold): New function that converts
> operands to target type before calling callback function, adding the
> necessary conversion statements.
> * config/aarch64/aarch64-sve-builtins.h
> (gimple_folder::convert_and_fold): Declare function.
> (signed_type_suffix_index): Return type_suffix_index of signed
> vector type for given width.
> (function_instance::signed_type): Return signed vector type for
> given width.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
> expected outcome.
> ---
> .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
> gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
> gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
> 7 files changed, 153 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 87e9909b55a..52401a8c57a 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -2092,33 +2092,61 @@ public:
>       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
> 
>     /* If one of the operands is all integer -1, fold to svneg.  */
> -    tree pg = gimple_call_arg (f.call, 0);
> -    tree negated_op = NULL;
> -    if (integer_minus_onep (op2))
> -      negated_op = op1;
> -    else if (integer_minus_onep (op1))
> -      negated_op = op2;
> -    if (!f.type_suffix (0).unsigned_p && negated_op)
> +    if (integer_minus_onep (op1) || integer_minus_onep (op2))
>       {
> - function_instance instance ("svneg", functions::svneg,
> -     shapes::unary, MODE_none,
> -     f.type_suffix_ids, GROUP_none, f.pred);
> - gcall *call = f.redirect_call (instance);
> - unsigned offset_index = 0;
> - if (f.pred == PRED_m)
> + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
> +     vec<tree> &args_conv) -> gimple *
>   {
> -     offset_index = 1;
> -     gimple_call_set_arg (call, 0, op1);
> -   }
> - else
> -   gimple_set_num_ops (call, 5);
> - gimple_call_set_arg (call, offset_index, pg);
> - gimple_call_set_arg (call, offset_index + 1, negated_op);
> - return call;
> +     gcc_assert (lhs_conv && args_conv.length () == 3);
> +     tree pg = args_conv[0];
> +     tree op1 = args_conv[1];
> +     tree op2 = args_conv[2];
> +     tree negated_op = op1;
> +     bool negate_op1 = true;
> +     if (integer_minus_onep (op1))
> +       {
> + negated_op = op2;
> + negate_op1 = false;
> +       }
> +     type_suffix_pair signed_tsp =
> +       {signed_type_suffix_index (f.type_suffix (0).element_bits),
> + f.type_suffix_ids[1]};
> +     function_instance instance ("svneg", functions::svneg,
> + shapes::unary, MODE_none,
> + signed_tsp, GROUP_none, f.pred);
> +     gcall *call = f.redirect_call (instance);
> +     gimple_call_set_lhs (call, lhs_conv);
> +     unsigned offset = 0;
> +     tree fntype, op1_type = TREE_TYPE (op1);
> +     if (f.pred == PRED_m)
> +       {
> + offset = 1;
> + tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
> +     3, arg_types);
> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
> + tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);
> + gimple_call_set_arg (call, 0, inactive);
> +       }
> +     else
> +       {
> + gimple_set_num_ops (call, 5);
> + tree arg_types[2] = {TREE_TYPE (pg), op1_type};
> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
> +     2, arg_types);
> +       }
> +     gimple_call_set_fntype (call, fntype);
> +     gimple_call_set_arg (call, offset, pg);
> +     gimple_call_set_arg (call, offset + 1, negated_op);
> +     return call;
> +   };
> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
> + return f.convert_and_fold (ty, mul_by_m1);
>       }
> 
>     /* If one of the operands is a uniform power of 2, fold to a left shift
>        by immediate.  */
> +    tree pg = gimple_call_arg (f.call, 0);
>     tree op1_cst = uniform_integer_cst_p (op1);
>     tree op2_cst = uniform_integer_cst_p (op2);
>     tree shift_op1, shift_op2 = NULL;
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0fec1cd439e..01b0da22c9b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
>   return redirect_call (instance);
> }
> 
> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
> +   Pass the converted variables to the callback FP, and finally convert the
> +   result back to the original type. Add the necessary conversion statements.
> +   Return the new call.  */
> +gimple *
> +gimple_folder::convert_and_fold (tree type,
> +  gimple *(*fp) (gimple_folder &,
> + tree, vec<tree> &))
> +{
> +  gcc_assert (VECTOR_TYPE_P (type)
> +       && TYPE_MODE (type) != VNx16BImode);
> +  tree old_ty = TREE_TYPE (lhs);
> +  gimple_seq stmts = NULL;
> +  tree lhs_conv, op, op_ty, t;
> +  gimple *g, *new_stmt;
> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
> +  unsigned int num_args = gimple_call_num_args (call);
> +  vec<tree> args_conv = vNULL;
> +  args_conv.safe_grow (num_args);
> +  for (unsigned int i = 0; i < num_args; ++i)
> +    {
> +      op = gimple_call_arg (call, i);
> +      op_ty = TREE_TYPE (op);
> +      args_conv[i] =
> + (VECTOR_TYPE_P (op_ty)
> +  && TREE_CODE (op) != VECTOR_CST
> +  && TYPE_MODE (op_ty) != VNx16BImode
> +  && !useless_type_conversion_p (op_ty, type))
> + ? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
> +    }
> +
> +  new_stmt = fp (*this, lhs_conv, args_conv);
> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  if (convert_lhs_p)
> +    {
> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
> +    }
> +  return new_stmt;
> +}
> +
> /* Fold the call to constant VAL.  */
> gimple *
> gimple_folder::fold_to_cstu (poly_uint64 val)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 4094f8207f9..3e919e52e2c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -406,6 +406,7 @@ public:
>   tree scalar_type (unsigned int) const;
>   tree vector_type (unsigned int) const;
>   tree tuple_type (unsigned int) const;
> +  tree signed_type (unsigned int) const;
>   unsigned int elements_per_vq (unsigned int) const;
>   machine_mode vector_mode (unsigned int) const;
>   machine_mode tuple_mode (unsigned int) const;
> @@ -632,6 +633,8 @@ public:
> 
>   gcall *redirect_call (const function_instance &);
>   gimple *redirect_pred_x ();
> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
> +        tree, vec<tree> &));
> 
>   gimple *fold_to_cstu (poly_uint64);
>   gimple *fold_to_pfalse ();
> @@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>   gcc_unreachable ();
> }
> 
> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
> +inline type_suffix_index
> +signed_type_suffix_index (unsigned int element_bits)
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return TYPE_SUFFIX_s8;
> +  case 16: return TYPE_SUFFIX_s16;
> +  case 32: return TYPE_SUFFIX_s32;
> +  case 64: return TYPE_SUFFIX_s64;
> +  }
> +  gcc_unreachable ();
> +}
> +
> /* Return the single field in tuple type TYPE.  */
> inline tree
> tuple_type_field (tree type)
> @@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const
>   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
> }
> 
> +/* Return the signed vector type of width ELEMENT_BITS.  */
> +inline tree
> +function_instance::signed_type (unsigned int element_bits) const
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
> +  }
> +  gcc_unreachable ();
> +}
> +
> /* Return the number of elements of type suffix I that fit within a
>    128-bit block.  */
> inline unsigned int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> index bdf6fcb98d6..e228dc5995d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
> 
> /*
> ** mul_m1_u16_m:
> -** mov (z[0-9]+)\.b, #-1
> -** mul z0\.h, p0/m, z0\.h, \1\.h
> +** neg z0\.h, p0/m, z0\.h
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
> 
> /*
> ** mul_m1_u16_x:
> -** mul z0\.h, z0\.h, #-1
> +** neg z0\.h, p0/m, z0\.h
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> index a61e85fa12d..e8f52c9d785 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
> 
> /*
> ** mul_m1_u32_m:
> -** mov (z[0-9]+)\.b, #-1
> -** mul z0\.s, p0/m, z0\.s, \1\.s
> +** neg z0\.s, p0/m, z0\.s
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
> 
> /*
> ** mul_m1_u32_x:
> -** mul z0\.s, z0\.s, #-1
> +** neg z0\.s, p0/m, z0\.s
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> index eee1f8a0c99..2ccdc3642c5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
> 
> /*
> ** mul_m1_u64_m:
> -** mov (z[0-9]+)\.b, #-1
> -** mul z0\.d, p0/m, z0\.d, \1\.d
> +** neg z0\.d, p0/m, z0\.d
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
> z0 = svmul_n_u64_m (p0, z0, -1),
> z0 = svmul_m (p0, z0, -1))
> 
> +/*
> +** mul_m1r_u64_m:
> +** mov (z[0-9]+)\.b, #-1
> +** mov (z[0-9]+\.d), z0\.d
> +** movprfx z0, \1
> +** neg z0\.d, p0/m, \2
> +** ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
> + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
> + z0 = svmul_m (p0, svdup_u64 (-1), z0))
> +
> /*
> ** mul_u64_z_tied1:
> ** movprfx z0\.d, p0/z, z0\.d
> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
> 
> /*
> ** mul_m1_u64_x:
> -** mul z0\.d, z0\.d, #-1
> +** neg z0\.d, p0/m, z0\.d
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
> z0 = svmul_n_u64_x (p0, z0, -1),
> z0 = svmul_x (p0, z0, -1))
> 
> +/*
> +** mul_m1r_u64_x:
> +** neg z0\.d, p0/m, z0\.d
> +** ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
> + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
> + z0 = svmul_x (p0, svdup_u64 (-1), z0))
> +
> /*
> ** mul_m127_u64_x:
> ** mul z0\.d, z0\.d, #-127
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> index 06ee1b3e7c8..8e53a4821f0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
> 
> /*
> ** mul_m1_u8_m:
> -** mov (z[0-9]+)\.b, #-1
> -** mul z0\.b, p0/m, z0\.b, \1\.b
> +** neg z0\.b, p0/m, z0\.b
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
> 
> /*
> ** mul_255_u8_x:
> -** mul z0\.b, z0\.b, #-1
> +** neg z0\.b, p0/m, z0\.b
> ** ret
> */
> TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
> 
> /*
> ** mul_m1_u8_x:
> -** mul z0\.b, z0\.b, #-1
> +** neg z0\.b, p0/m, z0\.b
> ** ret
> */
> TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
> -- 
> 2.44.0
> 
>> 
>> Richard
Jennifer Schmitz Dec. 10, 2024, 7:36 a.m. UTC | #6
Ping.
Thanks,
Jennifer

> On 3 Dec 2024, at 09:14, Jennifer Schmitz <jschmitz@nvidia.com> wrote:
> 
> Ping.
> Thanks,
> Jennifer
> 
>> On 26 Nov 2024, at 09:18, Jennifer Schmitz <jschmitz@nvidia.com> wrote:
>> 
>> 
>> 
>>> On 20 Nov 2024, at 13:43, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>> 
>>> External email: Use caution opening links or attachments
>>> 
>>> 
>>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>>>> On 13 Nov 2024, at 12:54, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> 
>>>>> External email: Use caution opening links or attachments
>>>>> 
>>>>> 
>>>>> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>>>>>> As follow-up to
>>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>>>>>> this patch implements folding of svmul and svdiv by -1 to svneg for
>>>>>> unsigned SVE vector types. The key idea is to reuse the existing code that
>>>>>> does this fold for signed types and feed it as callback to a helper function
>>>>>> that adds the necessary type conversions.
>>>>> 
>>>>> I only meant that we should do this for multiplication (since the sign
>>>>> isn't relevant for N-bit x N-bit -> N-bit multiplication).  It wouldn't
>>>>> be right for unsigned division, since unsigned division by the maximum
>>>>> value is instead equivalent to x == MAX ? MAX : 0.
>>>>> 
>>>>> Some comments on the multiplication bit below:
>>>>> 
>>>>>> 
>>>>>> For example, for the test case
>>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>>> {
>>>>>> return svmul_n_u64_x (pg, x, -1);
>>>>>> }
>>>>>> 
>>>>>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>>>>>> svuint64_t foo (svuint64_t x, svbool_t pg)
>>>>>> {
>>>>>> svuint64_t D.12921;
>>>>>> svint64_t D.12920;
>>>>>> svuint64_t D.12919;
>>>>>> 
>>>>>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>>>>> D.12921 = svneg_s64_x (pg, D.12920);
>>>>>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>>>>> goto <D.12922>;
>>>>>> <D.12922>:
>>>>>> return D.12919;
>>>>>> }
>>>>>> 
>>>>>> In general, the new helper gimple_folder::convert_and_fold
>>>>>> - takes a target type and a function pointer,
>>>>>> - converts all non-boolean vector types to the target type,
>>>>>> - replaces the converted arguments in the function call,
>>>>>> - calls the callback function,
>>>>>> - adds the necessary view converts to the gimple sequence,
>>>>>> - and returns the new call.
>>>>>> 
>>>>>> Because all arguments are converted to the same target types, the helper
>>>>>> function is only suitable for folding calls whose arguments are all of
>>>>>> the same type. If necessary, this could be extended to convert the
>>>>>> arguments to different types differentially.
>>>>>> 
>>>>>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>>>>>> OK for mainline?
>>>>>> 
>>>>>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>>>>>> 
>>>>>> gcc/ChangeLog:
>>>>>> 
>>>>>>   * config/aarch64/aarch64-sve-builtins-base.cc
>>>>>>   (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>>>>>   function and pass to gimple_folder::convert_and_fold to enable
>>>>>>   the transform for unsigned types.
>>>>>>   (svdiv_impl::fold): Likewise.
>>>>>>   * config/aarch64/aarch64-sve-builtins.cc
>>>>>>   (gimple_folder::convert_and_fold): New function that converts
>>>>>>   operands to target type before calling callback function, adding the
>>>>>>   necessary conversion statements.
>>>>>>   * config/aarch64/aarch64-sve-builtins.h
>>>>>>   (gimple_folder::convert_and_fold): Declare function.
>>>>>>   (signed_type_suffix_index): Return type_suffix_index of signed
>>>>>>   vector type for given width.
>>>>>>   (function_instance::signed_type): Return signed vector type for
>>>>>>   given width.
>>>>>> 
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> 
>>>>>>   * gcc.target/aarch64/sve/acle/asm/div_u32.c: Adjust expected
>>>>>>   outcome.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>>>>>   * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>>>>>   expected outcome.
>>>>>> ---
>>>>>> .../aarch64/aarch64-sve-builtins-base.cc      | 99 ++++++++++++-------
>>>>>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 40 ++++++++
>>>>>> gcc/config/aarch64/aarch64-sve-builtins.h     | 30 ++++++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u32.c |  9 ++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/div_u64.c |  9 ++
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++-
>>>>>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>>>>>> 9 files changed, 180 insertions(+), 50 deletions(-)
>>>>>> 
>>>>>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> index 1c9f515a52c..6df14a8f4c4 100644
>>>>>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>>>>>> [...]
>>>>>> @@ -2082,33 +2091,49 @@ public:
>>>>>>    return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>>>>>> 
>>>>>>  /* If one of the operands is all integer -1, fold to svneg.  */
>>>>>> -    tree pg = gimple_call_arg (f.call, 0);
>>>>>> -    tree negated_op = NULL;
>>>>>> -    if (integer_minus_onep (op2))
>>>>>> -      negated_op = op1;
>>>>>> -    else if (integer_minus_onep (op1))
>>>>>> -      negated_op = op2;
>>>>>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>>>>>> +     if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>>>> 
>>>>> Formatting nit, sorry, but: indentation looks off.
>>>>> 
>>>>>>    {
>>>>>> -     function_instance instance ("svneg", functions::svneg,
>>>>>> -                                 shapes::unary, MODE_none,
>>>>>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>>>>>> -     gcall *call = f.redirect_call (instance);
>>>>>> -     unsigned offset_index = 0;
>>>>>> -     if (f.pred == PRED_m)
>>>>>> +     auto mul_by_m1 = [](gimple_folder &f) -> gcall *
>>>>>>     {
>>>>>> -         offset_index = 1;
>>>>>> -         gimple_call_set_arg (call, 0, op1);
>>>>>> -       }
>>>>>> -     else
>>>>>> -       gimple_set_num_ops (call, 5);
>>>>>> -     gimple_call_set_arg (call, offset_index, pg);
>>>>>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>>>>>> -     return call;
>>>>> 
>>>>> Rather than having convert_and_fold modify the call in-place (which
>>>>> would create an incorrectly typed call if we leave the function itself
>>>>> unchanged), could we make it pass back a "tree" that contains the
>>>>> (possibly converted) lhs and a "vec<tree> &" that contains the
>>>>> (possibly converted) arguments?
>>>> Dear Richard,
>>>> I agree that it's preferable to not have convert_and_fold modify the call in-place and wanted to ask for clarification about which sort of design you had in mind:
>>>> Option 1:
>>>> tree gimple_folder::convert_and_fold (tree type, vec<tree> &args_conv), such that the function takes a target type TYPE to which all non-boolean vector operands are converted and a reference for a vec<tree> that it fills with converted argument trees. It returns a tree with the (possibly converted) lhs. It also adds the convert_view statements, but makes no changes to the call itself.
>>>> The caller of the function uses the converted arguments and lhs to assemble the new gcall.
>>>> 
>>>> Option 2:
>>>> gcall *gimple_folder::convert_and_fold (tree type, gcall *(*fp) (tree, vec<tree>, gimple_folder &), where the function converts the lhs and arguments to TYPE and assigns them to the newly created tree lhs_conv and vec<tree> args_conv that are passed to the function pointer FP. The callback assembles the new call and returns it to convert_and_fold, which adds the necessary conversion statements before returning the new call to the caller. So, in this case it would be the callback modifying the call.
>>> 
>>> Yeah, I meant option 2, but with vec<tree> & rather than plain vec<tree>.
>>> I suppose the callback should return a gimple * rather than a gcall *
>>> though, in case the callback wants to create a gassign instead.
>>> 
>>> (Thanks for checking btw)
>> Thanks for clarifying. Below you find the updated patch. I made the following changes:
>> - removed folding of svdiv and reverted the test cases
>> - pass lhs_conv and args_conv to callback instead of having convert_and_fold change the call inplace
>> - re-validated, no regression.
>> Thanks,
>> Jennifer
>> 
>> 
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>> return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>> svint64_t D.12921;
>> svint64_t D.12920;
>> svuint64_t D.12919;
>> 
>> D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>> D.12921 = svneg_s64_x (pg, D.12920);
>> D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>> goto <D.12922>;
>> <D.12922>:
>> return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts the lhs and all non-boolean vector types to the target type,
>> - passes the converted lhs and arguments to the callback,
>> - receives the new gimple statement from the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>> * config/aarch64/aarch64-sve-builtins-base.cc
>> (svmul_impl::fold): Wrap code for folding to svneg in lambda
>> function and pass to gimple_folder::convert_and_fold to enable
>> the transform for unsigned types.
>> * config/aarch64/aarch64-sve-builtins.cc
>> (gimple_folder::convert_and_fold): New function that converts
>> operands to target type before calling callback function, adding the
>> necessary conversion statements.
>> * config/aarch64/aarch64-sve-builtins.h
>> (gimple_folder::convert_and_fold): Declare function.
>> (signed_type_suffix_index): Return type_suffix_index of signed
>> vector type for given width.
>> (function_instance::signed_type): Return signed vector type for
>> given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
>> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>> expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
>> gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 7 files changed, 153 insertions(+), 34 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 87e9909b55a..52401a8c57a 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -2092,33 +2092,61 @@ public:
>>      return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>>    /* If one of the operands is all integer -1, fold to svneg.  */
>> -    tree pg = gimple_call_arg (f.call, 0);
>> -    tree negated_op = NULL;
>> -    if (integer_minus_onep (op2))
>> -      negated_op = op1;
>> -    else if (integer_minus_onep (op1))
>> -      negated_op = op2;
>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>> +    if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>      {
>> - function_instance instance ("svneg", functions::svneg,
>> -     shapes::unary, MODE_none,
>> -     f.type_suffix_ids, GROUP_none, f.pred);
>> - gcall *call = f.redirect_call (instance);
>> - unsigned offset_index = 0;
>> - if (f.pred == PRED_m)
>> + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
>> +     vec<tree> &args_conv) -> gimple *
>>  {
>> -     offset_index = 1;
>> -     gimple_call_set_arg (call, 0, op1);
>> -   }
>> - else
>> -   gimple_set_num_ops (call, 5);
>> - gimple_call_set_arg (call, offset_index, pg);
>> - gimple_call_set_arg (call, offset_index + 1, negated_op);
>> - return call;
>> +     gcc_assert (lhs_conv && args_conv.length () == 3);
>> +     tree pg = args_conv[0];
>> +     tree op1 = args_conv[1];
>> +     tree op2 = args_conv[2];
>> +     tree negated_op = op1;
>> +     bool negate_op1 = true;
>> +     if (integer_minus_onep (op1))
>> +       {
>> + negated_op = op2;
>> + negate_op1 = false;
>> +       }
>> +     type_suffix_pair signed_tsp =
>> +       {signed_type_suffix_index (f.type_suffix (0).element_bits),
>> + f.type_suffix_ids[1]};
>> +     function_instance instance ("svneg", functions::svneg,
>> + shapes::unary, MODE_none,
>> + signed_tsp, GROUP_none, f.pred);
>> +     gcall *call = f.redirect_call (instance);
>> +     gimple_call_set_lhs (call, lhs_conv);
>> +     unsigned offset = 0;
>> +     tree fntype, op1_type = TREE_TYPE (op1);
>> +     if (f.pred == PRED_m)
>> +       {
>> + offset = 1;
>> + tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
>> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +     3, arg_types);
>> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> + tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);
>> + gimple_call_set_arg (call, 0, inactive);
>> +       }
>> +     else
>> +       {
>> + gimple_set_num_ops (call, 5);
>> + tree arg_types[2] = {TREE_TYPE (pg), op1_type};
>> + fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +     2, arg_types);
>> +       }
>> +     gimple_call_set_fntype (call, fntype);
>> +     gimple_call_set_arg (call, offset, pg);
>> +     gimple_call_set_arg (call, offset + 1, negated_op);
>> +     return call;
>> +   };
>> + tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> + return f.convert_and_fold (ty, mul_by_m1);
>>      }
>> 
>>    /* If one of the operands is a uniform power of 2, fold to a left shift
>>       by immediate.  */
>> +    tree pg = gimple_call_arg (f.call, 0);
>>    tree op1_cst = uniform_integer_cst_p (op1);
>>    tree op2_cst = uniform_integer_cst_p (op2);
>>    tree shift_op1, shift_op2 = NULL;
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 0fec1cd439e..01b0da22c9b 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
>>  return redirect_call (instance);
>> }
>> 
>> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
>> +   Pass the converted variables to the callback FP, and finally convert the
>> +   result back to the original type. Add the necessary conversion statements.
>> +   Return the new call.  */
>> +gimple *
>> +gimple_folder::convert_and_fold (tree type,
>> +  gimple *(*fp) (gimple_folder &,
>> + tree, vec<tree> &))
>> +{
>> +  gcc_assert (VECTOR_TYPE_P (type)
>> +       && TYPE_MODE (type) != VNx16BImode);
>> +  tree old_ty = TREE_TYPE (lhs);
>> +  gimple_seq stmts = NULL;
>> +  tree lhs_conv, op, op_ty, t;
>> +  gimple *g, *new_stmt;
>> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
>> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
>> +  unsigned int num_args = gimple_call_num_args (call);
>> +  vec<tree> args_conv = vNULL;
>> +  args_conv.safe_grow (num_args);
>> +  for (unsigned int i = 0; i < num_args; ++i)
>> +    {
>> +      op = gimple_call_arg (call, i);
>> +      op_ty = TREE_TYPE (op);
>> +      args_conv[i] =
>> + (VECTOR_TYPE_P (op_ty)
>> +  && TREE_CODE (op) != VECTOR_CST
>> +  && TYPE_MODE (op_ty) != VNx16BImode
>> +  && !useless_type_conversion_p (op_ty, type))
>> + ? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
>> +    }
>> +
>> +  new_stmt = fp (*this, lhs_conv, args_conv);
>> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +  if (convert_lhs_p)
>> +    {
>> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
>> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
>> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
>> +    }
>> +  return new_stmt;
>> +}
>> +
>> /* Fold the call to constant VAL.  */
>> gimple *
>> gimple_folder::fold_to_cstu (poly_uint64 val)
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index 4094f8207f9..3e919e52e2c 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -406,6 +406,7 @@ public:
>>  tree scalar_type (unsigned int) const;
>>  tree vector_type (unsigned int) const;
>>  tree tuple_type (unsigned int) const;
>> +  tree signed_type (unsigned int) const;
>>  unsigned int elements_per_vq (unsigned int) const;
>>  machine_mode vector_mode (unsigned int) const;
>>  machine_mode tuple_mode (unsigned int) const;
>> @@ -632,6 +633,8 @@ public:
>> 
>>  gcall *redirect_call (const function_instance &);
>>  gimple *redirect_pred_x ();
>> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
>> +        tree, vec<tree> &));
>> 
>>  gimple *fold_to_cstu (poly_uint64);
>>  gimple *fold_to_pfalse ();
>> @@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>>  gcc_unreachable ();
>> }
>> 
>> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
>> +inline type_suffix_index
>> +signed_type_suffix_index (unsigned int element_bits)
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return TYPE_SUFFIX_s8;
>> +  case 16: return TYPE_SUFFIX_s16;
>> +  case 32: return TYPE_SUFFIX_s32;
>> +  case 64: return TYPE_SUFFIX_s64;
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the single field in tuple type TYPE.  */
>> inline tree
>> tuple_type_field (tree type)
>> @@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const
>>  return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>> }
>> 
>> +/* Return the signed vector type of width ELEMENT_BITS.  */
>> +inline tree
>> +function_instance::signed_type (unsigned int element_bits) const
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
>> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
>> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
>> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
>> /* Return the number of elements of type suffix I that fit within a
>>   128-bit block.  */
>> inline unsigned int
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> index bdf6fcb98d6..e228dc5995d 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.h, p0/m, z0\.h, \1\.h
>> +** neg z0\.h, p0/m, z0\.h
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>> 
>> /*
>> ** mul_m1_u16_x:
>> -** mul z0\.h, z0\.h, #-1
>> +** neg z0\.h, p0/m, z0\.h
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> index a61e85fa12d..e8f52c9d785 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.s, p0/m, z0\.s, \1\.s
>> +** neg z0\.s, p0/m, z0\.s
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
>> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>> 
>> /*
>> ** mul_m1_u32_x:
>> -** mul z0\.s, z0\.s, #-1
>> +** neg z0\.s, p0/m, z0\.s
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> index eee1f8a0c99..2ccdc3642c5 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
>> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.d, p0/m, z0\.d, \1\.d
>> +** neg z0\.d, p0/m, z0\.d
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>> z0 = svmul_n_u64_m (p0, z0, -1),
>> z0 = svmul_m (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_m:
>> +** mov (z[0-9]+)\.b, #-1
>> +** mov (z[0-9]+\.d), z0\.d
>> +** movprfx z0, \1
>> +** neg z0\.d, p0/m, \2
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
>> + z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
>> + z0 = svmul_m (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_u64_z_tied1:
>> ** movprfx z0\.d, p0/z, z0\.d
>> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>> 
>> /*
>> ** mul_m1_u64_x:
>> -** mul z0\.d, z0\.d, #-1
>> +** neg z0\.d, p0/m, z0\.d
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>> z0 = svmul_n_u64_x (p0, z0, -1),
>> z0 = svmul_x (p0, z0, -1))
>> 
>> +/*
>> +** mul_m1r_u64_x:
>> +** neg z0\.d, p0/m, z0\.d
>> +** ret
>> +*/
>> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
>> + z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
>> + z0 = svmul_x (p0, svdup_u64 (-1), z0))
>> +
>> /*
>> ** mul_m127_u64_x:
>> ** mul z0\.d, z0\.d, #-127
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> index 06ee1b3e7c8..8e53a4821f0 100644
>> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
>> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_m:
>> -** mov (z[0-9]+)\.b, #-1
>> -** mul z0\.b, p0/m, z0\.b, \1\.b
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
>> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_255_u8_x:
>> -** mul z0\.b, z0\.b, #-1
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>> 
>> /*
>> ** mul_m1_u8_x:
>> -** mul z0\.b, z0\.b, #-1
>> +** neg z0\.b, p0/m, z0\.b
>> ** ret
>> */
>> TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
>> -- 
>> 2.44.0
>> 
>>> 
>>> Richard
> 
>
Richard Sandiford Dec. 11, 2024, 12:01 p.m. UTC | #7
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> As follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
> this patch implements folding of svmul by -1 to svneg for
> unsigned SVE vector types. The key idea is to reuse the existing code that
> does this fold for signed types and feed it as callback to a helper function
> that adds the necessary type conversions.
>
> For example, for the test case
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   return svmul_n_u64_x (pg, x, -1);
> }
>
> the following gimple sequence is emitted (-O2 -mcpu=grace):
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   svint64_t D.12921;
>   svint64_t D.12920;
>   svuint64_t D.12919;
>
>   D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>   D.12921 = svneg_s64_x (pg, D.12920);
>   D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>   goto <D.12922>;
>   <D.12922>:
>   return D.12919;
> }
>
> In general, the new helper gimple_folder::convert_and_fold
> - takes a target type and a function pointer,
> - converts the lhs and all non-boolean vector types to the target type,
> - passes the converted lhs and arguments to the callback,
> - receives the new gimple statement from the callback function,
> - adds the necessary view converts to the gimple sequence,
> - and returns the new call.
>
> Because all arguments are converted to the same target types, the helper
> function is only suitable for folding calls whose arguments are all of
> the same type. If necessary, this could be extended to convert the
> arguments to different types differentially.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-sve-builtins-base.cc
> 	(svmul_impl::fold): Wrap code for folding to svneg in lambda
> 	function and pass to gimple_folder::convert_and_fold to enable
> 	the transform for unsigned types.
> 	* config/aarch64/aarch64-sve-builtins.cc
> 	(gimple_folder::convert_and_fold): New function that converts
> 	operands to target type before calling callback function, adding the
> 	necessary conversion statements.
> 	* config/aarch64/aarch64-sve-builtins.h
> 	(gimple_folder::convert_and_fold): Declare function.
> 	(signed_type_suffix_index): Return type_suffix_index of signed
> 	vector type for given width.
> 	(function_instance::signed_type): Return signed vector type for
> 	given width.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
> 	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
> 	expected outcome.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
>  gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
>  gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
>  .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>  7 files changed, 153 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 87e9909b55a..52401a8c57a 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -2092,33 +2092,61 @@ public:
>        return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>  
>      /* If one of the operands is all integer -1, fold to svneg.  */
> -    tree pg = gimple_call_arg (f.call, 0);
> -    tree negated_op = NULL;
> -    if (integer_minus_onep (op2))
> -      negated_op = op1;
> -    else if (integer_minus_onep (op1))
> -      negated_op = op2;
> -    if (!f.type_suffix (0).unsigned_p && negated_op)
> +    if (integer_minus_onep (op1) || integer_minus_onep (op2))
>        {
> -	function_instance instance ("svneg", functions::svneg,
> -				    shapes::unary, MODE_none,
> -				    f.type_suffix_ids, GROUP_none, f.pred);
> -	gcall *call = f.redirect_call (instance);
> -	unsigned offset_index = 0;
> -	if (f.pred == PRED_m)
> +	auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
> +			    vec<tree> &args_conv) -> gimple *
>  	  {
> -	    offset_index = 1;
> -	    gimple_call_set_arg (call, 0, op1);
> -	  }
> -	else
> -	  gimple_set_num_ops (call, 5);
> -	gimple_call_set_arg (call, offset_index, pg);
> -	gimple_call_set_arg (call, offset_index + 1, negated_op);
> -	return call;
> +	    gcc_assert (lhs_conv && args_conv.length () == 3);
> +	    tree pg = args_conv[0];
> +	    tree op1 = args_conv[1];
> +	    tree op2 = args_conv[2];
> +	    tree negated_op = op1;
> +	    bool negate_op1 = true;
> +	    if (integer_minus_onep (op1))
> +	      {
> +		negated_op = op2;
> +		negate_op1 = false;
> +	      }
> +	    type_suffix_pair signed_tsp =
> +	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
> +		f.type_suffix_ids[1]};
> +	    function_instance instance ("svneg", functions::svneg,
> +					shapes::unary, MODE_none,
> +					signed_tsp, GROUP_none, f.pred);
> +	    gcall *call = f.redirect_call (instance);
> +	    gimple_call_set_lhs (call, lhs_conv);
> +	    unsigned offset = 0;
> +	    tree fntype, op1_type = TREE_TYPE (op1);
> +	    if (f.pred == PRED_m)
> +	      {
> +		offset = 1;
> +		tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
> +		fntype = build_function_type_array (TREE_TYPE (lhs_conv),
> +						    3, arg_types);
> +		tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +		tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);

I think my question from the previous review still stands:

  Isn't the old approach of using op1 as the inactive argument still
  correct?

> +		gimple_call_set_arg (call, 0, inactive);
> +	      }
> +	    else
> +	      {
> +		gimple_set_num_ops (call, 5);
> +		tree arg_types[2] = {TREE_TYPE (pg), op1_type};
> +		fntype = build_function_type_array (TREE_TYPE (lhs_conv),
> +						    2, arg_types);
> +	      }
> +	    gimple_call_set_fntype (call, fntype);

I think the fntype should be set by redirect_call, which can get the
type from TREE_TYPE (rfn->decl).  That's better because it includes any
relevant attributes.

> +	    gimple_call_set_arg (call, offset, pg);
> +	    gimple_call_set_arg (call, offset + 1, negated_op);
> +	    return call;
> +	  };
> +	tree ty = f.signed_type (f.type_suffix (0).element_bits);
> +	return f.convert_and_fold (ty, mul_by_m1);
>        }
>  
>      /* If one of the operands is a uniform power of 2, fold to a left shift
>         by immediate.  */
> +    tree pg = gimple_call_arg (f.call, 0);
>      tree op1_cst = uniform_integer_cst_p (op1);
>      tree op2_cst = uniform_integer_cst_p (op2);
>      tree shift_op1, shift_op2 = NULL;
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0fec1cd439e..01b0da22c9b 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
>    return redirect_call (instance);
>  }
>  
> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
> +   Pass the converted variables to the callback FP, and finally convert the
> +   result back to the original type. Add the necessary conversion statements.
> +   Return the new call.  */
> +gimple *
> +gimple_folder::convert_and_fold (tree type,
> +				 gimple *(*fp) (gimple_folder &,
> +						tree, vec<tree> &))
> +{
> +  gcc_assert (VECTOR_TYPE_P (type)
> +	      && TYPE_MODE (type) != VNx16BImode);
> +  tree old_ty = TREE_TYPE (lhs);
> +  gimple_seq stmts = NULL;
> +  tree lhs_conv, op, op_ty, t;
> +  gimple *g, *new_stmt;
> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
> +  unsigned int num_args = gimple_call_num_args (call);
> +  vec<tree> args_conv = vNULL;

It'd be slightly more efficient to use something like:

  auto_vec<tree, 16> args_conv;

here.  This also removes the need to free the vector after use
(looks like the current version would leak).

> +  args_conv.safe_grow (num_args);
> +  for (unsigned int i = 0; i < num_args; ++i)
> +    {
> +      op = gimple_call_arg (call, i);
> +      op_ty = TREE_TYPE (op);
> +      args_conv[i] =
> +	(VECTOR_TYPE_P (op_ty)
> +	 && TREE_CODE (op) != VECTOR_CST
> +	 && TYPE_MODE (op_ty) != VNx16BImode
> +	 && !useless_type_conversion_p (op_ty, type))

I think we should convert VECTOR_CSTs too.  gimple_build should
fold it for us (but let me know if it doesn't).

Thanks, and sorry for the slow review.

Richard

> +	? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
> +    }
> +
> +  new_stmt = fp (*this, lhs_conv, args_conv);
> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  if (convert_lhs_p)
> +    {
> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
> +    }
> +  return new_stmt;
> +}
> +
>  /* Fold the call to constant VAL.  */
>  gimple *
>  gimple_folder::fold_to_cstu (poly_uint64 val)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 4094f8207f9..3e919e52e2c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -406,6 +406,7 @@ public:
>    tree scalar_type (unsigned int) const;
>    tree vector_type (unsigned int) const;
>    tree tuple_type (unsigned int) const;
> +  tree signed_type (unsigned int) const;
>    unsigned int elements_per_vq (unsigned int) const;
>    machine_mode vector_mode (unsigned int) const;
>    machine_mode tuple_mode (unsigned int) const;
> @@ -632,6 +633,8 @@ public:
>  
>    gcall *redirect_call (const function_instance &);
>    gimple *redirect_pred_x ();
> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
> +					       tree, vec<tree> &));
>  
>    gimple *fold_to_cstu (poly_uint64);
>    gimple *fold_to_pfalse ();
> @@ -864,6 +867,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>    gcc_unreachable ();
>  }
>  
> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
> +inline type_suffix_index
> +signed_type_suffix_index (unsigned int element_bits)
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return TYPE_SUFFIX_s8;
> +  case 16: return TYPE_SUFFIX_s16;
> +  case 32: return TYPE_SUFFIX_s32;
> +  case 64: return TYPE_SUFFIX_s64;
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the single field in tuple type TYPE.  */
>  inline tree
>  tuple_type_field (tree type)
> @@ -1029,6 +1046,20 @@ function_instance::tuple_type (unsigned int i) const
>    return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>  }
>  
> +/* Return the signed vector type of width ELEMENT_BITS.  */
> +inline tree
> +function_instance::signed_type (unsigned int element_bits) const
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
> +  }
> +  gcc_unreachable ();
> +}
> +
>  /* Return the number of elements of type suffix I that fit within a
>     128-bit block.  */
>  inline unsigned int
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> index bdf6fcb98d6..e228dc5995d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.h, p0/m, z0\.h, \1\.h
> +**	neg	z0\.h, p0/m, z0\.h
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
>  
>  /*
>  ** mul_m1_u16_x:
> -**	mul	z0\.h, z0\.h, #-1
> +**	neg	z0\.h, p0/m, z0\.h
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> index a61e85fa12d..e8f52c9d785 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.s, p0/m, z0\.s, \1\.s
> +**	neg	z0\.s, p0/m, z0\.s
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
> @@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
>  
>  /*
>  ** mul_m1_u32_x:
> -**	mul	z0\.s, z0\.s, #-1
> +**	neg	z0\.s, p0/m, z0\.s
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> index eee1f8a0c99..2ccdc3642c5 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
> @@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.d, p0/m, z0\.d, \1\.d
> +**	neg	z0\.d, p0/m, z0\.d
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
>  		z0 = svmul_n_u64_m (p0, z0, -1),
>  		z0 = svmul_m (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_m:
> +**	mov	(z[0-9]+)\.b, #-1
> +**	mov	(z[0-9]+\.d), z0\.d
> +**	movprfx	z0, \1
> +**	neg	z0\.d, p0/m, \2
> +**	ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
> +		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
> +		z0 = svmul_m (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_u64_z_tied1:
>  **	movprfx	z0\.d, p0/z, z0\.d
> @@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
>  
>  /*
>  ** mul_m1_u64_x:
> -**	mul	z0\.d, z0\.d, #-1
> +**	neg	z0\.d, p0/m, z0\.d
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
>  		z0 = svmul_n_u64_x (p0, z0, -1),
>  		z0 = svmul_x (p0, z0, -1))
>  
> +/*
> +** mul_m1r_u64_x:
> +**	neg	z0\.d, p0/m, z0\.d
> +**	ret
> +*/
> +TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
> +		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
> +		z0 = svmul_x (p0, svdup_u64 (-1), z0))
> +
>  /*
>  ** mul_m127_u64_x:
>  **	mul	z0\.d, z0\.d, #-127
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> index 06ee1b3e7c8..8e53a4821f0 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
> @@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_m:
> -**	mov	(z[0-9]+)\.b, #-1
> -**	mul	z0\.b, p0/m, z0\.b, \1\.b
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
> @@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
>  
>  /*
>  ** mul_255_u8_x:
> -**	mul	z0\.b, z0\.b, #-1
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
> @@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
>  
>  /*
>  ** mul_m1_u8_x:
> -**	mul	z0\.b, z0\.b, #-1
> +**	neg	z0\.b, p0/m, z0\.b
>  **	ret
>  */
>  TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
Jennifer Schmitz Dec. 13, 2024, 10:47 a.m. UTC | #8
> On 11 Dec 2024, at 13:01, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> As follow-up to
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
>> this patch implements folding of svmul by -1 to svneg for
>> unsigned SVE vector types. The key idea is to reuse the existing code that
>> does this fold for signed types and feed it as callback to a helper function
>> that adds the necessary type conversions.
>> 
>> For example, for the test case
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  return svmul_n_u64_x (pg, x, -1);
>> }
>> 
>> the following gimple sequence is emitted (-O2 -mcpu=grace):
>> svuint64_t foo (svuint64_t x, svbool_t pg)
>> {
>>  svint64_t D.12921;
>>  svint64_t D.12920;
>>  svuint64_t D.12919;
>> 
>>  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
>>  D.12921 = svneg_s64_x (pg, D.12920);
>>  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
>>  goto <D.12922>;
>>  <D.12922>:
>>  return D.12919;
>> }
>> 
>> In general, the new helper gimple_folder::convert_and_fold
>> - takes a target type and a function pointer,
>> - converts the lhs and all non-boolean vector types to the target type,
>> - passes the converted lhs and arguments to the callback,
>> - receives the new gimple statement from the callback function,
>> - adds the necessary view converts to the gimple sequence,
>> - and returns the new call.
>> 
>> Because all arguments are converted to the same target types, the helper
>> function is only suitable for folding calls whose arguments are all of
>> the same type. If necessary, this could be extended to convert the
>> arguments to different types differentially.
>> 
>> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
>> OK for mainline?
>> 
>> Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
>> 
>> gcc/ChangeLog:
>> 
>>      * config/aarch64/aarch64-sve-builtins-base.cc
>>      (svmul_impl::fold): Wrap code for folding to svneg in lambda
>>      function and pass to gimple_folder::convert_and_fold to enable
>>      the transform for unsigned types.
>>      * config/aarch64/aarch64-sve-builtins.cc
>>      (gimple_folder::convert_and_fold): New function that converts
>>      operands to target type before calling callback function, adding the
>>      necessary conversion statements.
>>      * config/aarch64/aarch64-sve-builtins.h
>>      (gimple_folder::convert_and_fold): Declare function.
>>      (signed_type_suffix_index): Return type_suffix_index of signed
>>      vector type for given width.
>>      (function_instance::signed_type): Return signed vector type for
>>      given width.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>      * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>>      * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>>      expected outcome.
>> ---
>> .../aarch64/aarch64-sve-builtins-base.cc      | 70 +++++++++++++------
>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 ++++++++++++
>> gcc/config/aarch64/aarch64-sve-builtins.h     | 31 ++++++++
>> .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++-
>> .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>> 7 files changed, 153 insertions(+), 34 deletions(-)
>> 
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> index 87e9909b55a..52401a8c57a 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
>> @@ -2092,33 +2092,61 @@ public:
>>       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>> 
>>     /* If one of the operands is all integer -1, fold to svneg.  */
>> -    tree pg = gimple_call_arg (f.call, 0);
>> -    tree negated_op = NULL;
>> -    if (integer_minus_onep (op2))
>> -      negated_op = op1;
>> -    else if (integer_minus_onep (op1))
>> -      negated_op = op2;
>> -    if (!f.type_suffix (0).unsigned_p && negated_op)
>> +    if (integer_minus_onep (op1) || integer_minus_onep (op2))
>>       {
>> -     function_instance instance ("svneg", functions::svneg,
>> -                                 shapes::unary, MODE_none,
>> -                                 f.type_suffix_ids, GROUP_none, f.pred);
>> -     gcall *call = f.redirect_call (instance);
>> -     unsigned offset_index = 0;
>> -     if (f.pred == PRED_m)
>> +     auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
>> +                         vec<tree> &args_conv) -> gimple *
>>        {
>> -         offset_index = 1;
>> -         gimple_call_set_arg (call, 0, op1);
>> -       }
>> -     else
>> -       gimple_set_num_ops (call, 5);
>> -     gimple_call_set_arg (call, offset_index, pg);
>> -     gimple_call_set_arg (call, offset_index + 1, negated_op);
>> -     return call;
>> +         gcc_assert (lhs_conv && args_conv.length () == 3);
>> +         tree pg = args_conv[0];
>> +         tree op1 = args_conv[1];
>> +         tree op2 = args_conv[2];
>> +         tree negated_op = op1;
>> +         bool negate_op1 = true;
>> +         if (integer_minus_onep (op1))
>> +           {
>> +             negated_op = op2;
>> +             negate_op1 = false;
>> +           }
>> +         type_suffix_pair signed_tsp =
>> +           {signed_type_suffix_index (f.type_suffix (0).element_bits),
>> +             f.type_suffix_ids[1]};
>> +         function_instance instance ("svneg", functions::svneg,
>> +                                     shapes::unary, MODE_none,
>> +                                     signed_tsp, GROUP_none, f.pred);
>> +         gcall *call = f.redirect_call (instance);
>> +         gimple_call_set_lhs (call, lhs_conv);
>> +         unsigned offset = 0;
>> +         tree fntype, op1_type = TREE_TYPE (op1);
>> +         if (f.pred == PRED_m)
>> +           {
>> +             offset = 1;
>> +             tree arg_types[3] = {op1_type, TREE_TYPE (pg), op1_type};
>> +             fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +                                                 3, arg_types);
>> +             tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> +             tree inactive = negate_op1 ? op1 : build_minus_one_cst (ty);
> 
> I think my question from the previous review still stands:
> 
>  Isn't the old approach of using op1 as the inactive argument still
>  correct?
You’re right, that’s still correct. I adjusted it.
> 
>> +             gimple_call_set_arg (call, 0, inactive);
>> +           }
>> +         else
>> +           {
>> +             gimple_set_num_ops (call, 5);
>> +             tree arg_types[2] = {TREE_TYPE (pg), op1_type};
>> +             fntype = build_function_type_array (TREE_TYPE (lhs_conv),
>> +                                                 2, arg_types);
>> +           }
>> +         gimple_call_set_fntype (call, fntype);
> 
> I think the fntype should be set by redirect_call, which can get the
> type from TREE_TYPE (rfn->decl).  That's better because it includes any
> relevant attributes.
Done.
> 
>> +         gimple_call_set_arg (call, offset, pg);
>> +         gimple_call_set_arg (call, offset + 1, negated_op);
>> +         return call;
>> +       };
>> +     tree ty = f.signed_type (f.type_suffix (0).element_bits);
>> +     return f.convert_and_fold (ty, mul_by_m1);
>>       }
>> 
>>     /* If one of the operands is a uniform power of 2, fold to a left shift
>>        by immediate.  */
>> +    tree pg = gimple_call_arg (f.call, 0);
>>     tree op1_cst = uniform_integer_cst_p (op1);
>>     tree op2_cst = uniform_integer_cst_p (op2);
>>     tree shift_op1, shift_op2 = NULL;
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> index 0fec1cd439e..01b0da22c9b 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
>> @@ -3646,6 +3646,49 @@ gimple_folder::redirect_pred_x ()
>>   return redirect_call (instance);
>> }
>> 
>> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
>> +   Pass the converted variables to the callback FP, and finally convert the
>> +   result back to the original type. Add the necessary conversion statements.
>> +   Return the new call.  */
>> +gimple *
>> +gimple_folder::convert_and_fold (tree type,
>> +                              gimple *(*fp) (gimple_folder &,
>> +                                             tree, vec<tree> &))
>> +{
>> +  gcc_assert (VECTOR_TYPE_P (type)
>> +           && TYPE_MODE (type) != VNx16BImode);
>> +  tree old_ty = TREE_TYPE (lhs);
>> +  gimple_seq stmts = NULL;
>> +  tree lhs_conv, op, op_ty, t;
>> +  gimple *g, *new_stmt;
>> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
>> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
>> +  unsigned int num_args = gimple_call_num_args (call);
>> +  vec<tree> args_conv = vNULL;
> 
> It'd be slightly more efficient to use something like:
> 
>  auto_vec<tree, 16> args_conv;
> 
> here.  This also removes the need to free the vector after use
> (looks like the current version would leak).
Done.
> 
>> +  args_conv.safe_grow (num_args);
>> +  for (unsigned int i = 0; i < num_args; ++i)
>> +    {
>> +      op = gimple_call_arg (call, i);
>> +      op_ty = TREE_TYPE (op);
>> +      args_conv[i] =
>> +     (VECTOR_TYPE_P (op_ty)
>> +      && TREE_CODE (op) != VECTOR_CST
>> +      && TYPE_MODE (op_ty) != VNx16BImode
>> +      && !useless_type_conversion_p (op_ty, type))
> 
> I think we should convert VECTOR_CSTs too.  gimple_build should
> fold it for us (but let me know if it doesn't).
Yes, that can be removed.
> 
> Thanks, and sorry for the slow review.
> 
> Richard
Thanks for the review, I posted the updated patch below (including the FPM_unused argument to creating a new function instance that was added in the meantime). It was re-tested on aarch64, no regression.
Thanks,
Jennifer

As follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
this patch implements folding of svmul by -1 to svneg for
unsigned SVE vector types. The key idea is to reuse the existing code that
does this fold for signed types and feed it as callback to a helper function
that adds the necessary type conversions.

For example, for the test case
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  return svmul_n_u64_x (pg, x, -1);
}

the following gimple sequence is emitted (-O2 -mcpu=grace):
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  svint64_t D.12921;
  svint64_t D.12920;
  svuint64_t D.12919;

  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
  D.12921 = svneg_s64_x (pg, D.12920);
  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
  goto <D.12922>;
  <D.12922>:
  return D.12919;
}

In general, the new helper gimple_folder::convert_and_fold
- takes a target type and a function pointer,
- converts the lhs and all non-boolean vector types to the target type,
- passes the converted lhs and arguments to the callback,
- receives the new gimple statement from the callback function,
- adds the necessary view converts to the gimple sequence,
- and returns the new call.

Because all arguments are converted to the same target types, the helper
function is only suitable for folding calls whose arguments are all of
the same type. If necessary, this could be extended to convert the
arguments to different types differentially.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins-base.cc
	(svmul_impl::fold): Wrap code for folding to svneg in lambda
	function and pass to gimple_folder::convert_and_fold to enable
	the transform for unsigned types.
	* config/aarch64/aarch64-sve-builtins.cc
	(gimple_folder::convert_and_fold): New function that converts
	operands to target type before calling callback function, adding the
	necessary conversion statements.
	(gimple_folder::redirect_call): Set fntype of redirected call.
	* config/aarch64/aarch64-sve-builtins.h
	(gimple_folder::convert_and_fold): Declare function.
	(signed_type_suffix_index): Return type_suffix_index of signed
	vector type for given width.
	(function_instance::signed_type): Return signed vector type for
	given width.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
	expected outcome.
---
 .../aarch64/aarch64-sve-builtins-base.cc      | 54 +++++++++++--------
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 43 +++++++++++++++
 gcc/config/aarch64/aarch64-sve-builtins.h     | 31 +++++++++++
 .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 +++++++--
 .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 ++-
 7 files changed, 137 insertions(+), 34 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 927c5bbae21..b0d04e3ffc6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2305,33 +2305,45 @@ public:
       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
     /* If one of the operands is all integer -1, fold to svneg.  */
-    tree pg = gimple_call_arg (f.call, 0);
-    tree negated_op = NULL;
-    if (integer_minus_onep (op2))
-      negated_op = op1;
-    else if (integer_minus_onep (op1))
-      negated_op = op2;
-    if (!f.type_suffix (0).unsigned_p && negated_op)
+    if (integer_minus_onep (op1) || integer_minus_onep (op2))
       {
-	function_instance instance ("svneg", functions::svneg, shapes::unary,
-				    MODE_none, f.type_suffix_ids, GROUP_none,
-				    f.pred, FPM_unused);
-	gcall *call = f.redirect_call (instance);
-	unsigned offset_index = 0;
-	if (f.pred == PRED_m)
+	auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
+			    vec<tree> &args_conv) -> gimple *
 	  {
-	    offset_index = 1;
-	    gimple_call_set_arg (call, 0, op1);
-	  }
-	else
-	  gimple_set_num_ops (call, 5);
-	gimple_call_set_arg (call, offset_index, pg);
-	gimple_call_set_arg (call, offset_index + 1, negated_op);
-	return call;
+	    gcc_assert (lhs_conv && args_conv.length () == 3);
+	    tree pg = args_conv[0];
+	    tree op1 = args_conv[1];
+	    tree op2 = args_conv[2];
+	    tree negated_op = op1;
+	    if (integer_minus_onep (op1))
+	      negated_op = op2;
+	    type_suffix_pair signed_tsp =
+	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
+		f.type_suffix_ids[1]};
+	    function_instance instance ("svneg", functions::svneg,
+					shapes::unary, MODE_none, signed_tsp,
+					GROUP_none, f.pred, FPM_unused);
+	    gcall *call = f.redirect_call (instance);
+	    gimple_call_set_lhs (call, lhs_conv);
+	    unsigned offset = 0;
+	    if (f.pred == PRED_m)
+	      {
+		offset = 1;
+		gimple_call_set_arg (call, 0, op1);
+	      }
+	    else
+	      gimple_set_num_ops (call, 5);
+	    gimple_call_set_arg (call, offset, pg);
+	    gimple_call_set_arg (call, offset + 1, negated_op);
+	    return call;
+	  };
+	tree ty = f.signed_type (f.type_suffix (0).element_bits);
+	return f.convert_and_fold (ty, mul_by_m1);
       }
 
     /* If one of the operands is a uniform power of 2, fold to a left shift
        by immediate.  */
+    tree pg = gimple_call_arg (f.call, 0);
     tree op1_cst = uniform_integer_cst_p (op1);
     tree op2_cst = uniform_integer_cst_p (op2);
     tree shift_op1, shift_op2 = NULL;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5acc56f99c6..5fbfa13a777 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3598,6 +3598,7 @@ gimple_folder::redirect_call (const function_instance &instance)
     return NULL;
 
   gimple_call_set_fndecl (call, rfn->decl);
+  gimple_call_set_fntype (call, TREE_TYPE (rfn->decl));
   return call;
 }
 
@@ -3672,6 +3673,48 @@ gimple_folder::fold_pfalse ()
   return nullptr;
 }
 
+/* Convert the lhs and all non-boolean vector-type operands to TYPE.
+   Pass the converted variables to the callback FP, and finally convert the
+   result back to the original type. Add the necessary conversion statements.
+   Return the new call.  */
+gimple *
+gimple_folder::convert_and_fold (tree type,
+				 gimple *(*fp) (gimple_folder &,
+						tree, vec<tree> &))
+{
+  gcc_assert (VECTOR_TYPE_P (type)
+	      && TYPE_MODE (type) != VNx16BImode);
+  tree old_ty = TREE_TYPE (lhs);
+  gimple_seq stmts = NULL;
+  tree lhs_conv, op, op_ty, t;
+  gimple *g, *new_stmt;
+  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
+  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
+  unsigned int num_args = gimple_call_num_args (call);
+  auto_vec<tree, 16> args_conv;
+  args_conv.safe_grow (num_args);
+  for (unsigned int i = 0; i < num_args; ++i)
+    {
+      op = gimple_call_arg (call, i);
+      op_ty = TREE_TYPE (op);
+      args_conv[i] =
+	(VECTOR_TYPE_P (op_ty)
+	 && TYPE_MODE (op_ty) != VNx16BImode
+	 && !useless_type_conversion_p (op_ty, type))
+	? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
+    }
+
+  new_stmt = fp (*this, lhs_conv, args_conv);
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+  if (convert_lhs_p)
+    {
+      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
+      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
+      gsi_insert_after (gsi, g, GSI_SAME_STMT);
+    }
+  return new_stmt;
+}
+
 /* Fold the call to constant VAL.  */
 gimple *
 gimple_folder::fold_to_cstu (poly_uint64 val)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 6f22868f9b3..02ae098ed32 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -421,6 +421,7 @@ public:
   tree scalar_type (unsigned int) const;
   tree vector_type (unsigned int) const;
   tree tuple_type (unsigned int) const;
+  tree signed_type (unsigned int) const;
   unsigned int elements_per_vq (unsigned int) const;
   machine_mode vector_mode (unsigned int) const;
   machine_mode tuple_mode (unsigned int) const;
@@ -649,6 +650,8 @@ public:
   gcall *redirect_call (const function_instance &);
   gimple *redirect_pred_x ();
   gimple *fold_pfalse ();
+  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
+					       tree, vec<tree> &));
 
   gimple *fold_to_cstu (poly_uint64);
   gimple *fold_to_pfalse ();
@@ -884,6 +887,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
   gcc_unreachable ();
 }
 
+/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
+inline type_suffix_index
+signed_type_suffix_index (unsigned int element_bits)
+{
+  switch (element_bits)
+  {
+  case 8: return TYPE_SUFFIX_s8;
+  case 16: return TYPE_SUFFIX_s16;
+  case 32: return TYPE_SUFFIX_s32;
+  case 64: return TYPE_SUFFIX_s64;
+  }
+  gcc_unreachable ();
+}
+
 /* Return the single field in tuple type TYPE.  */
 inline tree
 tuple_type_field (tree type)
@@ -1049,6 +1066,20 @@ function_instance::tuple_type (unsigned int i) const
   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
 }
 
+/* Return the signed vector type of width ELEMENT_BITS.  */
+inline tree
+function_instance::signed_type (unsigned int element_bits) const
+{
+  switch (element_bits)
+  {
+  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
+  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
+  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
+  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
+  }
+  gcc_unreachable ();
+}
+
 /* Return the number of elements of type suffix I that fit within a
    128-bit block.  */
 inline unsigned int
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index bdf6fcb98d6..e228dc5995d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
 
 /*
 ** mul_m1_u16_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.h, p0/m, z0\.h, \1\.h
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
 
 /*
 ** mul_m1_u16_x:
-**	mul	z0\.h, z0\.h, #-1
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
index a61e85fa12d..e8f52c9d785 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
 
 /*
 ** mul_m1_u32_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.s, p0/m, z0\.s, \1\.s
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
 
 /*
 ** mul_m1_u32_x:
-**	mul	z0\.s, z0\.s, #-1
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
index eee1f8a0c99..2ccdc3642c5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
@@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
 
 /*
 ** mul_m1_u64_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.d, p0/m, z0\.d, \1\.d
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
 		z0 = svmul_n_u64_m (p0, z0, -1),
 		z0 = svmul_m (p0, z0, -1))
 
+/*
+** mul_m1r_u64_m:
+**	mov	(z[0-9]+)\.b, #-1
+**	mov	(z[0-9]+\.d), z0\.d
+**	movprfx	z0, \1
+**	neg	z0\.d, p0/m, \2
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
+		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
+		z0 = svmul_m (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_u64_z_tied1:
 **	movprfx	z0\.d, p0/z, z0\.d
@@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
 
 /*
 ** mul_m1_u64_x:
-**	mul	z0\.d, z0\.d, #-1
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
 		z0 = svmul_n_u64_x (p0, z0, -1),
 		z0 = svmul_x (p0, z0, -1))
 
+/*
+** mul_m1r_u64_x:
+**	neg	z0\.d, p0/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
+		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
+		z0 = svmul_x (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_m127_u64_x:
 **	mul	z0\.d, z0\.d, #-127
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index 06ee1b3e7c8..8e53a4821f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
 
 /*
 ** mul_m1_u8_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.b, p0/m, z0\.b, \1\.b
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
@@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
 
 /*
 ** mul_255_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
@@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
 
 /*
 ** mul_m1_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
Richard Sandiford Dec. 19, 2024, 11:24 a.m. UTC | #9
Jennifer Schmitz <jschmitz@nvidia.com> writes:
> @@ -3672,6 +3673,48 @@ gimple_folder::fold_pfalse ()
>    return nullptr;
>  }
>  
> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
> +   Pass the converted variables to the callback FP, and finally convert the
> +   result back to the original type. Add the necessary conversion statements.
> +   Return the new call.  */
> +gimple *
> +gimple_folder::convert_and_fold (tree type,
> +				 gimple *(*fp) (gimple_folder &,
> +						tree, vec<tree> &))
> +{
> +  gcc_assert (VECTOR_TYPE_P (type)
> +	      && TYPE_MODE (type) != VNx16BImode);
> +  tree old_ty = TREE_TYPE (lhs);
> +  gimple_seq stmts = NULL;
> +  tree lhs_conv, op, op_ty, t;
> +  gimple *g, *new_stmt;

Sorry for the last-minute minor request, but: it would be nice to declare
these at the point of initialisation, for consistency with the rest of the
function.

> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
> +  unsigned int num_args = gimple_call_num_args (call);
> +  auto_vec<tree, 16> args_conv;
> +  args_conv.safe_grow (num_args);
> +  for (unsigned int i = 0; i < num_args; ++i)
> +    {
> +      op = gimple_call_arg (call, i);
> +      op_ty = TREE_TYPE (op);
> +      args_conv[i] =
> +	(VECTOR_TYPE_P (op_ty)
> +	 && TYPE_MODE (op_ty) != VNx16BImode
> +	 && !useless_type_conversion_p (op_ty, type))
> +	? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
> +    }
> +
> +  new_stmt = fp (*this, lhs_conv, args_conv);
> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +  if (convert_lhs_p)
> +    {
> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
> +    }
> +  return new_stmt;
> +}
> +
>  /* Fold the call to constant VAL.  */
>  gimple *
>  gimple_folder::fold_to_cstu (poly_uint64 val)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 6f22868f9b3..02ae098ed32 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -421,6 +421,7 @@ public:
>    tree scalar_type (unsigned int) const;
>    tree vector_type (unsigned int) const;
>    tree tuple_type (unsigned int) const;
> +  tree signed_type (unsigned int) const;
>    unsigned int elements_per_vq (unsigned int) const;
>    machine_mode vector_mode (unsigned int) const;
>    machine_mode tuple_mode (unsigned int) const;
> @@ -649,6 +650,8 @@ public:
>    gcall *redirect_call (const function_instance &);
>    gimple *redirect_pred_x ();
>    gimple *fold_pfalse ();
> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
> +					       tree, vec<tree> &));
>  
>    gimple *fold_to_cstu (poly_uint64);
>    gimple *fold_to_pfalse ();
> @@ -884,6 +887,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>    gcc_unreachable ();
>  }
>  
> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
> +inline type_suffix_index
> +signed_type_suffix_index (unsigned int element_bits)
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return TYPE_SUFFIX_s8;
> +  case 16: return TYPE_SUFFIX_s16;
> +  case 32: return TYPE_SUFFIX_s32;
> +  case 64: return TYPE_SUFFIX_s64;
> +  }
> +  gcc_unreachable ();
> +}
> +

We could drop this and instead replace calls with:

  find_type_suffix (TYPE_signed, element_bits)


>  /* Return the single field in tuple type TYPE.  */
>  inline tree
>  tuple_type_field (tree type)
> @@ -1049,6 +1066,20 @@ function_instance::tuple_type (unsigned int i) const
>    return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>  }
>  
> +/* Return the signed vector type of width ELEMENT_BITS.  */
> +inline tree
> +function_instance::signed_type (unsigned int element_bits) const
> +{
> +  switch (element_bits)
> +  {
> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
> +  }
> +  gcc_unreachable ();
> +}
> +

And for this, I think we should instead make:

/* Return the vector type associated with TYPE.  */
static tree
get_vector_type (sve_type type)
{
  auto vector_type = type_suffixes[type.type].vector_type;
  return acle_vector_types[type.num_vectors - 1][vector_type];
}

public, or perhaps just define it inline in aarch64-sve-builtins.h.

OK with those changes, thanks.

Richard
Jennifer Schmitz Dec. 19, 2024, 3:04 p.m. UTC | #10
> On 19 Dec 2024, at 12:24, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Jennifer Schmitz <jschmitz@nvidia.com> writes:
>> @@ -3672,6 +3673,48 @@ gimple_folder::fold_pfalse ()
>>   return nullptr;
>> }
>> 
>> +/* Convert the lhs and all non-boolean vector-type operands to TYPE.
>> +   Pass the converted variables to the callback FP, and finally convert the
>> +   result back to the original type. Add the necessary conversion statements.
>> +   Return the new call.  */
>> +gimple *
>> +gimple_folder::convert_and_fold (tree type,
>> +                              gimple *(*fp) (gimple_folder &,
>> +                                             tree, vec<tree> &))
>> +{
>> +  gcc_assert (VECTOR_TYPE_P (type)
>> +           && TYPE_MODE (type) != VNx16BImode);
>> +  tree old_ty = TREE_TYPE (lhs);
>> +  gimple_seq stmts = NULL;
>> +  tree lhs_conv, op, op_ty, t;
>> +  gimple *g, *new_stmt;
> 
> Sorry for the last-minute minor request, but: it would be nice to declare
> these at the point of initialisation, for consistency with the rest of the
> function.
Done.
> 
>> +  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
>> +  lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
>> +  unsigned int num_args = gimple_call_num_args (call);
>> +  auto_vec<tree, 16> args_conv;
>> +  args_conv.safe_grow (num_args);
>> +  for (unsigned int i = 0; i < num_args; ++i)
>> +    {
>> +      op = gimple_call_arg (call, i);
>> +      op_ty = TREE_TYPE (op);
>> +      args_conv[i] =
>> +     (VECTOR_TYPE_P (op_ty)
>> +      && TYPE_MODE (op_ty) != VNx16BImode
>> +      && !useless_type_conversion_p (op_ty, type))
>> +     ? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
>> +    }
>> +
>> +  new_stmt = fp (*this, lhs_conv, args_conv);
>> +  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +  if (convert_lhs_p)
>> +    {
>> +      t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
>> +      g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
>> +      gsi_insert_after (gsi, g, GSI_SAME_STMT);
>> +    }
>> +  return new_stmt;
>> +}
>> +
>> /* Fold the call to constant VAL.  */
>> gimple *
>> gimple_folder::fold_to_cstu (poly_uint64 val)
>> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
>> index 6f22868f9b3..02ae098ed32 100644
>> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
>> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
>> @@ -421,6 +421,7 @@ public:
>>   tree scalar_type (unsigned int) const;
>>   tree vector_type (unsigned int) const;
>>   tree tuple_type (unsigned int) const;
>> +  tree signed_type (unsigned int) const;
>>   unsigned int elements_per_vq (unsigned int) const;
>>   machine_mode vector_mode (unsigned int) const;
>>   machine_mode tuple_mode (unsigned int) const;
>> @@ -649,6 +650,8 @@ public:
>>   gcall *redirect_call (const function_instance &);
>>   gimple *redirect_pred_x ();
>>   gimple *fold_pfalse ();
>> +  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
>> +                                            tree, vec<tree> &));
>> 
>>   gimple *fold_to_cstu (poly_uint64);
>>   gimple *fold_to_pfalse ();
>> @@ -884,6 +887,20 @@ find_type_suffix (type_class_index tclass, unsigned int element_bits)
>>   gcc_unreachable ();
>> }
>> 
>> +/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
>> +inline type_suffix_index
>> +signed_type_suffix_index (unsigned int element_bits)
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return TYPE_SUFFIX_s8;
>> +  case 16: return TYPE_SUFFIX_s16;
>> +  case 32: return TYPE_SUFFIX_s32;
>> +  case 64: return TYPE_SUFFIX_s64;
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
> 
> We could drop this and instead replace calls with:
> 
>  find_type_suffix (TYPE_signed, element_bits)
Done.
> 
> 
>> /* Return the single field in tuple type TYPE.  */
>> inline tree
>> tuple_type_field (tree type)
>> @@ -1049,6 +1066,20 @@ function_instance::tuple_type (unsigned int i) const
>>   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
>> }
>> 
>> +/* Return the signed vector type of width ELEMENT_BITS.  */
>> +inline tree
>> +function_instance::signed_type (unsigned int element_bits) const
>> +{
>> +  switch (element_bits)
>> +  {
>> +  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
>> +  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
>> +  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
>> +  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
>> +  }
>> +  gcc_unreachable ();
>> +}
>> +
> 
> And for this, I think we should instead make:
> 
> /* Return the vector type associated with TYPE.  */
> static tree
> get_vector_type (sve_type type)
> {
>  auto vector_type = type_suffixes[type.type].vector_type;
>  return acle_vector_types[type.num_vectors - 1][vector_type];
> }
> 
> public, or perhaps just define it inline in aarch64-sve-builtins.h.
Done. Thanks for pointing out the existing functions that can be used here. As I was writing these two helper function it felt like there should be something related already, I just did not find it. This is definitely better!

Below is the updated patch, though I will wait with pushing it until January.
Thanks,
Jennifer

As follow-up to
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
this patch implements folding of svmul by -1 to svneg for
unsigned SVE vector types. The key idea is to reuse the existing code that
does this fold for signed types and feed it as callback to a helper function
that adds the necessary type conversions.

For example, for the test case
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  return svmul_n_u64_x (pg, x, -1);
}

the following gimple sequence is emitted (-O2 -mcpu=grace):
svuint64_t foo (svuint64_t x, svbool_t pg)
{
  svint64_t D.12921;
  svint64_t D.12920;
  svuint64_t D.12919;

  D.12920 = VIEW_CONVERT_EXPR<svint64_t>(x);
  D.12921 = svneg_s64_x (pg, D.12920);
  D.12919 = VIEW_CONVERT_EXPR<svuint64_t>(D.12921);
  goto <D.12922>;
  <D.12922>:
  return D.12919;
}

In general, the new helper gimple_folder::convert_and_fold
- takes a target type and a function pointer,
- converts the lhs and all non-boolean vector types to the target type,
- passes the converted lhs and arguments to the callback,
- receives the new gimple statement from the callback function,
- adds the necessary view converts to the gimple sequence,
- and returns the new call.

Because all arguments are converted to the same target types, the helper
function is only suitable for folding calls whose arguments are all of
the same type. If necessary, this could be extended to convert the
arguments to different types differentially.

The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/ChangeLog:

	* config/aarch64/aarch64-sve-builtins-base.cc
	(svmul_impl::fold): Wrap code for folding to svneg in lambda
	function and pass to gimple_folder::convert_and_fold to enable
	the transform for unsigned types.
	* config/aarch64/aarch64-sve-builtins.cc
	(gimple_folder::convert_and_fold): New function that converts
	operands to target type before calling callback function, adding the
	necessary conversion statements.
	(gimple_folder::redirect_call): Set fntype of redirected call.
	(get_vector_type): Move from here to aarch64-sve-builtins.h.
	* config/aarch64/aarch64-sve-builtins.h
	(gimple_folder::convert_and_fold): Declare function.
	(get_vector_type): Move here as inline function.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
	* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
	expected outcome.
---
 .../aarch64/aarch64-sve-builtins-base.cc      | 56 ++++++++++++-------
 gcc/config/aarch64/aarch64-sve-builtins.cc    | 49 +++++++++++++---
 gcc/config/aarch64/aarch64-sve-builtins.h     | 10 ++++
 .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
 .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++++++++-
 .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +--
 7 files changed, 116 insertions(+), 42 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 927c5bbae21..f955b89ccf0 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -2305,33 +2305,47 @@ public:
       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
     /* If one of the operands is all integer -1, fold to svneg.  */
-    tree pg = gimple_call_arg (f.call, 0);
-    tree negated_op = NULL;
-    if (integer_minus_onep (op2))
-      negated_op = op1;
-    else if (integer_minus_onep (op1))
-      negated_op = op2;
-    if (!f.type_suffix (0).unsigned_p && negated_op)
+    if (integer_minus_onep (op1) || integer_minus_onep (op2))
       {
-	function_instance instance ("svneg", functions::svneg, shapes::unary,
-				    MODE_none, f.type_suffix_ids, GROUP_none,
-				    f.pred, FPM_unused);
-	gcall *call = f.redirect_call (instance);
-	unsigned offset_index = 0;
-	if (f.pred == PRED_m)
+	auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
+			    vec<tree> &args_conv) -> gimple *
 	  {
-	    offset_index = 1;
-	    gimple_call_set_arg (call, 0, op1);
-	  }
-	else
-	  gimple_set_num_ops (call, 5);
-	gimple_call_set_arg (call, offset_index, pg);
-	gimple_call_set_arg (call, offset_index + 1, negated_op);
-	return call;
+	    gcc_assert (lhs_conv && args_conv.length () == 3);
+	    tree pg = args_conv[0];
+	    tree op1 = args_conv[1];
+	    tree op2 = args_conv[2];
+	    tree negated_op = op1;
+	    if (integer_minus_onep (op1))
+	      negated_op = op2;
+	    type_suffix_pair signed_tsp =
+	      {find_type_suffix (TYPE_signed, f.type_suffix (0).element_bits),
+		f.type_suffix_ids[1]};
+	    function_instance instance ("svneg", functions::svneg,
+					shapes::unary, MODE_none, signed_tsp,
+					GROUP_none, f.pred, FPM_unused);
+	    gcall *call = f.redirect_call (instance);
+	    gimple_call_set_lhs (call, lhs_conv);
+	    unsigned offset = 0;
+	    if (f.pred == PRED_m)
+	      {
+		offset = 1;
+		gimple_call_set_arg (call, 0, op1);
+	      }
+	    else
+	      gimple_set_num_ops (call, 5);
+	    gimple_call_set_arg (call, offset, pg);
+	    gimple_call_set_arg (call, offset + 1, negated_op);
+	    return call;
+	  };
+	tree ty =
+	  get_vector_type (find_type_suffix (TYPE_signed,
+					     f.type_suffix (0).element_bits));
+	return f.convert_and_fold (ty, mul_by_m1);
       }
 
     /* If one of the operands is a uniform power of 2, fold to a left shift
        by immediate.  */
+    tree pg = gimple_call_arg (f.call, 0);
     tree op1_cst = uniform_integer_cst_p (op1);
     tree op2_cst = uniform_integer_cst_p (op2);
     tree shift_op1, shift_op2 = NULL;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 5acc56f99c6..886ba3890d8 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -1127,14 +1127,6 @@ num_vectors_to_group (unsigned int nvectors)
   gcc_unreachable ();
 }
 
-/* Return the vector type associated with TYPE.  */
-static tree
-get_vector_type (sve_type type)
-{
-  auto vector_type = type_suffixes[type.type].vector_type;
-  return acle_vector_types[type.num_vectors - 1][vector_type];
-}
-
 /* If FNDECL is an SVE builtin, return its function instance, otherwise
    return null.  */
 const function_instance *
@@ -3598,6 +3590,7 @@ gimple_folder::redirect_call (const function_instance &instance)
     return NULL;
 
   gimple_call_set_fndecl (call, rfn->decl);
+  gimple_call_set_fntype (call, TREE_TYPE (rfn->decl));
   return call;
 }
 
@@ -3672,6 +3665,46 @@ gimple_folder::fold_pfalse ()
   return nullptr;
 }
 
+/* Convert the lhs and all non-boolean vector-type operands to TYPE.
+   Pass the converted variables to the callback FP, and finally convert the
+   result back to the original type. Add the necessary conversion statements.
+   Return the new call.  */
+gimple *
+gimple_folder::convert_and_fold (tree type,
+				 gimple *(*fp) (gimple_folder &,
+						tree, vec<tree> &))
+{
+  gcc_assert (VECTOR_TYPE_P (type)
+	      && TYPE_MODE (type) != VNx16BImode);
+  tree old_ty = TREE_TYPE (lhs);
+  gimple_seq stmts = NULL;
+  bool convert_lhs_p = !useless_type_conversion_p (type, old_ty);
+  tree lhs_conv = convert_lhs_p ? create_tmp_var (type) : lhs;
+  unsigned int num_args = gimple_call_num_args (call);
+  auto_vec<tree, 16> args_conv;
+  args_conv.safe_grow (num_args);
+  for (unsigned int i = 0; i < num_args; ++i)
+    {
+      tree op = gimple_call_arg (call, i);
+      tree op_ty = TREE_TYPE (op);
+      args_conv[i] =
+	(VECTOR_TYPE_P (op_ty)
+	 && TYPE_MODE (op_ty) != VNx16BImode
+	 && !useless_type_conversion_p (op_ty, type))
+	? gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op) : op;
+    }
+
+  gimple *new_stmt = fp (*this, lhs_conv, args_conv);
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+  if (convert_lhs_p)
+    {
+      tree t = build1 (VIEW_CONVERT_EXPR, old_ty, lhs_conv);
+      gimple *g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t);
+      gsi_insert_after (gsi, g, GSI_SAME_STMT);
+    }
+  return new_stmt;
+}
+
 /* Fold the call to constant VAL.  */
 gimple *
 gimple_folder::fold_to_cstu (poly_uint64 val)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index 6f22868f9b3..e411a085c9b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -649,6 +649,8 @@ public:
   gcall *redirect_call (const function_instance &);
   gimple *redirect_pred_x ();
   gimple *fold_pfalse ();
+  gimple *convert_and_fold (tree, gimple *(*) (gimple_folder &,
+					       tree, vec<tree> &));
 
   gimple *fold_to_cstu (poly_uint64);
   gimple *fold_to_pfalse ();
@@ -894,6 +896,14 @@ tuple_type_field (tree type)
   gcc_unreachable ();
 }
 
+/* Return the vector type associated with TYPE.  */
+inline tree
+get_vector_type (sve_type type)
+{
+  auto vector_type = type_suffixes[type.type].vector_type;
+  return acle_vector_types[type.num_vectors - 1][vector_type];
+}
+
 inline function_instance::
 function_instance (const char *base_name_in, const function_base *base_in,
 		   const function_shape *shape_in,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index bdf6fcb98d6..e228dc5995d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
 
 /*
 ** mul_m1_u16_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.h, p0/m, z0\.h, \1\.h
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
 
 /*
 ** mul_m1_u16_x:
-**	mul	z0\.h, z0\.h, #-1
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
index a61e85fa12d..e8f52c9d785 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
 
 /*
 ** mul_m1_u32_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.s, p0/m, z0\.s, \1\.s
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
@@ -569,7 +568,7 @@ TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
 
 /*
 ** mul_m1_u32_x:
-**	mul	z0\.s, z0\.s, #-1
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
index eee1f8a0c99..2ccdc3642c5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
@@ -183,14 +183,25 @@ TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
 
 /*
 ** mul_m1_u64_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.d, p0/m, z0\.d, \1\.d
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
 		z0 = svmul_n_u64_m (p0, z0, -1),
 		z0 = svmul_m (p0, z0, -1))
 
+/*
+** mul_m1r_u64_m:
+**	mov	(z[0-9]+)\.b, #-1
+**	mov	(z[0-9]+\.d), z0\.d
+**	movprfx	z0, \1
+**	neg	z0\.d, p0/m, \2
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
+		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
+		z0 = svmul_m (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_u64_z_tied1:
 **	movprfx	z0\.d, p0/z, z0\.d
@@ -597,13 +608,22 @@ TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
 
 /*
 ** mul_m1_u64_x:
-**	mul	z0\.d, z0\.d, #-1
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
 		z0 = svmul_n_u64_x (p0, z0, -1),
 		z0 = svmul_x (p0, z0, -1))
 
+/*
+** mul_m1r_u64_x:
+**	neg	z0\.d, p0/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
+		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
+		z0 = svmul_x (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_m127_u64_x:
 **	mul	z0\.d, z0\.d, #-127
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index 06ee1b3e7c8..8e53a4821f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -174,8 +174,7 @@ TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
 
 /*
 ** mul_m1_u8_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.b, p0/m, z0\.b, \1\.b
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
@@ -559,7 +558,7 @@ TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
 
 /*
 ** mul_255_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
@@ -568,7 +567,7 @@ TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
 
 /*
 ** mul_m1_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 1c9f515a52c..6df14a8f4c4 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -769,24 +769,33 @@  public:
       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
     /* If the divisor is all integer -1, fold to svneg.  */
-    tree pg = gimple_call_arg (f.call, 0);
-    if (!f.type_suffix (0).unsigned_p && integer_minus_onep (op2))
+    if (integer_minus_onep (op2))
       {
-	function_instance instance ("svneg", functions::svneg,
-				    shapes::unary, MODE_none,
-				    f.type_suffix_ids, GROUP_none, f.pred);
-	gcall *call = f.redirect_call (instance);
-	unsigned offset_index = 0;
-	if (f.pred == PRED_m)
+	auto div_by_m1 = [](gimple_folder &f) -> gcall *
 	  {
-	    offset_index = 1;
-	    gimple_call_set_arg (call, 0, op1);
-	  }
-	else
-	  gimple_set_num_ops (call, 5);
-	gimple_call_set_arg (call, offset_index, pg);
-	gimple_call_set_arg (call, offset_index + 1, op1);
-	return call;
+	    tree pg = gimple_call_arg (f.call, 0);
+	    tree op1 = gimple_call_arg (f.call, 1);
+	    type_suffix_pair signed_tsp =
+	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
+		f.type_suffix_ids[1]};
+	    function_instance instance ("svneg", functions::svneg,
+					shapes::unary, MODE_none,
+					signed_tsp, GROUP_none, f.pred);
+	    gcall *call = f.redirect_call (instance);
+	    unsigned offset = 0;
+	    if (f.pred == PRED_m)
+	      {
+		offset = 1;
+		gimple_call_set_arg (call, 0, op1);
+	      }
+	    else
+	      gimple_set_num_ops (call, 5);
+	    gimple_call_set_arg (call, offset, pg);
+	    gimple_call_set_arg (call, offset + 1, op1);
+	    return call;
+	  };
+	tree ty = f.signed_type (f.type_suffix (0).element_bits);
+	return f.convert_and_fold (ty, div_by_m1);
       }
 
     /* If the divisor is a uniform power of 2, fold to a shift
@@ -2082,33 +2091,49 @@  public:
       return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
 
     /* If one of the operands is all integer -1, fold to svneg.  */
-    tree pg = gimple_call_arg (f.call, 0);
-    tree negated_op = NULL;
-    if (integer_minus_onep (op2))
-      negated_op = op1;
-    else if (integer_minus_onep (op1))
-      negated_op = op2;
-    if (!f.type_suffix (0).unsigned_p && negated_op)
+     if (integer_minus_onep (op1) || integer_minus_onep (op2))
       {
-	function_instance instance ("svneg", functions::svneg,
-				    shapes::unary, MODE_none,
-				    f.type_suffix_ids, GROUP_none, f.pred);
-	gcall *call = f.redirect_call (instance);
-	unsigned offset_index = 0;
-	if (f.pred == PRED_m)
+	auto mul_by_m1 = [](gimple_folder &f) -> gcall *
 	  {
-	    offset_index = 1;
-	    gimple_call_set_arg (call, 0, op1);
-	  }
-	else
-	  gimple_set_num_ops (call, 5);
-	gimple_call_set_arg (call, offset_index, pg);
-	gimple_call_set_arg (call, offset_index + 1, negated_op);
-	return call;
+	    tree pg = gimple_call_arg (f.call, 0);
+	    tree op1 = gimple_call_arg (f.call, 1);
+	    tree op2 = gimple_call_arg (f.call, 2);
+	    tree negated_op = op1;
+	    bool negate_op1 = true;
+	    if (integer_minus_onep (op1))
+	      {
+		negated_op = op2;
+		negate_op1 = false;
+	      }
+	    type_suffix_pair signed_tsp =
+	      {signed_type_suffix_index (f.type_suffix (0).element_bits),
+		f.type_suffix_ids[1]};
+	    function_instance instance ("svneg", functions::svneg,
+					shapes::unary, MODE_none,
+					signed_tsp, GROUP_none, f.pred);
+	    gcall *call = f.redirect_call (instance);
+	    unsigned offset = 0;
+	    if (f.pred == PRED_m)
+	      {
+		offset = 1;
+		tree ty = f.signed_type (f.type_suffix (0).element_bits);
+		tree inactive = negate_op1 ? gimple_call_arg (f.call, 1)
+		  : build_minus_one_cst (ty);
+		gimple_call_set_arg (call, 0, inactive);
+	      }
+	    else
+	      gimple_set_num_ops (call, 5);
+	    gimple_call_set_arg (call, offset, pg);
+	    gimple_call_set_arg (call, offset + 1, negated_op);
+	    return call;
+	  };
+	tree ty = f.signed_type (f.type_suffix (0).element_bits);
+	return f.convert_and_fold (ty, mul_by_m1);
       }
 
     /* If one of the operands is a uniform power of 2, fold to a left shift
        by immediate.  */
+    tree pg = gimple_call_arg (f.call, 0);
     tree op1_cst = uniform_integer_cst_p (op1);
     tree op2_cst = uniform_integer_cst_p (op2);
     tree shift_op1, shift_op2 = NULL;
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 44b7f6edae5..6523a782dac 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3576,6 +3576,46 @@  gimple_folder::redirect_pred_x ()
   return redirect_call (instance);
 }
 
+/* Convert all non-boolean vector-type operands to TYPE, fold the call
+   using callback F, and convert the result back to the original type.
+   Add the necessary conversion statements. Return the updated call.  */
+gimple *
+gimple_folder::convert_and_fold (tree type, gcall *(*fp) (gimple_folder &))
+{
+  gcc_assert (VECTOR_TYPE_P (type)
+	      && TYPE_MODE (type) != VNx16BImode);
+  tree old_type = TREE_TYPE (lhs);
+  if (useless_type_conversion_p (type, old_type))
+    return fp (*this);
+
+  unsigned int num_args = gimple_call_num_args (call);
+  gimple_seq stmts = NULL;
+  tree op, op_type, t1, t2, t3;
+  gimple *g;
+  gcall *new_call;
+  for (unsigned int i = 0; i < num_args; ++i)
+    {
+      op = gimple_call_arg (call, i);
+      op_type = TREE_TYPE (op);
+      if (VECTOR_TYPE_P (op_type)
+	  && TREE_CODE (op) != VECTOR_CST
+	  && TYPE_MODE (op_type) != VNx16BImode)
+	{
+	  t1 = gimple_build (&stmts, VIEW_CONVERT_EXPR, type, op);
+	  gimple_call_set_arg (call, i, t1);
+	}
+    }
+
+  new_call = fp (*this);
+  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+  t2 = create_tmp_var (gimple_call_return_type (new_call));
+  gimple_call_set_lhs (new_call, t2);
+  t3 = build1 (VIEW_CONVERT_EXPR, old_type, t2);
+  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR, t3);
+  gsi_insert_after (gsi, g, GSI_SAME_STMT);
+  return new_call;
+}
+
 /* Fold the call to constant VAL.  */
 gimple *
 gimple_folder::fold_to_cstu (poly_uint64 val)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h
index d5cc6e0a40d..b48fed0df6b 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -406,6 +406,7 @@  public:
   tree scalar_type (unsigned int) const;
   tree vector_type (unsigned int) const;
   tree tuple_type (unsigned int) const;
+  tree signed_type (unsigned int) const;
   unsigned int elements_per_vq (unsigned int) const;
   machine_mode vector_mode (unsigned int) const;
   machine_mode tuple_mode (unsigned int) const;
@@ -630,6 +631,7 @@  public:
 
   gcall *redirect_call (const function_instance &);
   gimple *redirect_pred_x ();
+  gimple *convert_and_fold (tree, gcall *(*) (gimple_folder &));
 
   gimple *fold_to_cstu (poly_uint64);
   gimple *fold_to_pfalse ();
@@ -860,6 +862,20 @@  find_type_suffix (type_class_index tclass, unsigned int element_bits)
   gcc_unreachable ();
 }
 
+/* Return the type suffix of the signed type of width ELEMENT_BITS.  */
+inline type_suffix_index
+signed_type_suffix_index (unsigned int element_bits)
+{
+  switch (element_bits)
+  {
+  case 8: return TYPE_SUFFIX_s8;
+  case 16: return TYPE_SUFFIX_s16;
+  case 32: return TYPE_SUFFIX_s32;
+  case 64: return TYPE_SUFFIX_s64;
+  }
+  gcc_unreachable ();
+}
+
 /* Return the single field in tuple type TYPE.  */
 inline tree
 tuple_type_field (tree type)
@@ -1025,6 +1041,20 @@  function_instance::tuple_type (unsigned int i) const
   return acle_vector_types[num_vectors - 1][type_suffix (i).vector_type];
 }
 
+/* Return the signed vector type of width ELEMENT_BITS.  */
+inline tree
+function_instance::signed_type (unsigned int element_bits) const
+{
+  switch (element_bits)
+  {
+  case 8: return acle_vector_types[0][VECTOR_TYPE_svint8_t];
+  case 16: return acle_vector_types[0][VECTOR_TYPE_svint16_t];
+  case 32: return acle_vector_types[0][VECTOR_TYPE_svint32_t];
+  case 64: return acle_vector_types[0][VECTOR_TYPE_svint64_t];
+  }
+  gcc_unreachable ();
+}
+
 /* Return the number of elements of type suffix I that fit within a
    128-bit block.  */
 inline unsigned int
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
index 1e8d6104845..fcadc05d75b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u32.c
@@ -72,6 +72,15 @@  TEST_UNIFORM_Z (div_1_u32_m_untied, svuint32_t,
 		z0 = svdiv_n_u32_m (p0, z1, 1),
 		z0 = svdiv_m (p0, z1, 1))
 
+/*
+** div_m1_u32_m_tied1:
+**	neg	z0\.s, p0/m, z0\.s
+**	ret
+*/
+TEST_UNIFORM_Z (div_m1_u32_m_tied1, svuint32_t,
+		z0 = svdiv_n_u32_m (p0, z0, -1),
+		z0 = svdiv_m (p0, z0, -1))
+
 /*
 ** div_2_u32_m_tied1:
 **	lsr	z0\.s, p0/m, z0\.s, #1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
index ab049affb63..b95d5af13d8 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/div_u64.c
@@ -72,6 +72,15 @@  TEST_UNIFORM_Z (div_1_u64_m_untied, svuint64_t,
 		z0 = svdiv_n_u64_m (p0, z1, 1),
 		z0 = svdiv_m (p0, z1, 1))
 
+/*
+** div_m1_u64_m_tied1:
+**	neg	z0\.d, p0/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_Z (div_m1_u64_m_tied1, svuint64_t,
+		z0 = svdiv_n_u64_m (p0, z0, -1),
+		z0 = svdiv_m (p0, z0, -1))
+
 /*
 ** div_2_u64_m_tied1:
 **	lsr	z0\.d, p0/m, z0\.d, #1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
index bdf6fcb98d6..e228dc5995d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u16.c
@@ -174,8 +174,7 @@  TEST_UNIFORM_Z (mul_3_u16_m_untied, svuint16_t,
 
 /*
 ** mul_m1_u16_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.h, p0/m, z0\.h, \1\.h
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_m, svuint16_t,
@@ -569,7 +568,7 @@  TEST_UNIFORM_Z (mul_255_u16_x, svuint16_t,
 
 /*
 ** mul_m1_u16_x:
-**	mul	z0\.h, z0\.h, #-1
+**	neg	z0\.h, p0/m, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u16_x, svuint16_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
index a61e85fa12d..e8f52c9d785 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u32.c
@@ -174,8 +174,7 @@  TEST_UNIFORM_Z (mul_3_u32_m_untied, svuint32_t,
 
 /*
 ** mul_m1_u32_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.s, p0/m, z0\.s, \1\.s
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_m, svuint32_t,
@@ -569,7 +568,7 @@  TEST_UNIFORM_Z (mul_255_u32_x, svuint32_t,
 
 /*
 ** mul_m1_u32_x:
-**	mul	z0\.s, z0\.s, #-1
+**	neg	z0\.s, p0/m, z0\.s
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u32_x, svuint32_t,
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
index eee1f8a0c99..2ccdc3642c5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u64.c
@@ -183,14 +183,25 @@  TEST_UNIFORM_Z (mul_3_u64_m_untied, svuint64_t,
 
 /*
 ** mul_m1_u64_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.d, p0/m, z0\.d, \1\.d
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_m, svuint64_t,
 		z0 = svmul_n_u64_m (p0, z0, -1),
 		z0 = svmul_m (p0, z0, -1))
 
+/*
+** mul_m1r_u64_m:
+**	mov	(z[0-9]+)\.b, #-1
+**	mov	(z[0-9]+\.d), z0\.d
+**	movprfx	z0, \1
+**	neg	z0\.d, p0/m, \2
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_m, svuint64_t,
+		z0 = svmul_u64_m (p0, svdup_u64 (-1), z0),
+		z0 = svmul_m (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_u64_z_tied1:
 **	movprfx	z0\.d, p0/z, z0\.d
@@ -597,13 +608,22 @@  TEST_UNIFORM_Z (mul_255_u64_x, svuint64_t,
 
 /*
 ** mul_m1_u64_x:
-**	mul	z0\.d, z0\.d, #-1
+**	neg	z0\.d, p0/m, z0\.d
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u64_x, svuint64_t,
 		z0 = svmul_n_u64_x (p0, z0, -1),
 		z0 = svmul_x (p0, z0, -1))
 
+/*
+** mul_m1r_u64_x:
+**	neg	z0\.d, p0/m, z0\.d
+**	ret
+*/
+TEST_UNIFORM_Z (mul_m1r_u64_x, svuint64_t,
+		z0 = svmul_u64_x (p0, svdup_u64 (-1), z0),
+		z0 = svmul_x (p0, svdup_u64 (-1), z0))
+
 /*
 ** mul_m127_u64_x:
 **	mul	z0\.d, z0\.d, #-127
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
index 06ee1b3e7c8..8e53a4821f0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/mul_u8.c
@@ -174,8 +174,7 @@  TEST_UNIFORM_Z (mul_3_u8_m_untied, svuint8_t,
 
 /*
 ** mul_m1_u8_m:
-**	mov	(z[0-9]+)\.b, #-1
-**	mul	z0\.b, p0/m, z0\.b, \1\.b
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_m, svuint8_t,
@@ -559,7 +558,7 @@  TEST_UNIFORM_Z (mul_128_u8_x, svuint8_t,
 
 /*
 ** mul_255_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
@@ -568,7 +567,7 @@  TEST_UNIFORM_Z (mul_255_u8_x, svuint8_t,
 
 /*
 ** mul_m1_u8_x:
-**	mul	z0\.b, z0\.b, #-1
+**	neg	z0\.b, p0/m, z0\.b
 **	ret
 */
 TEST_UNIFORM_Z (mul_m1_u8_x, svuint8_t,