diff mbox series

[v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

Message ID 20240710092753.3810342-1-pan2.li@intel.com
State New
Headers show
Series [v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call | expand

Commit Message

Li, Pan2 July 10, 2024, 9:27 a.m. UTC
From: Pan Li <pan2.li@intel.com>

The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
For example _1 = .SAT_ADD (_2, 9) comes from below sample code.

Form 3:
  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)

It will failure to vectorize as the vectorizable_call will check the
operands is type_compatiable but the imm will be treated as unsigned
SImode from the perspective of tree.  Aka

uint64_t _1;
uint64_t _2;

_1 = .SAT_ADD (_2, 9);

The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
SImode,  and then result in vectorizable_call fails.  This patch would
like to promote the imm operand to the operand type mode of _2 if and
only if there is no precision/data loss.  Aka convert the imm 9 to the
DImode for above example.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.

gcc/ChangeLog:

	* tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
	new func impl to promote the imm tree to target type.
	(vect_recog_sat_add_pattern): Peform the type promotion before
	generate .SAT_ADD call.

Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Richard Biener July 10, 2024, 11:35 a.m. UTC | #1
On Wed, Jul 10, 2024 at 11:28 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
>   T __attribute__((noinline))                                          \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {                                                                    \
>     unsigned i;                                                        \
>     T ret;                                                             \
>     for (i = 0; i < limit; i++)                                        \
>       {                                                                \
>         out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>       }                                                                \
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
>         * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
>         new func impl to promote the imm tree to target type.
>         (vect_recog_sat_add_pattern): Peform the type promotion before
>         generate .SAT_ADD call.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>    return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +    return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +    *op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
>
>    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>      {
> +      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>        gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>                                                           IFN_SAT_ADD, type_out,
>                                                           lhs, ops[0], ops[1]);
> --
> 2.34.1
>
Li, Pan2 July 10, 2024, 11:04 p.m. UTC | #2
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

Thanks Richard. Not very sure which part result in type mismatch, will take a look and keep you posted.

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
>   T __attribute__((noinline))                                          \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {                                                                    \
>     unsigned i;                                                        \
>     T ret;                                                             \
>     for (i = 0; i < limit; i++)                                        \
>       {                                                                \
>         out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>       }                                                                \
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
>         * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
>         new func impl to promote the imm tree to target type.
>         (vect_recog_sat_add_pattern): Peform the type promotion before
>         generate .SAT_ADD call.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>    return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +    return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +    *op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
>
>    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>      {
> +      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>        gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>                                                           IFN_SAT_ADD, type_out,
>                                                           lhs, ops[0], ops[1]);
> --
> 2.34.1
>
Li, Pan2 July 16, 2024, 1:21 p.m. UTC | #3
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?

I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree type. It looks like that
the 2 operands of .ADD_OVERFLOW has different tree types when one of them is constant.
One is unsigned DI, and the other is int.

(gdb) call debug_gimple_stmt(stmt)
_14 = .ADD_OVERFLOW (_4, 129);
(gdb) call debug_tree (gimple_call_arg(stmt, 0))
 <ssa_name 0x7ffff6a0ddc8
    type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
        size <integer_cst 0x7ffff6a391b0 constant 64>
        unit-size <integer_cst 0x7ffff6a391c8 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
        pointer_to_this <pointer_type 0x7ffff6a522a0>>
    visited
    def_stmt _4 = *_3;
    version:4>
(gdb) call debug_tree (gimple_call_arg(stmt, 1))
 <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>
(gdb)

Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has different tree types.
As my understanding, here we should have unsigned DI for constant operands

(gdb) layout src
(gdb) list
506                                                                         if (gimple_call_num_args (_c4) == 2)
507                                                                           {
508                                                                             tree _q40 = gimple_call_arg (_c4, 0);
509                                                                             _q40 = do_valueize (valueize, _q40);
510                                                                             tree _q41 = gimple_call_arg (_c4, 1);
511                                                                             _q41 = do_valueize (valueize, _q41);
512                                                                             if (integer_zerop (_q21))
513                                                                               {
514                                                                                 if (integer_minus_onep (_p1))
515                                                                                   {
(gdb) call debug_tree (_q40)
 <ssa_name 0x7ffff6a0ddc8
    type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
        size <integer_cst 0x7ffff6a391b0 constant 64>
        unit-size <integer_cst 0x7ffff6a391c8 constant 8>
        align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
        pointer_to_this <pointer_type 0x7ffff6a522a0>>
    visited
    def_stmt _4 = *_3;
    version:4>
(gdb) call debug_tree (_q41)
 <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>

Pan

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Wednesday, July 10, 2024 7:36 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

On Wed, Jul 10, 2024 at 11:28 AM <pan2.li@intel.com> wrote:
>
> From: Pan Li <pan2.li@intel.com>
>
> The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
>
> Form 3:
>   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
>   T __attribute__((noinline))                                          \
>   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
>   {                                                                    \
>     unsigned i;                                                        \
>     T ret;                                                             \
>     for (i = 0; i < limit; i++)                                        \
>       {                                                                \
>         out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
>       }                                                                \
>   }
>
> DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
>
> It will failure to vectorize as the vectorizable_call will check the
> operands is type_compatiable but the imm will be treated as unsigned
> SImode from the perspective of tree.

I think that's a bug.  Do you say __builtin_add_overflow fails to promote
(constant) arguments?

>  Aka
>
> uint64_t _1;
> uint64_t _2;
>
> _1 = .SAT_ADD (_2, 9);
>
> The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> SImode,  and then result in vectorizable_call fails.  This patch would
> like to promote the imm operand to the operand type mode of _2 if and
> only if there is no precision/data loss.  Aka convert the imm 9 to the
> DImode for above example.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The rv64gcv build with glibc.
> 3. The x86 bootstrap tests.
> 4. The x86 fully regression tests.
>
> gcc/ChangeLog:
>
>         * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
>         new func impl to promote the imm tree to target type.
>         (vect_recog_sat_add_pattern): Peform the type promotion before
>         generate .SAT_ADD call.
>
> Signed-off-by: Pan Li <pan2.li@intel.com>
> ---
>  gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 86e893a1c43..e1013222b12 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
>    return NULL;
>  }
>
> +static void
> +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> +{
> +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> +    return;
> +
> +  unsigned precision = TYPE_PRECISION (type);
> +  wide_int type_max = wi::mask (precision, false, precision);
> +  wide_int op_cst_val = wi::to_wide (*op, precision);
> +
> +  if (wi::leu_p (op_cst_val, type_max))
> +    *op = wide_int_to_tree (type, op_cst_val);
> +}
> +
>  /*
>   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
>   *   _7 = _4 + _6;
> @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
>
>    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
>      {
> +      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> +      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> +
>        gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
>                                                           IFN_SAT_ADD, type_out,
>                                                           lhs, ops[0], ops[1]);
> --
> 2.34.1
>
Richard Biener July 16, 2024, 3:29 p.m. UTC | #4
On Tue, Jul 16, 2024 at 3:22 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> > (constant) arguments?
>
> I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree type. It looks like that
> the 2 operands of .ADD_OVERFLOW has different tree types when one of them is constant.
> One is unsigned DI, and the other is int.

I think that's a bug (and a downside of internal-functions as they
have no prototype the type
verifier could work with).

That you see them in 022t.ssa means that either the frontend
mis-handles the builtin call parsing
or fold_builtin_arith_overflow which is responsible for the rewriting
to an internal function is
wrong.

I've CCed Jakub who added those.

I think we could add verification for internal functions in the set of
commutative_binary_fn_p, commutative_ternary_fn_p, associative_binary_fn_p
and possibly others where we can constrain argument and result types.

Richard.

> (gdb) call debug_gimple_stmt(stmt)
> _14 = .ADD_OVERFLOW (_4, 129);
> (gdb) call debug_tree (gimple_call_arg(stmt, 0))
>  <ssa_name 0x7ffff6a0ddc8
>     type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
>         size <integer_cst 0x7ffff6a391b0 constant 64>
>         unit-size <integer_cst 0x7ffff6a391c8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
>         pointer_to_this <pointer_type 0x7ffff6a522a0>>
>     visited
>     def_stmt _4 = *_3;
>     version:4>
> (gdb) call debug_tree (gimple_call_arg(stmt, 1))
>  <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>
> (gdb)
>
> Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has different tree types.
> As my understanding, here we should have unsigned DI for constant operands
>
> (gdb) layout src
> (gdb) list
> 506                                                                         if (gimple_call_num_args (_c4) == 2)
> 507                                                                           {
> 508                                                                             tree _q40 = gimple_call_arg (_c4, 0);
> 509                                                                             _q40 = do_valueize (valueize, _q40);
> 510                                                                             tree _q41 = gimple_call_arg (_c4, 1);
> 511                                                                             _q41 = do_valueize (valueize, _q41);
> 512                                                                             if (integer_zerop (_q21))
> 513                                                                               {
> 514                                                                                 if (integer_minus_onep (_p1))
> 515                                                                                   {
> (gdb) call debug_tree (_q40)
>  <ssa_name 0x7ffff6a0ddc8
>     type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
>         size <integer_cst 0x7ffff6a391b0 constant 64>
>         unit-size <integer_cst 0x7ffff6a391c8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
>         pointer_to_this <pointer_type 0x7ffff6a522a0>>
>     visited
>     def_stmt _4 = *_3;
>     version:4>
> (gdb) call debug_tree (_q41)
>  <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, July 10, 2024 7:36 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
>
> On Wed, Jul 10, 2024 at 11:28 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> > For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
> >
> > Form 3:
> >   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
> >   T __attribute__((noinline))                                          \
> >   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> >   {                                                                    \
> >     unsigned i;                                                        \
> >     T ret;                                                             \
> >     for (i = 0; i < limit; i++)                                        \
> >       {                                                                \
> >         out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
> >       }                                                                \
> >   }
> >
> > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
> >
> > It will failure to vectorize as the vectorizable_call will check the
> > operands is type_compatiable but the imm will be treated as unsigned
> > SImode from the perspective of tree.
>
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?
>
> >  Aka
> >
> > uint64_t _1;
> > uint64_t _2;
> >
> > _1 = .SAT_ADD (_2, 9);
> >
> > The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> > SImode,  and then result in vectorizable_call fails.  This patch would
> > like to promote the imm operand to the operand type mode of _2 if and
> > only if there is no precision/data loss.  Aka convert the imm 9 to the
> > DImode for above example.
> >
> > The below test suites are passed for this patch:
> > 1. The rv64gcv fully regression tests.
> > 2. The rv64gcv build with glibc.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >         * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> >         new func impl to promote the imm tree to target type.
> >         (vect_recog_sat_add_pattern): Peform the type promotion before
> >         generate .SAT_ADD call.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 86e893a1c43..e1013222b12 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
> >    return NULL;
> >  }
> >
> > +static void
> > +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> > +{
> > +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> > +    return;
> > +
> > +  unsigned precision = TYPE_PRECISION (type);
> > +  wide_int type_max = wi::mask (precision, false, precision);
> > +  wide_int op_cst_val = wi::to_wide (*op, precision);
> > +
> > +  if (wi::leu_p (op_cst_val, type_max))
> > +    *op = wide_int_to_tree (type, op_cst_val);
> > +}
> > +
> >  /*
> >   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> >   *   _7 = _4 + _6;
> > @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> >
> >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> >      {
> > +      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> > +      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> > +
> >        gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
> >                                                           IFN_SAT_ADD, type_out,
> >                                                           lhs, ops[0], ops[1]);
> > --
> > 2.34.1
> >
Li, Pan2 Aug. 17, 2024, 5:03 a.m. UTC | #5
Thanks Richard for confirmation. Sorry almost forget this thread.

Hi Jakub,

Please feel free to let me know if there is anything I can do to fix this issue. Thanks a lot.

Pan


-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: Tuesday, July 16, 2024 11:29 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Jakub Jelinek <jakub@redhat.com>
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

On Tue, Jul 16, 2024 at 3:22 PM Li, Pan2 <pan2.li@intel.com> wrote:
>
> > I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> > (constant) arguments?
>
> I double checked the 022t.ssa pass for the __builtin_add_overflow operands tree type. It looks like that
> the 2 operands of .ADD_OVERFLOW has different tree types when one of them is constant.
> One is unsigned DI, and the other is int.

I think that's a bug (and a downside of internal-functions as they
have no prototype the type
verifier could work with).

That you see them in 022t.ssa means that either the frontend
mis-handles the builtin call parsing
or fold_builtin_arith_overflow which is responsible for the rewriting
to an internal function is
wrong.

I've CCed Jakub who added those.

I think we could add verification for internal functions in the set of
commutative_binary_fn_p, commutative_ternary_fn_p, associative_binary_fn_p
and possibly others where we can constrain argument and result types.

Richard.

> (gdb) call debug_gimple_stmt(stmt)
> _14 = .ADD_OVERFLOW (_4, 129);
> (gdb) call debug_tree (gimple_call_arg(stmt, 0))
>  <ssa_name 0x7ffff6a0ddc8
>     type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
>         size <integer_cst 0x7ffff6a391b0 constant 64>
>         unit-size <integer_cst 0x7ffff6a391c8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
>         pointer_to_this <pointer_type 0x7ffff6a522a0>>
>     visited
>     def_stmt _4 = *_3;
>     version:4>
> (gdb) call debug_tree (gimple_call_arg(stmt, 1))
>  <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>
> (gdb)
>
> Then we go to the vect pass, we can also see that the ops of .ADD_OVERFLOW has different tree types.
> As my understanding, here we should have unsigned DI for constant operands
>
> (gdb) layout src
> (gdb) list
> 506                                                                         if (gimple_call_num_args (_c4) == 2)
> 507                                                                           {
> 508                                                                             tree _q40 = gimple_call_arg (_c4, 0);
> 509                                                                             _q40 = do_valueize (valueize, _q40);
> 510                                                                             tree _q41 = gimple_call_arg (_c4, 1);
> 511                                                                             _q41 = do_valueize (valueize, _q41);
> 512                                                                             if (integer_zerop (_q21))
> 513                                                                               {
> 514                                                                                 if (integer_minus_onep (_p1))
> 515                                                                                   {
> (gdb) call debug_tree (_q40)
>  <ssa_name 0x7ffff6a0ddc8
>     type <integer_type 0x7ffff6a437e0 long unsigned int sizes-gimplified public unsigned DI
>         size <integer_cst 0x7ffff6a391b0 constant 64>
>         unit-size <integer_cst 0x7ffff6a391c8 constant 8>
>         align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff6a437e0 precision:64 min <integer_cst 0x7ffff6a39480 0> max <integer_cst 0x7ffff6a03660 18446744073709551615>
>         pointer_to_this <pointer_type 0x7ffff6a522a0>>
>     visited
>     def_stmt _4 = *_3;
>     version:4>
> (gdb) call debug_tree (_q41)
>  <integer_cst 0x7ffff6beac78 type <integer_type 0x7ffff6a435e8 int> constant 129>
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <richard.guenther@gmail.com>
> Sent: Wednesday, July 10, 2024 7:36 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call
>
> On Wed, Jul 10, 2024 at 11:28 AM <pan2.li@intel.com> wrote:
> >
> > From: Pan Li <pan2.li@intel.com>
> >
> > The .SAT_ADD has 2 operand and one of the operand may be INTEGER_CST.
> > For example _1 = .SAT_ADD (_2, 9) comes from below sample code.
> >
> > Form 3:
> >   #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
> >   T __attribute__((noinline))                                          \
> >   vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> >   {                                                                    \
> >     unsigned i;                                                        \
> >     T ret;                                                             \
> >     for (i = 0; i < limit; i++)                                        \
> >       {                                                                \
> >         out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
> >       }                                                                \
> >   }
> >
> > DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 9)
> >
> > It will failure to vectorize as the vectorizable_call will check the
> > operands is type_compatiable but the imm will be treated as unsigned
> > SImode from the perspective of tree.
>
> I think that's a bug.  Do you say __builtin_add_overflow fails to promote
> (constant) arguments?
>
> >  Aka
> >
> > uint64_t _1;
> > uint64_t _2;
> >
> > _1 = .SAT_ADD (_2, 9);
> >
> > The _1 and _2 are unsigned DImode, which is different to imm 9 unsigned
> > SImode,  and then result in vectorizable_call fails.  This patch would
> > like to promote the imm operand to the operand type mode of _2 if and
> > only if there is no precision/data loss.  Aka convert the imm 9 to the
> > DImode for above example.
> >
> > The below test suites are passed for this patch:
> > 1. The rv64gcv fully regression tests.
> > 2. The rv64gcv build with glibc.
> > 3. The x86 bootstrap tests.
> > 4. The x86 fully regression tests.
> >
> > gcc/ChangeLog:
> >
> >         * tree-vect-patterns.cc (vect_recog_promote_cst_to_unsigned): Add
> >         new func impl to promote the imm tree to target type.
> >         (vect_recog_sat_add_pattern): Peform the type promotion before
> >         generate .SAT_ADD call.
> >
> > Signed-off-by: Pan Li <pan2.li@intel.com>
> > ---
> >  gcc/tree-vect-patterns.cc | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 86e893a1c43..e1013222b12 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4527,6 +4527,20 @@ vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
> >    return NULL;
> >  }
> >
> > +static void
> > +vect_recog_promote_cst_to_unsigned (tree *op, tree type)
> > +{
> > +  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
> > +    return;
> > +
> > +  unsigned precision = TYPE_PRECISION (type);
> > +  wide_int type_max = wi::mask (precision, false, precision);
> > +  wide_int op_cst_val = wi::to_wide (*op, precision);
> > +
> > +  if (wi::leu_p (op_cst_val, type_max))
> > +    *op = wide_int_to_tree (type, op_cst_val);
> > +}
> > +
> >  /*
> >   * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
> >   *   _7 = _4 + _6;
> > @@ -4553,6 +4567,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> >
> >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> >      {
> > +      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
> > +      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
> > +
> >        gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
> >                                                           IFN_SAT_ADD, type_out,
> >                                                           lhs, ops[0], ops[1]);
> > --
> > 2.34.1
> >
Jakub Jelinek Aug. 17, 2024, 9:21 p.m. UTC | #6
On Sat, Aug 17, 2024 at 05:03:14AM +0000, Li, Pan2 wrote:
> Thanks Richard for confirmation. Sorry almost forget this thread.
> 
> Please feel free to let me know if there is anything I can do to fix this issue. Thanks a lot.

There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to
have the same type, as described in the
__builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can have
different type and result yet another one, the behavior is then (as if) to
perform the operation in infinite precision and if that result fits into
the result type, there is no overflow, otherwise there is.
So, there is no need to promote anything, promoted constants would have the
same value as the non-promoted ones and the value is all that matters for
constants.

	Jakub
Li, Pan2 Aug. 18, 2024, 2:05 a.m. UTC | #7
Thanks Jakub for explaining.

Hi Richard,

Does it mean we need to do some promotion similar as this patch to make the vectorizable_call happy
when there is a constant operand? I am not sure if there is a better approach for this case.

Pan 

-----Original Message-----
From: Jakub Jelinek <jakub@redhat.com> 
Sent: Sunday, August 18, 2024 5:21 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

On Sat, Aug 17, 2024 at 05:03:14AM +0000, Li, Pan2 wrote:
> Thanks Richard for confirmation. Sorry almost forget this thread.
> 
> Please feel free to let me know if there is anything I can do to fix this issue. Thanks a lot.

There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to
have the same type, as described in the
__builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can have
different type and result yet another one, the behavior is then (as if) to
perform the operation in infinite precision and if that result fits into
the result type, there is no overflow, otherwise there is.
So, there is no need to promote anything, promoted constants would have the
same value as the non-promoted ones and the value is all that matters for
constants.

	Jakub
Tamar Christina Aug. 19, 2024, 1:55 p.m. UTC | #8
Hi Pan,

> 
> Thanks Jakub for explaining.
> 
> Hi Richard,
> 
> Does it mean we need to do some promotion similar as this patch to make the
> vectorizable_call happy
> when there is a constant operand? I am not sure if there is a better approach for
> this case.

I'll leave it up to Richi, but if I understand correctly it looks like this is only a problem for constants because
unsigned_integer_sat_add in the constant case doesn't enforce the types be equal, most likely specifically
so we can do the type coercions later?

But the constant is always operand2 in that case and you check that the result
type matched the first operand.  So only operand2 can be in a different type.

So would this not be the simplest fix:

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 87b3dc413b8..fcbc83a49f0 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,

   if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
     {
+      if (TREE_CODE (ops[1]) == INTEGER_CST)
+       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
+
       gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
                                                          IFN_SAT_ADD, type_out,
                                                          lhs, ops[0], ops[1]);

ps. You have the same problem at every usage site of unsigned_integer_sat_add don't you?
So doesn't the scalar variant also introduce the mismatch? The one in tree-ssa-math-opts.cc?

Thanks,
Tamar

> 
> Pan
> 
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Sunday, August 18, 2024 5:21 AM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org;
> juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com;
> jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> On Sat, Aug 17, 2024 at 05:03:14AM +0000, Li, Pan2 wrote:
> > Thanks Richard for confirmation. Sorry almost forget this thread.
> >
> > Please feel free to let me know if there is anything I can do to fix this issue.
> Thanks a lot.
> 
> There is no bug.  The operands of .{ADD,SUB,MUL}_OVERFLOW don't have to
> have the same type, as described in the
> __builtin_{add,sub,mul}_overflow{,_p} documentation, each argument can have
> different type and result yet another one, the behavior is then (as if) to
> perform the operation in infinite precision and if that result fits into
> the result type, there is no overflow, otherwise there is.
> So, there is no need to promote anything, promoted constants would have the
> same value as the non-promoted ones and the value is all that matters for
> constants.
> 
> 	Jakub
Jakub Jelinek Aug. 19, 2024, 7:25 p.m. UTC | #9
On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote:
> So would this not be the simplest fix:
> 
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 87b3dc413b8..fcbc83a49f0 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> 
>    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))

But then you call gimple_unsigned_integer_sat_add with mismatching types,
not sure if that is ok.

>      {
> +      if (TREE_CODE (ops[1]) == INTEGER_CST)
> +       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> +

This would be only ok if the conversion doesn't change the value
of the constant.
.ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
want to change the latter to the former if the value has the most
significant bit set.
Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
arguments, you don't want to truncate the constant.
So, you could e.g. do the fold_convert and then verify if
wi::to_widest on the old and new tree are equal, or you could check for
TREE_OVERFLOW if fold_convert honors that.
As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
precision value (aka wi::to_widest) is all that matters.

	Jakub
Tamar Christina Aug. 19, 2024, 7:40 p.m. UTC | #10
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, August 19, 2024 8:25 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote:
> > So would this not be the simplest fix:
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 87b3dc413b8..fcbc83a49f0 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo,
> stmt_vec_info stmt_vinfo,
> >
> >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> 
> But then you call gimple_unsigned_integer_sat_add with mismatching types,
> not sure if that is ok.
> 

gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the expression
rooted in lhs and returns the results in ops.  So not sure what you mean here.

> >      {
> > +      if (TREE_CODE (ops[1]) == INTEGER_CST)
> > +       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> > +
> 
> This would be only ok if the conversion doesn't change the value
> of the constant.
> .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
> want to change the latter to the former if the value has the most
> significant bit set.
> Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
> arguments, you don't want to truncate the constant.

Yes, if the expression truncates or changes the sign of the expression this wouldn't
work.  But then you can also not match at all.  So the match should be rejected then
since the values need to fit in the same type as the argument and be the same sign.

So the original vect_recog_promote_cst_to_unsigned is also wrong since it doesn't
stop the match if it doesn't fit.

Imho, the pattern here cannot check this and it should be part of the match condition.
If the constant cannot fir into the same type as the operand or has a different sign
the matching should fail.

It's just that in match.pd you can't modify the arguments returned from a predicate
but since the predicate is intended to be used to rewrite to the IFN I still think the
above solution is right, and the range check should be done within the predicate.

Tamar.

> So, you could e.g. do the fold_convert and then verify if
> wi::to_widest on the old and new tree are equal, or you could check for
> TREE_OVERFLOW if fold_convert honors that.
> As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
> precision value (aka wi::to_widest) is all that matters.
> 
> 	Jakub
Li, Pan2 Aug. 20, 2024, 12:57 a.m. UTC | #11
Thanks Jakub and Tamar for comments and suggestions.

The match.pd list as below doesn't check the INT_CST type for .SAT_ADD.

(match (unsigned_integer_sat_add @0 @1)
(cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
  integer_minus_onep (realpart @2))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0))))

Thus the different types of .ADD_OVERFLOW could hit the pattern. The vectorizable_call strictly
check the operands are totally the same, while the scalar doesn't have similar check. That
is why I only found this issue from vector part.

It looks like we need to add the type check for INT_CST in match.pd predicate and then add explicit cast
to IMM from the source code to match the pattern. For example as below, not very sure it is reasonable or not.

#define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
T __attribute__((noinline))                                          \
vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
{                                                                    \
  unsigned i;                                                        \
  T ret;                                                             \
  for (i = 0; i < limit; i++)                                        \
    {                                                                \
      out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \   // need add (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 : ret; to hit the pattern.
    }                                                                \
}

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Tuesday, August 20, 2024 3:41 AM
To: Jakub Jelinek <jakub@redhat.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, August 19, 2024 8:25 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>
> Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote:
> > So would this not be the simplest fix:
> >
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > index 87b3dc413b8..fcbc83a49f0 100644
> > --- a/gcc/tree-vect-patterns.cc
> > +++ b/gcc/tree-vect-patterns.cc
> > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo,
> stmt_vec_info stmt_vinfo,
> >
> >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> 
> But then you call gimple_unsigned_integer_sat_add with mismatching types,
> not sure if that is ok.
> 

gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the expression
rooted in lhs and returns the results in ops.  So not sure what you mean here.

> >      {
> > +      if (TREE_CODE (ops[1]) == INTEGER_CST)
> > +       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> > +
> 
> This would be only ok if the conversion doesn't change the value
> of the constant.
> .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
> want to change the latter to the former if the value has the most
> significant bit set.
> Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
> arguments, you don't want to truncate the constant.

Yes, if the expression truncates or changes the sign of the expression this wouldn't
work.  But then you can also not match at all.  So the match should be rejected then
since the values need to fit in the same type as the argument and be the same sign.

So the original vect_recog_promote_cst_to_unsigned is also wrong since it doesn't
stop the match if it doesn't fit.

Imho, the pattern here cannot check this and it should be part of the match condition.
If the constant cannot fir into the same type as the operand or has a different sign
the matching should fail.

It's just that in match.pd you can't modify the arguments returned from a predicate
but since the predicate is intended to be used to rewrite to the IFN I still think the
above solution is right, and the range check should be done within the predicate.

Tamar.

> So, you could e.g. do the fold_convert and then verify if
> wi::to_widest on the old and new tree are equal, or you could check for
> TREE_OVERFLOW if fold_convert honors that.
> As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
> precision value (aka wi::to_widest) is all that matters.
> 
> 	Jakub
Tamar Christina Aug. 20, 2024, 7:55 a.m. UTC | #12
Hi Pan,

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Tuesday, August 20, 2024 1:58 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; Jakub Jelinek
> <jakub@redhat.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org;
> juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com;
> rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> Thanks Jakub and Tamar for comments and suggestions.
> 
> The match.pd list as below doesn't check the INT_CST type for .SAT_ADD.
> 
> (match (unsigned_integer_sat_add @0 @1)
> (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1))
> integer_zerop)
>   integer_minus_onep (realpart @2))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>       && types_match (type, @0))))
> 
> Thus the different types of .ADD_OVERFLOW could hit the pattern. The
> vectorizable_call strictly
> check the operands are totally the same, while the scalar doesn't have similar
> check. That
> is why I only found this issue from vector part.

Yeah and my question was more why are we getting away with it for the scalar.
So I implemented the optabs so I can take a look.

It looks like the scalar version doesn't match because split-path rewrites the IL
when the argument is a constant.  Passing -fno-split-paths gets it to generate
the instruction where we see that the IFN will then also contain mixed types.

#include <stdint.h>
#include <stdio.h>

  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

#define CST -9LL
DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, CST)

int
main ()
{
  uint8_t a = 1;
  uint8_t r = 0;
  vec_sat_u_add_immCST_uint8_t_fmt_3 (&r, &a, 1);
  printf ("r=%u\n", r);
}

Generates:

        movi    v31.8b, 0xfffffffffffffff7
        uxtw    x2, w2
        mov     x3, 0
        .p2align 5,,15
.L3:
        ldr     b30, [x1, x3]
        uqadd   b30, b30, b31
        str     b30, [x0, x3]
        add     x3, x3, 1
        cmp     x2, x3
        bne     .L3

which is incorrect, it's expected to saturate but instead is doing x + 0xF7.

This is because of what Richi said before, there's nothing else in GIMPLE that tries to validate the operands,
and expand will simply force the operand to the register of the size it requested and doesn't care about the outcome.

For constants that are out of range, we're getting lucky in that existing math rules will remove the operation and replace
It with -1.  Because the operations know it would overflow in this case so at compile time the check goes away.  That's why
The problem doesn't show up with an out of range constant since there's no saturation check anymore.  But this is pure luck.

Secondly the reason I said that

+static void
+vect_recog_promote_cst_to_unsigned (tree *op, tree type)
+{
+  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
+    return;
+
+  unsigned precision = TYPE_PRECISION (type);
+  wide_int type_max = wi::mask (precision, false, precision);
+  wide_int op_cst_val = wi::to_wide (*op, precision);
+
+  if (wi::leu_p (op_cst_val, type_max))
+    *op = wide_int_to_tree (type, op_cst_val);
+}
+

Is wrong is that patterns are one way.  If the operand is invalid *op isn't rewritten so it stays the same as it is today.
That is it'll get rejected in vectorizable_call.  But because you've now matched here, another pattern can't match
anymore, and more importantly, it prevents is from trying any alternative way to vectorize this (if there was one).

That's why the pattern matcher shouldn't knowingly accept something we know can't get vectorized.  You shouldn't
build the pattern at all.

And the reason I suggested doing this check in the match.pd is because of an inconsistency between the variable and immediate
variant if it's not done there.

You have

(match (unsigned_integer_sat_add @0 @1)
 (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
  integer_minus_onep (realpart @2))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0))))

And

(match (unsigned_integer_sat_add @0 @1)
 (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
  integer_minus_onep (usadd_left_part_2 @0 @1)))

Where 

(match (usadd_left_part_2 @0 @1)
 (realpart (IFN_ADD_OVERFLOW:c @0 @1))
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0, @1))))

so for the non-constant version we require all the operands to be unsigned and of the same type.
For the constant version we don't..

This seems a bit inconsistent to me, along with for the non-overflow case:

(match (unsigned_integer_sat_add @0 @1)
 (plus (min @0 INTEGER_CST@2) INTEGER_CST@1)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0, @1))

There is a check on the sign of the constant.  So it seems to me that it's the non-constant case
for the IFN_ADD_OVERFLOW case that's the only one lacking additional validation on the constant.

Thanks,
Tamar

> 
> It looks like we need to add the type check for INT_CST in match.pd predicate and
> then add explicit cast
> to IMM from the source code to match the pattern. For example as below, not very
> sure it is reasonable or not.
> 
> #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
> T __attribute__((noinline))                                          \
> vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> {                                                                    \
>   unsigned i;                                                        \
>   T ret;                                                             \
>   for (i = 0; i < limit; i++)                                        \
>     {                                                                \
>       out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \   // need add
> (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 : ret; to hit
> the pattern.
>     }                                                                \
> }
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Tuesday, August 20, 2024 3:41 AM
> To: Jakub Jelinek <jakub@redhat.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>
> Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Monday, August 19, 2024 8:25 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> > gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> > jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> > <hongtao.liu@intel.com>
> > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> > vectorizable_call
> >
> > On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote:
> > > So would this not be the simplest fix:
> > >
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index 87b3dc413b8..fcbc83a49f0 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo,
> > stmt_vec_info stmt_vinfo,
> > >
> > >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> >
> > But then you call gimple_unsigned_integer_sat_add with mismatching types,
> > not sure if that is ok.
> >
> 
> gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the
> expression
> rooted in lhs and returns the results in ops.  So not sure what you mean here.
> 
> > >      {
> > > +      if (TREE_CODE (ops[1]) == INTEGER_CST)
> > > +       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> > > +
> >
> > This would be only ok if the conversion doesn't change the value
> > of the constant.
> > .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
> > want to change the latter to the former if the value has the most
> > significant bit set.
> > Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
> > arguments, you don't want to truncate the constant.
> 
> Yes, if the expression truncates or changes the sign of the expression this wouldn't
> work.  But then you can also not match at all.  So the match should be rejected
> then
> since the values need to fit in the same type as the argument and be the same sign.
> 
> So the original vect_recog_promote_cst_to_unsigned is also wrong since it doesn't
> stop the match if it doesn't fit.
> 
> Imho, the pattern here cannot check this and it should be part of the match
> condition.
> If the constant cannot fir into the same type as the operand or has a different sign
> the matching should fail.
> 
> It's just that in match.pd you can't modify the arguments returned from a
> predicate
> but since the predicate is intended to be used to rewrite to the IFN I still think the
> above solution is right, and the range check should be done within the predicate.
> 
> Tamar.
> 
> > So, you could e.g. do the fold_convert and then verify if
> > wi::to_widest on the old and new tree are equal, or you could check for
> > TREE_OVERFLOW if fold_convert honors that.
> > As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
> > precision value (aka wi::to_widest) is all that matters.
> >
> > 	Jakub
Li, Pan2 Aug. 20, 2024, 8:39 a.m. UTC | #13
Thanks Tamar for comments and explanations.

> But because you've now matched here, another pattern can't match
> anymore, and more importantly, it prevents is from trying any alternative way to vectorize this (if there was one).

> That's why the pattern matcher shouldn't knowingly accept something we know can't get vectorized.  You shouldn't
> build the pattern at all.

> And the reason I suggested doing this check in the match.pd is because of an inconsistency between the variable and immediate
> variant if it's not done there.

Got the point here, I will double check all SAT_* related matching pattern for INT_CST type check.

Pan

-----Original Message-----
From: Tamar Christina <Tamar.Christina@arm.com> 
Sent: Tuesday, August 20, 2024 3:56 PM
To: Li, Pan2 <pan2.li@intel.com>; Jakub Jelinek <jakub@redhat.com>
Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for vectorizable_call

Hi Pan,

> -----Original Message-----
> From: Li, Pan2 <pan2.li@intel.com>
> Sent: Tuesday, August 20, 2024 1:58 AM
> To: Tamar Christina <Tamar.Christina@arm.com>; Jakub Jelinek
> <jakub@redhat.com>
> Cc: Richard Biener <richard.guenther@gmail.com>; gcc-patches@gcc.gnu.org;
> juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com;
> rdapp.gcc@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>
> Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> Thanks Jakub and Tamar for comments and suggestions.
> 
> The match.pd list as below doesn't check the INT_CST type for .SAT_ADD.
> 
> (match (unsigned_integer_sat_add @0 @1)
> (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1))
> integer_zerop)
>   integer_minus_onep (realpart @2))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
>       && types_match (type, @0))))
> 
> Thus the different types of .ADD_OVERFLOW could hit the pattern. The
> vectorizable_call strictly
> check the operands are totally the same, while the scalar doesn't have similar
> check. That
> is why I only found this issue from vector part.

Yeah and my question was more why are we getting away with it for the scalar.
So I implemented the optabs so I can take a look.

It looks like the scalar version doesn't match because split-path rewrites the IL
when the argument is a constant.  Passing -fno-split-paths gets it to generate
the instruction where we see that the IFN will then also contain mixed types.

#include <stdint.h>
#include <stdio.h>

  #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
  T __attribute__((noinline))                                          \
  vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
  {                                                                    \
    unsigned i;                                                        \
    T ret;                                                             \
    for (i = 0; i < limit; i++)                                        \
      {                                                                \
        out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \
      }                                                                \
  }

#define CST -9LL
DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint8_t, CST)

int
main ()
{
  uint8_t a = 1;
  uint8_t r = 0;
  vec_sat_u_add_immCST_uint8_t_fmt_3 (&r, &a, 1);
  printf ("r=%u\n", r);
}

Generates:

        movi    v31.8b, 0xfffffffffffffff7
        uxtw    x2, w2
        mov     x3, 0
        .p2align 5,,15
.L3:
        ldr     b30, [x1, x3]
        uqadd   b30, b30, b31
        str     b30, [x0, x3]
        add     x3, x3, 1
        cmp     x2, x3
        bne     .L3

which is incorrect, it's expected to saturate but instead is doing x + 0xF7.

This is because of what Richi said before, there's nothing else in GIMPLE that tries to validate the operands,
and expand will simply force the operand to the register of the size it requested and doesn't care about the outcome.

For constants that are out of range, we're getting lucky in that existing math rules will remove the operation and replace
It with -1.  Because the operations know it would overflow in this case so at compile time the check goes away.  That's why
The problem doesn't show up with an out of range constant since there's no saturation check anymore.  But this is pure luck.

Secondly the reason I said that

+static void
+vect_recog_promote_cst_to_unsigned (tree *op, tree type)
+{
+  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
+    return;
+
+  unsigned precision = TYPE_PRECISION (type);
+  wide_int type_max = wi::mask (precision, false, precision);
+  wide_int op_cst_val = wi::to_wide (*op, precision);
+
+  if (wi::leu_p (op_cst_val, type_max))
+    *op = wide_int_to_tree (type, op_cst_val);
+}
+

Is wrong is that patterns are one way.  If the operand is invalid *op isn't rewritten so it stays the same as it is today.
That is it'll get rejected in vectorizable_call.  But because you've now matched here, another pattern can't match
anymore, and more importantly, it prevents is from trying any alternative way to vectorize this (if there was one).

That's why the pattern matcher shouldn't knowingly accept something we know can't get vectorized.  You shouldn't
build the pattern at all.

And the reason I suggested doing this check in the match.pd is because of an inconsistency between the variable and immediate
variant if it's not done there.

You have

(match (unsigned_integer_sat_add @0 @1)
 (cond^ (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 INTEGER_CST@1)) integer_zerop)
  integer_minus_onep (realpart @2))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0))))

And

(match (unsigned_integer_sat_add @0 @1)
 (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop)
  integer_minus_onep (usadd_left_part_2 @0 @1)))

Where 

(match (usadd_left_part_2 @0 @1)
 (realpart (IFN_ADD_OVERFLOW:c @0 @1))
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0, @1))))

so for the non-constant version we require all the operands to be unsigned and of the same type.
For the constant version we don't..

This seems a bit inconsistent to me, along with for the non-overflow case:

(match (unsigned_integer_sat_add @0 @1)
 (plus (min @0 INTEGER_CST@2) INTEGER_CST@1)
 (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
      && types_match (type, @0, @1))

There is a check on the sign of the constant.  So it seems to me that it's the non-constant case
for the IFN_ADD_OVERFLOW case that's the only one lacking additional validation on the constant.

Thanks,
Tamar

> 
> It looks like we need to add the type check for INT_CST in match.pd predicate and
> then add explicit cast
> to IMM from the source code to match the pattern. For example as below, not very
> sure it is reasonable or not.
> 
> #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM)                          \
> T __attribute__((noinline))                                          \
> vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \
> {                                                                    \
>   unsigned i;                                                        \
>   T ret;                                                             \
>   for (i = 0; i < limit; i++)                                        \
>     {                                                                \
>       out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \   // need add
> (T)IMM, aka out[i] = __builtin_add_overflow (in[i], (T)IMM, &ret) ? -1 : ret; to hit
> the pattern.
>     }                                                                \
> }
> 
> Pan
> 
> -----Original Message-----
> From: Tamar Christina <Tamar.Christina@arm.com>
> Sent: Tuesday, August 20, 2024 3:41 AM
> To: Jakub Jelinek <jakub@redhat.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> <hongtao.liu@intel.com>
> Subject: RE: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> vectorizable_call
> 
> > -----Original Message-----
> > From: Jakub Jelinek <jakub@redhat.com>
> > Sent: Monday, August 19, 2024 8:25 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Li, Pan2 <pan2.li@intel.com>; Richard Biener <richard.guenther@gmail.com>;
> > gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com;
> > jeffreyalaw@gmail.com; rdapp.gcc@gmail.com; Liu, Hongtao
> > <hongtao.liu@intel.com>
> > Subject: Re: [PATCH v1] Vect: Promote unsigned .SAT_ADD constant operand for
> > vectorizable_call
> >
> > On Mon, Aug 19, 2024 at 01:55:38PM +0000, Tamar Christina wrote:
> > > So would this not be the simplest fix:
> > >
> > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > > index 87b3dc413b8..fcbc83a49f0 100644
> > > --- a/gcc/tree-vect-patterns.cc
> > > +++ b/gcc/tree-vect-patterns.cc
> > > @@ -4558,6 +4558,9 @@ vect_recog_sat_add_pattern (vec_info *vinfo,
> > stmt_vec_info stmt_vinfo,
> > >
> > >    if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
> >
> > But then you call gimple_unsigned_integer_sat_add with mismatching types,
> > not sure if that is ok.
> >
> 
> gimple_unsigned_integer_sat_add is a match.pd predicate. It matches the
> expression
> rooted in lhs and returns the results in ops.  So not sure what you mean here.
> 
> > >      {
> > > +      if (TREE_CODE (ops[1]) == INTEGER_CST)
> > > +       ops[1] = fold_convert (TREE_TYPE (ops[0]), ops[1]);
> > > +
> >
> > This would be only ok if the conversion doesn't change the value
> > of the constant.
> > .ADD_OVERFLOW etc. could have e.g. int and unsigned arguments, you don't
> > want to change the latter to the former if the value has the most
> > significant bit set.
> > Similarly, .ADD_OVERFLOW could have e.g. unsigned and unsigned __int128
> > arguments, you don't want to truncate the constant.
> 
> Yes, if the expression truncates or changes the sign of the expression this wouldn't
> work.  But then you can also not match at all.  So the match should be rejected
> then
> since the values need to fit in the same type as the argument and be the same sign.
> 
> So the original vect_recog_promote_cst_to_unsigned is also wrong since it doesn't
> stop the match if it doesn't fit.
> 
> Imho, the pattern here cannot check this and it should be part of the match
> condition.
> If the constant cannot fir into the same type as the operand or has a different sign
> the matching should fail.
> 
> It's just that in match.pd you can't modify the arguments returned from a
> predicate
> but since the predicate is intended to be used to rewrite to the IFN I still think the
> above solution is right, and the range check should be done within the predicate.
> 
> Tamar.
> 
> > So, you could e.g. do the fold_convert and then verify if
> > wi::to_widest on the old and new tree are equal, or you could check for
> > TREE_OVERFLOW if fold_convert honors that.
> > As I said, for INTEGER_CST operands of .ADD/SUB/MUL_OVERFLOW, the infinite
> > precision value (aka wi::to_widest) is all that matters.
> >
> > 	Jakub
diff mbox series

Patch

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 86e893a1c43..e1013222b12 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4527,6 +4527,20 @@  vect_recog_build_binary_gimple_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
   return NULL;
 }
 
+static void
+vect_recog_promote_cst_to_unsigned (tree *op, tree type)
+{
+  if (TREE_CODE (*op) != INTEGER_CST || !TYPE_UNSIGNED (type))
+    return;
+
+  unsigned precision = TYPE_PRECISION (type);
+  wide_int type_max = wi::mask (precision, false, precision);
+  wide_int op_cst_val = wi::to_wide (*op, precision);
+
+  if (wi::leu_p (op_cst_val, type_max))
+    *op = wide_int_to_tree (type, op_cst_val);
+}
+
 /*
  * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
  *   _7 = _4 + _6;
@@ -4553,6 +4567,9 @@  vect_recog_sat_add_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
 
   if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
     {
+      vect_recog_promote_cst_to_unsigned (&ops[0], TREE_TYPE (ops[1]));
+      vect_recog_promote_cst_to_unsigned (&ops[1], TREE_TYPE (ops[0]));
+
       gimple *stmt = vect_recog_build_binary_gimple_stmt (vinfo, stmt_vinfo,
 							  IFN_SAT_ADD, type_out,
 							  lhs, ops[0], ops[1]);