Message ID | 20240617014108.2831124-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD | expand |
On Mon, Jun 17, 2024 at 3:41 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > When investigate the vectorization of .SAT_ADD, we notice there > are additional 2 forms, aka form 7 and 8 for .SAT_ADD. > > Form 7: > #define DEF_SAT_U_ADD_FMT_7(T) \ > T __attribute__((noinline)) \ > sat_u_add_##T##_fmt_7 (T x, T y) \ > { \ > return x > (T)(x + y) ? -1 : (x + y); \ > } > > Form 8: > #define DEF_SAT_U_ADD_FMT_8(T) \ > T __attribute__((noinline)) \ > sat_u_add_##T##_fmt_8 (T x, T y) \ > { \ > return x <= (T)(x + y) ? (x + y) : -1; \ > } > > Thus, add above 2 forms to the match gimple_unsigned_integer_sat_add, > and then the vectorizer can try to recog the pattern like form 7 and > form 8. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test with newlib. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap test. > 4. The x86 fully regression test. OK. Note that fold-const.cc has canonicalization for the minus one to be put last: /* If the second operand is simpler than the third, swap them since that produces better jump optimization results. */ if (truth_value_p (TREE_CODE (arg0)) && tree_swap_operands_p (op1, op2)) { location_t loc0 = expr_location_or (arg0, loc); /* See if this can be inverted. If it can't, possibly because it was a floating-point inequality comparison, don't do anything. */ tem = fold_invert_truthvalue (loc0, arg0); if (tem) return fold_build3_loc (loc, code, type, tem, op2, op1); we might want to consider such transform in match.pd, in this case this would allow to elide one of the patterns. Richard. > gcc/ChangeLog: > > * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 99968d316ed..aae6d30a5e4 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop) > integer_minus_onep (usadd_left_part_2 @0 @1))) > > +/* Unsigned saturation add, case 7 (branch with le): > + SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ > +(match (unsigned_integer_sat_add @0 @1) > + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) > + > +/* Unsigned saturation add, case 8 (branch with gt): > + SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ > +(match (unsigned_integer_sat_add @0 @1) > + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2)) > + > /* Unsigned saturation sub, case 1 (branch with gt): > SAT_U_SUB = X > Y ? X - Y : 0 */ > (match (unsigned_integer_sat_sub @0 @1) > -- > 2.34.1 >
Thanks Richard for comments. > we might want to consider such transform in match.pd, in this case this > would allow to elide one of the patterns. That makes much more sense to me, it is not good idea to have many patterns for SAT_ADD, will commit this first and have a try in another PATCH for this. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Tuesday, June 18, 2024 7:03 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; jeffreyalaw@gmail.com; rdapp.gcc@gmail.com Subject: Re: [PATCH v1] Match: Support forms 7 and 8 for the unsigned .SAT_ADD On Mon, Jun 17, 2024 at 3:41 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > When investigate the vectorization of .SAT_ADD, we notice there > are additional 2 forms, aka form 7 and 8 for .SAT_ADD. > > Form 7: > #define DEF_SAT_U_ADD_FMT_7(T) \ > T __attribute__((noinline)) \ > sat_u_add_##T##_fmt_7 (T x, T y) \ > { \ > return x > (T)(x + y) ? -1 : (x + y); \ > } > > Form 8: > #define DEF_SAT_U_ADD_FMT_8(T) \ > T __attribute__((noinline)) \ > sat_u_add_##T##_fmt_8 (T x, T y) \ > { \ > return x <= (T)(x + y) ? (x + y) : -1; \ > } > > Thus, add above 2 forms to the match gimple_unsigned_integer_sat_add, > and then the vectorizer can try to recog the pattern like form 7 and > form 8. > > The below test suites are passed for this patch: > 1. The rv64gcv fully regression test with newlib. > 2. The rv64gcv build with glibc. > 3. The x86 bootstrap test. > 4. The x86 fully regression test. OK. Note that fold-const.cc has canonicalization for the minus one to be put last: /* If the second operand is simpler than the third, swap them since that produces better jump optimization results. */ if (truth_value_p (TREE_CODE (arg0)) && tree_swap_operands_p (op1, op2)) { location_t loc0 = expr_location_or (arg0, loc); /* See if this can be inverted. If it can't, possibly because it was a floating-point inequality comparison, don't do anything. */ tem = fold_invert_truthvalue (loc0, arg0); if (tem) return fold_build3_loc (loc, code, type, tem, op2, op1); we might want to consider such transform in match.pd, in this case this would allow to elide one of the patterns. Richard. > gcc/ChangeLog: > > * match.pd: Add form 7 and 8 for the unsigned .SAT_ADD match. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 99968d316ed..aae6d30a5e4 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop) > integer_minus_onep (usadd_left_part_2 @0 @1))) > > +/* Unsigned saturation add, case 7 (branch with le): > + SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ > +(match (unsigned_integer_sat_add @0 @1) > + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) > + > +/* Unsigned saturation add, case 8 (branch with gt): > + SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ > +(match (unsigned_integer_sat_add @0 @1) > + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2)) > + > /* Unsigned saturation sub, case 1 (branch with gt): > SAT_U_SUB = X > Y ? X - Y : 0 */ > (match (unsigned_integer_sat_sub @0 @1) > -- > 2.34.1 >
diff --git a/gcc/match.pd b/gcc/match.pd index 99968d316ed..aae6d30a5e4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3144,6 +3144,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c @0 @1)) integer_zerop) integer_minus_onep (usadd_left_part_2 @0 @1))) +/* Unsigned saturation add, case 7 (branch with le): + SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ +(match (unsigned_integer_sat_add @0 @1) + (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) + +/* Unsigned saturation add, case 8 (branch with gt): + SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ +(match (unsigned_integer_sat_add @0 @1) + (cond^ (gt @0 (usadd_left_part_1@2 @0 @1)) integer_minus_onep @2)) + /* Unsigned saturation sub, case 1 (branch with gt): SAT_U_SUB = X > Y ? X - Y : 0 */ (match (unsigned_integer_sat_sub @0 @1)