Message ID | 20240520110048.2827900-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v1,1/2] Match: Support branch form for unsigned SAT_ADD | expand |
Hi Pan, > -----Original Message----- > From: pan2.li@intel.com <pan2.li@intel.com> > Sent: Monday, May 20, 2024 12:01 PM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina > <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li > <pan2.li@intel.com> > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the branch form for unsigned > SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > return (uint64_t) (x + y) >= x ? (x + y) : -1; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y) > { > long unsigned int _1; > uint64_t _2; > __complex__ long unsigned int _6; > long unsigned int _7; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_3(D), y_4(D)); > _1 = REALPART_EXPR <_6>; > _7 = IMAGPART_EXPR <_6>; > if (_7 == 0) > goto <bb 4>; [65.00%] > else > goto <bb 3>; [35.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _2 = PHI <18446744073709551615(3), _1(2)> > return _2; > ;; succ: EXIT > > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _9; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call] > return _9; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. > > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0f9c34fa897..0547b57b3a3 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +/* Simplify the branch version of SAT_ADD into branchless if and only if > + the backend has supported the IFN_SAT_ADD. Thus, the backend has the > + ability to choose branch or branchless implementation of .SAT_ADD. */ > + > +(simplify > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > OPTIMIZE_FOR_BOTH)) > + (bit_ior @2 (negate (convert (lt @2 @0)))))) > + > +(simplify > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep) > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > OPTIMIZE_FOR_BOTH)) > + (bit_ior @2 (negate (convert (lt @2 @0)))))) > + > +#endif Thanks, this looks good to me! I'll leave it up to Richard to approve, Richard: The reason for the direct_internal_fn_supported_p is because some targets said that they currently handle the branch version better due to the lack of some types. At the time I reason it's just a target expansion bug but didn't hear anything. To be honest, it feels to me like we should do this unconditionally, and just have the targets that get faster branch version to handle it during expand? Since the patch series provides a canonicalized version now. This means we can also better support targets that have the vector optab but not the scalar one as the above check would fail for these targets. What do you think? Thanks, Tamar > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1
On Mon, May 20, 2024 at 1:50 PM Tamar Christina <Tamar.Christina@arm.com> wrote: > > Hi Pan, > > > -----Original Message----- > > From: pan2.li@intel.com <pan2.li@intel.com> > > Sent: Monday, May 20, 2024 12:01 PM > > To: gcc-patches@gcc.gnu.org > > Cc: juzhe.zhong@rivai.ai; kito.cheng@gmail.com; Tamar Christina > > <Tamar.Christina@arm.com>; richard.guenther@gmail.com; Pan Li > > <pan2.li@intel.com> > > Subject: [PATCH v1 1/2] Match: Support branch form for unsigned SAT_ADD > > > > From: Pan Li <pan2.li@intel.com> > > > > This patch would like to support the branch form for unsigned > > SAT_ADD. For example as below: > > > > uint64_t > > sat_add (uint64_t x, uint64_t y) > > { > > return (uint64_t) (x + y) >= x ? (x + y) : -1; > > } > > > > Different to the branchless version, we leverage the simplify to > > convert the branch version of SAT_ADD into branchless if and only > > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > > the ability to choose branch or branchless implementation of .SAT_ADD. > > For example, some target can take care of branches code more optimally. > > > > When the target implement the IFN_SAT_ADD for unsigned and before this > > patch: > > uint64_t sat_add_u_1_uint64_t (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > uint64_t _2; > > __complex__ long unsigned int _6; > > long unsigned int _7; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _6 = .ADD_OVERFLOW (x_3(D), y_4(D)); > > _1 = REALPART_EXPR <_6>; > > _7 = IMAGPART_EXPR <_6>; > > if (_7 == 0) > > goto <bb 4>; [65.00%] > > else > > goto <bb 3>; [35.00%] > > ;; succ: 4 > > ;; 3 > > > > ;; basic block 3, loop depth 0 > > ;; pred: 2 > > ;; succ: 4 > > > > ;; basic block 4, loop depth 0 > > ;; pred: 3 > > ;; 2 > > # _2 = PHI <18446744073709551615(3), _1(2)> > > return _2; > > ;; succ: EXIT > > > > } > > > > After this patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > > { > > long unsigned int _9; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _9 = .SAT_ADD (x_3(D), y_4(D)); [tail call] > > return _9; > > ;; succ: EXIT > > } > > > > The below test suites are passed for this patch: > > * The x86 bootstrap test. > > * The x86 fully regression test. > > * The riscv fully regression test. > > > > gcc/ChangeLog: > > > > * match.pd: Add new simplify to convert branch SAT_ADD into > > branchless, if and only if backend implement the IFN. > > > > Signed-off-by: Pan Li <pan2.li@intel.com> > > --- > > gcc/match.pd | 18 ++++++++++++++++++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 0f9c34fa897..0547b57b3a3 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (match (unsigned_integer_sat_add @0 @1) > > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > > > +#if GIMPLE > > + > > +/* Simplify the branch version of SAT_ADD into branchless if and only if > > + the backend has supported the IFN_SAT_ADD. Thus, the backend has the > > + ability to choose branch or branchless implementation of .SAT_ADD. */ This comment or part of the description above should say this simplifies (x + y) >= x ? (x + y) : -1 as (x + y) | (-(typeof(x))((x + y) < x)) > > +(simplify > > + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) > > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > > OPTIMIZE_FOR_BOTH)) > > + (bit_ior @2 (negate (convert (lt @2 @0)))))) > > + > > +(simplify > > + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep) > > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > > OPTIMIZE_FOR_BOTH)) > > + (bit_ior @2 (negate (convert (lt @2 @0)))))) and this should probably be (gt @2 @0)? This misses INTEGER_TYPE_P constraints and it's supposed to be only for TYPE_UNSIGNED? > > + > > +#endif > > Thanks, this looks good to me! > > I'll leave it up to Richard to approve, > Richard: The reason for the direct_internal_fn_supported_p is because some > targets said that they currently handle the branch version better due to the lack > of some types. At the time I reason it's just a target expansion bug but didn't hear anything. > > To be honest, it feels to me like we should do this unconditionally, and just have the targets > that get faster branch version to handle it during expand? Since the patch series provides > a canonicalized version now. I'm not sure this is a good canonical form. __imag .ADD_OVERFLOW (x, y) ? __real .ADD_OVERFLOW (x, y) : -1 would be better IMO. It can be branch-less by using a COND_EXPR. > This means we can also better support targets that have the vector optab but not the scalar one > as the above check would fail for these targets. > > What do you think? > > Thanks, > Tamar > > > + > > /* x > y && x != XXX_MIN --> x > y > > x > y && x == XXX_MIN --> false . */ > > (for eqne (eq ne) > > -- > > 2.34.1 >
diff --git a/gcc/match.pd b/gcc/match.pd index 0f9c34fa897..0547b57b3a3 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3094,6 +3094,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (unsigned_integer_sat_add @0 @1) (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) +#if GIMPLE + +/* Simplify the branch version of SAT_ADD into branchless if and only if + the backend has supported the IFN_SAT_ADD. Thus, the backend has the + ability to choose branch or branchless implementation of .SAT_ADD. */ + +(simplify + (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) + (bit_ior @2 (negate (convert (lt @2 @0)))))) + +(simplify + (cond (le @0 (plus:c@2 @0 @1)) @2 integer_minus_onep) + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) + (bit_ior @2 (negate (convert (lt @2 @0)))))) + +#endif + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne)