Message ID | 20240521105312.4112496-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v1,1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD | expand |
On Tue, May 21, 2024, 3:55 AM <pan2.li@intel.com> wrote: > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. > > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0f9c34fa897..8b9ded98323 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, > OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > I think you need to make sure type and @0's type matches. Also I don't think you need :c here since you don't match @0 nor @1 more than once. Thanks, Andrew + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1 > >
Thanks Andrew for comments. > I think you need to make sure type and @0's type matches. Oh, yes, we need that, will update in v2. > Also I don't think you need :c here since you don't match @0 nor @1 more than once. You mean the :c from (IFN_ADD_OVERFLOW:c@2 @0 @1)), right? My initial idea is to catch both the (IFN_ADD_OVERFLOW @0 @1) and (IFN_ADD_OVERFLOW @1 @0). It is unnecessary if IFN_ADD_OVERFLOW takes care of this already. Pan From: Andrew Pinski <pinskia@gmail.com> Sent: Tuesday, May 21, 2024 7:40 PM To: Li, Pan2 <pan2.li@intel.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; 钟居哲 <juzhe.zhong@rivai.ai>; Kito Cheng <kito.cheng@gmail.com>; Tamar Christina <tamar.christina@arm.com>; Richard Guenther <richard.guenther@gmail.com> Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On Tue, May 21, 2024, 3:55 AM <pan2.li@intel.com<mailto:pan2.li@intel.com>> wrote: From: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>> This patch would like to support the __builtin_add_overflow branch form for unsigned SAT_ADD. For example as below: uint64_t sat_add (uint64_t x, uint64_t y) { uint64_t ret; return __builtin_add_overflow (x, y, &ret) ? -1 : ret; } Different to the branchless version, we leverage the simplify to convert the branch version of SAT_ADD into branchless if and only if the backend has supported the IFN_SAT_ADD. Thus, the backend has the ability to choose branch or branchless implementation of .SAT_ADD. For example, some target can take care of branches code more optimally. When the target implement the IFN_SAT_ADD for unsigned and before this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _1; long unsigned int _2; uint64_t _3; __complex__ long unsigned int _6; ;; basic block 2, loop depth 0 ;; pred: ENTRY _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); _2 = IMAGPART_EXPR <_6>; if (_2 != 0) goto <bb 4>; [35.00%] else goto <bb 3>; [65.00%] ;; succ: 4 ;; 3 ;; basic block 3, loop depth 0 ;; pred: 2 _1 = REALPART_EXPR <_6>; ;; succ: 4 ;; basic block 4, loop depth 0 ;; pred: 3 ;; 2 # _3 = PHI <_1(3), 18446744073709551615(2)> return _3; ;; succ: EXIT } After this patch: uint64_t sat_add (uint64_t x, uint64_t y) { long unsigned int _12; ;; basic block 2, loop depth 0 ;; pred: ENTRY _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] return _12; ;; succ: EXIT } The below test suites are passed for this patch: * The x86 bootstrap test. * The x86 fully regression test. * The riscv fully regression test. gcc/ChangeLog: * match.pd: Add new simplify to convert branch SAT_ADD into branchless, if and only if backend implement the IFN. Signed-off-by: Pan Li <pan2.li@intel.com<mailto:pan2.li@intel.com>> --- gcc/match.pd | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index 0f9c34fa897..8b9ded98323 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (unsigned_integer_sat_add @0 @1) (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) +#if GIMPLE + +(simplify + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) + integer_minus_onep (realpart @2)) + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) I think you need to make sure type and @0's type matches. Also I don't think you need :c here since you don't match @0 nor @1 more than once. Thanks, Andrew + +#endif + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne) -- 2.34.1
On Tue, May 21, 2024 at 5:28 AM Li, Pan2 <pan2.li@intel.com> wrote: > > Thanks Andrew for comments. > > > > > I think you need to make sure type and @0's type matches. > > > > Oh, yes, we need that, will update in v2. > > > > > Also I don't think you need :c here since you don't match @0 nor @1 more than once. > > > > You mean the :c from (IFN_ADD_OVERFLOW:c@2 @0 @1)), right? > > My initial idea is to catch both the (IFN_ADD_OVERFLOW @0 @1) and (IFN_ADD_OVERFLOW @1 @0). > > It is unnecessary if IFN_ADD_OVERFLOW takes care of this already. Since in this case there is Canonical form/order here (at least there should be). > + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) Since you matching @2 for the realpart rather than `(IFN_ADD_OVERFLOW @0 @1)` directly the :c is not needed and genmatch will just generate extra matching code that cannot be not get reached Thanks, Andrew. > > > > Pan > > > > > > From: Andrew Pinski <pinskia@gmail.com> > Sent: Tuesday, May 21, 2024 7:40 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; 钟居哲 <juzhe.zhong@rivai.ai>; Kito Cheng <kito.cheng@gmail.com>; Tamar Christina <tamar.christina@arm.com>; Richard Guenther <richard.guenther@gmail.com> > Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > > > > > On Tue, May 21, 2024, 3:55 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. > > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0f9c34fa897..8b9ded98323 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > > > > I think you need to make sure type and @0's type matches. > > > > Also I don't think you need :c here since you don't match @0 nor @1 more than once. > > > > Thanks, > > Andrew > > > > > > + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1
> Since you matching @2 for the realpart rather than `(IFN_ADD_OVERFLOW > @0 @1)` directly the :c is not needed and genmatch will just generate > extra matching code that cannot be not get reached Got it, thanks for explanation. I may need to check the generated matching code for a better understanding for this. Pan -----Original Message----- From: Andrew Pinski <pinskia@gmail.com> Sent: Tuesday, May 21, 2024 8:34 PM To: Li, Pan2 <pan2.li@intel.com> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; 钟居哲 <juzhe.zhong@rivai.ai>; Kito Cheng <kito.cheng@gmail.com>; Tamar Christina <tamar.christina@arm.com>; Richard Guenther <richard.guenther@gmail.com> Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On Tue, May 21, 2024 at 5:28 AM Li, Pan2 <pan2.li@intel.com> wrote: > > Thanks Andrew for comments. > > > > > I think you need to make sure type and @0's type matches. > > > > Oh, yes, we need that, will update in v2. > > > > > Also I don't think you need :c here since you don't match @0 nor @1 more than once. > > > > You mean the :c from (IFN_ADD_OVERFLOW:c@2 @0 @1)), right? > > My initial idea is to catch both the (IFN_ADD_OVERFLOW @0 @1) and (IFN_ADD_OVERFLOW @1 @0). > > It is unnecessary if IFN_ADD_OVERFLOW takes care of this already. Since in this case there is Canonical form/order here (at least there should be). > + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) Since you matching @2 for the realpart rather than `(IFN_ADD_OVERFLOW @0 @1)` directly the :c is not needed and genmatch will just generate extra matching code that cannot be not get reached Thanks, Andrew. > > > > Pan > > > > > > From: Andrew Pinski <pinskia@gmail.com> > Sent: Tuesday, May 21, 2024 7:40 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; 钟居哲 <juzhe.zhong@rivai.ai>; Kito Cheng <kito.cheng@gmail.com>; Tamar Christina <tamar.christina@arm.com>; Richard Guenther <richard.guenther@gmail.com> > Subject: Re: [PATCH v1 1/2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > > > > > On Tue, May 21, 2024, 3:55 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. > > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index 0f9c34fa897..8b9ded98323 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > > > > I think you need to make sure type and @0's type matches. > > > > Also I don't think you need :c here since you don't match @0 nor @1 more than once. > > > > Thanks, > > Andrew > > > > > > + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1
diff --git a/gcc/match.pd b/gcc/match.pd index 0f9c34fa897..8b9ded98323 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3094,6 +3094,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (unsigned_integer_sat_add @0 @1) (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) +#if GIMPLE + +(simplify + (cond (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop) + integer_minus_onep (realpart @2)) + (if (direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) + +#endif + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne)