Message ID | 20240522011725.424952-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD | expand |
On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. I'm not convinced we should match this during early if-conversion, should we? The middle-end doesn't really know .SAT_ADD but some handling of .ADD_OVERFLOW is present. But please add a comment before the new pattern, esp. since it's non-obvious that this is an improvent. I suspect you rely on this form being recognized as .SAT_ADD later but what prevents us from breaking this? Why not convert it to .SAT_ADD immediately? If this is because the ISEL pass (or the widen-mult pass) cannot handle PHIs then I would suggest to split out enough parts of tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index cff67c84498..2dc77a46e67 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type) > + && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1 >
Thanks Richard for reviewing. > I'm not convinced we should match this during early if-conversion, should we? > The middle-end doesn't really know .SAT_ADD but some handling of > .ADD_OVERFLOW is present. I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form. Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling and convert the branch form to the branchless form in v2. > But please add a comment before the new pattern, esp. since it's > non-obvious that this is an improvent. Sure thing. > I suspect you rely on this form being recognized as .SAT_ADD later but > what prevents us from breaking this? Why not convert it to .SAT_ADD > immediately? If this is because the ISEL pass (or the widen-mult pass) > cannot handle PHIs then I would suggest to split out enough parts of > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult. Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt? The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the other early pass. Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD In phiopt (mostly like what we do in widen-mult). Not sure if my understanding is correct or not, thanks again for help. #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } SAT_ADD_U_1(uint8_t); Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Wednesday, May 22, 2024 9:14 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. I'm not convinced we should match this during early if-conversion, should we? The middle-end doesn't really know .SAT_ADD but some handling of .ADD_OVERFLOW is present. But please add a comment before the new pattern, esp. since it's non-obvious that this is an improvent. I suspect you rely on this form being recognized as .SAT_ADD later but what prevents us from breaking this? Why not convert it to .SAT_ADD immediately? If this is because the ISEL pass (or the widen-mult pass) cannot handle PHIs then I would suggest to split out enough parts of tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index cff67c84498..2dc77a46e67 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type) > + && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1 >
I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. And then we can do the matching on COND_EXPR in the underlying widen-mul pass. Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 will go on to see if this works or not. Part-A: uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y) { unsigned char _1; uint8_t _2; <bb 2> : _1 = x_3(D) + y_4(D); if (_1 >= x_3(D)) goto <bb 3>; [INV] else goto <bb 4>; [INV] <bb 3> : <bb 4> : # _2 = PHI <255(2), _1(3)> return _2; } Part-B: uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y) { unsigned char _1; _Bool phi_cond_6; <bb 2> : _1 = x_3(D) + y_4(D); phi_cond_6 = _1 >= x_3(D); _2 = phi_cond_6 ? _1 : 255; return _2; } -----Original Message----- From: Li, Pan2 Sent: Thursday, May 23, 2024 12:17 PM To: Richard Biener <richard.guenther@gmail.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD Thanks Richard for reviewing. > I'm not convinced we should match this during early if-conversion, should we? > The middle-end doesn't really know .SAT_ADD but some handling of > .ADD_OVERFLOW is present. I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form. Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling and convert the branch form to the branchless form in v2. > But please add a comment before the new pattern, esp. since it's > non-obvious that this is an improvent. Sure thing. > I suspect you rely on this form being recognized as .SAT_ADD later but > what prevents us from breaking this? Why not convert it to .SAT_ADD > immediately? If this is because the ISEL pass (or the widen-mult pass) > cannot handle PHIs then I would suggest to split out enough parts of > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult. Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt? The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the other early pass. Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD In phiopt (mostly like what we do in widen-mult). Not sure if my understanding is correct or not, thanks again for help. #define SAT_ADD_U_1(T) \ T sat_add_u_1_##T(T x, T y) \ { \ return (T)(x + y) >= x ? (x + y) : -1; \ } SAT_ADD_U_1(uint8_t); Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Wednesday, May 22, 2024 9:14 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > This patch would like to support the __builtin_add_overflow branch form for > unsigned SAT_ADD. For example as below: > > uint64_t > sat_add (uint64_t x, uint64_t y) > { > uint64_t ret; > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > } > > Different to the branchless version, we leverage the simplify to > convert the branch version of SAT_ADD into branchless if and only > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > the ability to choose branch or branchless implementation of .SAT_ADD. > For example, some target can take care of branches code more optimally. > > When the target implement the IFN_SAT_ADD for unsigned and before this > patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _1; > long unsigned int _2; > uint64_t _3; > __complex__ long unsigned int _6; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > _2 = IMAGPART_EXPR <_6>; > if (_2 != 0) > goto <bb 4>; [35.00%] > else > goto <bb 3>; [65.00%] > ;; succ: 4 > ;; 3 > > ;; basic block 3, loop depth 0 > ;; pred: 2 > _1 = REALPART_EXPR <_6>; > ;; succ: 4 > > ;; basic block 4, loop depth 0 > ;; pred: 3 > ;; 2 > # _3 = PHI <_1(3), 18446744073709551615(2)> > return _3; > ;; succ: EXIT > } > > After this patch: > uint64_t sat_add (uint64_t x, uint64_t y) > { > long unsigned int _12; > > ;; basic block 2, loop depth 0 > ;; pred: ENTRY > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > return _12; > ;; succ: EXIT > } > > The below test suites are passed for this patch: > * The x86 bootstrap test. > * The x86 fully regression test. > * The riscv fully regression test. I'm not convinced we should match this during early if-conversion, should we? The middle-end doesn't really know .SAT_ADD but some handling of .ADD_OVERFLOW is present. But please add a comment before the new pattern, esp. since it's non-obvious that this is an improvent. I suspect you rely on this form being recognized as .SAT_ADD later but what prevents us from breaking this? Why not convert it to .SAT_ADD immediately? If this is because the ISEL pass (or the widen-mult pass) cannot handle PHIs then I would suggest to split out enough parts of tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. > gcc/ChangeLog: > > * match.pd: Add new simplify to convert branch SAT_ADD into > branchless, if and only if backend implement the IFN. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/match.pd | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/gcc/match.pd b/gcc/match.pd > index cff67c84498..2dc77a46e67 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (match (unsigned_integer_sat_add @0 @1) > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > +#if GIMPLE > + > +(simplify > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop) > + integer_minus_onep (realpart @2)) > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type) > + && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > + > +#endif > + > /* x > y && x != XXX_MIN --> x > y > x > y && x == XXX_MIN --> false . */ > (for eqne (eq ne) > -- > 2.34.1 >
On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: > > I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. > And then we can do the matching on COND_EXPR in the underlying widen-mul pass. > > Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => > sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 Likely you have released _2, more comments below on your previous mail. > will go on to see if this works or not. > > Part-A: > uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y) > { > unsigned char _1; > uint8_t _2; > > <bb 2> : > _1 = x_3(D) + y_4(D); > if (_1 >= x_3(D)) > goto <bb 3>; [INV] > else > goto <bb 4>; [INV] > > <bb 3> : > > <bb 4> : > # _2 = PHI <255(2), _1(3)> > return _2; > > } > > Part-B: > uint8_t sat_add_u_1_uint8_t (uint8_t x, uint8_t y) > { > unsigned char _1; > _Bool phi_cond_6; > > <bb 2> : > _1 = x_3(D) + y_4(D); > phi_cond_6 = _1 >= x_3(D); > _2 = phi_cond_6 ? _1 : 255; > return _2; > > } > > -----Original Message----- > From: Li, Pan2 > Sent: Thursday, May 23, 2024 12:17 PM > To: Richard Biener <richard.guenther@gmail.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com > Subject: RE: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > Thanks Richard for reviewing. > > > I'm not convinced we should match this during early if-conversion, should we? > > The middle-end doesn't really know .SAT_ADD but some handling of > > .ADD_OVERFLOW is present. > > I tried to do the branch (aka cond) match in widen-mult pass similar as previous branchless form. > Unfortunately, the branch will be converted to PHI when widen-mult, thus try to bypass the PHI handling > and convert the branch form to the branchless form in v2. > > > But please add a comment before the new pattern, esp. since it's > > non-obvious that this is an improvent. > > Sure thing. > > > I suspect you rely on this form being recognized as .SAT_ADD later but > > what prevents us from breaking this? Why not convert it to .SAT_ADD > > immediately? If this is because the ISEL pass (or the widen-mult pass) > > cannot handle PHIs then I would suggest to split out enough parts of > > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. > > Yes, this is sort of redundant, we can also convert it to .SAT_ADD immediately in match.pd before widen-mult. > > Sorry I may get confused here, for branch form like below, what transform should we perform in phiopt? > The gimple_simplify_phiopt mostly leverage the simplify in match.pd but we may hit the simplify in the > other early pass. > > Or we can leverage branch version of unsigned_integer_sat_add gimple match in phiopt and generate the gimple call .SAT_ADD > In phiopt (mostly like what we do in widen-mult). > Not sure if my understanding is correct or not, thanks again for help. The trick for widen-mult (or ISEL) would be to try to match the PHI nodes in a similar way as to gimple_simplify_phiopt calls op.resimplify. The difficulty resides in that the (match ...) generated code gets the entry to the stmt root. Either we'd teach genmatch to recognize a PHI def as a COND or we make (match ..) (additionally?) generate entry points taking a gimple_match_op so the toplevel COND trick works. Note it's already a bit awkward because we build a GENERIC form of the condition and that's now invalid in the IL for a GIMPLE COND_EXPR but still present because of that phiopt trick. There isn't a SSA def for the condition in the IL (it's only part of a GIMPLE_COND and that one doesn't define "CC flags"). That means possibly special-casing (match (..) (cond (cmp ...) ..)) in genmatch to handle PHI defs might be the easiest "trick" here. Not sure what you did for the IL you quoted above. Richard. > #define SAT_ADD_U_1(T) \ > T sat_add_u_1_##T(T x, T y) \ > { \ > return (T)(x + y) >= x ? (x + y) : -1; \ > } > > SAT_ADD_U_1(uint8_t); > > Pan > > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Wednesday, May 22, 2024 9:14 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > On Wed, May 22, 2024 at 3:17 AM <pan2.li@intel.com> wrote: > > > > From: Pan Li <pan2.li@intel.com> > > > > This patch would like to support the __builtin_add_overflow branch form for > > unsigned SAT_ADD. For example as below: > > > > uint64_t > > sat_add (uint64_t x, uint64_t y) > > { > > uint64_t ret; > > return __builtin_add_overflow (x, y, &ret) ? -1 : ret; > > } > > > > Different to the branchless version, we leverage the simplify to > > convert the branch version of SAT_ADD into branchless if and only > > if the backend has supported the IFN_SAT_ADD. Thus, the backend has > > the ability to choose branch or branchless implementation of .SAT_ADD. > > For example, some target can take care of branches code more optimally. > > > > When the target implement the IFN_SAT_ADD for unsigned and before this > > patch: > > > > uint64_t sat_add (uint64_t x, uint64_t y) > > { > > long unsigned int _1; > > long unsigned int _2; > > uint64_t _3; > > __complex__ long unsigned int _6; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _6 = .ADD_OVERFLOW (x_4(D), y_5(D)); > > _2 = IMAGPART_EXPR <_6>; > > if (_2 != 0) > > goto <bb 4>; [35.00%] > > else > > goto <bb 3>; [65.00%] > > ;; succ: 4 > > ;; 3 > > > > ;; basic block 3, loop depth 0 > > ;; pred: 2 > > _1 = REALPART_EXPR <_6>; > > ;; succ: 4 > > > > ;; basic block 4, loop depth 0 > > ;; pred: 3 > > ;; 2 > > # _3 = PHI <_1(3), 18446744073709551615(2)> > > return _3; > > ;; succ: EXIT > > } > > > > After this patch: > > uint64_t sat_add (uint64_t x, uint64_t y) > > { > > long unsigned int _12; > > > > ;; basic block 2, loop depth 0 > > ;; pred: ENTRY > > _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call] > > return _12; > > ;; succ: EXIT > > } > > > > The below test suites are passed for this patch: > > * The x86 bootstrap test. > > * The x86 fully regression test. > > * The riscv fully regression test. > > I'm not convinced we should match this during early if-conversion, should we? > The middle-end doesn't really know .SAT_ADD but some handling of > .ADD_OVERFLOW is present. > > But please add a comment before the new pattern, esp. since it's > non-obvious that this is an improvent. > > I suspect you rely on this form being recognized as .SAT_ADD later but > what prevents us from breaking this? Why not convert it to .SAT_ADD > immediately? If this is because the ISEL pass (or the widen-mult pass) > cannot handle PHIs then I would suggest to split out enough parts of > tree-ssa-phiopt.cc to be able to query match.pd for COND_EXPRs. > > > gcc/ChangeLog: > > > > * match.pd: Add new simplify to convert branch SAT_ADD into > > branchless, if and only if backend implement the IFN. > > > > Signed-off-by: Pan Li <pan2.li@intel.com> > > --- > > gcc/match.pd | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index cff67c84498..2dc77a46e67 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (match (unsigned_integer_sat_add @0 @1) > > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > > > +#if GIMPLE > > + > > +(simplify > > + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop) > > + integer_minus_onep (realpart @2)) > > + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type) > > + && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) > > + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) > > + > > +#endif > > + > > /* x > y && x != XXX_MIN --> x > y > > x > y && x == XXX_MIN --> false . */ > > (for eqne (eq ne) > > -- > > 2.34.1 > >
On 5/23/24 6:14 AM, Richard Biener wrote: > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: >> >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass. >> >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 > > Likely you have released _2, more comments below on your previous mail. You can be sure by calling debug_tree () on the SSA_NAME node in question. If it reports "in-free-list", then that's definitive that the SSA_NAME was released back to the SSA_NAME manager. If that SSA_NAME is still in the IL, then that's very bad. jeff
Thanks Jeff and Richard for suggestion and reviewing. Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b. It can perform the convert from PHI to stmt = cond ? a : b successfully, and then the widen-mul is able to do the recog to .SAT_ADD. For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched, as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition. sat_add.c: In function ‘sat_add_u_3_uint8_t’: sat_add.c:69:1: error: missing definition 69 | SAT_ADD_U_3(uint8_t); | ^~~~~~~~~~~ for SSA_NAME: _6 in statement: # VUSE <.MEM_14(D)> return _6; during GIMPLE pass: phiopt dump file: sat_add.c.046t.phiopt1 sat_add.c:69:1: internal compiler error: verify_ssa failed 0x1db41ba verify_ssa(bool, bool /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203 0x18e3075 execute_function_todo /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096 0x18e1c52 do_per_function /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688 0x18e3222 execute_todo I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it. Thus, there will be orphan nodes somewhere and we need something like rollback to recover the gimple up to a point. I tried sorts of release_xx or likewise but seems not working. So is there any suggest to take care of such gimple rollback or another solution for this? Below are The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot. Pan diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc index 918cf50b589..7982b65bac4 100644 --- a/gcc/tree-ssa-phiopt.cc +++ b/gcc/tree-ssa-phiopt.cc @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op) } } +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); + +/* Try to match the phi expr to the gimple cond. Return true if we can + perform the convert or return false. There will be some restrictions + or such kind of conversion, aka: + + 1. Only selected pattern will try this convert. + 2. The generated gassign matched the selected IFN pattern. + 3. The backend has implement the standard name. + + From: + <bb 2> : + _1 = x_3(D) + y_4(D); + if (_1 >= x_3(D)) + goto <bb 3>; [INV] + else + goto <bb 4>; [INV] + + <bb 3> : + + <bb 4> : + # _2 = PHI <255(2), _1(3)> + + To: + <bb 2> : + _1 = x_3(D) + y_4(D); + phi_cond_6 = _1 >= x_3(D); + _2 = phi_cond_6 ? _1 : 255; */ + +static bool +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1) +{ + gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb)); + + if (!cond) + return false; + + enum tree_code code = gimple_cond_code (cond); + tree phi_result = gimple_phi_result (phi); + tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond"); + tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond), + gimple_cond_rhs (cond)); + tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1); + + gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree); + gassign *stmt_val = gimple_build_assign (phi_result, rhs); + + tree ops[2]; + tree lhs = gimple_assign_lhs (stmt_val); + bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL) + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), + OPTIMIZE_FOR_BOTH)); + + if (matched_p) + { + gimple_stmt_iterator gsi = gsi_last_bb (cond_bb); + gimple_stmt_iterator psi = gsi_for_stmt (phi); + + gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT); + gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT); + remove_phi_node (&psi, false); + } + else + { + // Clean up the stmt created, but non of blow works well. + // gsi = gsi_for_stmt (stmt_val); + // gsi_remove (&gsi, true); + // release_defs (stmt_val); + // ggc_free (stmt_val); + + // gsi = gsi_for_stmt (stmt_cond); + // gsi_remove (&gsi, true); + // release_defs (stmt_cond); + // ggc_free (stmt_cond); + + // release_defs (stmt_cond); + // release_defs (stmt_val); + release_ssa_name (cond_tree); + } + + return matched_p; +} + /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT. Return NULL if nothing can be simplified or the resulting simplified value with parts pushed if EARLY_P was true. Also rejects non allowed tree code @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, So, given the condition COND, and the two PHI arguments, match and simplify can happen on (COND) ? arg0 : arg1. */ + if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1)) + return true; + stmt = last_nondebug_stmt (cond_bb); /* We need to know which is the true edge and which is the false -----Original Message----- From: Jeff Law <jeffreyalaw@gmail.com> Sent: Thursday, May 23, 2024 10:59 PM To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On 5/23/24 6:14 AM, Richard Biener wrote: > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: >> >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass. >> >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 > > Likely you have released _2, more comments below on your previous mail. You can be sure by calling debug_tree () on the SSA_NAME node in question. If it reports "in-free-list", then that's definitive that the SSA_NAME was released back to the SSA_NAME manager. If that SSA_NAME is still in the IL, then that's very bad. jeff
On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote: > > Thanks Jeff and Richard for suggestion and reviewing. > > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b. > It can perform the convert from PHI to stmt = cond ? a : b successfully, and then > the widen-mul is able to do the recog to .SAT_ADD. > > For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched, > as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs > is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition. > > sat_add.c: In function ‘sat_add_u_3_uint8_t’: > sat_add.c:69:1: error: missing definition > 69 | SAT_ADD_U_3(uint8_t); > | ^~~~~~~~~~~ > for SSA_NAME: _6 in statement: > # VUSE <.MEM_14(D)> > return _6; > during GIMPLE pass: phiopt > dump file: sat_add.c.046t.phiopt1 > sat_add.c:69:1: internal compiler error: verify_ssa failed > 0x1db41ba verify_ssa(bool, bool > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203 > 0x18e3075 execute_function_todo > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096 > 0x18e1c52 do_per_function > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688 > 0x18e3222 execute_todo > > I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it. > Thus, there will be orphan nodes somewhere and we need something like rollback to recover the > gimple up to a point. I tried sorts of release_xx or likewise but seems not working. > > So is there any suggest to take care of such gimple rollback or another solution for this? Below are > The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot. > > Pan > > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc > index 918cf50b589..7982b65bac4 100644 > --- a/gcc/tree-ssa-phiopt.cc > +++ b/gcc/tree-ssa-phiopt.cc > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op) > } > } > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); > + > +/* Try to match the phi expr to the gimple cond. Return true if we can > + perform the convert or return false. There will be some restrictions > + or such kind of conversion, aka: > + > + 1. Only selected pattern will try this convert. > + 2. The generated gassign matched the selected IFN pattern. > + 3. The backend has implement the standard name. > + > + From: > + <bb 2> : > + _1 = x_3(D) + y_4(D); > + if (_1 >= x_3(D)) > + goto <bb 3>; [INV] > + else > + goto <bb 4>; [INV] > + > + <bb 3> : > + > + <bb 4> : > + # _2 = PHI <255(2), _1(3)> > + > + To: > + <bb 2> : > + _1 = x_3(D) + y_4(D); > + phi_cond_6 = _1 >= x_3(D); > + _2 = phi_cond_6 ? _1 : 255; */ > + > +static bool > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1) You should do this in widen-mult and/or ISEL and if necessary for vectorization in tree-if-conv.cc, though eventually what if-convert creates might be good enough to match during pattern recognition. > +{ > + gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb)); > + > + if (!cond) > + return false; > + > + enum tree_code code = gimple_cond_code (cond); > + tree phi_result = gimple_phi_result (phi); > + tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond"); > + tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond), > + gimple_cond_rhs (cond)); > + tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1); phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond. > + > + gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree); > + gassign *stmt_val = gimple_build_assign (phi_result, rhs); > + > + tree ops[2]; > + tree lhs = gimple_assign_lhs (stmt_val); > + bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL) > + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), > + OPTIMIZE_FOR_BOTH)); > + > + if (matched_p) > + { > + gimple_stmt_iterator gsi = gsi_last_bb (cond_bb); > + gimple_stmt_iterator psi = gsi_for_stmt (phi); > + > + gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT); > + gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT); > + remove_phi_node (&psi, false); You only matched but you do not insert the actual .SAT_ADD here and that's the definition that's missing. You probably shouldn't need to add the cond-stmt? > + } > + else > + { > + // Clean up the stmt created, but non of blow works well. > + // gsi = gsi_for_stmt (stmt_val); > + // gsi_remove (&gsi, true); > + // release_defs (stmt_val); > + // ggc_free (stmt_val); > + > + // gsi = gsi_for_stmt (stmt_cond); > + // gsi_remove (&gsi, true); > + // release_defs (stmt_cond); > + // ggc_free (stmt_cond); > + > + // release_defs (stmt_cond); > + // release_defs (stmt_val); > + release_ssa_name (cond_tree); As you don't insert the stmts you should be able to simply only release the SSA names and ggc_free the stmt. You can also look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc for more "ugly" ways to do this. Building a helper in one place to match a PHI def as COND_EXPR might be nice. As said, avoiding all the mess by providing native support from genmatch would be even better, but I'm not asking you to do that. Richard. > + } > + > + return matched_p; > +} > + > /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT. > Return NULL if nothing can be simplified or the resulting simplified value > with parts pushed if EARLY_P was true. Also rejects non allowed tree code > @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, > So, given the condition COND, and the two PHI arguments, match and simplify > can happen on (COND) ? arg0 : arg1. */ > > + if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1)) > + return true; > + > stmt = last_nondebug_stmt (cond_bb); > > /* We need to know which is the true edge and which is the false > > > -----Original Message----- > From: Jeff Law <jeffreyalaw@gmail.com> > Sent: Thursday, May 23, 2024 10:59 PM > To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > > > On 5/23/24 6:14 AM, Richard Biener wrote: > > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: > >> > >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. > >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass. > >> > >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => > >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 > > > > Likely you have released _2, more comments below on your previous mail. > You can be sure by calling debug_tree () on the SSA_NAME node in > question. If it reports "in-free-list", then that's definitive that the > SSA_NAME was released back to the SSA_NAME manager. If that SSA_NAME is > still in the IL, then that's very bad. > > jeff >
On Fri, May 24, 2024 at 8:56 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote: > > > > Thanks Jeff and Richard for suggestion and reviewing. > > > > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b. > > It can perform the convert from PHI to stmt = cond ? a : b successfully, and then > > the widen-mul is able to do the recog to .SAT_ADD. > > > > For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched, > > as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs > > is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition. > > > > sat_add.c: In function ‘sat_add_u_3_uint8_t’: > > sat_add.c:69:1: error: missing definition > > 69 | SAT_ADD_U_3(uint8_t); > > | ^~~~~~~~~~~ > > for SSA_NAME: _6 in statement: > > # VUSE <.MEM_14(D)> > > return _6; > > during GIMPLE pass: phiopt > > dump file: sat_add.c.046t.phiopt1 > > sat_add.c:69:1: internal compiler error: verify_ssa failed > > 0x1db41ba verify_ssa(bool, bool > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203 > > 0x18e3075 execute_function_todo > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096 > > 0x18e1c52 do_per_function > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688 > > 0x18e3222 execute_todo > > > > I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it. > > Thus, there will be orphan nodes somewhere and we need something like rollback to recover the > > gimple up to a point. I tried sorts of release_xx or likewise but seems not working. > > > > So is there any suggest to take care of such gimple rollback or another solution for this? Below are > > The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot. > > > > Pan > > > > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc > > index 918cf50b589..7982b65bac4 100644 > > --- a/gcc/tree-ssa-phiopt.cc > > +++ b/gcc/tree-ssa-phiopt.cc > > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op) > > } > > } > > > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); > > + > > +/* Try to match the phi expr to the gimple cond. Return true if we can > > + perform the convert or return false. There will be some restrictions > > + or such kind of conversion, aka: > > + > > + 1. Only selected pattern will try this convert. > > + 2. The generated gassign matched the selected IFN pattern. > > + 3. The backend has implement the standard name. > > + > > + From: > > + <bb 2> : > > + _1 = x_3(D) + y_4(D); > > + if (_1 >= x_3(D)) > > + goto <bb 3>; [INV] > > + else > > + goto <bb 4>; [INV] > > + > > + <bb 3> : > > + > > + <bb 4> : > > + # _2 = PHI <255(2), _1(3)> > > + > > + To: > > + <bb 2> : > > + _1 = x_3(D) + y_4(D); > > + phi_cond_6 = _1 >= x_3(D); > > + _2 = phi_cond_6 ? _1 : 255; */ > > + > > +static bool > > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1) > > You should do this in widen-mult and/or ISEL and if necessary for vectorization > in tree-if-conv.cc, though eventually what if-convert creates might be > good enough > to match during pattern recognition. > > > +{ > > + gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb)); > > + > > + if (!cond) > > + return false; > > + > > + enum tree_code code = gimple_cond_code (cond); > > + tree phi_result = gimple_phi_result (phi); > > + tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond"); > > + tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond), > > + gimple_cond_rhs (cond)); > > + tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1); > > phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond. > > > + > > + gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree); > > + gassign *stmt_val = gimple_build_assign (phi_result, rhs); > > + > > + tree ops[2]; > > + tree lhs = gimple_assign_lhs (stmt_val); > > + bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL) > > + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), > > + OPTIMIZE_FOR_BOTH)); > > + > > + if (matched_p) > > + { > > + gimple_stmt_iterator gsi = gsi_last_bb (cond_bb); > > + gimple_stmt_iterator psi = gsi_for_stmt (phi); > > + > > + gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT); > > + gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT); > > + remove_phi_node (&psi, false); > > You only matched but you do not insert the actual .SAT_ADD here and that's > the definition that's missing. You probably shouldn't need to add the > cond-stmt? > > > + } > > + else > > + { > > + // Clean up the stmt created, but non of blow works well. > > + // gsi = gsi_for_stmt (stmt_val); > > + // gsi_remove (&gsi, true); > > + // release_defs (stmt_val); > > + // ggc_free (stmt_val); > > + > > + // gsi = gsi_for_stmt (stmt_cond); > > + // gsi_remove (&gsi, true); > > + // release_defs (stmt_cond); > > + // ggc_free (stmt_cond); > > + > > + // release_defs (stmt_cond); > > + // release_defs (stmt_val); > > + release_ssa_name (cond_tree); > > As you don't insert the stmts you should be able to simply > only release the SSA names and ggc_free the stmt. You can also > look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc > for more "ugly" ways to do this. > > Building a helper in one place to match a PHI def as COND_EXPR > might be nice. As said, avoiding all the mess by providing native > support from genmatch would be even better, but I'm not asking you > to do that. For reference the attached outlines where/what to do if you are curious. Richard. > > Richard. > > > + } > > + > > + return matched_p; > > +} > > + > > /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT. > > Return NULL if nothing can be simplified or the resulting simplified value > > with parts pushed if EARLY_P was true. Also rejects non allowed tree code > > @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, > > So, given the condition COND, and the two PHI arguments, match and simplify > > can happen on (COND) ? arg0 : arg1. */ > > > > + if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1)) > > + return true; > > + > > stmt = last_nondebug_stmt (cond_bb); > > > > /* We need to know which is the true edge and which is the false > > > > > > -----Original Message----- > > From: Jeff Law <jeffreyalaw@gmail.com> > > Sent: Thursday, May 23, 2024 10:59 PM > > To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com> > > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com > > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > > > > > > > On 5/23/24 6:14 AM, Richard Biener wrote: > > > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: > > >> > > >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. > > >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass. > > >> > > >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => > > >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 > > > > > > Likely you have released _2, more comments below on your previous mail. > > You can be sure by calling debug_tree () on the SSA_NAME node in > > question. If it reports "in-free-list", then that's definitive that the > > SSA_NAME was released back to the SSA_NAME manager. If that SSA_NAME is > > still in the IL, then that's very bad. > > > > jeff > >
Thanks Richard for help and comments. If my understanding is correct, I should be able to follow below step(s) to support the branch form for unsigned SAT_ADD. 1. Building a helper in one place to match a PHI def as COND_EXPR (or even a better way to do it by providing native support from genmatch) 2. Leverage this helper in widen-mul and recog it as .SAT_ADD if matches. Will have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Friday, May 24, 2024 3:21 PM To: Li, Pan2 <pan2.li@intel.com> Cc: Jeff Law <jeffreyalaw@gmail.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD On Fri, May 24, 2024 at 8:56 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, May 24, 2024 at 8:37 AM Li, Pan2 <pan2.li@intel.com> wrote: > > > > Thanks Jeff and Richard for suggestion and reviewing. > > > > Have another try in phiopt to do the convert from PHI to stmt = cond ? a : b. > > It can perform the convert from PHI to stmt = cond ? a : b successfully, and then > > the widen-mul is able to do the recog to .SAT_ADD. > > > > For now, to limit the risck, the above convert from PHI to stmt = cond ? a : b only be performed when matched, > > as well as the backend support the usadd standard name. Unfortunately, I am stuck in the case that when the lhs > > is not matched, we need to clean up something like created stmt in previous, or we will have ICE for missing definition. > > > > sat_add.c: In function ‘sat_add_u_3_uint8_t’: > > sat_add.c:69:1: error: missing definition > > 69 | SAT_ADD_U_3(uint8_t); > > | ^~~~~~~~~~~ > > for SSA_NAME: _6 in statement: > > # VUSE <.MEM_14(D)> > > return _6; > > during GIMPLE pass: phiopt > > dump file: sat_add.c.046t.phiopt1 > > sat_add.c:69:1: internal compiler error: verify_ssa failed > > 0x1db41ba verify_ssa(bool, bool > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/tree-ssa.cc:1203 > > 0x18e3075 execute_function_todo > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:2096 > > 0x18e1c52 do_per_function > > /home/pli/gcc/555/riscv-gnu-toolchain/gcc/__RISCV_BUILD__/../gcc/passes.cc:1688 > > 0x18e3222 execute_todo > > > > I bet the reason is that we created new stmt like stmt_cond and stmt_val but we don't insert it. > > Thus, there will be orphan nodes somewhere and we need something like rollback to recover the > > gimple up to a point. I tried sorts of release_xx or likewise but seems not working. > > > > So is there any suggest to take care of such gimple rollback or another solution for this? Below are > > The function to perform the convert from PHI to stmt = cond ? a : b for reference, thanks a lot. > > > > Pan > > > > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc > > index 918cf50b589..7982b65bac4 100644 > > --- a/gcc/tree-ssa-phiopt.cc > > +++ b/gcc/tree-ssa-phiopt.cc > > @@ -486,6 +486,88 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op) > > } > > } > > > > +extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree)); > > + > > +/* Try to match the phi expr to the gimple cond. Return true if we can > > + perform the convert or return false. There will be some restrictions > > + or such kind of conversion, aka: > > + > > + 1. Only selected pattern will try this convert. > > + 2. The generated gassign matched the selected IFN pattern. > > + 3. The backend has implement the standard name. > > + > > + From: > > + <bb 2> : > > + _1 = x_3(D) + y_4(D); > > + if (_1 >= x_3(D)) > > + goto <bb 3>; [INV] > > + else > > + goto <bb 4>; [INV] > > + > > + <bb 3> : > > + > > + <bb 4> : > > + # _2 = PHI <255(2), _1(3)> > > + > > + To: > > + <bb 2> : > > + _1 = x_3(D) + y_4(D); > > + phi_cond_6 = _1 >= x_3(D); > > + _2 = phi_cond_6 ? _1 : 255; */ > > + > > +static bool > > +match_phi_to_gimple_cond (basic_block cond_bb, gphi *phi, tree arg0, tree arg1) > > You should do this in widen-mult and/or ISEL and if necessary for vectorization > in tree-if-conv.cc, though eventually what if-convert creates might be > good enough > to match during pattern recognition. > > > +{ > > + gcond *cond = as_a <gcond *> (*gsi_last_bb (cond_bb)); > > + > > + if (!cond) > > + return false; > > + > > + enum tree_code code = gimple_cond_code (cond); > > + tree phi_result = gimple_phi_result (phi); > > + tree cond_tree = make_temp_ssa_name (boolean_type_node, NULL, "phi_cond"); > > + tree cmp_tree = build2 (code, boolean_type_node, gimple_cond_lhs (cond), > > + gimple_cond_rhs (cond)); > > + tree rhs = build3 (COND_EXPR, TREE_TYPE (phi_result), cond_tree, arg0, arg1); > > phiopt directly uses cmp_tree, so you could do that as well and avoid stmt_cond. > > > + > > + gassign *stmt_cond = gimple_build_assign (cond_tree, cmp_tree); > > + gassign *stmt_val = gimple_build_assign (phi_result, rhs); > > + > > + tree ops[2]; > > + tree lhs = gimple_assign_lhs (stmt_val); > > + bool matched_p = (gimple_unsigned_integer_sat_add (lhs, ops, NULL) > > + && direct_internal_fn_supported_p (IFN_SAT_ADD, TREE_TYPE (lhs), > > + OPTIMIZE_FOR_BOTH)); > > + > > + if (matched_p) > > + { > > + gimple_stmt_iterator gsi = gsi_last_bb (cond_bb); > > + gimple_stmt_iterator psi = gsi_for_stmt (phi); > > + > > + gsi_insert_before (&gsi, stmt_cond, GSI_SAME_STMT); > > + gsi_insert_before (&gsi, stmt_val, GSI_SAME_STMT); > > + remove_phi_node (&psi, false); > > You only matched but you do not insert the actual .SAT_ADD here and that's > the definition that's missing. You probably shouldn't need to add the > cond-stmt? > > > + } > > + else > > + { > > + // Clean up the stmt created, but non of blow works well. > > + // gsi = gsi_for_stmt (stmt_val); > > + // gsi_remove (&gsi, true); > > + // release_defs (stmt_val); > > + // ggc_free (stmt_val); > > + > > + // gsi = gsi_for_stmt (stmt_cond); > > + // gsi_remove (&gsi, true); > > + // release_defs (stmt_cond); > > + // ggc_free (stmt_cond); > > + > > + // release_defs (stmt_cond); > > + // release_defs (stmt_val); > > + release_ssa_name (cond_tree); > > As you don't insert the stmts you should be able to simply > only release the SSA names and ggc_free the stmt. You can also > look at maybe_fold_comparisons_from_match_pd in gimple-fold.cc > for more "ugly" ways to do this. > > Building a helper in one place to match a PHI def as COND_EXPR > might be nice. As said, avoiding all the mess by providing native > support from genmatch would be even better, but I'm not asking you > to do that. For reference the attached outlines where/what to do if you are curious. Richard. > > Richard. > > > + } > > + > > + return matched_p; > > +} > > + > > /* gimple_simplify_phiopt is like gimple_simplify but designed for PHIOPT. > > Return NULL if nothing can be simplified or the resulting simplified value > > with parts pushed if EARLY_P was true. Also rejects non allowed tree code > > @@ -826,6 +908,9 @@ match_simplify_replacement (basic_block cond_bb, basic_block middle_bb, > > So, given the condition COND, and the two PHI arguments, match and simplify > > can happen on (COND) ? arg0 : arg1. */ > > > > + if (match_phi_to_gimple_cond (cond_bb, phi, arg0, arg1)) > > + return true; > > + > > stmt = last_nondebug_stmt (cond_bb); > > > > /* We need to know which is the true edge and which is the false > > > > > > -----Original Message----- > > From: Jeff Law <jeffreyalaw@gmail.com> > > Sent: Thursday, May 23, 2024 10:59 PM > > To: Richard Biener <richard.guenther@gmail.com>; Li, Pan2 <pan2.li@intel.com> > > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; kito.cheng@gmail.com; tamar.christina@arm.com; pinskia@gmail.com > > Subject: Re: [PATCH v2] Match: Support __builtin_add_overflow branch form for unsigned SAT_ADD > > > > > > > > On 5/23/24 6:14 AM, Richard Biener wrote: > > > On Thu, May 23, 2024 at 1:08 PM Li, Pan2 <pan2.li@intel.com> wrote: > > >> > > >> I have a try to convert the PHI from Part-A to Part-B, aka PHI to _2 = phi_cond ? _1 : 255. > > >> And then we can do the matching on COND_EXPR in the underlying widen-mul pass. > > >> > > >> Unfortunately, meet some ICE when verify_gimple_phi in sccopy1 pass => > > >> sat_add.c:66:1: internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.cc:86 > > > > > > Likely you have released _2, more comments below on your previous mail. > > You can be sure by calling debug_tree () on the SSA_NAME node in > > question. If it reports "in-free-list", then that's definitive that the > > SSA_NAME was released back to the SSA_NAME manager. If that SSA_NAME is > > still in the IL, then that's very bad. > > > > jeff > >
diff --git a/gcc/match.pd b/gcc/match.pd index cff67c84498..2dc77a46e67 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3080,6 +3080,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (match (unsigned_integer_sat_add @0 @1) (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) +#if GIMPLE + +(simplify + (cond (ne (imagpart (IFN_ADD_OVERFLOW@2 @0 @1)) integer_zerop) + integer_minus_onep (realpart @2)) + (if (ternary_integer_types_match_p (type, @0, @1) && TYPE_UNSIGNED (type) + && direct_internal_fn_supported_p (IFN_SAT_ADD, type, OPTIMIZE_FOR_BOTH)) + (bit_ior (plus@3 @0 @1) (negate (convert (lt @3 @0)))))) + +#endif + /* x > y && x != XXX_MIN --> x > y x > y && x == XXX_MIN --> false . */ (for eqne (eq ne)