Message ID | a9c739df-eba4-e0e6-b59e-4d6ecc7511e9@arm.com |
---|---|
State | New |
Headers | show |
Series | [1/3] Refactor to allow internal_fn's | expand |
On Fri, 28 Apr 2023, Andre Vieira (lists) wrote: > This patch replaces the existing tree_code widen_plus and widen_minus > patterns with internal_fn versions. > > DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it provides > convenience wrappers for defining conversions that require a hi/lo split, like > widening and narrowing operations. Each definition for <NAME> will require an > optab named <OPTAB> and two other optabs that you specify for signed and > unsigned. The hi/lo pair is necessary because the widening operations take n > narrow elements as inputs and return n/2 wide elements as outputs. The 'lo' > operation operates on the first n/2 elements of input. The 'hi' operation > operates on the second n/2 elements of input. Defining an internal_fn along > with hi/lo variations allows a single internal function to be returned from a > vect_recog function that will later be expanded to hi/lo. > > DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a widening > internal_fn. It is defined differently in different places and internal-fn.def > is sourced from those places so the parameters given can be reused. > internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later > defined to generate the 'expand_' functions for the hi/lo versions of the fn. > internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original > and hi/lo variants of the internal_fn > > For example: > IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>addl_hi_<mode> -> > (u/s)addl2 > IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>addl_lo_<mode> > -> (u/s)addl > > This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree > codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. I'll note that it's interesting we have widen multiplication as the only existing example where we have both HI/LO and EVEN/ODD cases. I think we want to share as much of the infrastructure to eventually support targets doing even/odd (I guess all VLA vector targets will be even/odd?). DEF_INTERNAL_OPTAB_HILO_FN also looks to be implicitely directed to widening operations (otherwise no signed/unsigned variants would be necessary). What I don't understand is why we need an optab without _hi/_lo but in that case no signed/unsigned variant? Looks like all plus, plus_lo and plus_hi are commutative but only plus is widening?! So is the setup that the vectorizer doesn't know about the split and uses 'plus' but then the expander performs the split? It does look a bit awkward here (the plain 'plus' is just used for the scalar case during pattern recog it seems). I'd rather have DEF_INTERNAL_OPTAB_HILO_FN split up, declaring the hi/lo pairs and the scalar variant separately using DEF_INTERNAL_FN without expander for that, and having DEF_INTERNAL_HILO_WIDEN_OPTAB_FN and DEF_INTERNAL_EVENODD_WIDEN_OPTAB_FN for the signed/unsigned pairs? (if we need that helper at all) Targets shouldn't need to implement the plain optab (it shouldn't exist) and the vectorizer should query the hi/lo or even/odd optabs for support instead. The vectorizer parts look OK to me, I'd like Richard to chime in on the optab parts as well. Thanks, Richard. > gcc/ChangeLog: > > 2023-04-28 Andre Vieira <andre.simoesdiasvieira@arm.com> > Joel Hutton <joel.hutton@arm.com> > Tamar Christina <tamar.christina@arm.com> > > * internal-fn.cc (INCLUDE_MAP): Include maps for use in optab > lookup. > (DEF_INTERNAL_OPTAB_HILO_FN): Macro to define an internal_fn that > expands into multiple internal_fns (for widening). > (ifn_cmp): Function to compare ifn's for sorting/searching. > (lookup_hilo_ifn_optab): Add lookup function. > (lookup_hilo_internal_fn): Add lookup function. > (commutative_binary_fn_p): Add widen_plus fn's. > (widening_fn_p): New function. > (decomposes_to_hilo_fn_p): New function. > * internal-fn.def (DEF_INTERNAL_OPTAB_HILO_FN): Define widening > plus,minus functions. > (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS tree code. > (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS tree code. > * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. > (lookup_hilo_ifn_optab): Add prototype. > (lookup_hilo_internal_fn): Likewise. > (widening_fn_p): Likewise. > (decomposes_to_hilo_fn_p): Likewise. > * optabs.cc (commutative_optab_p): Add widening plus, minus optabs. > * optabs.def (OPTAB_CD): widen add, sub optabs > * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support > patterns with a hi/lo split. > (vect_recog_widen_plus_pattern): Refactor to return > IFN_VECT_WIDEN_PLUS. > (vect_recog_widen_minus_pattern): Refactor to return new > IFN_VEC_WIDEN_MINUS. > * tree-vect-stmts.cc (vectorizable_conversion): Add widen plus/minus > ifn > support. > (supportable_widening_operation): Add widen plus/minus ifn support. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-widen-add.c: Test that new > IFN_VEC_WIDEN_PLUS is being used. > * gcc.target/aarch64/vect-widen-sub.c: Test that new > IFN_VEC_WIDEN_MINUS is being used. >
Richard Biener <rguenther@suse.de> writes: > On Fri, 28 Apr 2023, Andre Vieira (lists) wrote: > >> This patch replaces the existing tree_code widen_plus and widen_minus >> patterns with internal_fn versions. >> >> DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it provides >> convenience wrappers for defining conversions that require a hi/lo split, like >> widening and narrowing operations. Each definition for <NAME> will require an >> optab named <OPTAB> and two other optabs that you specify for signed and >> unsigned. The hi/lo pair is necessary because the widening operations take n >> narrow elements as inputs and return n/2 wide elements as outputs. The 'lo' >> operation operates on the first n/2 elements of input. The 'hi' operation >> operates on the second n/2 elements of input. Defining an internal_fn along >> with hi/lo variations allows a single internal function to be returned from a >> vect_recog function that will later be expanded to hi/lo. >> >> DEF_INTERNAL_OPTAB_HILO_FN is used in internal-fn.def to register a widening >> internal_fn. It is defined differently in different places and internal-fn.def >> is sourced from those places so the parameters given can be reused. >> internal-fn.c: defined to expand to hi/lo signed/unsigned optabs, later >> defined to generate the 'expand_' functions for the hi/lo versions of the fn. >> internal-fn.def: defined to invoke DEF_INTERNAL_OPTAB_FN for the original >> and hi/lo variants of the internal_fn >> >> For example: >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>addl_hi_<mode> -> >> (u/s)addl2 >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>addl_lo_<mode> >> -> (u/s)addl >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > I'll note that it's interesting we have widen multiplication as > the only existing example where we have both HI/LO and EVEN/ODD cases. > I think we want to share as much of the infrastructure to eventually > support targets doing even/odd (I guess all VLA vector targets will > be even/odd?). Can't speak for all, but SVE2 certainly is. > DEF_INTERNAL_OPTAB_HILO_FN also looks to be implicitely directed to > widening operations (otherwise no signed/unsigned variants would be > necessary). What I don't understand is why we need an optab > without _hi/_lo but in that case no signed/unsigned variant? > > Looks like all plus, plus_lo and plus_hi are commutative but > only plus is widening?! So is the setup that the vectorizer > doesn't know about the split and uses 'plus' but then the > expander performs the split? It does look a bit awkward here > (the plain 'plus' is just used for the scalar case during > pattern recog it seems). > > I'd rather have DEF_INTERNAL_OPTAB_HILO_FN split up, declaring > the hi/lo pairs and the scalar variant separately using > DEF_INTERNAL_FN without expander for that, and having > DEF_INTERNAL_HILO_WIDEN_OPTAB_FN and DEF_INTERNAL_EVENODD_WIDEN_OPTAB_FN > for the signed/unsigned pairs? (if we need that helper at all) > > Targets shouldn't need to implement the plain optab (it shouldn't > exist) and the vectorizer should query the hi/lo or even/odd > optabs for support instead. I dread these kinds of review because I think I'm almost certain to flatly contradict something I said last time round, but +1 FWIW. It seems OK to define an ifn to represent the combined effect, for the scalar case, but that shouldn't leak into optabs unless we actually want to use the ifn for "real" scalar ops (as opposed to a temporary placeholder during pattern recognition). On the optabs/ifn bits: > +static int > +ifn_cmp (const void *a_, const void *b_) > +{ > + typedef std::pair<enum internal_fn, unsigned> ifn_pair; > + auto *a = (const std::pair<ifn_pair, optab> *)a_; > + auto *b = (const std::pair<ifn_pair, optab> *)b_; > + return (int) (a->first.first) - (b->first.first); > +} > + > +/* Return the optab belonging to the given internal function NAME for the given > + SIGN or unknown_optab. */ > + > +optab > +lookup_hilo_ifn_optab (enum internal_fn fn, unsigned sign) There is no NAME parameter. It also isn't clear what SIGN means: is 1 for unsigned or signed? Would be better to use signop and TYPE_SIGN IMO. > +{ > + typedef std::pair<enum internal_fn, unsigned> ifn_pair; > + typedef auto_vec <std::pair<ifn_pair, optab>>fn_to_optab_map_type; > + static fn_to_optab_map_type *fn_to_optab_map; > + > + if (!fn_to_optab_map) > + { > + unsigned num > + = sizeof (internal_fn_hilo_keys_array) / sizeof (enum internal_fn); > + fn_to_optab_map = new fn_to_optab_map_type (); > + for (unsigned int i = 0; i < num - 1; ++i) > + { > + enum internal_fn fn = internal_fn_hilo_keys_array[i]; > + optab v1 = internal_fn_hilo_values_array[2*i]; > + optab v2 = internal_fn_hilo_values_array[2*i + 1]; > + ifn_pair key1 (fn, 0); > + fn_to_optab_map->safe_push ({key1, v1}); > + ifn_pair key2 (fn, 1); > + fn_to_optab_map->safe_push ({key2, v2}); > + } > + fn_to_optab_map->qsort (ifn_cmp); > + } > + > + ifn_pair new_pair (fn, sign ? 1 : 0); > + optab tmp; > + std::pair<ifn_pair,optab> pair_wrap (new_pair, tmp); > + auto entry = fn_to_optab_map->bsearch (&pair_wrap, ifn_cmp); > + return entry != fn_to_optab_map->end () ? entry->second : unknown_optab; > +} > + Do we need to use a map for this? It seems like it follows mechanically from the macro definition and could be handled using a switch statement and preprocessor logic. Also, it would be good to make direct_internal_fn_optab DTRT for this case, rather than needing a separate function. > +extern void > +lookup_hilo_internal_fn (enum internal_fn ifn, enum internal_fn *lo, > + enum internal_fn *hi) > +{ > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > + > + *lo = internal_fn (ifn + 1); > + *hi = internal_fn (ifn + 2); > +} Nit: spurious extern. Function needs a comment. There have been requests to drop redundant "enum" keywords from new code. > +/* Return true if FN decomposes to _hi and _lo IFN. If true this should also > + be a widening function. */ > + > +bool > +decomposes_to_hilo_fn_p (internal_fn fn) > +{ > + if (!widening_fn_p (fn)) > + return false; > + > + switch (fn) > + { > + case IFN_VEC_WIDEN_PLUS: > + case IFN_VEC_WIDEN_MINUS: > + return true; > + > + default: > + return false; > + } > +} > + Similarly here I think we should use the preprocessor. It isn't clear why this returns false for !widening_fn_p. Narrowing hi/lo functions would decompose in a similar way. As a general comment, how about naming the new macro: DEF_INTERNAL_SIGNED_HILO_OPTAB_FN and make it invoke DEF_INTERNAL_SIGNED_OPTAB_FN twice, once for the hi and once for the lo? The new optabs need to be documented in md.texi. I think it'd be better to drop the "l" suffix in "addl" and "subl", since that's an Arm convention and is redundant with the earlier "widen". Sorry for the nitpicks and thanks for picking up this work. Richard
I have dealt with, I think..., most of your comments. There's quite a few changes, I think it's all a bit simpler now. I made some other changes to the costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve the same behaviour as we had with the tree codes before. Also added some extra checks to tree-cfg.cc that made sense to me. I am still regression testing the gimple-range-op change, as that was a last minute change, but the rest survived a bootstrap and regression test on aarch64-unknown-linux-gnu. cover letter: This patch replaces the existing tree_code widen_plus and widen_minus patterns with internal_fn versions. DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively except they provide convenience wrappers for defining conversions that require a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo and each of those will also require a signed and unsigned version in the case of widening. The hi/lo pair is necessary because the widening and narrowing operations take n narrow elements as inputs and return n/2 wide elements as outputs. The 'lo' operation operates on the first n/2 elements of input. The 'hi' operation operates on the second n/2 elements of input. Defining an internal_fn along with hi/lo variations allows a single internal function to be returned from a vect_recog function that will later be expanded to hi/lo. For example: IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2 IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. gcc/ChangeLog: 2023-05-12 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> Tamar Christina <tamar.christina@arm.com> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename this ... (vec_widen_<su>add_lo_<mode>): ... to this. (vec_widen_<su>addl_hi_<mode>): Rename this ... (vec_widen_<su>add_hi_<mode>): ... to this. (vec_widen_<su>subl_lo_<mode>): Rename this ... (vec_widen_<su>sub_lo_<mode>): ... to this. (vec_widen_<su>subl_hi_<mode>): Rename this ... (vec_widen_<su>sub_hi_<mode>): ...to this. * doc/generic.texi: Document new IFN codes. * internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to define an internal_fn that expands into multiple internal_fns for widening. (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing. (ifn_cmp): Function to compare ifn's for sorting/searching. (lookup_hilo_internal_fn): Add lookup function. (commutative_binary_fn_p): Add widen_plus fn's. (widening_fn_p): New function. (narrowing_fn_p): New function. (decomposes_to_hilo_fn_p): New function. (direct_internal_fn_optab): Change visibility. * internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define widening plus,minus functions. (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code. (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code. * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. (direct_internal_fn_optab): Declare new prototype. (lookup_hilo_internal_fn): Likewise. (widening_fn_p): Likewise. (Narrowing_fn_p): Likewise. (decomposes_to_hilo_fn_p): Likewise. * optabs.cc (commutative_optab_p): Add widening plus optabs. * optabs.def (OPTAB_D): Define widen add, sub optabs. * tree-cfg.cc (verify_gimple_call): Add checks for new widen add and sub IFNs. * tree-inline.cc (estimate_num_insns): Return same cost for widen add and sub IFNs as previous tree_codes. * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support patterns with a hi/lo split. (vect_recog_sad_pattern): Refactor to use new IFN codes. (vect_recog_widen_plus_pattern): Likewise. (vect_recog_widen_minus_pattern): Likewise. (vect_recog_average_pattern): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Add support for _HILO IFNs. (supportable_widening_operation): Likewise. * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: Test that new IFN_VEC_WIDEN_PLUS is being used. * gcc.target/aarch64/vect-widen-sub.c: Test that new IFN_VEC_WIDEN_MINUS is being used. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4626,7 +4626,7 @@ [(set_attr "type" "neon_<ADDSUB:optab>_long")] ) -(define_expand "vec_widen_<su>addl_lo_<mode>" +(define_expand "vec_widen_<su>add_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4638,7 +4638,7 @@ DONE; }) -(define_expand "vec_widen_<su>addl_hi_<mode>" +(define_expand "vec_widen_<su>add_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4650,7 +4650,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_lo_<mode>" +(define_expand "vec_widen_<su>sub_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4662,7 +4662,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_hi_<mode>" +(define_expand "vec_widen_<su>sub_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..0fd7e6cce8bbd4ecb8027b702722adcf6c32eb55 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1811,6 +1811,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @tindex VEC_WIDEN_MULT_LO_EXPR +@tindex IFN_VEC_WIDEN_PLUS_HI +@tindex IFN_VEC_WIDEN_PLUS_LO +@tindex IFN_VEC_WIDEN_MINUS_HI +@tindex IFN_VEC_WIDEN_MINUS_LO @tindex VEC_WIDEN_PLUS_HI_EXPR @tindex VEC_WIDEN_PLUS_LO_EXPR @tindex VEC_WIDEN_MINUS_HI_EXPR @@ -1861,6 +1865,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the low @code{N/2} elements of the two vector are multiplied to produce the vector of @code{N/2} products. +@item IFN_VEC_WIDEN_PLUS_HI +@itemx IFN_VEC_WIDEN_PLUS_LO +These internal functions represent widening vector addition of the high and low +parts of the two input vectors, respectively. Their operands are vectors that +contain the same number of elements (@code{N}) of the same integral type. The +result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the +high @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. + +@item IFN_VEC_WIDEN_MINUS_HI +@itemx IFN_VEC_WIDEN_MINUS_LO +These internal functions represent widening vector subtraction of the high and +low parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The high/low elements of the second vector are subtracted from the high/low +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second +vector are subtracted from the high @code{N/2} of the first to produce the +vector of @code{N/2} products. In the case of +@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second +vector are subtracted from the low @code{N/2} of the first to produce the +vector of @code{N/2} products. + @item VEC_WIDEN_PLUS_HI_EXPR @itemx VEC_WIDEN_PLUS_LO_EXPR These nodes represent widening vector addition of the high and low parts of diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 594bd3043f0e944299ddfff219f757ef15a3dd61..66636d82df27626e7911efd0cb8526921b39633f 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard () { range_operator *signed_op = ptr_op_widen_mult_signed; range_operator *unsigned_op = ptr_op_widen_mult_unsigned; + bool signed1, signed2, signed_ret; if (gimple_code (m_stmt) == GIMPLE_ASSIGN) switch (gimple_assign_rhs_code (m_stmt)) { @@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard () m_op1 = gimple_assign_rhs1 (m_stmt); m_op2 = gimple_assign_rhs2 (m_stmt); tree ret = gimple_assign_lhs (m_stmt); - bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; - bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; - bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; - - /* Normally these operands should all have the same sign, but - some passes and violate this by taking mismatched sign args. At - the moment the only one that's possible is mismatch inputs and - unsigned output. Once ranger supports signs for the operands we - can properly fix it, for now only accept the case we can do - correctly. */ - if ((signed1 ^ signed2) && signed_ret) - return; - - m_valid = true; - if (signed2 && !signed1) - std::swap (m_op1, m_op2); - - if (signed1 || signed2) - m_int = signed_op; - else - m_int = unsigned_op; + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; break; } default: - break; + return; } + else if (gimple_code (m_stmt) == GIMPLE_CALL + && gimple_call_internal_p (m_stmt) + && gimple_get_lhs (m_stmt) != NULL_TREE) + switch (gimple_call_internal_fn (m_stmt)) + { + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: + { + signed_op = ptr_op_widen_plus_signed; + unsigned_op = ptr_op_widen_plus_unsigned; + m_valid = false; + m_op1 = gimple_call_arg (m_stmt, 0); + m_op2 = gimple_call_arg (m_stmt, 1); + tree ret = gimple_get_lhs (m_stmt); + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; + break; + } + default: + return; + } + else + return; + + /* Normally these operands should all have the same sign, but some passes + and violate this by taking mismatched sign args. At the moment the only + one that's possible is mismatch inputs and unsigned output. Once ranger + supports signs for the operands we can properly fix it, for now only + accept the case we can do correctly. */ + if ((signed1 ^ signed2) && signed_ret) + return; + + m_valid = true; + if (signed2 && !signed1) + std::swap (m_op1, m_op2); + + if (signed1 || signed2) + m_int = signed_op; + else + m_int = unsigned_op; } // Set up a gimple_range_op_handler for any built in function which can be diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..1acea5ae33046b70de247b1688aea874d9956abc 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -90,6 +90,19 @@ lookup_internal_fn (const char *name) return entry ? *entry : IFN_LAST; } +/* Given an internal_fn IFN that is a HILO function, return its corresponding + LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + *lo = internal_fn (ifn + 1); + *hi = internal_fn (ifn + 2); +} + + /* Fnspec of each internal function, indexed by function number. */ const_tree internal_fn_fnspec_array[IFN_LAST + 1]; @@ -137,7 +150,16 @@ const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct, #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ UNSIGNED_OPTAB, TYPE) TYPE##_direct, +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) \ +TYPE##_direct, TYPE##_direct, TYPE##_direct, +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE) \ +TYPE##_direct, TYPE##_direct, TYPE##_direct, #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN not_direct }; @@ -3852,7 +3874,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, /* Return the optab used by internal function FN. */ -static optab +optab direct_internal_fn_optab (internal_fn fn, tree_pair types) { switch (fn) @@ -3971,6 +3993,9 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_VEC_WIDEN_PLUS_HILO: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: return true; default: @@ -4044,6 +4069,88 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as wide as the element size of the input vectors. */ + +bool +widening_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME##_HILO:\ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + + default: + return false; + } +} + +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as narrow as the element size of the input vectors. */ + +bool +narrowing_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \ + case IFN_##NAME##_HILO:\ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + + default: + return false; + } +} + +/* Return true if FN decomposes to _hi and _lo IFN. */ + +bool +decomposes_to_hilo_fn_p (internal_fn fn) +{ + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME##_HILO:\ + return true; + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \ + case IFN_##NAME##_HILO:\ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + + default: + return false; + } +} + /* Return true if IFN_SET_EDOM is supported. */ bool @@ -4071,7 +4178,33 @@ set_edom_supported_p (void) optab which_optab = direct_internal_fn_optab (fn, types); \ expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, \ + SIGNED_OPTAB, UNSIGNED_OPTAB, \ + TYPE) \ + static void \ + expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED, \ + gcall *stmt ATTRIBUTE_UNUSED) \ + { \ + gcc_unreachable (); \ + } \ + DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_HI, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_LO, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE) \ + static void \ + expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED, \ + gcall *stmt ATTRIBUTE_UNUSED) \ + { \ + gcc_unreachable (); \ + } \ + DEF_INTERNAL_OPTAB_FN(CODE##_LO, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN(CODE##_HI, FLAGS, OPTAB, TYPE) #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_OPTAB_FN +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4080,6 +4213,7 @@ set_edom_supported_p (void) where STMT is the statement that performs the call. */ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { + #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE, #include "internal-fn.def" 0 diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..012dd323b86dd7cfcc5c13d3a2bb2a453937155d 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -85,6 +85,13 @@ along with GCC; see the file COPYING3. If not see says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_SIGNED_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it + provides convenience wrappers for defining conversions that require a + hi/lo split, like widening and narrowing operations. Each definition + for <NAME> will require an optab named <OPTAB> and two other optabs that + you specify for signed and unsigned. + + Each entry must have a corresponding expander of the form: void expand_NAME (gimple_call stmt) @@ -123,6 +130,20 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE) +#endif + +#ifndef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE) +#endif + DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, @@ -315,6 +336,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) +DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_PLUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_sadd, vec_widen_uadd, + binary) +DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_MINUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_ssub, vec_widen_usub, + binary) DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 08922ed4254898f5fffca3f33973e96ed9ce772f..8ba07d6d1338e75bc5a451d9e403112a608f3ea2 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_INTERNAL_FN_H #define GCC_INTERNAL_FN_H +#include "insn-codes.h" +#include "insn-opinit.h" + + /* INTEGER_CST values for IFN_UNIQUE function arg-0. UNSPEC: Undifferentiated UNIQUE. @@ -112,6 +116,8 @@ internal_fn_name (enum internal_fn fn) } extern internal_fn lookup_internal_fn (const char *); +extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *); +extern optab direct_internal_fn_optab (internal_fn, tree_pair); /* Return the ECF_* flags for function FN. */ @@ -210,6 +216,9 @@ extern bool commutative_binary_fn_p (internal_fn); extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); +extern bool widening_fn_p (code_helper); +extern bool narrowing_fn_p (code_helper); +extern bool decomposes_to_hilo_fn_p (internal_fn); extern bool set_edom_supported_p (void); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab) || binoptab == smul_widen_optab || binoptab == umul_widen_optab || binoptab == smul_highpart_optab - || binoptab == umul_highpart_optab); + || binoptab == umul_highpart_optab + || binoptab == vec_widen_saddl_hi_optab + || binoptab == vec_widen_saddl_lo_optab + || binoptab == vec_widen_uaddl_hi_optab + || binoptab == vec_widen_uaddl_lo_optab + || binoptab == vec_widen_sadd_hi_optab + || binoptab == vec_widen_sadd_lo_optab + || binoptab == vec_widen_uadd_hi_optab + || binoptab == vec_widen_uadd_lo_optab); } /* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b300c9ca5737de9be809fa01aabe5e01..16d121722c8c5723d9b164f5a2c616dc7ec143de 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -410,6 +410,10 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") +OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a") +OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a") +OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a") +OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a") OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") @@ -422,6 +426,10 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") +OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a") +OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a") +OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a") +OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 0aeebb67fac864db284985f4a6f0653af281d62b..28464ad9e3a7ea25557ffebcdbdbc1340f9e0d8b 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "asan.h" #include "profile.h" #include "sreal.h" +#include "internal-fn.h" /* This file contains functions for building the Control Flow Graph (CFG) for a function tree. */ @@ -3411,6 +3412,52 @@ verify_gimple_call (gcall *stmt) debug_generic_stmt (fn); return true; } + internal_fn ifn = gimple_call_internal_fn (stmt); + if (ifn == IFN_LAST) + { + error ("gimple call has an invalid IFN"); + debug_generic_stmt (fn); + return true; + } + else if (decomposes_to_hilo_fn_p (ifn)) + { + /* Non decomposed HILO stmts should not appear in IL, these are + merely used as an internal representation to the auto-vectorizer + pass and should have been expanded to their _LO _HI variants. */ + error ("gimple call has an non decomposed HILO IFN"); + debug_generic_stmt (fn); + return true; + } + else if (ifn == IFN_VEC_WIDEN_PLUS_LO + || ifn == IFN_VEC_WIDEN_PLUS_HI + || ifn == IFN_VEC_WIDEN_MINUS_LO + || ifn == IFN_VEC_WIDEN_MINUS_HI) + { + tree rhs1_type = TREE_TYPE (gimple_call_arg (stmt, 0)); + tree rhs2_type = TREE_TYPE (gimple_call_arg (stmt, 1)); + tree lhs_type = TREE_TYPE (gimple_get_lhs (stmt)); + if (TREE_CODE (lhs_type) == VECTOR_TYPE) + { + if (TREE_CODE (rhs1_type) != VECTOR_TYPE + || TREE_CODE (rhs2_type) != VECTOR_TYPE) + { + error ("invalid non-vector operands in vector IFN call"); + debug_generic_stmt (fn); + return true; + } + lhs_type = TREE_TYPE (lhs_type); + rhs1_type = TREE_TYPE (rhs1_type); + rhs2_type = TREE_TYPE (rhs2_type); + } + if (POINTER_TYPE_P (lhs_type) + || POINTER_TYPE_P (rhs1_type) + || POINTER_TYPE_P (rhs2_type)) + { + error ("invalid (pointer) operands in vector IFN call"); + debug_generic_stmt (fn); + return true; + } + } } else { diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights) tree decl; if (gimple_call_internal_p (stmt)) - return 0; + { + internal_fn fn = gimple_call_internal_fn (stmt); + switch (fn) + { + case IFN_VEC_WIDEN_PLUS_HI: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_MINUS_HI: + case IFN_VEC_WIDEN_MINUS_LO: + return 1; + + default: + return 0; + } + } else if ((decl = gimple_call_fndecl (stmt)) && fndecl_built_in_p (decl)) { diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1778af0242898e3dc73d94d22a5b8505628a53b5..93cebc72beb4f65249a69b2665dfeb8a0991c1d1 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type) static unsigned int vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, - tree_code widened_code, bool shift_p, + code_helper widened_code, bool shift_p, unsigned int max_nops, vect_unpromoted_value *unprom, tree *common_type, enum optab_subtype *subtype = NULL) { /* Check for an integer operation with the right code. */ - gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); - if (!assign) + gimple* stmt = stmt_info->stmt; + if (!(is_gimple_assign (stmt) || is_gimple_call (stmt))) + return 0; + + code_helper rhs_code; + if (is_gimple_assign (stmt)) + rhs_code = gimple_assign_rhs_code (stmt); + else if (is_gimple_call (stmt)) + rhs_code = gimple_call_combined_fn (stmt); + else return 0; - tree_code rhs_code = gimple_assign_rhs_code (assign); - if (rhs_code != code && rhs_code != widened_code) + if (rhs_code != code + && rhs_code != widened_code) return 0; - tree type = TREE_TYPE (gimple_assign_lhs (assign)); + tree lhs = gimple_get_lhs (stmt); + tree type = TREE_TYPE (lhs); if (!INTEGRAL_TYPE_P (type)) return 0; @@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, { vect_unpromoted_value *this_unprom = &unprom[next_op]; unsigned int nops = 1; - tree op = gimple_op (assign, i + 1); + tree op = gimple_arg (stmt, i); if (i == 1 && TREE_CODE (op) == INTEGER_CST) { /* We already have a common type from earlier operands. @@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo, /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom[2]; - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR, + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, + IFN_VEC_WIDEN_MINUS_HILO, false, 2, unprom, &half_type)) return NULL; @@ -1395,14 +1405,16 @@ static gimple * vect_recog_widen_op_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out, tree_code orig_code, code_helper wide_code, - bool shift_p, const char *name) + bool shift_p, const char *name, + optab_subtype *subtype = NULL) { gimple *last_stmt = last_stmt_info->stmt; vect_unpromoted_value unprom[2]; tree half_type; if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code, - shift_p, 2, unprom, &half_type)) + shift_p, 2, unprom, &half_type, subtype)) + return NULL; /* Pattern detected. */ @@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo, type, pattern_stmt, vecctype); } +static gimple * +vect_recog_widen_op_pattern (vec_info *vinfo, + stmt_vec_info last_stmt_info, tree *type_out, + tree_code orig_code, internal_fn wide_ifn, + bool shift_p, const char *name, + optab_subtype *subtype = NULL) +{ + combined_fn ifn = as_combined_fn (wide_ifn); + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + orig_code, ifn, shift_p, name, + subtype); +} + + /* Try to detect multiplication on widened inputs, converting MULT_EXPR to WIDEN_MULT_EXPR. See vect_recog_widen_op_pattern for details. */ @@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, } /* Try to detect addition on widened inputs, converting PLUS_EXPR - to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_PLUS_HILO. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - PLUS_EXPR, WIDEN_PLUS_EXPR, false, - "vect_recog_widen_plus_pattern"); + PLUS_EXPR, IFN_VEC_WIDEN_PLUS_HILO, + false, "vect_recog_widen_plus_pattern", + &subtype); } /* Try to detect subtraction on widened inputs, converting MINUS_EXPR - to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_MINUS_HILO. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - MINUS_EXPR, WIDEN_MINUS_EXPR, false, - "vect_recog_widen_minus_pattern"); + MINUS_EXPR, IFN_VEC_WIDEN_MINUS_HILO, + false, "vect_recog_widen_minus_pattern", + &subtype); } /* Function vect_recog_ctz_ffs_pattern @@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo, vect_unpromoted_value unprom[3]; tree new_type; unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR, - WIDEN_PLUS_EXPR, false, 3, + IFN_VEC_WIDEN_PLUS_HILO, false, 3, unprom, &new_type); if (nops == 0) return NULL; @@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mask_conversion_pattern, "mask_conversion" }, { vect_recog_widen_plus_pattern, "widen_plus" }, { vect_recog_widen_minus_pattern, "widen_minus" }, + /* These must come after the double widening ones. */ }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index d152ae9ab10b361b88c0f839d6951c43b954750a..24c811ebe01fb8b003100dea494cf64fea72a975 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5038,7 +5038,9 @@ vectorizable_conversion (vec_info *vinfo, bool widen_arith = (code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR || code == WIDEN_MULT_EXPR - || code == WIDEN_LSHIFT_EXPR); + || code == WIDEN_LSHIFT_EXPR + || code == IFN_VEC_WIDEN_PLUS_HILO + || code == IFN_VEC_WIDEN_MINUS_HILO); if (!widen_arith && !CONVERT_EXPR_CODE_P (code) @@ -5088,7 +5090,9 @@ vectorizable_conversion (vec_info *vinfo, gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR || code == WIDEN_PLUS_EXPR - || code == WIDEN_MINUS_EXPR); + || code == WIDEN_MINUS_EXPR + || code == IFN_VEC_WIDEN_PLUS_HILO + || code == IFN_VEC_WIDEN_MINUS_HILO); op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : @@ -12478,10 +12482,43 @@ supportable_widening_operation (vec_info *vinfo, optab1 = vec_unpacks_sbool_lo_optab; optab2 = vec_unpacks_sbool_hi_optab; } - else + + if (code.is_fn_code ()) + { + internal_fn ifn = as_internal_fn ((combined_fn) code); + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + internal_fn lo, hi; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = direct_internal_fn_optab (lo, {vectype, vectype}); + optab2 = direct_internal_fn_optab (hi, {vectype, vectype}); + } + else if (code.is_tree_code ()) { - optab1 = optab_for_tree_code (c1, vectype, optab_default); - optab2 = optab_for_tree_code (c2, vectype, optab_default); + if (code == FIX_TRUNC_EXPR) + { + /* The signedness is determined from output operand. */ + optab1 = optab_for_tree_code (c1, vectype_out, optab_default); + optab2 = optab_for_tree_code (c2, vectype_out, optab_default); + } + else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ()) + && VECTOR_BOOLEAN_TYPE_P (wide_vectype) + && VECTOR_BOOLEAN_TYPE_P (vectype) + && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + { + /* If the input and result modes are the same, a different optab + is needed where we pass in the number of units in vectype. */ + optab1 = vec_unpacks_sbool_lo_optab; + optab2 = vec_unpacks_sbool_hi_optab; + } + else + { + optab1 = optab_for_tree_code (c1, vectype, optab_default); + optab2 = optab_for_tree_code (c2, vectype, optab_default); + } } if (!optab1 || !optab2) diff --git a/gcc/tree.def b/gcc/tree.def index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2) /* Widening sad (sum of absolute differences). - The first two arguments are of type t1 which should be integer. - The third argument and the result are of type t2, such that t2 is at least - twice the size of t1. Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is + The first two arguments are of type t1 which should be a vector of integers. + The third argument and the result are of type t2, such that the size of + the elements of t2 is at least twice the size of the elements of t1. + Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is equivalent to: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = PLUS_EXPR (tmp2, arg3) or: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = WIDEN_SUM_EXPR (tmp2, arg3) */
On Fri, 12 May 2023, Andre Vieira (lists) wrote: > I have dealt with, I think..., most of your comments. There's quite a few > changes, I think it's all a bit simpler now. I made some other changes to the > costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve > the same behaviour as we had with the tree codes before. Also added some extra > checks to tree-cfg.cc that made sense to me. > > I am still regression testing the gimple-range-op change, as that was a last > minute change, but the rest survived a bootstrap and regression test on > aarch64-unknown-linux-gnu. > > cover letter: > > This patch replaces the existing tree_code widen_plus and widen_minus > patterns with internal_fn versions. > > DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN > are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively > except they provide convenience wrappers for defining conversions that require > a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo > and each of those will also require a signed and unsigned version in the case > of widening. The hi/lo pair is necessary because the widening and narrowing > operations take n narrow elements as inputs and return n/2 wide elements as > outputs. The 'lo' operation operates on the first n/2 elements of input. The > 'hi' operation operates on the second n/2 elements of input. Defining an > internal_fn along with hi/lo variations allows a single internal function to > be returned from a vect_recog function that will later be expanded to hi/lo. > > > For example: > IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> > (u/s)addl2 > IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> > -> (u/s)addl > > This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree > codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. What I still don't understand is how we are so narrowly focused on HI/LO? We need a combined scalar IFN for pattern selection (not sure why that's now called _HILO, I expected no suffix). Then there's three possibilities the target can implement this: 1) with a widen_[su]add<mode> instruction - I _think_ that's what RISCV is going to offer since it is a target where vector modes have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead RVV can do a V4HI to V4SI widening and widening add/subtract using vwadd[u] and vwsub[u] (the HI->SI widening is actually done with a widening add of zero - eh). IIRC GCN is the same here. 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree codes currently support (exclusively) 3) similar, but widen_[su]add{_even,_odd}<mode> that said, things like decomposes_to_hilo_fn_p look to paint us into a 2) corner without good reason. Richard. > gcc/ChangeLog: > > 2023-05-12 Andre Vieira <andre.simoesdiasvieira@arm.com> > Joel Hutton <joel.hutton@arm.com> > Tamar Christina <tamar.christina@arm.com> > > * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): > Rename > this ... > (vec_widen_<su>add_lo_<mode>): ... to this. > (vec_widen_<su>addl_hi_<mode>): Rename this ... > (vec_widen_<su>add_hi_<mode>): ... to this. > (vec_widen_<su>subl_lo_<mode>): Rename this ... > (vec_widen_<su>sub_lo_<mode>): ... to this. > (vec_widen_<su>subl_hi_<mode>): Rename this ... > (vec_widen_<su>sub_hi_<mode>): ...to this. > * doc/generic.texi: Document new IFN codes. > * internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to > define an > internal_fn that expands into multiple internal_fns for widening. > (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing. > (ifn_cmp): Function to compare ifn's for sorting/searching. > (lookup_hilo_internal_fn): Add lookup function. > (commutative_binary_fn_p): Add widen_plus fn's. > (widening_fn_p): New function. > (narrowing_fn_p): New function. > (decomposes_to_hilo_fn_p): New function. > (direct_internal_fn_optab): Change visibility. > * internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define > widening > plus,minus functions. > (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code. > (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code. > * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. > (direct_internal_fn_optab): Declare new prototype. > (lookup_hilo_internal_fn): Likewise. > (widening_fn_p): Likewise. > (Narrowing_fn_p): Likewise. > (decomposes_to_hilo_fn_p): Likewise. > * optabs.cc (commutative_optab_p): Add widening plus optabs. > * optabs.def (OPTAB_D): Define widen add, sub optabs. > * tree-cfg.cc (verify_gimple_call): Add checks for new widen > add and sub IFNs. > * tree-inline.cc (estimate_num_insns): Return same > cost for widen add and sub IFNs as previous tree_codes. > * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support > patterns with a hi/lo split. > (vect_recog_sad_pattern): Refactor to use new IFN codes. > (vect_recog_widen_plus_pattern): Likewise. > (vect_recog_widen_minus_pattern): Likewise. > (vect_recog_average_pattern): Likewise. > * tree-vect-stmts.cc (vectorizable_conversion): Add support for > _HILO IFNs. > (supportable_widening_operation): Likewise. > * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-widen-add.c: Test that new > IFN_VEC_WIDEN_PLUS is being used. > * gcc.target/aarch64/vect-widen-sub.c: Test that new > IFN_VEC_WIDEN_MINUS is being used. >
On 12/05/2023 14:28, Richard Biener wrote: > On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >> I have dealt with, I think..., most of your comments. There's quite a few >> changes, I think it's all a bit simpler now. I made some other changes to the >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >> the same behaviour as we had with the tree codes before. Also added some extra >> checks to tree-cfg.cc that made sense to me. >> >> I am still regression testing the gimple-range-op change, as that was a last >> minute change, but the rest survived a bootstrap and regression test on >> aarch64-unknown-linux-gnu. >> >> cover letter: >> >> This patch replaces the existing tree_code widen_plus and widen_minus >> patterns with internal_fn versions. >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >> except they provide convenience wrappers for defining conversions that require >> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo >> and each of those will also require a signed and unsigned version in the case >> of widening. The hi/lo pair is necessary because the widening and narrowing >> operations take n narrow elements as inputs and return n/2 wide elements as >> outputs. The 'lo' operation operates on the first n/2 elements of input. The >> 'hi' operation operates on the second n/2 elements of input. Defining an >> internal_fn along with hi/lo variations allows a single internal function to >> be returned from a vect_recog function that will later be expanded to hi/lo. >> >> >> For example: >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> >> (u/s)addl2 >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> >> -> (u/s)addl >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > What I still don't understand is how we are so narrowly focused on > HI/LO? We need a combined scalar IFN for pattern selection (not > sure why that's now called _HILO, I expected no suffix). Then there's > three possibilities the target can implement this: > > 1) with a widen_[su]add<mode> instruction - I _think_ that's what > RISCV is going to offer since it is a target where vector modes > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > RVV can do a V4HI to V4SI widening and widening add/subtract > using vwadd[u] and vwsub[u] (the HI->SI widening is actually > done with a widening add of zero - eh). > IIRC GCN is the same here. > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree > codes currently support (exclusively) > 3) similar, but widen_[su]add{_even,_odd}<mode> > > that said, things like decomposes_to_hilo_fn_p look to paint us into > a 2) corner without good reason. I was kind of just keeping the naming, I had forgotten to mention I was also going to add _EVENODD but you are right, the pattern selection IFN does not need to be restrictive. And then at supportable_widening_operation we could check what the target offers support for (either 1, 2 or 3). We can then actually just get rid of decomposes_to_hilo_fn_p and just assume that for all narrowing or widening IFN's there are optabs (that may or may not be implemented by a target) for all three variants Having said that, that means we should have an optab to cover 1, which should probably just have the original name. Let me write it out... Say we have a IFN_VEC_WIDEN_PLUS pattern and assume its signed, supportable_widening_operation would then first check if the target supported vec_widen_sadd_optab for say V8HI -> V8SI? Risc-V would take this path I guess? If the target doesn't then it could check for support for: vec_widen_sadd_lo_optab V4HI -> V4SI vec_widen_sadd_hi_optab V4HI -> V4SI AArch64 Advanced SIMD would implement this. If the target still didn't support this it would check for (not sure about the modes here): vec_widen_sadd_even_optab VNx8HI -> VNx4SI vec_widen_sadd_odd_optab VNx8HI -> VNx4SI This is one SVE would implement. So that would mean that I'd probably end up rewriting #define DEF_INTERNAL_OPTAB_WIDENING_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) as: for1) DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) for 2) DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_LO, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_HI, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) for 3) DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_EVEN, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_ODD, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) And the same for narrowing (but with DEF_INTERNAL_OPTAB_FN instead of SIGNED_OPTAB). So each widening and narrowing IFN would have optabs for all its variants and each target implements the ones it supports. I'm happy to do this, but implementing support to handle the 1 and 3 variants without having optabs for them right now seems a bit odd and it would delay this patch, so I suggest I add the framework and the optabs but leave adding the vectorizer support for later? I can add comments to where I think that should go. > Richard. > >> gcc/ChangeLog: >> >> 2023-05-12 Andre Vieira <andre.simoesdiasvieira@arm.com> >> Joel Hutton <joel.hutton@arm.com> >> Tamar Christina <tamar.christina@arm.com> >> >> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): >> Rename >> this ... >> (vec_widen_<su>add_lo_<mode>): ... to this. >> (vec_widen_<su>addl_hi_<mode>): Rename this ... >> (vec_widen_<su>add_hi_<mode>): ... to this. >> (vec_widen_<su>subl_lo_<mode>): Rename this ... >> (vec_widen_<su>sub_lo_<mode>): ... to this. >> (vec_widen_<su>subl_hi_<mode>): Rename this ... >> (vec_widen_<su>sub_hi_<mode>): ...to this. >> * doc/generic.texi: Document new IFN codes. >> * internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to >> define an >> internal_fn that expands into multiple internal_fns for widening. >> (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing. >> (ifn_cmp): Function to compare ifn's for sorting/searching. >> (lookup_hilo_internal_fn): Add lookup function. >> (commutative_binary_fn_p): Add widen_plus fn's. >> (widening_fn_p): New function. >> (narrowing_fn_p): New function. >> (decomposes_to_hilo_fn_p): New function. >> (direct_internal_fn_optab): Change visibility. >> * internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define >> widening >> plus,minus functions. >> (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code. >> (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code. >> * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. >> (direct_internal_fn_optab): Declare new prototype. >> (lookup_hilo_internal_fn): Likewise. >> (widening_fn_p): Likewise. >> (Narrowing_fn_p): Likewise. >> (decomposes_to_hilo_fn_p): Likewise. >> * optabs.cc (commutative_optab_p): Add widening plus optabs. >> * optabs.def (OPTAB_D): Define widen add, sub optabs. >> * tree-cfg.cc (verify_gimple_call): Add checks for new widen >> add and sub IFNs. >> * tree-inline.cc (estimate_num_insns): Return same >> cost for widen add and sub IFNs as previous tree_codes. >> * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support >> patterns with a hi/lo split. >> (vect_recog_sad_pattern): Refactor to use new IFN codes. >> (vect_recog_widen_plus_pattern): Likewise. >> (vect_recog_widen_minus_pattern): Likewise. >> (vect_recog_average_pattern): Likewise. >> * tree-vect-stmts.cc (vectorizable_conversion): Add support for >> _HILO IFNs. >> (supportable_widening_operation): Likewise. >> * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/vect-widen-add.c: Test that new >> IFN_VEC_WIDEN_PLUS is being used. >> * gcc.target/aarch64/vect-widen-sub.c: Test that new >> IFN_VEC_WIDEN_MINUS is being used. >> >
Richard Biener <rguenther@suse.de> writes: > On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >> I have dealt with, I think..., most of your comments. There's quite a few >> changes, I think it's all a bit simpler now. I made some other changes to the >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >> the same behaviour as we had with the tree codes before. Also added some extra >> checks to tree-cfg.cc that made sense to me. >> >> I am still regression testing the gimple-range-op change, as that was a last >> minute change, but the rest survived a bootstrap and regression test on >> aarch64-unknown-linux-gnu. >> >> cover letter: >> >> This patch replaces the existing tree_code widen_plus and widen_minus >> patterns with internal_fn versions. >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >> except they provide convenience wrappers for defining conversions that require >> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo >> and each of those will also require a signed and unsigned version in the case >> of widening. The hi/lo pair is necessary because the widening and narrowing >> operations take n narrow elements as inputs and return n/2 wide elements as >> outputs. The 'lo' operation operates on the first n/2 elements of input. The >> 'hi' operation operates on the second n/2 elements of input. Defining an >> internal_fn along with hi/lo variations allows a single internal function to >> be returned from a vect_recog function that will later be expanded to hi/lo. >> >> >> For example: >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> >> (u/s)addl2 >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> >> -> (u/s)addl >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > What I still don't understand is how we are so narrowly focused on > HI/LO? We need a combined scalar IFN for pattern selection (not > sure why that's now called _HILO, I expected no suffix). Then there's > three possibilities the target can implement this: > > 1) with a widen_[su]add<mode> instruction - I _think_ that's what > RISCV is going to offer since it is a target where vector modes > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > RVV can do a V4HI to V4SI widening and widening add/subtract > using vwadd[u] and vwsub[u] (the HI->SI widening is actually > done with a widening add of zero - eh). > IIRC GCN is the same here. SVE currently does this too, but the addition and widening are separate operations. E.g. in principle there's no reason why you can't sign-extend one operand, zero-extend the other, and then add the result together. Or you could extend them from different sizes (QI and HI). All of those are supported (if the costing allows them). If the target has operations to do combined extending and adding (or whatever), then at the moment we rely on combine to generate them. So I think this case is separate from Andre's work. The addition itself is just an ordinary addition, and any widening happens by vectorising a CONVERT/NOP_EXPR. > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree > codes currently support (exclusively) > 3) similar, but widen_[su]add{_even,_odd}<mode> > > that said, things like decomposes_to_hilo_fn_p look to paint us into > a 2) corner without good reason. I suppose one question is: how much of the patch is really specific to HI/LO, and how much is just grouping two halves together? The nice thing about the internal-fn grouping macros is that, if (3) is implemented in future, the structure will strongly encourage even/odd pairs to be supported for all operations that support hi/lo. That is, I would expect the grouping macros to be extended to define even/odd ifns alongside hi/lo ones, rather than adding separate definitions for even/odd functions. If so, at least from the internal-fn.* side of things, I think the question is whether it's OK to stick with hilo names for now, or whether we should use more forward-looking names. Thanks, Richard > > Richard. > >> gcc/ChangeLog: >> >> 2023-05-12 Andre Vieira <andre.simoesdiasvieira@arm.com> >> Joel Hutton <joel.hutton@arm.com> >> Tamar Christina <tamar.christina@arm.com> >> >> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): >> Rename >> this ... >> (vec_widen_<su>add_lo_<mode>): ... to this. >> (vec_widen_<su>addl_hi_<mode>): Rename this ... >> (vec_widen_<su>add_hi_<mode>): ... to this. >> (vec_widen_<su>subl_lo_<mode>): Rename this ... >> (vec_widen_<su>sub_lo_<mode>): ... to this. >> (vec_widen_<su>subl_hi_<mode>): Rename this ... >> (vec_widen_<su>sub_hi_<mode>): ...to this. >> * doc/generic.texi: Document new IFN codes. >> * internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to >> define an >> internal_fn that expands into multiple internal_fns for widening. >> (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing. >> (ifn_cmp): Function to compare ifn's for sorting/searching. >> (lookup_hilo_internal_fn): Add lookup function. >> (commutative_binary_fn_p): Add widen_plus fn's. >> (widening_fn_p): New function. >> (narrowing_fn_p): New function. >> (decomposes_to_hilo_fn_p): New function. >> (direct_internal_fn_optab): Change visibility. >> * internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define >> widening >> plus,minus functions. >> (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code. >> (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code. >> * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. >> (direct_internal_fn_optab): Declare new prototype. >> (lookup_hilo_internal_fn): Likewise. >> (widening_fn_p): Likewise. >> (Narrowing_fn_p): Likewise. >> (decomposes_to_hilo_fn_p): Likewise. >> * optabs.cc (commutative_optab_p): Add widening plus optabs. >> * optabs.def (OPTAB_D): Define widen add, sub optabs. >> * tree-cfg.cc (verify_gimple_call): Add checks for new widen >> add and sub IFNs. >> * tree-inline.cc (estimate_num_insns): Return same >> cost for widen add and sub IFNs as previous tree_codes. >> * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support >> patterns with a hi/lo split. >> (vect_recog_sad_pattern): Refactor to use new IFN codes. >> (vect_recog_widen_plus_pattern): Likewise. >> (vect_recog_widen_minus_pattern): Likewise. >> (vect_recog_average_pattern): Likewise. >> * tree-vect-stmts.cc (vectorizable_conversion): Add support for >> _HILO IFNs. >> (supportable_widening_operation): Likewise. >> * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/vect-widen-add.c: Test that new >> IFN_VEC_WIDEN_PLUS is being used. >> * gcc.target/aarch64/vect-widen-sub.c: Test that new >> IFN_VEC_WIDEN_MINUS is being used. >>
On Fri, 12 May 2023, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Fri, 12 May 2023, Andre Vieira (lists) wrote: > > > >> I have dealt with, I think..., most of your comments. There's quite a few > >> changes, I think it's all a bit simpler now. I made some other changes to the > >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve > >> the same behaviour as we had with the tree codes before. Also added some extra > >> checks to tree-cfg.cc that made sense to me. > >> > >> I am still regression testing the gimple-range-op change, as that was a last > >> minute change, but the rest survived a bootstrap and regression test on > >> aarch64-unknown-linux-gnu. > >> > >> cover letter: > >> > >> This patch replaces the existing tree_code widen_plus and widen_minus > >> patterns with internal_fn versions. > >> > >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN > >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively > >> except they provide convenience wrappers for defining conversions that require > >> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo > >> and each of those will also require a signed and unsigned version in the case > >> of widening. The hi/lo pair is necessary because the widening and narrowing > >> operations take n narrow elements as inputs and return n/2 wide elements as > >> outputs. The 'lo' operation operates on the first n/2 elements of input. The > >> 'hi' operation operates on the second n/2 elements of input. Defining an > >> internal_fn along with hi/lo variations allows a single internal function to > >> be returned from a vect_recog function that will later be expanded to hi/lo. > >> > >> > >> For example: > >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> > >> (u/s)addl2 > >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> > >> -> (u/s)addl > >> > >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree > >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > > > > What I still don't understand is how we are so narrowly focused on > > HI/LO? We need a combined scalar IFN for pattern selection (not > > sure why that's now called _HILO, I expected no suffix). Then there's > > three possibilities the target can implement this: > > > > 1) with a widen_[su]add<mode> instruction - I _think_ that's what > > RISCV is going to offer since it is a target where vector modes > > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > > RVV can do a V4HI to V4SI widening and widening add/subtract > > using vwadd[u] and vwsub[u] (the HI->SI widening is actually > > done with a widening add of zero - eh). > > IIRC GCN is the same here. > > SVE currently does this too, but the addition and widening are > separate operations. E.g. in principle there's no reason why > you can't sign-extend one operand, zero-extend the other, and > then add the result together. Or you could extend them from > different sizes (QI and HI). All of those are supported > (if the costing allows them). I see. So why does the target the expose widen_[su]add<mode> at all? > If the target has operations to do combined extending and adding (or > whatever), then at the moment we rely on combine to generate them. > > So I think this case is separate from Andre's work. The addition > itself is just an ordinary addition, and any widening happens by > vectorising a CONVERT/NOP_EXPR. > > > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree > > codes currently support (exclusively) > > 3) similar, but widen_[su]add{_even,_odd}<mode> > > > > that said, things like decomposes_to_hilo_fn_p look to paint us into > > a 2) corner without good reason. > > I suppose one question is: how much of the patch is really specific > to HI/LO, and how much is just grouping two halves together? Yep, that I don't know for sure. > The nice > thing about the internal-fn grouping macros is that, if (3) is > implemented in future, the structure will strongly encourage even/odd > pairs to be supported for all operations that support hi/lo. That is, > I would expect the grouping macros to be extended to define even/odd > ifns alongside hi/lo ones, rather than adding separate definitions > for even/odd functions. > > If so, at least from the internal-fn.* side of things, I think the question > is whether it's OK to stick with hilo names for now, or whether we should > use more forward-looking names. I think for parts that are independent we could use a more forward-looking name. Maybe _halves? But I'm also not sure how much of that is really needed (it seems to be tied around optimizing optabs space?) Richard. > Thanks, > Richard > > > > > Richard. > > > >> gcc/ChangeLog: > >> > >> 2023-05-12 Andre Vieira <andre.simoesdiasvieira@arm.com> > >> Joel Hutton <joel.hutton@arm.com> > >> Tamar Christina <tamar.christina@arm.com> > >> > >> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): > >> Rename > >> this ... > >> (vec_widen_<su>add_lo_<mode>): ... to this. > >> (vec_widen_<su>addl_hi_<mode>): Rename this ... > >> (vec_widen_<su>add_hi_<mode>): ... to this. > >> (vec_widen_<su>subl_lo_<mode>): Rename this ... > >> (vec_widen_<su>sub_lo_<mode>): ... to this. > >> (vec_widen_<su>subl_hi_<mode>): Rename this ... > >> (vec_widen_<su>sub_hi_<mode>): ...to this. > >> * doc/generic.texi: Document new IFN codes. > >> * internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to > >> define an > >> internal_fn that expands into multiple internal_fns for widening. > >> (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing. > >> (ifn_cmp): Function to compare ifn's for sorting/searching. > >> (lookup_hilo_internal_fn): Add lookup function. > >> (commutative_binary_fn_p): Add widen_plus fn's. > >> (widening_fn_p): New function. > >> (narrowing_fn_p): New function. > >> (decomposes_to_hilo_fn_p): New function. > >> (direct_internal_fn_optab): Change visibility. > >> * internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define > >> widening > >> plus,minus functions. > >> (VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code. > >> (VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code. > >> * internal-fn.h (GCC_INTERNAL_FN_H): Add headers. > >> (direct_internal_fn_optab): Declare new prototype. > >> (lookup_hilo_internal_fn): Likewise. > >> (widening_fn_p): Likewise. > >> (Narrowing_fn_p): Likewise. > >> (decomposes_to_hilo_fn_p): Likewise. > >> * optabs.cc (commutative_optab_p): Add widening plus optabs. > >> * optabs.def (OPTAB_D): Define widen add, sub optabs. > >> * tree-cfg.cc (verify_gimple_call): Add checks for new widen > >> add and sub IFNs. > >> * tree-inline.cc (estimate_num_insns): Return same > >> cost for widen add and sub IFNs as previous tree_codes. > >> * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support > >> patterns with a hi/lo split. > >> (vect_recog_sad_pattern): Refactor to use new IFN codes. > >> (vect_recog_widen_plus_pattern): Likewise. > >> (vect_recog_widen_minus_pattern): Likewise. > >> (vect_recog_average_pattern): Likewise. > >> * tree-vect-stmts.cc (vectorizable_conversion): Add support for > >> _HILO IFNs. > >> (supportable_widening_operation): Likewise. > >> * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.target/aarch64/vect-widen-add.c: Test that new > >> IFN_VEC_WIDEN_PLUS is being used. > >> * gcc.target/aarch64/vect-widen-sub.c: Test that new > >> IFN_VEC_WIDEN_MINUS is being used. > >> >
Richard Biener <rguenther@suse.de> writes: > On Fri, 12 May 2023, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote: >> > >> >> I have dealt with, I think..., most of your comments. There's quite a few >> >> changes, I think it's all a bit simpler now. I made some other changes to the >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >> >> the same behaviour as we had with the tree codes before. Also added some extra >> >> checks to tree-cfg.cc that made sense to me. >> >> >> >> I am still regression testing the gimple-range-op change, as that was a last >> >> minute change, but the rest survived a bootstrap and regression test on >> >> aarch64-unknown-linux-gnu. >> >> >> >> cover letter: >> >> >> >> This patch replaces the existing tree_code widen_plus and widen_minus >> >> patterns with internal_fn versions. >> >> >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >> >> except they provide convenience wrappers for defining conversions that require >> >> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo >> >> and each of those will also require a signed and unsigned version in the case >> >> of widening. The hi/lo pair is necessary because the widening and narrowing >> >> operations take n narrow elements as inputs and return n/2 wide elements as >> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The >> >> 'hi' operation operates on the second n/2 elements of input. Defining an >> >> internal_fn along with hi/lo variations allows a single internal function to >> >> be returned from a vect_recog function that will later be expanded to hi/lo. >> >> >> >> >> >> For example: >> >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> >> >> (u/s)addl2 >> >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> >> >> -> (u/s)addl >> >> >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. >> > >> > What I still don't understand is how we are so narrowly focused on >> > HI/LO? We need a combined scalar IFN for pattern selection (not >> > sure why that's now called _HILO, I expected no suffix). Then there's >> > three possibilities the target can implement this: >> > >> > 1) with a widen_[su]add<mode> instruction - I _think_ that's what >> > RISCV is going to offer since it is a target where vector modes >> > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead >> > RVV can do a V4HI to V4SI widening and widening add/subtract >> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually >> > done with a widening add of zero - eh). >> > IIRC GCN is the same here. >> >> SVE currently does this too, but the addition and widening are >> separate operations. E.g. in principle there's no reason why >> you can't sign-extend one operand, zero-extend the other, and >> then add the result together. Or you could extend them from >> different sizes (QI and HI). All of those are supported >> (if the costing allows them). > > I see. So why does the target the expose widen_[su]add<mode> at all? It shouldn't (need to) do that. I don't think we should have an optab for the unsplit operation. At least on SVE, we really want the extensions to be fused with loads (where possible) rather than with arithmetic. We can still do the widening arithmetic in one go. It's just that fusing with the loads works for the mixed-sign and mixed-size cases, and can handle more than just doubling the element size. >> If the target has operations to do combined extending and adding (or >> whatever), then at the moment we rely on combine to generate them. >> >> So I think this case is separate from Andre's work. The addition >> itself is just an ordinary addition, and any widening happens by >> vectorising a CONVERT/NOP_EXPR. >> >> > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree >> > codes currently support (exclusively) >> > 3) similar, but widen_[su]add{_even,_odd}<mode> >> > >> > that said, things like decomposes_to_hilo_fn_p look to paint us into >> > a 2) corner without good reason. >> >> I suppose one question is: how much of the patch is really specific >> to HI/LO, and how much is just grouping two halves together? > > Yep, that I don't know for sure. > >> The nice >> thing about the internal-fn grouping macros is that, if (3) is >> implemented in future, the structure will strongly encourage even/odd >> pairs to be supported for all operations that support hi/lo. That is, >> I would expect the grouping macros to be extended to define even/odd >> ifns alongside hi/lo ones, rather than adding separate definitions >> for even/odd functions. >> >> If so, at least from the internal-fn.* side of things, I think the question >> is whether it's OK to stick with hilo names for now, or whether we should >> use more forward-looking names. > > I think for parts that are independent we could use a more > forward-looking name. Maybe _halves? Using _halves for the ifn macros sounds good to me FWIW. > But I'm also not sure > how much of that is really needed (it seems to be tied around > optimizing optabs space?) Not sure what you mean by "this". Optabs space shouldn't be a problem though. The optab encoding gives us a full int to play with, and it could easily go up to 64 bits if necessary/convenient. At least on the internal-fn.* side, the aim is really just to establish a regular structure, so that we don't have arbitrary differences between different widening operations, or too much cut-&-paste. Thanks, Richard
On Mon, 15 May 2023, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Fri, 12 May 2023, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >> > > >> >> I have dealt with, I think..., most of your comments. There's quite a few > >> >> changes, I think it's all a bit simpler now. I made some other changes to the > >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve > >> >> the same behaviour as we had with the tree codes before. Also added some extra > >> >> checks to tree-cfg.cc that made sense to me. > >> >> > >> >> I am still regression testing the gimple-range-op change, as that was a last > >> >> minute change, but the rest survived a bootstrap and regression test on > >> >> aarch64-unknown-linux-gnu. > >> >> > >> >> cover letter: > >> >> > >> >> This patch replaces the existing tree_code widen_plus and widen_minus > >> >> patterns with internal_fn versions. > >> >> > >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN > >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively > >> >> except they provide convenience wrappers for defining conversions that require > >> >> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo > >> >> and each of those will also require a signed and unsigned version in the case > >> >> of widening. The hi/lo pair is necessary because the widening and narrowing > >> >> operations take n narrow elements as inputs and return n/2 wide elements as > >> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The > >> >> 'hi' operation operates on the second n/2 elements of input. Defining an > >> >> internal_fn along with hi/lo variations allows a single internal function to > >> >> be returned from a vect_recog function that will later be expanded to hi/lo. > >> >> > >> >> > >> >> For example: > >> >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> > >> >> (u/s)addl2 > >> >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> > >> >> -> (u/s)addl > >> >> > >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree > >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > >> > > >> > What I still don't understand is how we are so narrowly focused on > >> > HI/LO? We need a combined scalar IFN for pattern selection (not > >> > sure why that's now called _HILO, I expected no suffix). Then there's > >> > three possibilities the target can implement this: > >> > > >> > 1) with a widen_[su]add<mode> instruction - I _think_ that's what > >> > RISCV is going to offer since it is a target where vector modes > >> > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > >> > RVV can do a V4HI to V4SI widening and widening add/subtract > >> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually > >> > done with a widening add of zero - eh). > >> > IIRC GCN is the same here. > >> > >> SVE currently does this too, but the addition and widening are > >> separate operations. E.g. in principle there's no reason why > >> you can't sign-extend one operand, zero-extend the other, and > >> then add the result together. Or you could extend them from > >> different sizes (QI and HI). All of those are supported > >> (if the costing allows them). > > > > I see. So why does the target the expose widen_[su]add<mode> at all? > > It shouldn't (need to) do that. I don't think we should have an optab > for the unsplit operation. > > At least on SVE, we really want the extensions to be fused with loads > (where possible) rather than with arithmetic. > > We can still do the widening arithmetic in one go. It's just that > fusing with the loads works for the mixed-sign and mixed-size cases, > and can handle more than just doubling the element size. > > >> If the target has operations to do combined extending and adding (or > >> whatever), then at the moment we rely on combine to generate them. > >> > >> So I think this case is separate from Andre's work. The addition > >> itself is just an ordinary addition, and any widening happens by > >> vectorising a CONVERT/NOP_EXPR. > >> > >> > 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree > >> > codes currently support (exclusively) > >> > 3) similar, but widen_[su]add{_even,_odd}<mode> > >> > > >> > that said, things like decomposes_to_hilo_fn_p look to paint us into > >> > a 2) corner without good reason. > >> > >> I suppose one question is: how much of the patch is really specific > >> to HI/LO, and how much is just grouping two halves together? > > > > Yep, that I don't know for sure. > > > >> The nice > >> thing about the internal-fn grouping macros is that, if (3) is > >> implemented in future, the structure will strongly encourage even/odd > >> pairs to be supported for all operations that support hi/lo. That is, > >> I would expect the grouping macros to be extended to define even/odd > >> ifns alongside hi/lo ones, rather than adding separate definitions > >> for even/odd functions. > >> > >> If so, at least from the internal-fn.* side of things, I think the question > >> is whether it's OK to stick with hilo names for now, or whether we should > >> use more forward-looking names. > > > > I think for parts that are independent we could use a more > > forward-looking name. Maybe _halves? > > Using _halves for the ifn macros sounds good to me FWIW. > > > But I'm also not sure > > how much of that is really needed (it seems to be tied around > > optimizing optabs space?) > > Not sure what you mean by "this". Optabs space shouldn't be a problem > though. The optab encoding gives us a full int to play with, and it > could easily go up to 64 bits if necessary/convenient. > > At least on the internal-fn.* side, the aim is really just to establish > a regular structure, so that we don't have arbitrary differences between > different widening operations, or too much cut-&-paste. Hmm, I'm looking at the need for the std::map and internal_fn_hilo_keys_array and internal_fn_hilo_values_array. The vectorizer pieces contain + if (code.is_fn_code ()) + { + internal_fn ifn = as_internal_fn ((combined_fn) code); + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + internal_fn lo, hi; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); so that tries to automatically associate the scalar widening IFN with the set(s) of IFN pairs we can split to. But then this list should be static and there's no need to create a std::map? Maybe gencfn-macros.cc can be enhanced to output these static cases? Or the vectorizer could (as it did previously) simply open-code the handled cases (I guess since we deal with two cases only now I'd prefer that). Thanks, Richard. > Thanks, > Richard >
Richard Biener <rguenther@suse.de> writes: > On Mon, 15 May 2023, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > But I'm also not sure >> > how much of that is really needed (it seems to be tied around >> > optimizing optabs space?) >> >> Not sure what you mean by "this". Optabs space shouldn't be a problem >> though. The optab encoding gives us a full int to play with, and it >> could easily go up to 64 bits if necessary/convenient. >> >> At least on the internal-fn.* side, the aim is really just to establish >> a regular structure, so that we don't have arbitrary differences between >> different widening operations, or too much cut-&-paste. > > Hmm, I'm looking at the need for the std::map and > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. > The vectorizer pieces contain > > + if (code.is_fn_code ()) > + { > + internal_fn ifn = as_internal_fn ((combined_fn) code); > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > + > + internal_fn lo, hi; > + lookup_hilo_internal_fn (ifn, &lo, &hi); > + *code1 = as_combined_fn (lo); > + *code2 = as_combined_fn (hi); > + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); > + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); > > so that tries to automatically associate the scalar widening IFN > with the set(s) of IFN pairs we can split to. But then this > list should be static and there's no need to create a std::map? > Maybe gencfn-macros.cc can be enhanced to output these static > cases? Or the vectorizer could (as it did previously) simply > open-code the handled cases (I guess since we deal with two > cases only now I'd prefer that). Ah, yeah, I pushed back against that too. I think it should be possible to do it using the preprocessor, if the macros are defined appropriately. But if it isn't possible to do it with macros then I agree that a generator would be better than initialisation within the compiler. Thanks, Richard
On 15/05/2023 12:01, Richard Biener wrote: > On Mon, 15 May 2023, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >>> On Fri, 12 May 2023, Richard Sandiford wrote: >>> >>>> Richard Biener <rguenther@suse.de> writes: >>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote: >>>>> >>>>>> I have dealt with, I think..., most of your comments. There's quite a few >>>>>> changes, I think it's all a bit simpler now. I made some other changes to the >>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >>>>>> the same behaviour as we had with the tree codes before. Also added some extra >>>>>> checks to tree-cfg.cc that made sense to me. >>>>>> >>>>>> I am still regression testing the gimple-range-op change, as that was a last >>>>>> minute change, but the rest survived a bootstrap and regression test on >>>>>> aarch64-unknown-linux-gnu. >>>>>> >>>>>> cover letter: >>>>>> >>>>>> This patch replaces the existing tree_code widen_plus and widen_minus >>>>>> patterns with internal_fn versions. >>>>>> >>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >>>>>> except they provide convenience wrappers for defining conversions that require >>>>>> a hi/lo split. Each definition for <NAME> will require optabs for _hi and _lo >>>>>> and each of those will also require a signed and unsigned version in the case >>>>>> of widening. The hi/lo pair is necessary because the widening and narrowing >>>>>> operations take n narrow elements as inputs and return n/2 wide elements as >>>>>> outputs. The 'lo' operation operates on the first n/2 elements of input. The >>>>>> 'hi' operation operates on the second n/2 elements of input. Defining an >>>>>> internal_fn along with hi/lo variations allows a single internal function to >>>>>> be returned from a vect_recog function that will later be expanded to hi/lo. >>>>>> >>>>>> >>>>>> For example: >>>>>> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> >>>>>> (u/s)addl2 >>>>>> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> >>>>>> -> (u/s)addl >>>>>> >>>>>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. >>>>> >>>>> What I still don't understand is how we are so narrowly focused on >>>>> HI/LO? We need a combined scalar IFN for pattern selection (not >>>>> sure why that's now called _HILO, I expected no suffix). Then there's >>>>> three possibilities the target can implement this: >>>>> >>>>> 1) with a widen_[su]add<mode> instruction - I _think_ that's what >>>>> RISCV is going to offer since it is a target where vector modes >>>>> have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead >>>>> RVV can do a V4HI to V4SI widening and widening add/subtract >>>>> using vwadd[u] and vwsub[u] (the HI->SI widening is actually >>>>> done with a widening add of zero - eh). >>>>> IIRC GCN is the same here. >>>> >>>> SVE currently does this too, but the addition and widening are >>>> separate operations. E.g. in principle there's no reason why >>>> you can't sign-extend one operand, zero-extend the other, and >>>> then add the result together. Or you could extend them from >>>> different sizes (QI and HI). All of those are supported >>>> (if the costing allows them). >>> >>> I see. So why does the target the expose widen_[su]add<mode> at all? >> >> It shouldn't (need to) do that. I don't think we should have an optab >> for the unsplit operation. >> >> At least on SVE, we really want the extensions to be fused with loads >> (where possible) rather than with arithmetic. >> >> We can still do the widening arithmetic in one go. It's just that >> fusing with the loads works for the mixed-sign and mixed-size cases, >> and can handle more than just doubling the element size. >> >>>> If the target has operations to do combined extending and adding (or >>>> whatever), then at the moment we rely on combine to generate them. >>>> >>>> So I think this case is separate from Andre's work. The addition >>>> itself is just an ordinary addition, and any widening happens by >>>> vectorising a CONVERT/NOP_EXPR. >>>> >>>>> 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree >>>>> codes currently support (exclusively) >>>>> 3) similar, but widen_[su]add{_even,_odd}<mode> >>>>> >>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into >>>>> a 2) corner without good reason. >>>> >>>> I suppose one question is: how much of the patch is really specific >>>> to HI/LO, and how much is just grouping two halves together? >>> >>> Yep, that I don't know for sure. >>> >>>> The nice >>>> thing about the internal-fn grouping macros is that, if (3) is >>>> implemented in future, the structure will strongly encourage even/odd >>>> pairs to be supported for all operations that support hi/lo. That is, >>>> I would expect the grouping macros to be extended to define even/odd >>>> ifns alongside hi/lo ones, rather than adding separate definitions >>>> for even/odd functions. >>>> >>>> If so, at least from the internal-fn.* side of things, I think the question >>>> is whether it's OK to stick with hilo names for now, or whether we should >>>> use more forward-looking names. >>> >>> I think for parts that are independent we could use a more >>> forward-looking name. Maybe _halves? >> >> Using _halves for the ifn macros sounds good to me FWIW. >> >>> But I'm also not sure >>> how much of that is really needed (it seems to be tied around >>> optimizing optabs space?) >> >> Not sure what you mean by "this". Optabs space shouldn't be a problem >> though. The optab encoding gives us a full int to play with, and it >> could easily go up to 64 bits if necessary/convenient. >> >> At least on the internal-fn.* side, the aim is really just to establish >> a regular structure, so that we don't have arbitrary differences between >> different widening operations, or too much cut-&-paste. > > Hmm, I'm looking at the need for the std::map and > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. > The vectorizer pieces contain > > + if (code.is_fn_code ()) > + { > + internal_fn ifn = as_internal_fn ((combined_fn) code); > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > + > + internal_fn lo, hi; > + lookup_hilo_internal_fn (ifn, &lo, &hi); > + *code1 = as_combined_fn (lo); > + *code2 = as_combined_fn (hi); > + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); > + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); > > so that tries to automatically associate the scalar widening IFN > with the set(s) of IFN pairs we can split to. But then this > list should be static and there's no need to create a std::map? > Maybe gencfn-macros.cc can be enhanced to output these static > cases? Or the vectorizer could (as it did previously) simply > open-code the handled cases (I guess since we deal with two > cases only now I'd prefer that). > > Thanks, > Richard. > > >> Thanks, >> Richard >> > The patch I uploaded last no longer has std::map nor internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've attached it again) I'm not sure I understand the _halves, do you mean that for the case where I had _hilo or _HILO before we rename that to _halves/_HALVES such that it later represents both _hi/_lo separation and _even/_odd? And am I correct to assume we are just giving up on having a INTERNAL_OPTAB_FN idea for 1)? Kind regards, Andre diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4626,7 +4626,7 @@ [(set_attr "type" "neon_<ADDSUB:optab>_long")] ) -(define_expand "vec_widen_<su>addl_lo_<mode>" +(define_expand "vec_widen_<su>add_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4638,7 +4638,7 @@ DONE; }) -(define_expand "vec_widen_<su>addl_hi_<mode>" +(define_expand "vec_widen_<su>add_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4650,7 +4650,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_lo_<mode>" +(define_expand "vec_widen_<su>sub_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4662,7 +4662,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_hi_<mode>" +(define_expand "vec_widen_<su>sub_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..0fd7e6cce8bbd4ecb8027b702722adcf6c32eb55 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1811,6 +1811,10 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @tindex VEC_WIDEN_MULT_LO_EXPR +@tindex IFN_VEC_WIDEN_PLUS_HI +@tindex IFN_VEC_WIDEN_PLUS_LO +@tindex IFN_VEC_WIDEN_MINUS_HI +@tindex IFN_VEC_WIDEN_MINUS_LO @tindex VEC_WIDEN_PLUS_HI_EXPR @tindex VEC_WIDEN_PLUS_LO_EXPR @tindex VEC_WIDEN_MINUS_HI_EXPR @@ -1861,6 +1865,33 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the low @code{N/2} elements of the two vector are multiplied to produce the vector of @code{N/2} products. +@item IFN_VEC_WIDEN_PLUS_HI +@itemx IFN_VEC_WIDEN_PLUS_LO +These internal functions represent widening vector addition of the high and low +parts of the two input vectors, respectively. Their operands are vectors that +contain the same number of elements (@code{N}) of the same integral type. The +result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the +high @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} products. + +@item IFN_VEC_WIDEN_MINUS_HI +@itemx IFN_VEC_WIDEN_MINUS_LO +These internal functions represent widening vector subtraction of the high and +low parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The high/low elements of the second vector are subtracted from the high/low +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second +vector are subtracted from the high @code{N/2} of the first to produce the +vector of @code{N/2} products. In the case of +@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second +vector are subtracted from the low @code{N/2} of the first to produce the +vector of @code{N/2} products. + @item VEC_WIDEN_PLUS_HI_EXPR @itemx VEC_WIDEN_PLUS_LO_EXPR These nodes represent widening vector addition of the high and low parts of diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 594bd3043f0e944299ddfff219f757ef15a3dd61..66636d82df27626e7911efd0cb8526921b39633f 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard () { range_operator *signed_op = ptr_op_widen_mult_signed; range_operator *unsigned_op = ptr_op_widen_mult_unsigned; + bool signed1, signed2, signed_ret; if (gimple_code (m_stmt) == GIMPLE_ASSIGN) switch (gimple_assign_rhs_code (m_stmt)) { @@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard () m_op1 = gimple_assign_rhs1 (m_stmt); m_op2 = gimple_assign_rhs2 (m_stmt); tree ret = gimple_assign_lhs (m_stmt); - bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; - bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; - bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; - - /* Normally these operands should all have the same sign, but - some passes and violate this by taking mismatched sign args. At - the moment the only one that's possible is mismatch inputs and - unsigned output. Once ranger supports signs for the operands we - can properly fix it, for now only accept the case we can do - correctly. */ - if ((signed1 ^ signed2) && signed_ret) - return; - - m_valid = true; - if (signed2 && !signed1) - std::swap (m_op1, m_op2); - - if (signed1 || signed2) - m_int = signed_op; - else - m_int = unsigned_op; + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; break; } default: - break; + return; } + else if (gimple_code (m_stmt) == GIMPLE_CALL + && gimple_call_internal_p (m_stmt) + && gimple_get_lhs (m_stmt) != NULL_TREE) + switch (gimple_call_internal_fn (m_stmt)) + { + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: + { + signed_op = ptr_op_widen_plus_signed; + unsigned_op = ptr_op_widen_plus_unsigned; + m_valid = false; + m_op1 = gimple_call_arg (m_stmt, 0); + m_op2 = gimple_call_arg (m_stmt, 1); + tree ret = gimple_get_lhs (m_stmt); + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; + break; + } + default: + return; + } + else + return; + + /* Normally these operands should all have the same sign, but some passes + and violate this by taking mismatched sign args. At the moment the only + one that's possible is mismatch inputs and unsigned output. Once ranger + supports signs for the operands we can properly fix it, for now only + accept the case we can do correctly. */ + if ((signed1 ^ signed2) && signed_ret) + return; + + m_valid = true; + if (signed2 && !signed1) + std::swap (m_op1, m_op2); + + if (signed1 || signed2) + m_int = signed_op; + else + m_int = unsigned_op; } // Set up a gimple_range_op_handler for any built in function which can be diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..1acea5ae33046b70de247b1688aea874d9956abc 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -90,6 +90,19 @@ lookup_internal_fn (const char *name) return entry ? *entry : IFN_LAST; } +/* Given an internal_fn IFN that is a HILO function, return its corresponding + LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + *lo = internal_fn (ifn + 1); + *hi = internal_fn (ifn + 2); +} + + /* Fnspec of each internal function, indexed by function number. */ const_tree internal_fn_fnspec_array[IFN_LAST + 1]; @@ -137,7 +150,16 @@ const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct, #define DEF_INTERNAL_SIGNED_OPTAB_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ UNSIGNED_OPTAB, TYPE) TYPE##_direct, +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) \ +TYPE##_direct, TYPE##_direct, TYPE##_direct, +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE) \ +TYPE##_direct, TYPE##_direct, TYPE##_direct, #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN not_direct }; @@ -3852,7 +3874,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, /* Return the optab used by internal function FN. */ -static optab +optab direct_internal_fn_optab (internal_fn fn, tree_pair types) { switch (fn) @@ -3971,6 +3993,9 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_VEC_WIDEN_PLUS_HILO: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: return true; default: @@ -4044,6 +4069,88 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as wide as the element size of the input vectors. */ + +bool +widening_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME##_HILO:\ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + + default: + return false; + } +} + +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as narrow as the element size of the input vectors. */ + +bool +narrowing_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \ + case IFN_##NAME##_HILO:\ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + + default: + return false; + } +} + +/* Return true if FN decomposes to _hi and _lo IFN. */ + +bool +decomposes_to_hilo_fn_p (internal_fn fn) +{ + switch (fn) + { + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME##_HILO:\ + return true; + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + #define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, F, O, T) \ + case IFN_##NAME##_HILO:\ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN + #undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN + + default: + return false; + } +} + /* Return true if IFN_SET_EDOM is supported. */ bool @@ -4071,7 +4178,33 @@ set_edom_supported_p (void) optab which_optab = direct_internal_fn_optab (fn, types); \ expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(CODE, FLAGS, SELECTOR, \ + SIGNED_OPTAB, UNSIGNED_OPTAB, \ + TYPE) \ + static void \ + expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED, \ + gcall *stmt ATTRIBUTE_UNUSED) \ + { \ + gcc_unreachable (); \ + } \ + DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_HI, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN(CODE##_LO, FLAGS, SELECTOR, SIGNED_OPTAB, \ + UNSIGNED_OPTAB, TYPE) +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(CODE, FLAGS, OPTAB, TYPE) \ + static void \ + expand_##CODE##_HILO (internal_fn fn ATTRIBUTE_UNUSED, \ + gcall *stmt ATTRIBUTE_UNUSED) \ + { \ + gcc_unreachable (); \ + } \ + DEF_INTERNAL_OPTAB_FN(CODE##_LO, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN(CODE##_HI, FLAGS, OPTAB, TYPE) #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_OPTAB_FN +#undef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#undef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4080,6 +4213,7 @@ set_edom_supported_p (void) where STMT is the statement that performs the call. */ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { + #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE, #include "internal-fn.def" 0 diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..012dd323b86dd7cfcc5c13d3a2bb2a453937155d 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -85,6 +85,13 @@ along with GCC; see the file COPYING3. If not see says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_SIGNED_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it + provides convenience wrappers for defining conversions that require a + hi/lo split, like widening and narrowing operations. Each definition + for <NAME> will require an optab named <OPTAB> and two other optabs that + you specify for signed and unsigned. + + Each entry must have a corresponding expander of the form: void expand_NAME (gimple_call stmt) @@ -123,6 +130,20 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_OPTAB_WIDENING_HILO_FN +#define DEF_INTERNAL_OPTAB_WIDENING_HILO_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE) +#endif + +#ifndef DEF_INTERNAL_OPTAB_NARROWING_HILO_FN +#define DEF_INTERNAL_OPTAB_NARROWING_HILO_FN(NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_FN (NAME##_HILO, FLAGS | ECF_LEAF, NULL) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE) +#endif + DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, @@ -315,6 +336,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) +DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_PLUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_sadd, vec_widen_uadd, + binary) +DEF_INTERNAL_OPTAB_WIDENING_HILO_FN (VEC_WIDEN_MINUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_ssub, vec_widen_usub, + binary) DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 08922ed4254898f5fffca3f33973e96ed9ce772f..8ba07d6d1338e75bc5a451d9e403112a608f3ea2 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_INTERNAL_FN_H #define GCC_INTERNAL_FN_H +#include "insn-codes.h" +#include "insn-opinit.h" + + /* INTEGER_CST values for IFN_UNIQUE function arg-0. UNSPEC: Undifferentiated UNIQUE. @@ -112,6 +116,8 @@ internal_fn_name (enum internal_fn fn) } extern internal_fn lookup_internal_fn (const char *); +extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *); +extern optab direct_internal_fn_optab (internal_fn, tree_pair); /* Return the ECF_* flags for function FN. */ @@ -210,6 +216,9 @@ extern bool commutative_binary_fn_p (internal_fn); extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); +extern bool widening_fn_p (code_helper); +extern bool narrowing_fn_p (code_helper); +extern bool decomposes_to_hilo_fn_p (internal_fn); extern bool set_edom_supported_p (void); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab) || binoptab == smul_widen_optab || binoptab == umul_widen_optab || binoptab == smul_highpart_optab - || binoptab == umul_highpart_optab); + || binoptab == umul_highpart_optab + || binoptab == vec_widen_saddl_hi_optab + || binoptab == vec_widen_saddl_lo_optab + || binoptab == vec_widen_uaddl_hi_optab + || binoptab == vec_widen_uaddl_lo_optab + || binoptab == vec_widen_sadd_hi_optab + || binoptab == vec_widen_sadd_lo_optab + || binoptab == vec_widen_uadd_hi_optab + || binoptab == vec_widen_uadd_lo_optab); } /* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b300c9ca5737de9be809fa01aabe5e01..16d121722c8c5723d9b164f5a2c616dc7ec143de 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -410,6 +410,10 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") +OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a") +OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a") +OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a") +OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a") OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") @@ -422,6 +426,10 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") +OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a") +OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a") +OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a") +OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 0aeebb67fac864db284985f4a6f0653af281d62b..28464ad9e3a7ea25557ffebcdbdbc1340f9e0d8b 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "asan.h" #include "profile.h" #include "sreal.h" +#include "internal-fn.h" /* This file contains functions for building the Control Flow Graph (CFG) for a function tree. */ @@ -3411,6 +3412,52 @@ verify_gimple_call (gcall *stmt) debug_generic_stmt (fn); return true; } + internal_fn ifn = gimple_call_internal_fn (stmt); + if (ifn == IFN_LAST) + { + error ("gimple call has an invalid IFN"); + debug_generic_stmt (fn); + return true; + } + else if (decomposes_to_hilo_fn_p (ifn)) + { + /* Non decomposed HILO stmts should not appear in IL, these are + merely used as an internal representation to the auto-vectorizer + pass and should have been expanded to their _LO _HI variants. */ + error ("gimple call has an non decomposed HILO IFN"); + debug_generic_stmt (fn); + return true; + } + else if (ifn == IFN_VEC_WIDEN_PLUS_LO + || ifn == IFN_VEC_WIDEN_PLUS_HI + || ifn == IFN_VEC_WIDEN_MINUS_LO + || ifn == IFN_VEC_WIDEN_MINUS_HI) + { + tree rhs1_type = TREE_TYPE (gimple_call_arg (stmt, 0)); + tree rhs2_type = TREE_TYPE (gimple_call_arg (stmt, 1)); + tree lhs_type = TREE_TYPE (gimple_get_lhs (stmt)); + if (TREE_CODE (lhs_type) == VECTOR_TYPE) + { + if (TREE_CODE (rhs1_type) != VECTOR_TYPE + || TREE_CODE (rhs2_type) != VECTOR_TYPE) + { + error ("invalid non-vector operands in vector IFN call"); + debug_generic_stmt (fn); + return true; + } + lhs_type = TREE_TYPE (lhs_type); + rhs1_type = TREE_TYPE (rhs1_type); + rhs2_type = TREE_TYPE (rhs2_type); + } + if (POINTER_TYPE_P (lhs_type) + || POINTER_TYPE_P (rhs1_type) + || POINTER_TYPE_P (rhs2_type)) + { + error ("invalid (pointer) operands in vector IFN call"); + debug_generic_stmt (fn); + return true; + } + } } else { diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights) tree decl; if (gimple_call_internal_p (stmt)) - return 0; + { + internal_fn fn = gimple_call_internal_fn (stmt); + switch (fn) + { + case IFN_VEC_WIDEN_PLUS_HI: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_MINUS_HI: + case IFN_VEC_WIDEN_MINUS_LO: + return 1; + + default: + return 0; + } + } else if ((decl = gimple_call_fndecl (stmt)) && fndecl_built_in_p (decl)) { diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1778af0242898e3dc73d94d22a5b8505628a53b5..93cebc72beb4f65249a69b2665dfeb8a0991c1d1 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type) static unsigned int vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, - tree_code widened_code, bool shift_p, + code_helper widened_code, bool shift_p, unsigned int max_nops, vect_unpromoted_value *unprom, tree *common_type, enum optab_subtype *subtype = NULL) { /* Check for an integer operation with the right code. */ - gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); - if (!assign) + gimple* stmt = stmt_info->stmt; + if (!(is_gimple_assign (stmt) || is_gimple_call (stmt))) + return 0; + + code_helper rhs_code; + if (is_gimple_assign (stmt)) + rhs_code = gimple_assign_rhs_code (stmt); + else if (is_gimple_call (stmt)) + rhs_code = gimple_call_combined_fn (stmt); + else return 0; - tree_code rhs_code = gimple_assign_rhs_code (assign); - if (rhs_code != code && rhs_code != widened_code) + if (rhs_code != code + && rhs_code != widened_code) return 0; - tree type = TREE_TYPE (gimple_assign_lhs (assign)); + tree lhs = gimple_get_lhs (stmt); + tree type = TREE_TYPE (lhs); if (!INTEGRAL_TYPE_P (type)) return 0; @@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, { vect_unpromoted_value *this_unprom = &unprom[next_op]; unsigned int nops = 1; - tree op = gimple_op (assign, i + 1); + tree op = gimple_arg (stmt, i); if (i == 1 && TREE_CODE (op) == INTEGER_CST) { /* We already have a common type from earlier operands. @@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo, /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom[2]; - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR, + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, + IFN_VEC_WIDEN_MINUS_HILO, false, 2, unprom, &half_type)) return NULL; @@ -1395,14 +1405,16 @@ static gimple * vect_recog_widen_op_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out, tree_code orig_code, code_helper wide_code, - bool shift_p, const char *name) + bool shift_p, const char *name, + optab_subtype *subtype = NULL) { gimple *last_stmt = last_stmt_info->stmt; vect_unpromoted_value unprom[2]; tree half_type; if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code, - shift_p, 2, unprom, &half_type)) + shift_p, 2, unprom, &half_type, subtype)) + return NULL; /* Pattern detected. */ @@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo, type, pattern_stmt, vecctype); } +static gimple * +vect_recog_widen_op_pattern (vec_info *vinfo, + stmt_vec_info last_stmt_info, tree *type_out, + tree_code orig_code, internal_fn wide_ifn, + bool shift_p, const char *name, + optab_subtype *subtype = NULL) +{ + combined_fn ifn = as_combined_fn (wide_ifn); + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + orig_code, ifn, shift_p, name, + subtype); +} + + /* Try to detect multiplication on widened inputs, converting MULT_EXPR to WIDEN_MULT_EXPR. See vect_recog_widen_op_pattern for details. */ @@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, } /* Try to detect addition on widened inputs, converting PLUS_EXPR - to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_PLUS_HILO. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - PLUS_EXPR, WIDEN_PLUS_EXPR, false, - "vect_recog_widen_plus_pattern"); + PLUS_EXPR, IFN_VEC_WIDEN_PLUS_HILO, + false, "vect_recog_widen_plus_pattern", + &subtype); } /* Try to detect subtraction on widened inputs, converting MINUS_EXPR - to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_MINUS_HILO. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - MINUS_EXPR, WIDEN_MINUS_EXPR, false, - "vect_recog_widen_minus_pattern"); + MINUS_EXPR, IFN_VEC_WIDEN_MINUS_HILO, + false, "vect_recog_widen_minus_pattern", + &subtype); } /* Function vect_recog_ctz_ffs_pattern @@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo, vect_unpromoted_value unprom[3]; tree new_type; unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR, - WIDEN_PLUS_EXPR, false, 3, + IFN_VEC_WIDEN_PLUS_HILO, false, 3, unprom, &new_type); if (nops == 0) return NULL; @@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mask_conversion_pattern, "mask_conversion" }, { vect_recog_widen_plus_pattern, "widen_plus" }, { vect_recog_widen_minus_pattern, "widen_minus" }, + /* These must come after the double widening ones. */ }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index d152ae9ab10b361b88c0f839d6951c43b954750a..24c811ebe01fb8b003100dea494cf64fea72a975 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5038,7 +5038,9 @@ vectorizable_conversion (vec_info *vinfo, bool widen_arith = (code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR || code == WIDEN_MULT_EXPR - || code == WIDEN_LSHIFT_EXPR); + || code == WIDEN_LSHIFT_EXPR + || code == IFN_VEC_WIDEN_PLUS_HILO + || code == IFN_VEC_WIDEN_MINUS_HILO); if (!widen_arith && !CONVERT_EXPR_CODE_P (code) @@ -5088,7 +5090,9 @@ vectorizable_conversion (vec_info *vinfo, gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR || code == WIDEN_PLUS_EXPR - || code == WIDEN_MINUS_EXPR); + || code == WIDEN_MINUS_EXPR + || code == IFN_VEC_WIDEN_PLUS_HILO + || code == IFN_VEC_WIDEN_MINUS_HILO); op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : @@ -12478,10 +12482,43 @@ supportable_widening_operation (vec_info *vinfo, optab1 = vec_unpacks_sbool_lo_optab; optab2 = vec_unpacks_sbool_hi_optab; } - else + + if (code.is_fn_code ()) + { + internal_fn ifn = as_internal_fn ((combined_fn) code); + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + internal_fn lo, hi; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = direct_internal_fn_optab (lo, {vectype, vectype}); + optab2 = direct_internal_fn_optab (hi, {vectype, vectype}); + } + else if (code.is_tree_code ()) { - optab1 = optab_for_tree_code (c1, vectype, optab_default); - optab2 = optab_for_tree_code (c2, vectype, optab_default); + if (code == FIX_TRUNC_EXPR) + { + /* The signedness is determined from output operand. */ + optab1 = optab_for_tree_code (c1, vectype_out, optab_default); + optab2 = optab_for_tree_code (c2, vectype_out, optab_default); + } + else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ()) + && VECTOR_BOOLEAN_TYPE_P (wide_vectype) + && VECTOR_BOOLEAN_TYPE_P (vectype) + && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + { + /* If the input and result modes are the same, a different optab + is needed where we pass in the number of units in vectype. */ + optab1 = vec_unpacks_sbool_lo_optab; + optab2 = vec_unpacks_sbool_hi_optab; + } + else + { + optab1 = optab_for_tree_code (c1, vectype, optab_default); + optab2 = optab_for_tree_code (c2, vectype, optab_default); + } } if (!optab1 || !optab2) diff --git a/gcc/tree.def b/gcc/tree.def index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2) /* Widening sad (sum of absolute differences). - The first two arguments are of type t1 which should be integer. - The third argument and the result are of type t2, such that t2 is at least - twice the size of t1. Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is + The first two arguments are of type t1 which should be a vector of integers. + The third argument and the result are of type t2, such that the size of + the elements of t2 is at least twice the size of the elements of t1. + Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is equivalent to: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = PLUS_EXPR (tmp2, arg3) or: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = WIDEN_SUM_EXPR (tmp2, arg3) */
On Mon, 15 May 2023, Andre Vieira (lists) wrote: > > > On 15/05/2023 12:01, Richard Biener wrote: > > On Mon, 15 May 2023, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >>> On Fri, 12 May 2023, Richard Sandiford wrote: > >>> > >>>> Richard Biener <rguenther@suse.de> writes: > >>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >>>>> > >>>>>> I have dealt with, I think..., most of your comments. There's quite a > >>>>>> few > >>>>>> changes, I think it's all a bit simpler now. I made some other changes > >>>>>> to the > >>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to > >>>>>> preserve > >>>>>> the same behaviour as we had with the tree codes before. Also added > >>>>>> some extra > >>>>>> checks to tree-cfg.cc that made sense to me. > >>>>>> > >>>>>> I am still regression testing the gimple-range-op change, as that was a > >>>>>> last > >>>>>> minute change, but the rest survived a bootstrap and regression test on > >>>>>> aarch64-unknown-linux-gnu. > >>>>>> > >>>>>> cover letter: > >>>>>> > >>>>>> This patch replaces the existing tree_code widen_plus and widen_minus > >>>>>> patterns with internal_fn versions. > >>>>>> > >>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and > >>>>>> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN > >>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN > >>>>>> respectively > >>>>>> except they provide convenience wrappers for defining conversions that > >>>>>> require > >>>>>> a hi/lo split. Each definition for <NAME> will require optabs for _hi > >>>>>> and _lo > >>>>>> and each of those will also require a signed and unsigned version in > >>>>>> the case > >>>>>> of widening. The hi/lo pair is necessary because the widening and > >>>>>> narrowing > >>>>>> operations take n narrow elements as inputs and return n/2 wide > >>>>>> elements as > >>>>>> outputs. The 'lo' operation operates on the first n/2 elements of > >>>>>> input. The > >>>>>> 'hi' operation operates on the second n/2 elements of input. Defining > >>>>>> an > >>>>>> internal_fn along with hi/lo variations allows a single internal > >>>>>> function to > >>>>>> be returned from a vect_recog function that will later be expanded to > >>>>>> hi/lo. > >>>>>> > >>>>>> > >>>>>> For example: > >>>>>> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > >>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> > >>>>>> (u/s)addl2 > >>>>>> IFN_VEC_WIDEN_PLUS_LO -> > >>>>>> vec_widen_<su>add_lo_<mode> > >>>>>> -> (u/s)addl > >>>>>> > >>>>>> This gives the same functionality as the previous > >>>>>> WIDEN_PLUS/WIDEN_MINUS tree > >>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > >>>>> > >>>>> What I still don't understand is how we are so narrowly focused on > >>>>> HI/LO? We need a combined scalar IFN for pattern selection (not > >>>>> sure why that's now called _HILO, I expected no suffix). Then there's > >>>>> three possibilities the target can implement this: > >>>>> > >>>>> 1) with a widen_[su]add<mode> instruction - I _think_ that's what > >>>>> RISCV is going to offer since it is a target where vector modes > >>>>> have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > >>>>> RVV can do a V4HI to V4SI widening and widening add/subtract > >>>>> using vwadd[u] and vwsub[u] (the HI->SI widening is actually > >>>>> done with a widening add of zero - eh). > >>>>> IIRC GCN is the same here. > >>>> > >>>> SVE currently does this too, but the addition and widening are > >>>> separate operations. E.g. in principle there's no reason why > >>>> you can't sign-extend one operand, zero-extend the other, and > >>>> then add the result together. Or you could extend them from > >>>> different sizes (QI and HI). All of those are supported > >>>> (if the costing allows them). > >>> > >>> I see. So why does the target the expose widen_[su]add<mode> at all? > >> > >> It shouldn't (need to) do that. I don't think we should have an optab > >> for the unsplit operation. > >> > >> At least on SVE, we really want the extensions to be fused with loads > >> (where possible) rather than with arithmetic. > >> > >> We can still do the widening arithmetic in one go. It's just that > >> fusing with the loads works for the mixed-sign and mixed-size cases, > >> and can handle more than just doubling the element size. > >> > >>>> If the target has operations to do combined extending and adding (or > >>>> whatever), then at the moment we rely on combine to generate them. > >>>> > >>>> So I think this case is separate from Andre's work. The addition > >>>> itself is just an ordinary addition, and any widening happens by > >>>> vectorising a CONVERT/NOP_EXPR. > >>>> > >>>>> 2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree > >>>>> codes currently support (exclusively) > >>>>> 3) similar, but widen_[su]add{_even,_odd}<mode> > >>>>> > >>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into > >>>>> a 2) corner without good reason. > >>>> > >>>> I suppose one question is: how much of the patch is really specific > >>>> to HI/LO, and how much is just grouping two halves together? > >>> > >>> Yep, that I don't know for sure. > >>> > >>>> The nice > >>>> thing about the internal-fn grouping macros is that, if (3) is > >>>> implemented in future, the structure will strongly encourage even/odd > >>>> pairs to be supported for all operations that support hi/lo. That is, > >>>> I would expect the grouping macros to be extended to define even/odd > >>>> ifns alongside hi/lo ones, rather than adding separate definitions > >>>> for even/odd functions. > >>>> > >>>> If so, at least from the internal-fn.* side of things, I think the > >>>> question > >>>> is whether it's OK to stick with hilo names for now, or whether we should > >>>> use more forward-looking names. > >>> > >>> I think for parts that are independent we could use a more > >>> forward-looking name. Maybe _halves? > >> > >> Using _halves for the ifn macros sounds good to me FWIW. > >> > >>> But I'm also not sure > >>> how much of that is really needed (it seems to be tied around > >>> optimizing optabs space?) > >> > >> Not sure what you mean by "this". Optabs space shouldn't be a problem > >> though. The optab encoding gives us a full int to play with, and it > >> could easily go up to 64 bits if necessary/convenient. > >> > >> At least on the internal-fn.* side, the aim is really just to establish > >> a regular structure, so that we don't have arbitrary differences between > >> different widening operations, or too much cut-&-paste. > > > > Hmm, I'm looking at the need for the std::map and > > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. > > The vectorizer pieces contain > > > > + if (code.is_fn_code ()) > > + { > > + internal_fn ifn = as_internal_fn ((combined_fn) code); > > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > > + > > + internal_fn lo, hi; > > + lookup_hilo_internal_fn (ifn, &lo, &hi); > > + *code1 = as_combined_fn (lo); > > + *code2 = as_combined_fn (hi); > > + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); > > + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); > > > > so that tries to automatically associate the scalar widening IFN > > with the set(s) of IFN pairs we can split to. But then this > > list should be static and there's no need to create a std::map? > > Maybe gencfn-macros.cc can be enhanced to output these static > > cases? Or the vectorizer could (as it did previously) simply > > open-code the handled cases (I guess since we deal with two > > cases only now I'd prefer that). > > > > Thanks, > > Richard. > > > > > >> Thanks, > >> Richard > >> > > > The patch I uploaded last no longer has std::map nor > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've attached > it again) Whoops, too many patches ... > I'm not sure I understand the _halves, do you mean that for the case where I > had _hilo or _HILO before we rename that to _halves/_HALVES such that it later > represents both _hi/_lo separation and _even/_odd? I don't see much shared stuff, but I guess we'd see when we add a case for EVEN/ODD. The verifier contains + else if (decomposes_to_hilo_fn_p (ifn)) + { + /* Non decomposed HILO stmts should not appear in IL, these are + merely used as an internal representation to the auto-vectorizer + pass and should have been expanded to their _LO _HI variants. */ + error ("gimple call has an non decomposed HILO IFN"); + debug_generic_stmt (fn); + return true; I think to support case 1) that's not wanted. Instead what you could check is that the types involved are vector types, so a subset of what you check for IFN_VEC_WIDEN_PLUS_LO etc. (but oddly it's not verified those are all operating on vector types only?) +/* Given an internal_fn IFN that is a HILO function, return its corresponding + LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + *lo = internal_fn (ifn + 1); + *hi = internal_fn (ifn + 2); that might become fragile if we add EVEN/ODD besides HI/LO unless we merge those with a DEF_INTERNAL_OPTAB_WIDENING_HILO_EVENODD_FN case, right? > And am I correct to assume we are just giving up on having a INTERNAL_OPTAB_FN > idea for 1)? Well, I think we want all of them in the end (or at least support them if target need arises). full vector, hi/lo and even/odd. Richard.
How about this? Not sure about the DEF_INTERNAL documentation I rewrote in internal-fn.def, was struggling to word these, so improvements welcome! gcc/ChangeLog: 2023-04-25 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> Tamar Christina <tamar.christina@arm.com> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename this ... (vec_widen_<su>add_lo_<mode>): ... to this. (vec_widen_<su>addl_hi_<mode>): Rename this ... (vec_widen_<su>add_hi_<mode>): ... to this. (vec_widen_<su>subl_lo_<mode>): Rename this ... (vec_widen_<su>sub_lo_<mode>): ... to this. (vec_widen_<su>subl_hi_<mode>): Rename this ... (vec_widen_<su>sub_hi_<mode>): ...to this. * doc/generic.texi: Document new IFN codes. * internal-fn.cc (ifn_cmp): Function to compare ifn's for sorting/searching. (lookup_hilo_internal_fn): Add lookup function. (commutative_binary_fn_p): Add widen_plus fn's. (widening_fn_p): New function. (narrowing_fn_p): New function. (direct_internal_fn_optab): Change visibility. * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an internal_fn that expands into multiple internal_fns for widening. (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing. (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions. * internal-fn.h (direct_internal_fn_optab): Declare new prototype. (lookup_hilo_internal_fn): Likewise. (widening_fn_p): Likewise. (Narrowing_fn_p): Likewise. * optabs.cc (commutative_optab_p): Add widening plus optabs. * optabs.def (OPTAB_D): Define widen add, sub optabs. * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns. * tree-inline.cc (estimate_num_insns): Return same cost for widen add and sub IFNs as previous tree_codes. * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support patterns with a hi/lo or even/odd split. (vect_recog_sad_pattern): Refactor to use new IFN codes. (vect_recog_widen_plus_pattern): Likewise. (vect_recog_widen_minus_pattern): Likewise. (vect_recog_average_pattern): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Add support for _HILO IFNs. (supportable_widening_operation): Likewise. * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: Test that new IFN_VEC_WIDEN_PLUS is being used. * gcc.target/aarch64/vect-widen-sub.c: Test that new IFN_VEC_WIDEN_MINUS is being used. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index bfc98a8d943467b33390defab9682f44efab5907..ffbbecb9409e1c2835d658c2a8855cd0e955c0f2 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4626,7 +4626,7 @@ [(set_attr "type" "neon_<ADDSUB:optab>_long")] ) -(define_expand "vec_widen_<su>addl_lo_<mode>" +(define_expand "vec_widen_<su>add_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4638,7 +4638,7 @@ DONE; }) -(define_expand "vec_widen_<su>addl_hi_<mode>" +(define_expand "vec_widen_<su>add_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4650,7 +4650,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_lo_<mode>" +(define_expand "vec_widen_<su>sub_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4662,7 +4662,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_hi_<mode>" +(define_expand "vec_widen_<su>sub_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..5e36dac2b1a10257616f12cdfb0b12d0f2879ae9 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1811,10 +1811,16 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @tindex VEC_WIDEN_MULT_LO_EXPR -@tindex VEC_WIDEN_PLUS_HI_EXPR -@tindex VEC_WIDEN_PLUS_LO_EXPR -@tindex VEC_WIDEN_MINUS_HI_EXPR -@tindex VEC_WIDEN_MINUS_LO_EXPR +@tindex IFN_VEC_WIDEN_PLUS +@tindex IFN_VEC_WIDEN_PLUS_HI +@tindex IFN_VEC_WIDEN_PLUS_LO +@tindex IFN_VEC_WIDEN_PLUS_EVEN +@tindex IFN_VEC_WIDEN_PLUS_ODD +@tindex IFN_VEC_WIDEN_MINUS +@tindex IFN_VEC_WIDEN_MINUS_HI +@tindex IFN_VEC_WIDEN_MINUS_LO +@tindex IFN_VEC_WIDEN_MINUS_EVEN +@tindex IFN_VEC_WIDEN_MINUS_ODD @tindex VEC_UNPACK_HI_EXPR @tindex VEC_UNPACK_LO_EXPR @tindex VEC_UNPACK_FLOAT_HI_EXPR @@ -1861,6 +1867,82 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the low @code{N/2} elements of the two vector are multiplied to produce the vector of @code{N/2} products. +@item IFN_VEC_WIDEN_PLUS +This internal function represents widening vector addition of two input +vectors. Its operands are vectors that contain the same number of elements +(@code{N}) of the same integral type. The result is a vector that contains +the same amount (@code{N}) of elements, of an integral type whose size is twice +as wide, as the input vectors. If the current target does not implement the +corresponding optabs the vectorizer may choose to split it into either a pair +of @code{IFN_VEC_WIDEN_PLUS_HI} and @code{IFN_VEC_WIDEN_PLUS_LO} or +@code{IFN_VEC_WIDEN_PLUS_EVEN} and @code{IFN_VEC_WIDEN_PLUS_ODD}, depending +on what optabs the target implements. + +@item IFN_VEC_WIDEN_PLUS_HI +@itemx IFN_VEC_WIDEN_PLUS_LO +These internal functions represent widening vector addition of the high and low +parts of the two input vectors, respectively. Their operands are vectors that +contain the same number of elements (@code{N}) of the same integral type. The +result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the +high @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. + +@item IFN_VEC_WIDEN_PLUS_EVEN +@itemx IFN_VEC_WIDEN_PLUS_ODD +These internal functions represent widening vector addition of the even and odd +elements of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_EVEN} the +even @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. In the case of @code{IFN_VEC_WIDEN_PLUS_ODD} the odd +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. + +@item IFN_VEC_WIDEN_MINUS +This internal function represents widening vector subtraction of two input +vectors. Its operands are vectors that contain the same number of elements +(@code{N}) of the same integral type. The result is a vector that contains +the same amount (@code{N}) of elements, of an integral type whose size is twice +as wide, as the input vectors. If the current target does not implement the +corresponding optabs the vectorizer may choose to split it into either a pair +of @code{IFN_VEC_WIDEN_MINUS_HI} and @code{IFN_VEC_WIDEN_MINUS_LO} or +@code{IFN_VEC_WIDEN_MINUS_EVEN} and @code{IFN_VEC_WIDEN_MINUS_ODD}, depending +on what optabs the target implements. + +@item IFN_VEC_WIDEN_MINUS_HI +@itemx IFN_VEC_WIDEN_MINUS_LO +These internal functions represent widening vector subtraction of the high and +low parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The high/low elements of the second vector are subtracted from the high/low +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second +vector are subtracted from the high @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. In the case of +@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second +vector are subtracted from the low @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. + +@item IFN_VEC_WIDEN_MINUS_EVEN +@itemx IFN_VEC_WIDEN_MINUS_ODD +These internal functions represent widening vector subtraction of the even and +odd parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The even/odd elements of the second vector are subtracted from the even/odd +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_EVEN} the even @code{N/2} elements of the second +vector are subtracted from the even @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. In the case of +@code{IFN_VEC_WIDEN_MINUS_ODD} the odd @code{N/2} elements of the second +vector are subtracted from the odd @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. + @item VEC_WIDEN_PLUS_HI_EXPR @itemx VEC_WIDEN_PLUS_LO_EXPR These nodes represent widening vector addition of the high and low parts of diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc index 594bd3043f0e944299ddfff219f757ef15a3dd61..33f4b7064a2a22aad49f27b24b409e91a5b89c69 100644 --- a/gcc/gimple-range-op.cc +++ b/gcc/gimple-range-op.cc @@ -1187,6 +1187,7 @@ gimple_range_op_handler::maybe_non_standard () { range_operator *signed_op = ptr_op_widen_mult_signed; range_operator *unsigned_op = ptr_op_widen_mult_unsigned; + bool signed1, signed2, signed_ret; if (gimple_code (m_stmt) == GIMPLE_ASSIGN) switch (gimple_assign_rhs_code (m_stmt)) { @@ -1202,32 +1203,55 @@ gimple_range_op_handler::maybe_non_standard () m_op1 = gimple_assign_rhs1 (m_stmt); m_op2 = gimple_assign_rhs2 (m_stmt); tree ret = gimple_assign_lhs (m_stmt); - bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; - bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; - bool signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; - - /* Normally these operands should all have the same sign, but - some passes and violate this by taking mismatched sign args. At - the moment the only one that's possible is mismatch inputs and - unsigned output. Once ranger supports signs for the operands we - can properly fix it, for now only accept the case we can do - correctly. */ - if ((signed1 ^ signed2) && signed_ret) - return; - - m_valid = true; - if (signed2 && !signed1) - std::swap (m_op1, m_op2); - - if (signed1 || signed2) - m_int = signed_op; - else - m_int = unsigned_op; + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; break; } default: - break; + return; + } + else if (gimple_code (m_stmt) == GIMPLE_CALL + && gimple_call_internal_p (m_stmt) + && gimple_get_lhs (m_stmt) != NULL_TREE) + switch (gimple_call_internal_fn (m_stmt)) + { + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: + { + signed_op = ptr_op_widen_plus_signed; + unsigned_op = ptr_op_widen_plus_unsigned; + m_valid = false; + m_op1 = gimple_call_arg (m_stmt, 0); + m_op2 = gimple_call_arg (m_stmt, 1); + tree ret = gimple_get_lhs (m_stmt); + signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED; + signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED; + signed_ret = TYPE_SIGN (TREE_TYPE (ret)) == SIGNED; + break; + } + default: + return; } + else + return; + + /* Normally these operands should all have the same sign, but some passes + and violate this by taking mismatched sign args. At the moment the only + one that's possible is mismatch inputs and unsigned output. Once ranger + supports signs for the operands we can properly fix it, for now only + accept the case we can do correctly. */ + if ((signed1 ^ signed2) && signed_ret) + return; + + m_valid = true; + if (signed2 && !signed1) + std::swap (m_op1, m_op2); + + if (signed1 || signed2) + m_int = signed_op; + else + m_int = unsigned_op; } // Set up a gimple_range_op_handler for any built in function which can be diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -90,6 +90,71 @@ lookup_internal_fn (const char *name) return entry ? *entry : IFN_LAST; } +/* Given an internal_fn IFN that is either a widening or narrowing function, return its + corresponding LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); + + switch (ifn) + { + default: + gcc_unreachable (); +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + *lo = internal_fn (IFN_##NAME##_LO); \ + *hi = internal_fn (IFN_##NAME##_HI); \ + break; +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME: \ + *lo = internal_fn (IFN_##NAME##_LO); \ + *hi = internal_fn (IFN_##NAME##_HI); \ + break; +#include "internal-fn.def" +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN + } +} + +extern void +lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even, + internal_fn *odd) +{ + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); + + switch (ifn) + { + default: + gcc_unreachable (); +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + *even = internal_fn (IFN_##NAME##_EVEN); \ + *odd = internal_fn (IFN_##NAME##_ODD); \ + break; +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME: \ + *even = internal_fn (IFN_##NAME##_EVEN); \ + *odd = internal_fn (IFN_##NAME##_ODD); \ + break; +#include "internal-fn.def" +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN + } +} + + /* Fnspec of each internal function, indexed by function number. */ const_tree internal_fn_fnspec_array[IFN_LAST + 1]; @@ -3852,7 +3917,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, /* Return the optab used by internal function FN. */ -static optab +optab direct_internal_fn_optab (internal_fn fn, tree_pair types) { switch (fn) @@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_VEC_WIDEN_PLUS: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: return true; default: @@ -4044,6 +4112,68 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as wide as the element size of the input vectors. */ + +bool +widening_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_WIDENING_OPTAB_FN + #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + case IFN_##NAME##_EVEN: \ + case IFN_##NAME##_ODD: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_WIDENING_OPTAB_FN + + default: + return false; + } +} + +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as narrow as the element size of the input vectors. */ + +bool +narrowing_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_NARROWING_OPTAB_FN + #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME##: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_NARROWING_OPTAB_FN + + default: + return false; + } +} + /* Return true if IFN_SET_EDOM is supported. */ bool @@ -4072,6 +4202,8 @@ set_edom_supported_p (void) expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_OPTAB_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4080,6 +4212,7 @@ set_edom_supported_p (void) where STMT is the statement that performs the call. */ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { + #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE, #include "internal-fn.def" 0 diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..e9edaa201ad4ad171a49119efa9d6bff49add9f4 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -85,6 +85,34 @@ along with GCC; see the file COPYING3. If not see says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal + functions with DEF_INTERNAL_SIGNED_OPTAB_FN: + - one that describes a widening operation with the same number of elements + in the output and input vectors, + - two that describe a pair of high-low widening operations where the output + vectors each have half the number of elements of the input vectors, + corresponding to the result of the widening operation on the top half and + bottom half, these have the suffixes _HI and _LO, + - and two that describe a pair of even-odd widening operations where the + output vectors each have half the number of elements of the input vectors, + corresponding to the result of the widening operation on the even and odd + elements, these have the suffixes _EVEN and _ODD. + These five internal functions will require two optabs each, a SIGNED_OPTAB + and an UNSIGNED_OTPAB. + + DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five internal + functions with DEF_INTERNAL_OPTAB_FN: + - one that describes a narrowing operation with the same number of elements + in the output and input vectors, + - two that describe a pair of high-low narrowing operations where the output + vector has the same number of elements in the top or bottom halves as the + full input vectors, these have the suffixes _HI and _LO. + - and two that describe a pair of even-odd narrowing operations where the + output vector has the same number of elements, in the even or odd positions, + as the full input vectors, these have the suffixes _EVEN and _ODD. + These five internal functions will require an optab each. + + Each entry must have a corresponding expander of the form: void expand_NAME (gimple_call stmt) @@ -123,6 +151,24 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_WIDENING_OPTAB_FN +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR, SOPTAB##_even, UOPTAB##_even, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, SOPTAB##_odd, UOPTAB##_odd, TYPE) +#endif + +#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _EVEN, FLAGS, OPTAB##_even, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _ODD, FLAGS, OPTAB##_odd, TYPE) +#endif + DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, @@ -315,6 +361,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_sadd, vec_widen_uadd, + binary) +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_ssub, vec_widen_usub, + binary) DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 08922ed4254898f5fffca3f33973e96ed9ce772f..3904ba3ca36949d844532a6a9303f550533311a4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_INTERNAL_FN_H #define GCC_INTERNAL_FN_H +#include "insn-codes.h" +#include "insn-opinit.h" + + /* INTEGER_CST values for IFN_UNIQUE function arg-0. UNSPEC: Undifferentiated UNIQUE. @@ -112,6 +116,10 @@ internal_fn_name (enum internal_fn fn) } extern internal_fn lookup_internal_fn (const char *); +extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *); +extern void lookup_evenodd_internal_fn (internal_fn, internal_fn *, + internal_fn *); +extern optab direct_internal_fn_optab (internal_fn, tree_pair); /* Return the ECF_* flags for function FN. */ @@ -210,6 +218,8 @@ extern bool commutative_binary_fn_p (internal_fn); extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); +extern bool widening_fn_p (code_helper); +extern bool narrowing_fn_p (code_helper); extern bool set_edom_supported_p (void); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index c8e39c82d57a7d726e7da33d247b80f32ec9236c..5a08d91e550b2d92e9572211f811fdba99a33a38 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1314,7 +1314,15 @@ commutative_optab_p (optab binoptab) || binoptab == smul_widen_optab || binoptab == umul_widen_optab || binoptab == smul_highpart_optab - || binoptab == umul_highpart_optab); + || binoptab == umul_highpart_optab + || binoptab == vec_widen_saddl_hi_optab + || binoptab == vec_widen_saddl_lo_optab + || binoptab == vec_widen_uaddl_hi_optab + || binoptab == vec_widen_uaddl_lo_optab + || binoptab == vec_widen_sadd_hi_optab + || binoptab == vec_widen_sadd_lo_optab + || binoptab == vec_widen_uadd_hi_optab + || binoptab == vec_widen_uadd_lo_optab); } /* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b300c9ca5737de9be809fa01aabe5e01..d41ed6e1afaddd019c7470f965c0ad21c8b2b9d7 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -410,6 +410,16 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") +OPTAB_D (vec_widen_ssub_optab, "vec_widen_ssub_$a") +OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a") +OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a") +OPTAB_D (vec_widen_ssub_odd_optab, "vec_widen_ssub_odd_$a") +OPTAB_D (vec_widen_ssub_even_optab, "vec_widen_ssub_even_$a") +OPTAB_D (vec_widen_sadd_optab, "vec_widen_sadd_$a") +OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a") +OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a") +OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a") +OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a") OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") @@ -422,6 +432,16 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") +OPTAB_D (vec_widen_usub_optab, "vec_widen_usub_$a") +OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a") +OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a") +OPTAB_D (vec_widen_usub_odd_optab, "vec_widen_usub_odd_$a") +OPTAB_D (vec_widen_usub_even_optab, "vec_widen_usub_even_$a") +OPTAB_D (vec_widen_uadd_optab, "vec_widen_uadd_$a") +OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a") +OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a") +OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a") +OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index 0aeebb67fac864db284985f4a6f0653af281d62b..0e847cd04ca6e33f67a86a78a36d35d42aba2627 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -65,6 +65,7 @@ along with GCC; see the file COPYING3. If not see #include "asan.h" #include "profile.h" #include "sreal.h" +#include "internal-fn.h" /* This file contains functions for building the Control Flow Graph (CFG) for a function tree. */ @@ -3411,6 +3412,40 @@ verify_gimple_call (gcall *stmt) debug_generic_stmt (fn); return true; } + internal_fn ifn = gimple_call_internal_fn (stmt); + if (ifn == IFN_LAST) + { + error ("gimple call has an invalid IFN"); + debug_generic_stmt (fn); + return true; + } + else if (widening_fn_p (ifn) + || narrowing_fn_p (ifn)) + { + tree lhs = gimple_get_lhs (stmt); + if (!lhs) + { + error ("vector IFN call with no lhs"); + debug_generic_stmt (fn); + return true; + } + + bool non_vector_operands = false; + for (unsigned i = 0; i < gimple_call_num_args (stmt); ++i) + if (!VECTOR_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i)))) + { + non_vector_operands = true; + break; + } + + if (non_vector_operands + || !VECTOR_TYPE_P (TREE_TYPE (lhs))) + { + error ("invalid non-vector operands in vector IFN call"); + debug_generic_stmt (fn); + return true; + } + } } else { diff --git a/gcc/tree-inline.cc b/gcc/tree-inline.cc index 63a19f8d1d89c6bd5d8e55a299cbffaa324b4b84..d74d8db2173b1ab117250fea89de5212d5e354ec 100644 --- a/gcc/tree-inline.cc +++ b/gcc/tree-inline.cc @@ -4433,7 +4433,20 @@ estimate_num_insns (gimple *stmt, eni_weights *weights) tree decl; if (gimple_call_internal_p (stmt)) - return 0; + { + internal_fn fn = gimple_call_internal_fn (stmt); + switch (fn) + { + case IFN_VEC_WIDEN_PLUS_HI: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_MINUS_HI: + case IFN_VEC_WIDEN_MINUS_LO: + return 1; + + default: + return 0; + } + } else if ((decl = gimple_call_fndecl (stmt)) && fndecl_built_in_p (decl)) { diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1778af0242898e3dc73d94d22a5b8505628a53b5..dcd4b5561600346a2c10bd5133507329206e8837 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type) static unsigned int vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, - tree_code widened_code, bool shift_p, + code_helper widened_code, bool shift_p, unsigned int max_nops, vect_unpromoted_value *unprom, tree *common_type, enum optab_subtype *subtype = NULL) { /* Check for an integer operation with the right code. */ - gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); - if (!assign) + gimple* stmt = stmt_info->stmt; + if (!(is_gimple_assign (stmt) || is_gimple_call (stmt))) + return 0; + + code_helper rhs_code; + if (is_gimple_assign (stmt)) + rhs_code = gimple_assign_rhs_code (stmt); + else if (is_gimple_call (stmt)) + rhs_code = gimple_call_combined_fn (stmt); + else return 0; - tree_code rhs_code = gimple_assign_rhs_code (assign); - if (rhs_code != code && rhs_code != widened_code) + if (rhs_code != code + && rhs_code != widened_code) return 0; - tree type = TREE_TYPE (gimple_assign_lhs (assign)); + tree lhs = gimple_get_lhs (stmt); + tree type = TREE_TYPE (lhs); if (!INTEGRAL_TYPE_P (type)) return 0; @@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, { vect_unpromoted_value *this_unprom = &unprom[next_op]; unsigned int nops = 1; - tree op = gimple_op (assign, i + 1); + tree op = gimple_arg (stmt, i); if (i == 1 && TREE_CODE (op) == INTEGER_CST) { /* We already have a common type from earlier operands. @@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo, /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom[2]; - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR, + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, + IFN_VEC_WIDEN_MINUS, false, 2, unprom, &half_type)) return NULL; @@ -1395,14 +1405,16 @@ static gimple * vect_recog_widen_op_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out, tree_code orig_code, code_helper wide_code, - bool shift_p, const char *name) + bool shift_p, const char *name, + optab_subtype *subtype = NULL) { gimple *last_stmt = last_stmt_info->stmt; vect_unpromoted_value unprom[2]; tree half_type; if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code, - shift_p, 2, unprom, &half_type)) + shift_p, 2, unprom, &half_type, subtype)) + return NULL; /* Pattern detected. */ @@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo, type, pattern_stmt, vecctype); } +static gimple * +vect_recog_widen_op_pattern (vec_info *vinfo, + stmt_vec_info last_stmt_info, tree *type_out, + tree_code orig_code, internal_fn wide_ifn, + bool shift_p, const char *name, + optab_subtype *subtype = NULL) +{ + combined_fn ifn = as_combined_fn (wide_ifn); + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + orig_code, ifn, shift_p, name, + subtype); +} + + /* Try to detect multiplication on widened inputs, converting MULT_EXPR to WIDEN_MULT_EXPR. See vect_recog_widen_op_pattern for details. */ @@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, } /* Try to detect addition on widened inputs, converting PLUS_EXPR - to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_PLUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - PLUS_EXPR, WIDEN_PLUS_EXPR, false, - "vect_recog_widen_plus_pattern"); + PLUS_EXPR, IFN_VEC_WIDEN_PLUS, + false, "vect_recog_widen_plus_pattern", + &subtype); } /* Try to detect subtraction on widened inputs, converting MINUS_EXPR - to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_MINUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - MINUS_EXPR, WIDEN_MINUS_EXPR, false, - "vect_recog_widen_minus_pattern"); + MINUS_EXPR, IFN_VEC_WIDEN_MINUS, + false, "vect_recog_widen_minus_pattern", + &subtype); } /* Function vect_recog_ctz_ffs_pattern @@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo, vect_unpromoted_value unprom[3]; tree new_type; unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR, - WIDEN_PLUS_EXPR, false, 3, + IFN_VEC_WIDEN_PLUS, false, 3, unprom, &new_type); if (nops == 0) return NULL; @@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mask_conversion_pattern, "mask_conversion" }, { vect_recog_widen_plus_pattern, "widen_plus" }, { vect_recog_widen_minus_pattern, "widen_minus" }, + /* These must come after the double widening ones. */ }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index d152ae9ab10b361b88c0f839d6951c43b954750a..132c0337b7f541bfb114c0a3d2abbeffdad79880 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5038,7 +5038,8 @@ vectorizable_conversion (vec_info *vinfo, bool widen_arith = (code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR || code == WIDEN_MULT_EXPR - || code == WIDEN_LSHIFT_EXPR); + || code == WIDEN_LSHIFT_EXPR + || widening_fn_p (code)); if (!widen_arith && !CONVERT_EXPR_CODE_P (code) @@ -5088,8 +5089,8 @@ vectorizable_conversion (vec_info *vinfo, gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR || code == WIDEN_PLUS_EXPR - || code == WIDEN_MINUS_EXPR); - + || code == WIDEN_MINUS_EXPR + || widening_fn_p (code)); op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : gimple_call_arg (stmt, 0); @@ -12478,26 +12479,69 @@ supportable_widening_operation (vec_info *vinfo, optab1 = vec_unpacks_sbool_lo_optab; optab2 = vec_unpacks_sbool_hi_optab; } - else - { - optab1 = optab_for_tree_code (c1, vectype, optab_default); - optab2 = optab_for_tree_code (c2, vectype, optab_default); + + vec_mode = TYPE_MODE (vectype); + if (widening_fn_p (code)) + { + /* If this is an internal fn then we must check whether the target + supports either a low-high split or an even-odd split. */ + internal_fn ifn = as_internal_fn ((combined_fn) code); + + internal_fn lo, hi, even, odd; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = direct_internal_fn_optab (lo, {vectype, vectype}); + optab2 = direct_internal_fn_optab (hi, {vectype, vectype}); + + /* If we don't support low-high, then check for even-odd. */ + if (!optab1 + || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing + || !optab2 + || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) + { + lookup_evenodd_internal_fn (ifn, &even, &odd); + *code1 = as_combined_fn (even); + *code2 = as_combined_fn (odd); + optab1 = direct_internal_fn_optab (even, {vectype, vectype}); + optab2 = direct_internal_fn_optab (odd, {vectype, vectype}); + } + } + else if (code.is_tree_code ()) + { + if (code == FIX_TRUNC_EXPR) + { + /* The signedness is determined from output operand. */ + optab1 = optab_for_tree_code (c1, vectype_out, optab_default); + optab2 = optab_for_tree_code (c2, vectype_out, optab_default); + } + else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ()) + && VECTOR_BOOLEAN_TYPE_P (wide_vectype) + && VECTOR_BOOLEAN_TYPE_P (vectype) + && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + { + /* If the input and result modes are the same, a different optab + is needed where we pass in the number of units in vectype. */ + optab1 = vec_unpacks_sbool_lo_optab; + optab2 = vec_unpacks_sbool_hi_optab; + } + else + { + optab1 = optab_for_tree_code (c1, vectype, optab_default); + optab2 = optab_for_tree_code (c2, vectype, optab_default); + } + *code1 = c1; + *code2 = c2; } if (!optab1 || !optab2) return false; - vec_mode = TYPE_MODE (vectype); if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) return false; - if (code.is_tree_code ()) - { - *code1 = c1; - *code2 = c2; - } - if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype) && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype)) diff --git a/gcc/tree.def b/gcc/tree.def index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2) /* Widening sad (sum of absolute differences). - The first two arguments are of type t1 which should be integer. - The third argument and the result are of type t2, such that t2 is at least - twice the size of t1. Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is + The first two arguments are of type t1 which should be a vector of integers. + The third argument and the result are of type t2, such that the size of + the elements of t2 is at least twice the size of the elements of t1. + Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is equivalent to: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = PLUS_EXPR (tmp2, arg3) or: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = WIDEN_SUM_EXPR (tmp2, arg3) */
On Thu, 18 May 2023, Andre Vieira (lists) wrote: > How about this? > > Not sure about the DEF_INTERNAL documentation I rewrote in internal-fn.def, > was struggling to word these, so improvements welcome! The even/odd variant optabs are also commutative_optab_p, so is the vec_widen_sadd without hi/lo or even/odd. +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ do you really want -all? I think you want -details + else if (widening_fn_p (ifn) + || narrowing_fn_p (ifn)) + { + tree lhs = gimple_get_lhs (stmt); + if (!lhs) + { + error ("vector IFN call with no lhs"); + debug_generic_stmt (fn); that's an error because ...? Maybe we want to verify this for all ECF_CONST|ECF_NOTHROW (or pure instead of const) internal function calls, but I wouldn't add any verification as part of this patch (not special to widening/narrowing fns either). if (gimple_call_internal_p (stmt)) - return 0; + { + internal_fn fn = gimple_call_internal_fn (stmt); + switch (fn) + { + case IFN_VEC_WIDEN_PLUS_HI: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_MINUS_HI: + case IFN_VEC_WIDEN_MINUS_LO: + return 1; this now looks incomplete. I think that we want instead to have a default: returning 1 and then special-cases we want to cost as zero. Not sure which - maybe blame tells why this was added? I think we can deal with this as followup (likewise the ranger additions). Otherwise looks good to me. Thanks, Richard. > gcc/ChangeLog: > > 2023-04-25 Andre Vieira <andre.simoesdiasvieira@arm.com> > Joel Hutton <joel.hutton@arm.com> > Tamar Christina <tamar.christina@arm.com> > > * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): > Rename > this ... > (vec_widen_<su>add_lo_<mode>): ... to this. > (vec_widen_<su>addl_hi_<mode>): Rename this ... > (vec_widen_<su>add_hi_<mode>): ... to this. > (vec_widen_<su>subl_lo_<mode>): Rename this ... > (vec_widen_<su>sub_lo_<mode>): ... to this. > (vec_widen_<su>subl_hi_<mode>): Rename this ... > (vec_widen_<su>sub_hi_<mode>): ...to this. > * doc/generic.texi: Document new IFN codes. > * internal-fn.cc (ifn_cmp): Function to compare ifn's for > sorting/searching. > (lookup_hilo_internal_fn): Add lookup function. > (commutative_binary_fn_p): Add widen_plus fn's. > (widening_fn_p): New function. > (narrowing_fn_p): New function. > (direct_internal_fn_optab): Change visibility. > * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an > internal_fn that expands into multiple internal_fns for widening. > (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing. > (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, > IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, > IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, > IFN_VEC_WIDEN_MINUS_LO, > IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening > plus,minus functions. > * internal-fn.h (direct_internal_fn_optab): Declare new prototype. > (lookup_hilo_internal_fn): Likewise. > (widening_fn_p): Likewise. > (Narrowing_fn_p): Likewise. > * optabs.cc (commutative_optab_p): Add widening plus optabs. > * optabs.def (OPTAB_D): Define widen add, sub optabs. > * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns. > * tree-inline.cc (estimate_num_insns): Return same > cost for widen add and sub IFNs as previous tree_codes. > * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support > patterns with a hi/lo or even/odd split. > (vect_recog_sad_pattern): Refactor to use new IFN codes. > (vect_recog_widen_plus_pattern): Likewise. > (vect_recog_widen_minus_pattern): Likewise. > (vect_recog_average_pattern): Likewise. > * tree-vect-stmts.cc (vectorizable_conversion): Add support for > _HILO IFNs. > (supportable_widening_operation): Likewise. > * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vect-widen-add.c: Test that new > IFN_VEC_WIDEN_PLUS is being used. > * gcc.target/aarch64/vect-widen-sub.c: Test that new > IFN_VEC_WIDEN_MINUS is being used. >
Hi, This is the updated patch and cover letter. Patches for inline and gimple-op changes will follow soon. DEF_INTERNAL_WIDENING_OPTAB_FN and DEF_INTERNAL_NARROWING_OPTAB_FN are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively. With the exception that they provide convenience wrappers for a single vector to vector conversion, a hi/lo split or an even/odd split. Each definition for <NAME> will require either signed optabs named <UOPTAB> and <SOPTAB> (for widening) or a single <OPTAB> (for narrowing) for each of the five functions it creates. For example, for widening addition the DEF_INTERNAL_WIDENING_OPTAB_FN will create five internal functions: IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN and IFN_VEC_WIDEN_PLUS_ODD. Each requiring two optabs, one for signed and one for unsigned. Aarch64 implements the hi/lo split optabs: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_<su>add_hi_<mode> -> (u/s)addl2 IFN_VEC_WIDEN_PLUS_LO -> vec_widen_<su>add_lo_<mode> -> (u/s)addl This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. gcc/ChangeLog: 2023-04-25 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> Tamar Christina <tamar.christina@arm.com> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): Rename this ... (vec_widen_<su>add_lo_<mode>): ... to this. (vec_widen_<su>addl_hi_<mode>): Rename this ... (vec_widen_<su>add_hi_<mode>): ... to this. (vec_widen_<su>subl_lo_<mode>): Rename this ... (vec_widen_<su>sub_lo_<mode>): ... to this. (vec_widen_<su>subl_hi_<mode>): Rename this ... (vec_widen_<su>sub_hi_<mode>): ...to this. * doc/generic.texi: Document new IFN codes. * internal-fn.cc (ifn_cmp): Function to compare ifn's for sorting/searching. (lookup_hilo_internal_fn): Add lookup function. (commutative_binary_fn_p): Add widen_plus fn's. (widening_fn_p): New function. (narrowing_fn_p): New function. (direct_internal_fn_optab): Change visibility. * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an internal_fn that expands into multiple internal_fns for widening. (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing. (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, IFN_VEC_WIDEN_MINUS_LO, IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening plus,minus functions. * internal-fn.h (direct_internal_fn_optab): Declare new prototype. (lookup_hilo_internal_fn): Likewise. (widening_fn_p): Likewise. (Narrowing_fn_p): Likewise. * optabs.cc (commutative_optab_p): Add widening plus optabs. * optabs.def (OPTAB_D): Define widen add, sub optabs. * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support patterns with a hi/lo or even/odd split. (vect_recog_sad_pattern): Refactor to use new IFN codes. (vect_recog_widen_plus_pattern): Likewise. (vect_recog_widen_minus_pattern): Likewise. (vect_recog_average_pattern): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Add support for _HILO IFNs. (supportable_widening_operation): Likewise. * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-widen-add.c: Test that new IFN_VEC_WIDEN_PLUS is being used. * gcc.target/aarch64/vect-widen-sub.c: Test that new IFN_VEC_WIDEN_MINUS is being used. On 22/05/2023 14:06, Richard Biener wrote: > On Thu, 18 May 2023, Andre Vieira (lists) wrote: > >> How about this? >> >> Not sure about the DEF_INTERNAL documentation I rewrote in internal-fn.def, >> was struggling to word these, so improvements welcome! > > The even/odd variant optabs are also commutative_optab_p, so is > the vec_widen_sadd without hi/lo or even/odd. > > +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ > > do you really want -all? I think you want -details > > + else if (widening_fn_p (ifn) > + || narrowing_fn_p (ifn)) > + { > + tree lhs = gimple_get_lhs (stmt); > + if (!lhs) > + { > + error ("vector IFN call with no lhs"); > + debug_generic_stmt (fn); > > that's an error because ...? Maybe we want to verify this > for all ECF_CONST|ECF_NOTHROW (or pure instead of const) internal > function calls, but I wouldn't add any verification as part > of this patch (not special to widening/narrowing fns either). > > if (gimple_call_internal_p (stmt)) > - return 0; > + { > + internal_fn fn = gimple_call_internal_fn (stmt); > + switch (fn) > + { > + case IFN_VEC_WIDEN_PLUS_HI: > + case IFN_VEC_WIDEN_PLUS_LO: > + case IFN_VEC_WIDEN_MINUS_HI: > + case IFN_VEC_WIDEN_MINUS_LO: > + return 1; > > this now looks incomplete. I think that we want instead to > have a default: returning 1 and then special-cases we want > to cost as zero. Not sure which - maybe blame tells why > this was added? I think we can deal with this as followup > (likewise the ranger additions). > > Otherwise looks good to me. > > Thanks, > Richard. > >> gcc/ChangeLog: >> >> 2023-04-25 Andre Vieira <andre.simoesdiasvieira@arm.com> >> Joel Hutton <joel.hutton@arm.com> >> Tamar Christina <tamar.christina@arm.com> >> >> * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>): >> Rename >> this ... >> (vec_widen_<su>add_lo_<mode>): ... to this. >> (vec_widen_<su>addl_hi_<mode>): Rename this ... >> (vec_widen_<su>add_hi_<mode>): ... to this. >> (vec_widen_<su>subl_lo_<mode>): Rename this ... >> (vec_widen_<su>sub_lo_<mode>): ... to this. >> (vec_widen_<su>subl_hi_<mode>): Rename this ... >> (vec_widen_<su>sub_hi_<mode>): ...to this. >> * doc/generic.texi: Document new IFN codes. >> * internal-fn.cc (ifn_cmp): Function to compare ifn's for >> sorting/searching. >> (lookup_hilo_internal_fn): Add lookup function. >> (commutative_binary_fn_p): Add widen_plus fn's. >> (widening_fn_p): New function. >> (narrowing_fn_p): New function. >> (direct_internal_fn_optab): Change visibility. >> * internal-fn.def (DEF_INTERNAL_WIDENING_OPTAB_FN): Macro to define an >> internal_fn that expands into multiple internal_fns for widening. >> (DEF_INTERNAL_NARROWING_OPTAB_FN): Likewise but for narrowing. >> (IFN_VEC_WIDEN_PLUS, IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO, >> IFN_VEC_WIDEN_PLUS_EVEN, IFN_VEC_WIDEN_PLUS_ODD, >> IFN_VEC_WIDEN_MINUS, IFN_VEC_WIDEN_MINUS_HI, >> IFN_VEC_WIDEN_MINUS_LO, >> IFN_VEC_WIDEN_MINUS_ODD, IFN_VEC_WIDEN_MINUS_EVEN): Define widening >> plus,minus functions. >> * internal-fn.h (direct_internal_fn_optab): Declare new prototype. >> (lookup_hilo_internal_fn): Likewise. >> (widening_fn_p): Likewise. >> (Narrowing_fn_p): Likewise. >> * optabs.cc (commutative_optab_p): Add widening plus optabs. >> * optabs.def (OPTAB_D): Define widen add, sub optabs. >> * tree-cfg.cc (verify_gimple_call): Add checks for widening ifns. >> * tree-inline.cc (estimate_num_insns): Return same >> cost for widen add and sub IFNs as previous tree_codes. >> * tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support >> patterns with a hi/lo or even/odd split. >> (vect_recog_sad_pattern): Refactor to use new IFN codes. >> (vect_recog_widen_plus_pattern): Likewise. >> (vect_recog_widen_minus_pattern): Likewise. >> (vect_recog_average_pattern): Likewise. >> * tree-vect-stmts.cc (vectorizable_conversion): Add support for >> _HILO IFNs. >> (supportable_widening_operation): Likewise. >> * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/vect-widen-add.c: Test that new >> IFN_VEC_WIDEN_PLUS is being used. >> * gcc.target/aarch64/vect-widen-sub.c: Test that new >> IFN_VEC_WIDEN_MINUS is being used. >> > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index da9c59e655465a74926b81b95b4ac8c353efb1b7..b404d5cabf9df8ea8c70ea4537deb978d351c51e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4626,7 +4626,7 @@ [(set_attr "type" "neon_<ADDSUB:optab>_long")] ) -(define_expand "vec_widen_<su>addl_lo_<mode>" +(define_expand "vec_widen_<su>add_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4638,7 +4638,7 @@ DONE; }) -(define_expand "vec_widen_<su>addl_hi_<mode>" +(define_expand "vec_widen_<su>add_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4650,7 +4650,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_lo_<mode>" +(define_expand "vec_widen_<su>sub_lo_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] @@ -4662,7 +4662,7 @@ DONE; }) -(define_expand "vec_widen_<su>subl_hi_<mode>" +(define_expand "vec_widen_<su>sub_hi_<mode>" [(match_operand:<VWIDE> 0 "register_operand") (ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand")) (ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand"))] diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi index 8b2882da4fe7da07d22b4e5384d049ba7d3907bf..5e36dac2b1a10257616f12cdfb0b12d0f2879ae9 100644 --- a/gcc/doc/generic.texi +++ b/gcc/doc/generic.texi @@ -1811,10 +1811,16 @@ a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @tindex VEC_RSHIFT_EXPR @tindex VEC_WIDEN_MULT_HI_EXPR @tindex VEC_WIDEN_MULT_LO_EXPR -@tindex VEC_WIDEN_PLUS_HI_EXPR -@tindex VEC_WIDEN_PLUS_LO_EXPR -@tindex VEC_WIDEN_MINUS_HI_EXPR -@tindex VEC_WIDEN_MINUS_LO_EXPR +@tindex IFN_VEC_WIDEN_PLUS +@tindex IFN_VEC_WIDEN_PLUS_HI +@tindex IFN_VEC_WIDEN_PLUS_LO +@tindex IFN_VEC_WIDEN_PLUS_EVEN +@tindex IFN_VEC_WIDEN_PLUS_ODD +@tindex IFN_VEC_WIDEN_MINUS +@tindex IFN_VEC_WIDEN_MINUS_HI +@tindex IFN_VEC_WIDEN_MINUS_LO +@tindex IFN_VEC_WIDEN_MINUS_EVEN +@tindex IFN_VEC_WIDEN_MINUS_ODD @tindex VEC_UNPACK_HI_EXPR @tindex VEC_UNPACK_LO_EXPR @tindex VEC_UNPACK_FLOAT_HI_EXPR @@ -1861,6 +1867,82 @@ vector of @code{N/2} products. In the case of @code{VEC_WIDEN_MULT_LO_EXPR} the low @code{N/2} elements of the two vector are multiplied to produce the vector of @code{N/2} products. +@item IFN_VEC_WIDEN_PLUS +This internal function represents widening vector addition of two input +vectors. Its operands are vectors that contain the same number of elements +(@code{N}) of the same integral type. The result is a vector that contains +the same amount (@code{N}) of elements, of an integral type whose size is twice +as wide, as the input vectors. If the current target does not implement the +corresponding optabs the vectorizer may choose to split it into either a pair +of @code{IFN_VEC_WIDEN_PLUS_HI} and @code{IFN_VEC_WIDEN_PLUS_LO} or +@code{IFN_VEC_WIDEN_PLUS_EVEN} and @code{IFN_VEC_WIDEN_PLUS_ODD}, depending +on what optabs the target implements. + +@item IFN_VEC_WIDEN_PLUS_HI +@itemx IFN_VEC_WIDEN_PLUS_LO +These internal functions represent widening vector addition of the high and low +parts of the two input vectors, respectively. Their operands are vectors that +contain the same number of elements (@code{N}) of the same integral type. The +result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_HI} the +high @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. In the case of @code{IFN_VEC_WIDEN_PLUS_LO} the low +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. + +@item IFN_VEC_WIDEN_PLUS_EVEN +@itemx IFN_VEC_WIDEN_PLUS_ODD +These internal functions represent widening vector addition of the even and odd +elements of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The result is a vector that contains half as many elements, of an integral type +whose size is twice as wide. In the case of @code{IFN_VEC_WIDEN_PLUS_EVEN} the +even @code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. In the case of @code{IFN_VEC_WIDEN_PLUS_ODD} the odd +@code{N/2} elements of the two vectors are added to produce the vector of +@code{N/2} additions. + +@item IFN_VEC_WIDEN_MINUS +This internal function represents widening vector subtraction of two input +vectors. Its operands are vectors that contain the same number of elements +(@code{N}) of the same integral type. The result is a vector that contains +the same amount (@code{N}) of elements, of an integral type whose size is twice +as wide, as the input vectors. If the current target does not implement the +corresponding optabs the vectorizer may choose to split it into either a pair +of @code{IFN_VEC_WIDEN_MINUS_HI} and @code{IFN_VEC_WIDEN_MINUS_LO} or +@code{IFN_VEC_WIDEN_MINUS_EVEN} and @code{IFN_VEC_WIDEN_MINUS_ODD}, depending +on what optabs the target implements. + +@item IFN_VEC_WIDEN_MINUS_HI +@itemx IFN_VEC_WIDEN_MINUS_LO +These internal functions represent widening vector subtraction of the high and +low parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The high/low elements of the second vector are subtracted from the high/low +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_HI} the high @code{N/2} elements of the second +vector are subtracted from the high @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. In the case of +@code{IFN_VEC_WIDEN_MINUS_LO} the low @code{N/2} elements of the second +vector are subtracted from the low @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. + +@item IFN_VEC_WIDEN_MINUS_EVEN +@itemx IFN_VEC_WIDEN_MINUS_ODD +These internal functions represent widening vector subtraction of the even and +odd parts of the two input vectors, respectively. Their operands are vectors +that contain the same number of elements (@code{N}) of the same integral type. +The even/odd elements of the second vector are subtracted from the even/odd +elements of the first. The result is a vector that contains half as many +elements, of an integral type whose size is twice as wide. In the case of +@code{IFN_VEC_WIDEN_MINUS_EVEN} the even @code{N/2} elements of the second +vector are subtracted from the even @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. In the case of +@code{IFN_VEC_WIDEN_MINUS_ODD} the odd @code{N/2} elements of the second +vector are subtracted from the odd @code{N/2} of the first to produce the +vector of @code{N/2} subtractions. + @item VEC_WIDEN_PLUS_HI_EXPR @itemx VEC_WIDEN_PLUS_LO_EXPR These nodes represent widening vector addition of the high and low parts of diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -90,6 +90,71 @@ lookup_internal_fn (const char *name) return entry ? *entry : IFN_LAST; } +/* Given an internal_fn IFN that is either a widening or narrowing function, return its + corresponding LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); + + switch (ifn) + { + default: + gcc_unreachable (); +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + *lo = internal_fn (IFN_##NAME##_LO); \ + *hi = internal_fn (IFN_##NAME##_HI); \ + break; +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME: \ + *lo = internal_fn (IFN_##NAME##_LO); \ + *hi = internal_fn (IFN_##NAME##_HI); \ + break; +#include "internal-fn.def" +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN + } +} + +extern void +lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even, + internal_fn *odd) +{ + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); + + switch (ifn) + { + default: + gcc_unreachable (); +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + *even = internal_fn (IFN_##NAME##_EVEN); \ + *odd = internal_fn (IFN_##NAME##_ODD); \ + break; +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME: \ + *even = internal_fn (IFN_##NAME##_EVEN); \ + *odd = internal_fn (IFN_##NAME##_ODD); \ + break; +#include "internal-fn.def" +#undef DEF_INTERNAL_FN +#undef DEF_INTERNAL_WIDENING_OPTAB_FN +#undef DEF_INTERNAL_NARROWING_OPTAB_FN + } +} + + /* Fnspec of each internal function, indexed by function number. */ const_tree internal_fn_fnspec_array[IFN_LAST + 1]; @@ -3852,7 +3917,7 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types, /* Return the optab used by internal function FN. */ -static optab +optab direct_internal_fn_optab (internal_fn fn, tree_pair types) { switch (fn) @@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_VEC_WIDEN_PLUS: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: return true; default: @@ -4044,6 +4112,68 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as wide as the element size of the input vectors. */ + +bool +widening_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_WIDENING_OPTAB_FN + #define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ + case IFN_##NAME: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + case IFN_##NAME##_EVEN: \ + case IFN_##NAME##_ODD: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_WIDENING_OPTAB_FN + + default: + return false; + } +} + +/* Return true if this CODE describes an internal_fn that returns a vector with + elements twice as narrow as the element size of the input vectors. */ + +bool +narrowing_fn_p (code_helper code) +{ + if (!code.is_fn_code ()) + return false; + + if (!internal_fn_p ((combined_fn) code)) + return false; + + internal_fn fn = as_internal_fn ((combined_fn) code); + switch (fn) + { + #undef DEF_INTERNAL_NARROWING_OPTAB_FN + #define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ + case IFN_##NAME##: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + case IFN_##NAME##_HI: \ + case IFN_##NAME##_LO: \ + return true; + #include "internal-fn.def" + #undef DEF_INTERNAL_NARROWING_OPTAB_FN + + default: + return false; + } +} + /* Return true if IFN_SET_EDOM is supported. */ bool @@ -4072,6 +4202,8 @@ set_edom_supported_p (void) expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_FN +#undef DEF_INTERNAL_SIGNED_OPTAB_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: @@ -4080,6 +4212,7 @@ set_edom_supported_p (void) where STMT is the statement that performs the call. */ static void (*const internal_fn_expanders[]) (internal_fn, gcall *) = { + #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) expand_##CODE, #include "internal-fn.def" 0 diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..e9edaa201ad4ad171a49119efa9d6bff49add9f4 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -85,6 +85,34 @@ along with GCC; see the file COPYING3. If not see says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_WIDENING_OPTAB_FN is a wrapper that defines five internal + functions with DEF_INTERNAL_SIGNED_OPTAB_FN: + - one that describes a widening operation with the same number of elements + in the output and input vectors, + - two that describe a pair of high-low widening operations where the output + vectors each have half the number of elements of the input vectors, + corresponding to the result of the widening operation on the top half and + bottom half, these have the suffixes _HI and _LO, + - and two that describe a pair of even-odd widening operations where the + output vectors each have half the number of elements of the input vectors, + corresponding to the result of the widening operation on the even and odd + elements, these have the suffixes _EVEN and _ODD. + These five internal functions will require two optabs each, a SIGNED_OPTAB + and an UNSIGNED_OTPAB. + + DEF_INTERNAL_NARROWING_OPTAB_FN is a wrapper that defines five internal + functions with DEF_INTERNAL_OPTAB_FN: + - one that describes a narrowing operation with the same number of elements + in the output and input vectors, + - two that describe a pair of high-low narrowing operations where the output + vector has the same number of elements in the top or bottom halves as the + full input vectors, these have the suffixes _HI and _LO. + - and two that describe a pair of even-odd narrowing operations where the + output vector has the same number of elements, in the even or odd positions, + as the full input vectors, these have the suffixes _EVEN and _ODD. + These five internal functions will require an optab each. + + Each entry must have a corresponding expander of the form: void expand_NAME (gimple_call stmt) @@ -123,6 +151,24 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_WIDENING_OPTAB_FN +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _LO, FLAGS, SELECTOR, SOPTAB##_lo, UOPTAB##_lo, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _HI, FLAGS, SELECTOR, SOPTAB##_hi, UOPTAB##_hi, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _EVEN, FLAGS, SELECTOR, SOPTAB##_even, UOPTAB##_even, TYPE) \ + DEF_INTERNAL_SIGNED_OPTAB_FN (NAME ## _ODD, FLAGS, SELECTOR, SOPTAB##_odd, UOPTAB##_odd, TYPE) +#endif + +#ifndef DEF_INTERNAL_NARROWING_OPTAB_FN +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, OPTAB##_lo, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, OPTAB##_hi, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _EVEN, FLAGS, OPTAB##_even, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _ODD, FLAGS, OPTAB##_odd, TYPE) +#endif + DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, @@ -315,6 +361,16 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_sadd, vec_widen_uadd, + binary) +DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_MINUS, + ECF_CONST | ECF_NOTHROW, + first, + vec_widen_ssub, vec_widen_usub, + binary) DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 08922ed4254898f5fffca3f33973e96ed9ce772f..3904ba3ca36949d844532a6a9303f550533311a4 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_INTERNAL_FN_H #define GCC_INTERNAL_FN_H +#include "insn-codes.h" +#include "insn-opinit.h" + + /* INTEGER_CST values for IFN_UNIQUE function arg-0. UNSPEC: Undifferentiated UNIQUE. @@ -112,6 +116,10 @@ internal_fn_name (enum internal_fn fn) } extern internal_fn lookup_internal_fn (const char *); +extern void lookup_hilo_internal_fn (internal_fn, internal_fn *, internal_fn *); +extern void lookup_evenodd_internal_fn (internal_fn, internal_fn *, + internal_fn *); +extern optab direct_internal_fn_optab (internal_fn, tree_pair); /* Return the ECF_* flags for function FN. */ @@ -210,6 +218,8 @@ extern bool commutative_binary_fn_p (internal_fn); extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); +extern bool widening_fn_p (code_helper); +extern bool narrowing_fn_p (code_helper); extern bool set_edom_supported_p (void); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index a12333c7169fc6219b0e34b6169780f78e033ee3..aab6ab6faf244a8236dac81be2d68fc28819bc9a 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1314,7 +1314,17 @@ commutative_optab_p (optab binoptab) || binoptab == smul_widen_optab || binoptab == umul_widen_optab || binoptab == smul_highpart_optab - || binoptab == umul_highpart_optab); + || binoptab == umul_highpart_optab + || binoptab == vec_widen_sadd_optab + || binoptab == vec_widen_uadd_optab + || binoptab == vec_widen_sadd_hi_optab + || binoptab == vec_widen_sadd_lo_optab + || binoptab == vec_widen_uadd_hi_optab + || binoptab == vec_widen_uadd_lo_optab + || binoptab == vec_widen_sadd_even_optab + || binoptab == vec_widen_sadd_odd_optab + || binoptab == vec_widen_uadd_even_optab + || binoptab == vec_widen_uadd_odd_optab); } /* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b300c9ca5737de9be809fa01aabe5e01..d41ed6e1afaddd019c7470f965c0ad21c8b2b9d7 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -410,6 +410,16 @@ OPTAB_D (vec_widen_ssubl_hi_optab, "vec_widen_ssubl_hi_$a") OPTAB_D (vec_widen_ssubl_lo_optab, "vec_widen_ssubl_lo_$a") OPTAB_D (vec_widen_saddl_hi_optab, "vec_widen_saddl_hi_$a") OPTAB_D (vec_widen_saddl_lo_optab, "vec_widen_saddl_lo_$a") +OPTAB_D (vec_widen_ssub_optab, "vec_widen_ssub_$a") +OPTAB_D (vec_widen_ssub_hi_optab, "vec_widen_ssub_hi_$a") +OPTAB_D (vec_widen_ssub_lo_optab, "vec_widen_ssub_lo_$a") +OPTAB_D (vec_widen_ssub_odd_optab, "vec_widen_ssub_odd_$a") +OPTAB_D (vec_widen_ssub_even_optab, "vec_widen_ssub_even_$a") +OPTAB_D (vec_widen_sadd_optab, "vec_widen_sadd_$a") +OPTAB_D (vec_widen_sadd_hi_optab, "vec_widen_sadd_hi_$a") +OPTAB_D (vec_widen_sadd_lo_optab, "vec_widen_sadd_lo_$a") +OPTAB_D (vec_widen_sadd_odd_optab, "vec_widen_sadd_odd_$a") +OPTAB_D (vec_widen_sadd_even_optab, "vec_widen_sadd_even_$a") OPTAB_D (vec_widen_sshiftl_hi_optab, "vec_widen_sshiftl_hi_$a") OPTAB_D (vec_widen_sshiftl_lo_optab, "vec_widen_sshiftl_lo_$a") OPTAB_D (vec_widen_umult_even_optab, "vec_widen_umult_even_$a") @@ -422,6 +432,16 @@ OPTAB_D (vec_widen_usubl_hi_optab, "vec_widen_usubl_hi_$a") OPTAB_D (vec_widen_usubl_lo_optab, "vec_widen_usubl_lo_$a") OPTAB_D (vec_widen_uaddl_hi_optab, "vec_widen_uaddl_hi_$a") OPTAB_D (vec_widen_uaddl_lo_optab, "vec_widen_uaddl_lo_$a") +OPTAB_D (vec_widen_usub_optab, "vec_widen_usub_$a") +OPTAB_D (vec_widen_usub_hi_optab, "vec_widen_usub_hi_$a") +OPTAB_D (vec_widen_usub_lo_optab, "vec_widen_usub_lo_$a") +OPTAB_D (vec_widen_usub_odd_optab, "vec_widen_usub_odd_$a") +OPTAB_D (vec_widen_usub_even_optab, "vec_widen_usub_even_$a") +OPTAB_D (vec_widen_uadd_optab, "vec_widen_uadd_$a") +OPTAB_D (vec_widen_uadd_hi_optab, "vec_widen_uadd_hi_$a") +OPTAB_D (vec_widen_uadd_lo_optab, "vec_widen_uadd_lo_$a") +OPTAB_D (vec_widen_uadd_odd_optab, "vec_widen_uadd_odd_$a") +OPTAB_D (vec_widen_uadd_even_optab, "vec_widen_uadd_even_$a") OPTAB_D (vec_addsub_optab, "vec_addsub$a3") OPTAB_D (vec_fmaddsub_optab, "vec_fmaddsub$a4") OPTAB_D (vec_fmsubadd_optab, "vec_fmsubadd$a4") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c index 220bd9352a4c7acd2e3713e441d74898d3e92b30..b5a73867e44ec3fa04d1201decf81353a67b4c82 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-details" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c index a2bed63affbd091977df95a126da1f5b8c1d41d2..1686c3f2f344c367ebb9cf34e558d0878849f9bc 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-details" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 1778af0242898e3dc73d94d22a5b8505628a53b5..dcd4b5561600346a2c10bd5133507329206e8837 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -562,21 +562,30 @@ vect_joust_widened_type (tree type, tree new_type, tree *common_type) static unsigned int vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, - tree_code widened_code, bool shift_p, + code_helper widened_code, bool shift_p, unsigned int max_nops, vect_unpromoted_value *unprom, tree *common_type, enum optab_subtype *subtype = NULL) { /* Check for an integer operation with the right code. */ - gassign *assign = dyn_cast <gassign *> (stmt_info->stmt); - if (!assign) + gimple* stmt = stmt_info->stmt; + if (!(is_gimple_assign (stmt) || is_gimple_call (stmt))) + return 0; + + code_helper rhs_code; + if (is_gimple_assign (stmt)) + rhs_code = gimple_assign_rhs_code (stmt); + else if (is_gimple_call (stmt)) + rhs_code = gimple_call_combined_fn (stmt); + else return 0; - tree_code rhs_code = gimple_assign_rhs_code (assign); - if (rhs_code != code && rhs_code != widened_code) + if (rhs_code != code + && rhs_code != widened_code) return 0; - tree type = TREE_TYPE (gimple_assign_lhs (assign)); + tree lhs = gimple_get_lhs (stmt); + tree type = TREE_TYPE (lhs); if (!INTEGRAL_TYPE_P (type)) return 0; @@ -589,7 +598,7 @@ vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info, tree_code code, { vect_unpromoted_value *this_unprom = &unprom[next_op]; unsigned int nops = 1; - tree op = gimple_op (assign, i + 1); + tree op = gimple_arg (stmt, i); if (i == 1 && TREE_CODE (op) == INTEGER_CST) { /* We already have a common type from earlier operands. @@ -1343,7 +1352,8 @@ vect_recog_sad_pattern (vec_info *vinfo, /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi inside the loop (in case we are analyzing an outer-loop). */ vect_unpromoted_value unprom[2]; - if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, WIDEN_MINUS_EXPR, + if (!vect_widened_op_tree (vinfo, diff_stmt_vinfo, MINUS_EXPR, + IFN_VEC_WIDEN_MINUS, false, 2, unprom, &half_type)) return NULL; @@ -1395,14 +1405,16 @@ static gimple * vect_recog_widen_op_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out, tree_code orig_code, code_helper wide_code, - bool shift_p, const char *name) + bool shift_p, const char *name, + optab_subtype *subtype = NULL) { gimple *last_stmt = last_stmt_info->stmt; vect_unpromoted_value unprom[2]; tree half_type; if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code, - shift_p, 2, unprom, &half_type)) + shift_p, 2, unprom, &half_type, subtype)) + return NULL; /* Pattern detected. */ @@ -1468,6 +1480,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo, type, pattern_stmt, vecctype); } +static gimple * +vect_recog_widen_op_pattern (vec_info *vinfo, + stmt_vec_info last_stmt_info, tree *type_out, + tree_code orig_code, internal_fn wide_ifn, + bool shift_p, const char *name, + optab_subtype *subtype = NULL) +{ + combined_fn ifn = as_combined_fn (wide_ifn); + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + orig_code, ifn, shift_p, name, + subtype); +} + + /* Try to detect multiplication on widened inputs, converting MULT_EXPR to WIDEN_MULT_EXPR. See vect_recog_widen_op_pattern for details. */ @@ -1481,26 +1507,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, } /* Try to detect addition on widened inputs, converting PLUS_EXPR - to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_PLUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - PLUS_EXPR, WIDEN_PLUS_EXPR, false, - "vect_recog_widen_plus_pattern"); + PLUS_EXPR, IFN_VEC_WIDEN_PLUS, + false, "vect_recog_widen_plus_pattern", + &subtype); } /* Try to detect subtraction on widened inputs, converting MINUS_EXPR - to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_MINUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - MINUS_EXPR, WIDEN_MINUS_EXPR, false, - "vect_recog_widen_minus_pattern"); + MINUS_EXPR, IFN_VEC_WIDEN_MINUS, + false, "vect_recog_widen_minus_pattern", + &subtype); } /* Function vect_recog_ctz_ffs_pattern @@ -3078,7 +3108,7 @@ vect_recog_average_pattern (vec_info *vinfo, vect_unpromoted_value unprom[3]; tree new_type; unsigned int nops = vect_widened_op_tree (vinfo, plus_stmt_info, PLUS_EXPR, - WIDEN_PLUS_EXPR, false, 3, + IFN_VEC_WIDEN_PLUS, false, 3, unprom, &new_type); if (nops == 0) return NULL; @@ -6469,6 +6499,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mask_conversion_pattern, "mask_conversion" }, { vect_recog_widen_plus_pattern, "widen_plus" }, { vect_recog_widen_minus_pattern, "widen_minus" }, + /* These must come after the double widening ones. */ }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index d73e7f0936435951fe05fa6b787ba053233635aa..4f1569023a4e42ad6d058bccf62687dc3fe1302e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5038,7 +5038,8 @@ vectorizable_conversion (vec_info *vinfo, bool widen_arith = (code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR || code == WIDEN_MULT_EXPR - || code == WIDEN_LSHIFT_EXPR); + || code == WIDEN_LSHIFT_EXPR + || widening_fn_p (code)); if (!widen_arith && !CONVERT_EXPR_CODE_P (code) @@ -5088,8 +5089,8 @@ vectorizable_conversion (vec_info *vinfo, gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR || code == WIDEN_PLUS_EXPR - || code == WIDEN_MINUS_EXPR); - + || code == WIDEN_MINUS_EXPR + || widening_fn_p (code)); op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : gimple_call_arg (stmt, 0); @@ -12500,26 +12501,69 @@ supportable_widening_operation (vec_info *vinfo, optab1 = vec_unpacks_sbool_lo_optab; optab2 = vec_unpacks_sbool_hi_optab; } - else - { - optab1 = optab_for_tree_code (c1, vectype, optab_default); - optab2 = optab_for_tree_code (c2, vectype, optab_default); + + vec_mode = TYPE_MODE (vectype); + if (widening_fn_p (code)) + { + /* If this is an internal fn then we must check whether the target + supports either a low-high split or an even-odd split. */ + internal_fn ifn = as_internal_fn ((combined_fn) code); + + internal_fn lo, hi, even, odd; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = direct_internal_fn_optab (lo, {vectype, vectype}); + optab2 = direct_internal_fn_optab (hi, {vectype, vectype}); + + /* If we don't support low-high, then check for even-odd. */ + if (!optab1 + || (icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing + || !optab2 + || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) + { + lookup_evenodd_internal_fn (ifn, &even, &odd); + *code1 = as_combined_fn (even); + *code2 = as_combined_fn (odd); + optab1 = direct_internal_fn_optab (even, {vectype, vectype}); + optab2 = direct_internal_fn_optab (odd, {vectype, vectype}); + } + } + else if (code.is_tree_code ()) + { + if (code == FIX_TRUNC_EXPR) + { + /* The signedness is determined from output operand. */ + optab1 = optab_for_tree_code (c1, vectype_out, optab_default); + optab2 = optab_for_tree_code (c2, vectype_out, optab_default); + } + else if (CONVERT_EXPR_CODE_P ((tree_code) code.safe_as_tree_code ()) + && VECTOR_BOOLEAN_TYPE_P (wide_vectype) + && VECTOR_BOOLEAN_TYPE_P (vectype) + && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + { + /* If the input and result modes are the same, a different optab + is needed where we pass in the number of units in vectype. */ + optab1 = vec_unpacks_sbool_lo_optab; + optab2 = vec_unpacks_sbool_hi_optab; + } + else + { + optab1 = optab_for_tree_code (c1, vectype, optab_default); + optab2 = optab_for_tree_code (c2, vectype, optab_default); + } + *code1 = c1; + *code2 = c2; } if (!optab1 || !optab2) return false; - vec_mode = TYPE_MODE (vectype); if ((icode1 = optab_handler (optab1, vec_mode)) == CODE_FOR_nothing || (icode2 = optab_handler (optab2, vec_mode)) == CODE_FOR_nothing) return false; - if (code.is_tree_code ()) - { - *code1 = c1; - *code2 = c2; - } - if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype) && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype)) diff --git a/gcc/tree.def b/gcc/tree.def index 90ceeec0b512bfa5f983359c0af03cc71de32007..b37b0b35927b92a6536e5c2d9805ffce8319a240 100644 --- a/gcc/tree.def +++ b/gcc/tree.def @@ -1374,15 +1374,16 @@ DEFTREECODE (DOT_PROD_EXPR, "dot_prod_expr", tcc_expression, 3) DEFTREECODE (WIDEN_SUM_EXPR, "widen_sum_expr", tcc_binary, 2) /* Widening sad (sum of absolute differences). - The first two arguments are of type t1 which should be integer. - The third argument and the result are of type t2, such that t2 is at least - twice the size of t1. Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is + The first two arguments are of type t1 which should be a vector of integers. + The third argument and the result are of type t2, such that the size of + the elements of t2 is at least twice the size of the elements of t1. + Like DOT_PROD_EXPR, SAD_EXPR (arg1,arg2,arg3) is equivalent to: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = PLUS_EXPR (tmp2, arg3) or: - tmp = WIDEN_MINUS_EXPR (arg1, arg2) + tmp = IFN_VEC_WIDEN_MINUS_EXPR (arg1, arg2) tmp2 = ABS_EXPR (tmp) arg3 = WIDEN_SUM_EXPR (tmp2, arg3) */
Just some very minor things. "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes: > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 5c9da73ea11f8060b18dcf513599c9694fa4f2ad..348bee35a35ae4ed9a8652f5349f430c2733e1cb 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -90,6 +90,71 @@ lookup_internal_fn (const char *name) > return entry ? *entry : IFN_LAST; > } > > +/* Given an internal_fn IFN that is either a widening or narrowing function, return its > + corresponding LO and HI internal_fns. */ Long line and too much space after "/*": /* Given an internal_fn IFN that is either a widening or narrowing function, return its corresponding _LO and _HI internal_fns in *LO and *HI. */ > +extern void > +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) > +{ > + gcc_assert (widening_fn_p (ifn) || narrowing_fn_p (ifn)); > + > + switch (ifn) > + { > + default: > + gcc_unreachable (); > +#undef DEF_INTERNAL_FN > +#undef DEF_INTERNAL_WIDENING_OPTAB_FN > +#undef DEF_INTERNAL_NARROWING_OPTAB_FN > +#define DEF_INTERNAL_FN(NAME, FLAGS, TYPE) > +#define DEF_INTERNAL_WIDENING_OPTAB_FN(NAME, F, S, SO, UO, T) \ > + case IFN_##NAME: \ > + *lo = internal_fn (IFN_##NAME##_LO); \ > + *hi = internal_fn (IFN_##NAME##_HI); \ > + break; > +#define DEF_INTERNAL_NARROWING_OPTAB_FN(NAME, F, O, T) \ > + case IFN_##NAME: \ > + *lo = internal_fn (IFN_##NAME##_LO); \ > + *hi = internal_fn (IFN_##NAME##_HI); \ > + break; > +#include "internal-fn.def" > +#undef DEF_INTERNAL_FN > +#undef DEF_INTERNAL_WIDENING_OPTAB_FN > +#undef DEF_INTERNAL_NARROWING_OPTAB_FN > + } > +} > + > +extern void > +lookup_evenodd_internal_fn (internal_fn ifn, internal_fn *even, > + internal_fn *odd) This needs a similar comment: /* Given an internal_fn IFN that is either a widening or narrowing function, return its corresponding _EVEN and _ODD internal_fns in *EVEN and *ODD. */ > @@ -3971,6 +4036,9 @@ commutative_binary_fn_p (internal_fn fn) > case IFN_UBSAN_CHECK_MUL: > case IFN_ADD_OVERFLOW: > case IFN_MUL_OVERFLOW: > + case IFN_VEC_WIDEN_PLUS: > + case IFN_VEC_WIDEN_PLUS_LO: > + case IFN_VEC_WIDEN_PLUS_HI: Should include even & odd as well. I'd suggest leaving out the narrowing stuff for now. There are some questions that would be easier to answer once we add the first use, such as whether one of the hi/lo pair and one or the even/odd pair merge with a vector containing the other half, whether all four define the other half to be zero, etc. OK for the optab/internal-fn parts with those changes from my POV. Thanks again for doing this! Richard
On Thu, Jun 01, 2023 at 05:27:56PM +0100, Andre Vieira (lists) via Gcc-patches wrote: > --- a/gcc/internal-fn.h > +++ b/gcc/internal-fn.h > @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see > #ifndef GCC_INTERNAL_FN_H > #define GCC_INTERNAL_FN_H > > +#include "insn-codes.h" > +#include "insn-opinit.h" My i686-linux build configured with ../configure --enable-languages=default,obj-c++,lto,go,d,rust,m2 --enable-checking=yes,rtl,extra --enable-libstdcxx-backtrace=yes just died with In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74, from ../../gcc/m2/gm2-gcc/m2except.cc:22: ../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory 24 | #include "insn-opinit.h" | ^~~~~~~~~~~~~~~ compilation terminated. In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74, from ../../gcc/m2/m2pp.cc:23: ../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory 24 | #include "insn-opinit.h" | ^~~~~~~~~~~~~~~ In file included from ../../gcc/m2/gm2-gcc/gcc-consolidation.h:74, from ../../gcc/m2/gm2-gcc/rtegraph.cc:22: ../../gcc/internal-fn.h:24:10: fatal error: insn-opinit.h: No such file or directory 24 | #include "insn-opinit.h" | ^~~~~~~~~~~~~~~ compilation terminated. compilation terminated. supposedly because of this change. Do you really need those includes there? If yes, what is supposed to ensure that the generated includes are generated before compiling files which include those? From what I can see, gcc/Makefile.in has generated_files var which includes among other things insn-opinit.h, and # Dependency information. # In order for parallel make to really start compiling the expensive # objects from $(OBJS) as early as possible, build all their # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) rule, plus I see $(generated_files) mentioned in a couple of dependencies in gcc/m2/Make-lang.in . But supposedly because of this change it now needs to be added to tons of other spots. Jakub
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 6e81dc05e0e0714256759b0594816df451415a2d..e4d815cd577d266d2bccf6fb68d62aac91a8b4cf 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>. */ +#define INCLUDE_MAP #include "config.h" #include "system.h" #include "coretypes.h" @@ -70,6 +71,26 @@ const int internal_fn_flags_array[] = { 0 }; +const enum internal_fn internal_fn_hilo_keys_array[] = { +#undef DEF_INTERNAL_OPTAB_HILO_FN +#define DEF_INTERNAL_OPTAB_HILO_FN(NAME, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \ + IFN_##NAME##_LO, \ + IFN_##NAME##_HI, +#include "internal-fn.def" + IFN_LAST +#undef DEF_INTERNAL_OPTAB_HILO_FN +}; + +const optab internal_fn_hilo_values_array[] = { +#undef DEF_INTERNAL_OPTAB_HILO_FN +#define DEF_INTERNAL_OPTAB_HILO_FN(NAME, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \ + SOPTAB##_lo_optab, UOPTAB##_lo_optab, \ + SOPTAB##_hi_optab, UOPTAB##_hi_optab, +#include "internal-fn.def" + unknown_optab, unknown_optab +#undef DEF_INTERNAL_OPTAB_HILO_FN +}; + /* Return the internal function called NAME, or IFN_LAST if there's no such function. */ @@ -90,6 +111,61 @@ lookup_internal_fn (const char *name) return entry ? *entry : IFN_LAST; } +static int +ifn_cmp (const void *a_, const void *b_) +{ + typedef std::pair<enum internal_fn, unsigned> ifn_pair; + auto *a = (const std::pair<ifn_pair, optab> *)a_; + auto *b = (const std::pair<ifn_pair, optab> *)b_; + return (int) (a->first.first) - (b->first.first); +} + +/* Return the optab belonging to the given internal function NAME for the given + SIGN or unknown_optab. */ + +optab +lookup_hilo_ifn_optab (enum internal_fn fn, unsigned sign) +{ + typedef std::pair<enum internal_fn, unsigned> ifn_pair; + typedef auto_vec <std::pair<ifn_pair, optab>>fn_to_optab_map_type; + static fn_to_optab_map_type *fn_to_optab_map; + + if (!fn_to_optab_map) + { + unsigned num + = sizeof (internal_fn_hilo_keys_array) / sizeof (enum internal_fn); + fn_to_optab_map = new fn_to_optab_map_type (); + for (unsigned int i = 0; i < num - 1; ++i) + { + enum internal_fn fn = internal_fn_hilo_keys_array[i]; + optab v1 = internal_fn_hilo_values_array[2*i]; + optab v2 = internal_fn_hilo_values_array[2*i + 1]; + ifn_pair key1 (fn, 0); + fn_to_optab_map->safe_push ({key1, v1}); + ifn_pair key2 (fn, 1); + fn_to_optab_map->safe_push ({key2, v2}); + } + fn_to_optab_map->qsort (ifn_cmp); + } + + ifn_pair new_pair (fn, sign ? 1 : 0); + optab tmp; + std::pair<ifn_pair,optab> pair_wrap (new_pair, tmp); + auto entry = fn_to_optab_map->bsearch (&pair_wrap, ifn_cmp); + return entry != fn_to_optab_map->end () ? entry->second : unknown_optab; +} + +extern void +lookup_hilo_internal_fn (enum internal_fn ifn, enum internal_fn *lo, + enum internal_fn *hi) +{ + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + *lo = internal_fn (ifn + 1); + *hi = internal_fn (ifn + 2); +} + + /* Fnspec of each internal function, indexed by function number. */ const_tree internal_fn_fnspec_array[IFN_LAST + 1]; @@ -3970,6 +4046,9 @@ commutative_binary_fn_p (internal_fn fn) case IFN_UBSAN_CHECK_MUL: case IFN_ADD_OVERFLOW: case IFN_MUL_OVERFLOW: + case IFN_VEC_WIDEN_PLUS: + case IFN_VEC_WIDEN_PLUS_LO: + case IFN_VEC_WIDEN_PLUS_HI: return true; default: @@ -4043,6 +4122,42 @@ first_commutative_argument (internal_fn fn) } } +/* Return true if FN has a wider output type than its argument types. */ + +bool +widening_fn_p (internal_fn fn) +{ + switch (fn) + { + case IFN_VEC_WIDEN_PLUS: + case IFN_VEC_WIDEN_MINUS: + return true; + + default: + return false; + } +} + +/* Return true if FN decomposes to _hi and _lo IFN. If true this should also + be a widening function. */ + +bool +decomposes_to_hilo_fn_p (internal_fn fn) +{ + if (!widening_fn_p (fn)) + return false; + + switch (fn) + { + case IFN_VEC_WIDEN_PLUS: + case IFN_VEC_WIDEN_MINUS: + return true; + + default: + return false; + } +} + /* Return true if IFN_SET_EDOM is supported. */ bool @@ -4055,6 +4170,32 @@ set_edom_supported_p (void) #endif } +#undef DEF_INTERNAL_OPTAB_HILO_FN +#define DEF_INTERNAL_OPTAB_HILO_FN(CODE, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \ + static void \ + expand_##CODE (internal_fn, gcall *) \ + { \ + gcc_unreachable (); \ + } \ + static void \ + expand_##CODE##_LO (internal_fn fn, gcall *stmt) \ + { \ + tree ty = TREE_TYPE (gimple_get_lhs (stmt)); \ + if (!TYPE_UNSIGNED (ty)) \ + expand_##TYPE##_optab_fn (fn, stmt, SOPTAB##_lo##_optab); \ + else \ + expand_##TYPE##_optab_fn (fn, stmt, UOPTAB##_lo##_optab); \ + } \ + static void \ + expand_##CODE##_HI (internal_fn fn, gcall *stmt) \ + { \ + tree ty = TREE_TYPE (gimple_get_lhs (stmt)); \ + if (!TYPE_UNSIGNED (ty)) \ + expand_##TYPE##_optab_fn (fn, stmt, SOPTAB##_hi##_optab); \ + else \ + expand_##TYPE##_optab_fn (fn, stmt, UOPTAB##_hi##_optab); \ + } + #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) \ static void \ expand_##CODE (internal_fn fn, gcall *stmt) \ @@ -4071,6 +4212,7 @@ set_edom_supported_p (void) expand_##TYPE##_optab_fn (fn, stmt, which_optab); \ } #include "internal-fn.def" +#undef DEF_INTERNAL_OPTAB_HILO_FN /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..347ed667d92620e0ee3ea15c58ecac6c242ebe73 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -85,6 +85,13 @@ along with GCC; see the file COPYING3. If not see says that the function extends the C-level BUILT_IN_<NAME>{,L,LL,IMAX} group of functions to any integral mode (including vector modes). + DEF_INTERNAL_OPTAB_HILO_FN is like DEF_INTERNAL_OPTAB_FN except it + provides convenience wrappers for defining conversions that require a + hi/lo split, like widening and narrowing operations. Each definition + for <NAME> will require an optab named <OPTAB> and two other optabs that + you specify for signed and unsigned. + + Each entry must have a corresponding expander of the form: void expand_NAME (gimple_call stmt) @@ -123,6 +130,14 @@ along with GCC; see the file COPYING3. If not see DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) #endif +#ifndef DEF_INTERNAL_OPTAB_HILO_FN +#define DEF_INTERNAL_OPTAB_HILO_FN(NAME, FLAGS, OPTAB, SOPTAB, UOPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _LO, FLAGS, unknown, TYPE) \ + DEF_INTERNAL_OPTAB_FN (NAME ## _HI, FLAGS, unknown, TYPE) +#endif + + DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, @@ -315,6 +330,14 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary) DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary) +DEF_INTERNAL_OPTAB_HILO_FN (VEC_WIDEN_PLUS, + ECF_CONST | ECF_NOTHROW, + vec_widen_add, vec_widen_saddl, vec_widen_uaddl, + binary) +DEF_INTERNAL_OPTAB_HILO_FN (VEC_WIDEN_MINUS, + ECF_CONST | ECF_NOTHROW, + vec_widen_sub, vec_widen_ssubl, vec_widen_usubl, + binary) DEF_INTERNAL_OPTAB_FN (VEC_FMADDSUB, ECF_CONST, vec_fmaddsub, ternary) DEF_INTERNAL_OPTAB_FN (VEC_FMSUBADD, ECF_CONST, vec_fmsubadd, ternary) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 08922ed4254898f5fffca3f33973e96ed9ce772f..6a5f8762e872ad2ef64ce2986a678e3b40622d81 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -20,6 +20,10 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_INTERNAL_FN_H #define GCC_INTERNAL_FN_H +#include "insn-codes.h" +#include "insn-opinit.h" + + /* INTEGER_CST values for IFN_UNIQUE function arg-0. UNSPEC: Undifferentiated UNIQUE. @@ -112,6 +116,9 @@ internal_fn_name (enum internal_fn fn) } extern internal_fn lookup_internal_fn (const char *); +extern optab lookup_hilo_ifn_optab (enum internal_fn, unsigned); +extern void lookup_hilo_internal_fn (enum internal_fn, enum internal_fn *, + enum internal_fn *); /* Return the ECF_* flags for function FN. */ @@ -210,6 +217,8 @@ extern bool commutative_binary_fn_p (internal_fn); extern bool commutative_ternary_fn_p (internal_fn); extern int first_commutative_argument (internal_fn); extern bool associative_binary_fn_p (internal_fn); +extern bool widening_fn_p (internal_fn); +extern bool decomposes_to_hilo_fn_p (internal_fn); extern bool set_edom_supported_p (void); diff --git a/gcc/optabs.cc b/gcc/optabs.cc index c8e39c82d57a7d726e7da33d247b80f32ec9236c..d4dd7ee3d34d01c32ab432ae4e4ce9e4b522b2f7 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1314,7 +1314,12 @@ commutative_optab_p (optab binoptab) || binoptab == smul_widen_optab || binoptab == umul_widen_optab || binoptab == smul_highpart_optab - || binoptab == umul_highpart_optab); + || binoptab == umul_highpart_optab + || binoptab == vec_widen_add_optab + || binoptab == vec_widen_saddl_hi_optab + || binoptab == vec_widen_saddl_lo_optab + || binoptab == vec_widen_uaddl_hi_optab + || binoptab == vec_widen_uaddl_lo_optab); } /* X is to be used in mode MODE as operand OPN to BINOPTAB. If we're diff --git a/gcc/optabs.def b/gcc/optabs.def index 695f5911b300c9ca5737de9be809fa01aabe5e01..e064189103b3be70644468d11f3c91ac45ffe0d0 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -78,6 +78,8 @@ OPTAB_CD(smsub_widen_optab, "msub$b$a4") OPTAB_CD(umsub_widen_optab, "umsub$b$a4") OPTAB_CD(ssmsub_widen_optab, "ssmsub$b$a4") OPTAB_CD(usmsub_widen_optab, "usmsub$a$b4") +OPTAB_CD(vec_widen_add_optab, "add$a$b3") +OPTAB_CD(vec_widen_sub_optab, "sub$a$b3") OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") OPTAB_CD(vec_mask_load_lanes_optab, "vec_mask_load_lanes$a$b") diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c index 220bd9352a4c7acd2e3713e441d74898d3e92b30..7037673d32bd780e1c9b58a51e58e2bac3b30b7e 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-add.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_PLUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tuaddl\t} 1} } */ /* { dg-final { scan-assembler-times {\tuaddl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tsaddl\t} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c index a2bed63affbd091977df95a126da1f5b8c1d41d2..83bc1edb6105f47114b665e24a13e6194b2179a2 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c +++ b/gcc/testsuite/gcc.target/aarch64/vect-widen-sub.c @@ -1,5 +1,5 @@ /* { dg-do run } */ -/* { dg-options "-O3 -save-temps" } */ +/* { dg-options "-O3 -save-temps -fdump-tree-vect-all" } */ #include <stdint.h> #include <string.h> @@ -86,6 +86,8 @@ main() return 0; } +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_LO" "vect" } } */ +/* { dg-final { scan-tree-dump "add new stmt.*VEC_WIDEN_MINUS_HI" "vect" } } */ /* { dg-final { scan-assembler-times {\tusubl\t} 1} } */ /* { dg-final { scan-assembler-times {\tusubl2\t} 1} } */ /* { dg-final { scan-assembler-times {\tssubl\t} 1} } */ diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index b35023adade94c1996cd076c4b7419560e819c6b..3175dd92187c0935f78ebbf2eb476bdcf8b4ccd1 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -1394,14 +1394,16 @@ static gimple * vect_recog_widen_op_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out, tree_code orig_code, code_helper wide_code, - bool shift_p, const char *name) + bool shift_p, const char *name, + enum optab_subtype *subtype = NULL) { gimple *last_stmt = last_stmt_info->stmt; vect_unpromoted_value unprom[2]; tree half_type; if (!vect_widened_op_tree (vinfo, last_stmt_info, orig_code, orig_code, - shift_p, 2, unprom, &half_type)) + shift_p, 2, unprom, &half_type, subtype)) + return NULL; /* Pattern detected. */ @@ -1467,6 +1469,20 @@ vect_recog_widen_op_pattern (vec_info *vinfo, type, pattern_stmt, vecctype); } +static gimple * +vect_recog_widen_op_pattern (vec_info *vinfo, + stmt_vec_info last_stmt_info, tree *type_out, + tree_code orig_code, internal_fn wide_ifn, + bool shift_p, const char *name, + enum optab_subtype *subtype = NULL) +{ + combined_fn ifn = as_combined_fn (wide_ifn); + return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, + orig_code, ifn, shift_p, name, + subtype); +} + + /* Try to detect multiplication on widened inputs, converting MULT_EXPR to WIDEN_MULT_EXPR. See vect_recog_widen_op_pattern for details. */ @@ -1480,26 +1496,30 @@ vect_recog_widen_mult_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, } /* Try to detect addition on widened inputs, converting PLUS_EXPR - to WIDEN_PLUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_PLUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_plus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + enum optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - PLUS_EXPR, WIDEN_PLUS_EXPR, false, - "vect_recog_widen_plus_pattern"); + PLUS_EXPR, IFN_VEC_WIDEN_PLUS, + false, "vect_recog_widen_plus_pattern", + &subtype); } /* Try to detect subtraction on widened inputs, converting MINUS_EXPR - to WIDEN_MINUS_EXPR. See vect_recog_widen_op_pattern for details. */ + to IFN_VEC_WIDEN_MINUS. See vect_recog_widen_op_pattern for details. */ static gimple * vect_recog_widen_minus_pattern (vec_info *vinfo, stmt_vec_info last_stmt_info, tree *type_out) { + enum optab_subtype subtype; return vect_recog_widen_op_pattern (vinfo, last_stmt_info, type_out, - MINUS_EXPR, WIDEN_MINUS_EXPR, false, - "vect_recog_widen_minus_pattern"); + MINUS_EXPR, IFN_VEC_WIDEN_MINUS, + false, "vect_recog_widen_minus_pattern", + &subtype); } /* Function vect_recog_popcount_pattern @@ -6067,6 +6087,7 @@ static vect_recog_func vect_vect_recog_func_ptrs[] = { { vect_recog_mask_conversion_pattern, "mask_conversion" }, { vect_recog_widen_plus_pattern, "widen_plus" }, { vect_recog_widen_minus_pattern, "widen_minus" }, + /* These must come after the double widening ones. */ }; const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index ce47f4940fa9a1baca4ba1162065cfc3b4072eba..2a7ef2439e12d1966e8884433963a3d387a856b7 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -5035,7 +5035,9 @@ vectorizable_conversion (vec_info *vinfo, bool widen_arith = (code == WIDEN_PLUS_EXPR || code == WIDEN_MINUS_EXPR || code == WIDEN_MULT_EXPR - || code == WIDEN_LSHIFT_EXPR); + || code == WIDEN_LSHIFT_EXPR + || code == IFN_VEC_WIDEN_PLUS + || code == IFN_VEC_WIDEN_MINUS); if (!widen_arith && !CONVERT_EXPR_CODE_P (code) @@ -5085,7 +5087,9 @@ vectorizable_conversion (vec_info *vinfo, gcc_assert (code == WIDEN_MULT_EXPR || code == WIDEN_LSHIFT_EXPR || code == WIDEN_PLUS_EXPR - || code == WIDEN_MINUS_EXPR); + || code == WIDEN_MINUS_EXPR + || code == IFN_VEC_WIDEN_PLUS + || code == IFN_VEC_WIDEN_MINUS); op1 = is_gimple_assign (stmt) ? gimple_assign_rhs2 (stmt) : @@ -12335,12 +12339,46 @@ supportable_widening_operation (vec_info *vinfo, optab1 = vec_unpacks_sbool_lo_optab; optab2 = vec_unpacks_sbool_hi_optab; } - else - { - optab1 = optab_for_tree_code (c1, vectype, optab_default); - optab2 = optab_for_tree_code (c2, vectype, optab_default); + + if (code.is_fn_code ()) + { + internal_fn ifn = as_internal_fn ((combined_fn) code); + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + internal_fn lo, hi; + lookup_hilo_internal_fn (ifn, &lo, &hi); + *code1 = as_combined_fn (lo); + *code2 = as_combined_fn (hi); + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); } + if (code.is_tree_code ()) + { + if (code == FIX_TRUNC_EXPR) + { + /* The signedness is determined from output operand. */ + optab1 = optab_for_tree_code (c1, vectype_out, optab_default); + optab2 = optab_for_tree_code (c2, vectype_out, optab_default); + } + else if (CONVERT_EXPR_CODE_P (code.safe_as_tree_code ()) + && VECTOR_BOOLEAN_TYPE_P (wide_vectype) + && VECTOR_BOOLEAN_TYPE_P (vectype) + && TYPE_MODE (wide_vectype) == TYPE_MODE (vectype) + && SCALAR_INT_MODE_P (TYPE_MODE (vectype))) + { + /* If the input and result modes are the same, a different optab + is needed where we pass in the number of units in vectype. */ + optab1 = vec_unpacks_sbool_lo_optab; + optab2 = vec_unpacks_sbool_hi_optab; + } + else + { + optab1 = optab_for_tree_code (c1, vectype, optab_default); + optab2 = optab_for_tree_code (c2, vectype, optab_default); + } + } + if (!optab1 || !optab2) return false;