Message ID | 20231030122256.3710809-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v3] VECT: Refine the type size restriction of call vectorizer | expand |
On Mon, Oct 30, 2023 at 1:23 PM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > Update in v3: > > * Add func to predicate type size is legal or not for vectorizer call. > > Update in v2: > > * Fix one ICE of type assertion. > * Adjust some test cases for aarch64 sve and riscv vector. > > Original log: > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to refine this data > type size check and unblock the standard name like lrintmn2 on conditions. > > The type size of vectype_out need to be exactly the same as the type > size of vectype_in when the vectype_out size isn't participating in > the optab selection. While there is no such restriction when the > vectype_out is somehow a part of the optab query. > > The below test are passed for this patch. > > * The x86 bootstrap and regression test. > * The aarch64 regression test. > * The risc-v regression tests. > * Ensure the lrintf standard name in risc-v. > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New > func impl to predicate the type size is legal or not. > (vectorizable_call): Leverage vectorizable_type_size_legal_p. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++----------- > 1 file changed, 38 insertions(+), 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index a9200767f67..24b3448d961 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl, > return IFN_LAST; > } > > +/* Return TRUE when the type size is legal for the call vectorizer, > + or FALSE. > + The type size of both the vectype_in and vectype_out should be > + exactly the same when vectype_out isn't participating the optab. > + While there is no restriction for type size when vectype_out > + is part of the optab query. > + */ > +static bool > +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out, > + tree vectype_in) > +{ > + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out); > + > + if (ifn == IFN_LAST || !direct_internal_fn_p (ifn)) > + return same_size_p; > + > + const direct_internal_fn_info &difn_info = direct_internal_fn (ifn); > + > + if (!difn_info.vectorizable) > + return same_size_p; > + > + /* According to vectorizable_internal_function, the type0/1 < 0 indicates > + the vectype_out participating the optable selection. Aka the type size > + check can be skipped here. */ > + if (difn_info.type0 < 0 || difn_info.type1 < 0) > + return true; can you instead amend vectorizable_internal_function to contain the check, returning IFN_LAST if it doesn't hold? > + > + return same_size_p; > +} > > static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info, > gimple_stmt_iterator *); > @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo, > ifn = vectorizable_internal_function (cfn, callee, vectype_out, > vectype_in); > > + if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "mismatched vector sizes %T and %T\n", > + vectype_in, vectype_out); > + return false; > + } > + > /* If that fails, try asking for a target-specific built-in function. */ > if (ifn == IFN_LAST) > { > -- > 2.34.1 >
> can you instead amend vectorizable_internal_function to contain the check, > returning IFN_LAST if it doesn't hold? Sure, will send v4 for this. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Tuesday, October 31, 2023 8:58 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> Subject: Re: [PATCH v3] VECT: Refine the type size restriction of call vectorizer On Mon, Oct 30, 2023 at 1:23 PM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > Update in v3: > > * Add func to predicate type size is legal or not for vectorizer call. > > Update in v2: > > * Fix one ICE of type assertion. > * Adjust some test cases for aarch64 sve and riscv vector. > > Original log: > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to refine this data > type size check and unblock the standard name like lrintmn2 on conditions. > > The type size of vectype_out need to be exactly the same as the type > size of vectype_in when the vectype_out size isn't participating in > the optab selection. While there is no such restriction when the > vectype_out is somehow a part of the optab query. > > The below test are passed for this patch. > > * The x86 bootstrap and regression test. > * The aarch64 regression test. > * The risc-v regression tests. > * Ensure the lrintf standard name in risc-v. > > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_type_size_legal_p): New > func impl to predicate the type size is legal or not. > (vectorizable_call): Leverage vectorizable_type_size_legal_p. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 51 +++++++++++++++++++++++++++++++----------- > 1 file changed, 38 insertions(+), 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index a9200767f67..24b3448d961 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl, > return IFN_LAST; > } > > +/* Return TRUE when the type size is legal for the call vectorizer, > + or FALSE. > + The type size of both the vectype_in and vectype_out should be > + exactly the same when vectype_out isn't participating the optab. > + While there is no restriction for type size when vectype_out > + is part of the optab query. > + */ > +static bool > +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out, > + tree vectype_in) > +{ > + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out); > + > + if (ifn == IFN_LAST || !direct_internal_fn_p (ifn)) > + return same_size_p; > + > + const direct_internal_fn_info &difn_info = direct_internal_fn (ifn); > + > + if (!difn_info.vectorizable) > + return same_size_p; > + > + /* According to vectorizable_internal_function, the type0/1 < 0 indicates > + the vectype_out participating the optable selection. Aka the type size > + check can be skipped here. */ > + if (difn_info.type0 < 0 || difn_info.type1 < 0) > + return true; can you instead amend vectorizable_internal_function to contain the check, returning IFN_LAST if it doesn't hold? > + > + return same_size_p; > +} > > static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info, > gimple_stmt_iterator *); > @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo, > ifn = vectorizable_internal_function (cfn, callee, vectype_out, > vectype_in); > > + if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "mismatched vector sizes %T and %T\n", > + vectype_in, vectype_out); > + return false; > + } > + > /* If that fails, try asking for a target-specific built-in function. */ > if (ifn == IFN_LAST) > { > -- > 2.34.1 >
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a9200767f67..24b3448d961 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1430,6 +1430,35 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl, return IFN_LAST; } +/* Return TRUE when the type size is legal for the call vectorizer, + or FALSE. + The type size of both the vectype_in and vectype_out should be + exactly the same when vectype_out isn't participating the optab. + While there is no restriction for type size when vectype_out + is part of the optab query. + */ +static bool +vectorizable_type_size_legal_p (internal_fn ifn, tree vectype_out, + tree vectype_in) +{ + bool same_size_p = TYPE_SIZE (vectype_in) == TYPE_SIZE (vectype_out); + + if (ifn == IFN_LAST || !direct_internal_fn_p (ifn)) + return same_size_p; + + const direct_internal_fn_info &difn_info = direct_internal_fn (ifn); + + if (!difn_info.vectorizable) + return same_size_p; + + /* According to vectorizable_internal_function, the type0/1 < 0 indicates + the vectype_out participating the optable selection. Aka the type size + check can be skipped here. */ + if (difn_info.type0 < 0 || difn_info.type1 < 0) + return true; + + return same_size_p; +} static tree permute_vec_elements (vec_info *, tree, tree, tree, stmt_vec_info, gimple_stmt_iterator *); @@ -3361,19 +3390,6 @@ vectorizable_call (vec_info *vinfo, return false; } - /* FORNOW: we don't yet support mixtures of vector sizes for calls, - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed - by a pack of the two vectors into an SI vector. We would need - separate code to handle direct VnDI->VnSI IFN_CTZs. */ - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "mismatched vector sizes %T and %T\n", - vectype_in, vectype_out); - return false; - } if (VECTOR_BOOLEAN_TYPE_P (vectype_out) != VECTOR_BOOLEAN_TYPE_P (vectype_in)) @@ -3431,6 +3447,15 @@ vectorizable_call (vec_info *vinfo, ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in); + if (!vectorizable_type_size_legal_p (ifn, vectype_out, vectype_in)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "mismatched vector sizes %T and %T\n", + vectype_in, vectype_out); + return false; + } + /* If that fails, try asking for a target-specific built-in function. */ if (ifn == IFN_LAST) {