Message ID | 20231026021816.1633907-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v2] VECT: Remove the type size restriction of vectorizer | expand |
On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > Update in v2: > > * Fix one ICE of type assertion. > * Adjust some test cases for aarch64 sve and riscv vector. > > Original log: > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > The below test are passed for this patch. > > * The x86 bootstrap and regression test. > * The aarch64 regression test. > * The risc-v regression tests. > > gcc/ChangeLog: > > * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. > * tree-vect-stmts.cc (vectorizable_call): Remove size check. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. > * gcc.target/aarch64/sve/clz_1.c: Ditto. > * gcc.target/aarch64/sve/popcount_1.c: Ditto. > * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/internal-fn.cc | 3 ++- > gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- > gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- > gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- > .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- > gcc/tree-vect-stmts.cc | 13 ------------- > 6 files changed, 6 insertions(+), 21 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 61d5a9e4772..17c0f4c3805 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, > emit_move_insn (lhs_rtx, ops[0].value); > else > { > - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); > + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); Can you explain why this is necessary? In particular what is lhs_rtx mode vs ops[0].value mode? > convert_move (lhs_rtx, ops[0].value, 0); I'm not sure convert_move handles vector modes correctly. Richard probably added this code, CCed. Richard. > } > } > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > index bdc9856faaf..940d08bbc7b 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > index 0c7a4e6d768..58b8ff406d2 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > index dfb6f4ac7a5..0eba898307c 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > index 585a522aa81..e6e3c70f927 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > @@ -1461,4 +1461,4 @@ main () > RUN_ALL () > } > > -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index a9200767f67..fa4ca0634e8 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
Thanks Richard for comments. > Can you explain why this is necessary? In particular what is lhs_rtx > mode vs ops[0].value mode? For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). The ops[0].value is (reg:VNx2DI 104). The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Thursday, October 26, 2023 4:38 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > Update in v2: > > * Fix one ICE of type assertion. > * Adjust some test cases for aarch64 sve and riscv vector. > > Original log: > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > The below test are passed for this patch. > > * The x86 bootstrap and regression test. > * The aarch64 regression test. > * The risc-v regression tests. > > gcc/ChangeLog: > > * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. > * tree-vect-stmts.cc (vectorizable_call): Remove size check. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. > * gcc.target/aarch64/sve/clz_1.c: Ditto. > * gcc.target/aarch64/sve/popcount_1.c: Ditto. > * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/internal-fn.cc | 3 ++- > gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- > gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- > gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- > .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- > gcc/tree-vect-stmts.cc | 13 ------------- > 6 files changed, 6 insertions(+), 21 deletions(-) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index 61d5a9e4772..17c0f4c3805 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, > emit_move_insn (lhs_rtx, ops[0].value); > else > { > - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); > + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); Can you explain why this is necessary? In particular what is lhs_rtx mode vs ops[0].value mode? > convert_move (lhs_rtx, ops[0].value, 0); I'm not sure convert_move handles vector modes correctly. Richard probably added this code, CCed. Richard. > } > } > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > index bdc9856faaf..940d08bbc7b 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > index 0c7a4e6d768..58b8ff406d2 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > index dfb6f4ac7a5..0eba898307c 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > } > > /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > index 585a522aa81..e6e3c70f927 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > @@ -1461,4 +1461,4 @@ main () > RUN_ALL () > } > > -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index a9200767f67..fa4ca0634e8 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>: > > Thanks Richard for comments. > >> Can you explain why this is necessary? In particular what is lhs_rtx >> mode vs ops[0].value mode? > > For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. > > The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). > The ops[0].value is (reg:VNx2DI 104). > > The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) > Pan > > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Thursday, October 26, 2023 4:38 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> > Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > >> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: >> >> From: Pan Li <pan2.li@intel.com> >> >> Update in v2: >> >> * Fix one ICE of type assertion. >> * Adjust some test cases for aarch64 sve and riscv vector. >> >> Original log: >> >> The vectoriable_call has one restriction of the size of data type. >> Aka DF to DI is allowed but SF to DI isn't. You may see below message >> when try to vectorize function call like lrintf. >> >> void >> test_lrintf (long *out, float *in, unsigned count) >> { >> for (unsigned i = 0; i < count; i++) >> out[i] = __builtin_lrintf (in[i]); >> } >> >> lrintf.c:5:26: missed: couldn't vectorize loop >> lrintf.c:5:26: missed: not vectorized: unsupported data-type >> >> Then the standard name pattern like lrintmn2 cannot work for different >> data type size like SF => DI. This patch would like to remove this data >> type size check and unblock the standard name like lrintmn2. >> >> The below test are passed for this patch. >> >> * The x86 bootstrap and regression test. >> * The aarch64 regression test. >> * The risc-v regression tests. >> >> gcc/ChangeLog: >> >> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. >> * tree-vect-stmts.cc (vectorizable_call): Remove size check. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. >> * gcc.target/aarch64/sve/clz_1.c: Ditto. >> * gcc.target/aarch64/sve/popcount_1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. >> >> Signed-off-by: Pan Li <pan2.li@intel.com> >> --- >> gcc/internal-fn.cc | 3 ++- >> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- >> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- >> gcc/tree-vect-stmts.cc | 13 ------------- >> 6 files changed, 6 insertions(+), 21 deletions(-) >> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >> index 61d5a9e4772..17c0f4c3805 100644 >> --- a/gcc/internal-fn.cc >> +++ b/gcc/internal-fn.cc >> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, >> emit_move_insn (lhs_rtx, ops[0].value); >> else >> { >> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); >> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) >> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); > > Can you explain why this is necessary? In particular what is lhs_rtx > mode vs ops[0].value mode? > >> convert_move (lhs_rtx, ops[0].value, 0); > > I'm not sure convert_move handles vector modes correctly. Richard > probably added this code, CCed. > > Richard. > >> } >> } >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> index bdc9856faaf..940d08bbc7b 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> index 0c7a4e6d768..58b8ff406d2 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> index dfb6f4ac7a5..0eba898307c 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> index 585a522aa81..e6e3c70f927 100644 >> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> @@ -1461,4 +1461,4 @@ main () >> RUN_ALL () >> } >> >> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index a9200767f67..fa4ca0634e8 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, >> >> return false; >> } >> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, >> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* >> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed >> - by a pack of the two vectors into an SI vector. We would need >> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ >> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> - "mismatched vector sizes %T and %T\n", >> - vectype_in, vectype_out); >> - return false; >> - } >> >> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) >> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) >> -- >> 2.34.1 >>
> But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) Make sense, we can wait a while for the confirmation from Richard S. If convert_move is not designed for Vector (looks like mostly up to a point), I am not sure if we can fix the assertion like below ... else If (VECTOR_INTERGER_TYPE (TREE_TYPE(lhs))) return; else { gcc_checking_assert (INTEGRAL_TYPE_TYPE_P (TREE_TYPE (lhs))); convert_move (lhs_rtx, ops[0].value, 0); } Aka bypass the vector here, but I am afraid this change may make the llrintf (SF => DI) not working on standard name. Let me have a try and keep you posted. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Thursday, October 26, 2023 10:00 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>: > > Thanks Richard for comments. > >> Can you explain why this is necessary? In particular what is lhs_rtx >> mode vs ops[0].value mode? > > For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. > > The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). > The ops[0].value is (reg:VNx2DI 104). > > The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) > Pan > > -----Original Message----- > From: Richard Biener <richard.guenther@gmail.com> > Sent: Thursday, October 26, 2023 4:38 PM > To: Li, Pan2 <pan2.li@intel.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> > Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > >> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: >> >> From: Pan Li <pan2.li@intel.com> >> >> Update in v2: >> >> * Fix one ICE of type assertion. >> * Adjust some test cases for aarch64 sve and riscv vector. >> >> Original log: >> >> The vectoriable_call has one restriction of the size of data type. >> Aka DF to DI is allowed but SF to DI isn't. You may see below message >> when try to vectorize function call like lrintf. >> >> void >> test_lrintf (long *out, float *in, unsigned count) >> { >> for (unsigned i = 0; i < count; i++) >> out[i] = __builtin_lrintf (in[i]); >> } >> >> lrintf.c:5:26: missed: couldn't vectorize loop >> lrintf.c:5:26: missed: not vectorized: unsupported data-type >> >> Then the standard name pattern like lrintmn2 cannot work for different >> data type size like SF => DI. This patch would like to remove this data >> type size check and unblock the standard name like lrintmn2. >> >> The below test are passed for this patch. >> >> * The x86 bootstrap and regression test. >> * The aarch64 regression test. >> * The risc-v regression tests. >> >> gcc/ChangeLog: >> >> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. >> * tree-vect-stmts.cc (vectorizable_call): Remove size check. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. >> * gcc.target/aarch64/sve/clz_1.c: Ditto. >> * gcc.target/aarch64/sve/popcount_1.c: Ditto. >> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. >> >> Signed-off-by: Pan Li <pan2.li@intel.com> >> --- >> gcc/internal-fn.cc | 3 ++- >> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- >> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- >> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- >> gcc/tree-vect-stmts.cc | 13 ------------- >> 6 files changed, 6 insertions(+), 21 deletions(-) >> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >> index 61d5a9e4772..17c0f4c3805 100644 >> --- a/gcc/internal-fn.cc >> +++ b/gcc/internal-fn.cc >> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, >> emit_move_insn (lhs_rtx, ops[0].value); >> else >> { >> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); >> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) >> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); > > Can you explain why this is necessary? In particular what is lhs_rtx > mode vs ops[0].value mode? > >> convert_move (lhs_rtx, ops[0].value, 0); > > I'm not sure convert_move handles vector modes correctly. Richard > probably added this code, CCed. > > Richard. > >> } >> } >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> index bdc9856faaf..940d08bbc7b 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> index 0c7a4e6d768..58b8ff406d2 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> index dfb6f4ac7a5..0eba898307c 100644 >> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >> } >> >> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> index 585a522aa81..e6e3c70f927 100644 >> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >> @@ -1461,4 +1461,4 @@ main () >> RUN_ALL () >> } >> >> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index a9200767f67..fa4ca0634e8 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, >> >> return false; >> } >> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, >> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* >> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed >> - by a pack of the two vectors into an SI vector. We would need >> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ >> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> - "mismatched vector sizes %T and %T\n", >> - vectype_in, vectype_out); >> - return false; >> - } >> >> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) >> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) >> -- >> 2.34.1 >>
Richard Biener <richard.guenther@gmail.com> writes: >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>: >> >> Thanks Richard for comments. >> >>> Can you explain why this is necessary? In particular what is lhs_rtx >>> mode vs ops[0].value mode? >> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. >> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). >> The ops[0].value is (reg:VNx2DI 104). >> >> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. > > But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) Agreed. Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out) makes sense if the called function allows the input and output modes to vary. That's true for internal functions that eventually map to two-mode optabs. But we can't remove the condition for calls to other functions, at least not without some fix-ups. ISTM that the problem being hit is the one described by the removed comment. In other words, I don't think simply removing the test from the vectoriser is correct. It needs to be replaced by something more selective. Thanks, Richard >> Pan >> >> -----Original Message----- >> From: Richard Biener <richard.guenther@gmail.com> >> Sent: Thursday, October 26, 2023 4:38 PM >> To: Li, Pan2 <pan2.li@intel.com> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer >> >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: >>> >>> From: Pan Li <pan2.li@intel.com> >>> >>> Update in v2: >>> >>> * Fix one ICE of type assertion. >>> * Adjust some test cases for aarch64 sve and riscv vector. >>> >>> Original log: >>> >>> The vectoriable_call has one restriction of the size of data type. >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message >>> when try to vectorize function call like lrintf. >>> >>> void >>> test_lrintf (long *out, float *in, unsigned count) >>> { >>> for (unsigned i = 0; i < count; i++) >>> out[i] = __builtin_lrintf (in[i]); >>> } >>> >>> lrintf.c:5:26: missed: couldn't vectorize loop >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type >>> >>> Then the standard name pattern like lrintmn2 cannot work for different >>> data type size like SF => DI. This patch would like to remove this data >>> type size check and unblock the standard name like lrintmn2. >>> >>> The below test are passed for this patch. >>> >>> * The x86 bootstrap and regression test. >>> * The aarch64 regression test. >>> * The risc-v regression tests. >>> >>> gcc/ChangeLog: >>> >>> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. >>> * tree-vect-stmts.cc (vectorizable_call): Remove size check. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. >>> * gcc.target/aarch64/sve/clz_1.c: Ditto. >>> * gcc.target/aarch64/sve/popcount_1.c: Ditto. >>> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. >>> >>> Signed-off-by: Pan Li <pan2.li@intel.com> >>> --- >>> gcc/internal-fn.cc | 3 ++- >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- >>> gcc/tree-vect-stmts.cc | 13 ------------- >>> 6 files changed, 6 insertions(+), 21 deletions(-) >>> >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >>> index 61d5a9e4772..17c0f4c3805 100644 >>> --- a/gcc/internal-fn.cc >>> +++ b/gcc/internal-fn.cc >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, >>> emit_move_insn (lhs_rtx, ops[0].value); >>> else >>> { >>> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); >>> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) >>> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); >> >> Can you explain why this is necessary? In particular what is lhs_rtx >> mode vs ops[0].value mode? >> >>> convert_move (lhs_rtx, ops[0].value, 0); >> >> I'm not sure convert_move handles vector modes correctly. Richard >> probably added this code, CCed. >> >> Richard. >> >>> } >>> } >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> index bdc9856faaf..940d08bbc7b 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> index 0c7a4e6d768..58b8ff406d2 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> index dfb6f4ac7a5..0eba898307c 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> index 585a522aa81..e6e3c70f927 100644 >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> @@ -1461,4 +1461,4 @@ main () >>> RUN_ALL () >>> } >>> >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >>> index a9200767f67..fa4ca0634e8 100644 >>> --- a/gcc/tree-vect-stmts.cc >>> +++ b/gcc/tree-vect-stmts.cc >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, >>> >>> return false; >>> } >>> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, >>> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* >>> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed >>> - by a pack of the two vectors into an SI vector. We would need >>> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ >>> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) >>> - { >>> - if (dump_enabled_p ()) >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> - "mismatched vector sizes %T and %T\n", >>> - vectype_in, vectype_out); >>> - return false; >>> - } >>> >>> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) >>> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) >>> -- >>> 2.34.1 >>>
Thanks Richard S for comments. > In other words, I don't think simply removing the test from the vectoriser > is correct. It needs to be replaced by something more selective. Does it mean we need to check if the internal fun allow different modes/sizes here? For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't. Pan -----Original Message----- From: Richard Sandiford <richard.sandiford@arm.com> Sent: Friday, October 27, 2023 1:47 AM To: Richard Biener <richard.guenther@gmail.com> Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer Richard Biener <richard.guenther@gmail.com> writes: >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>: >> >> Thanks Richard for comments. >> >>> Can you explain why this is necessary? In particular what is lhs_rtx >>> mode vs ops[0].value mode? >> >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. >> >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). >> The ops[0].value is (reg:VNx2DI 104). >> >> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. > > But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) Agreed. Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out) makes sense if the called function allows the input and output modes to vary. That's true for internal functions that eventually map to two-mode optabs. But we can't remove the condition for calls to other functions, at least not without some fix-ups. ISTM that the problem being hit is the one described by the removed comment. In other words, I don't think simply removing the test from the vectoriser is correct. It needs to be replaced by something more selective. Thanks, Richard >> Pan >> >> -----Original Message----- >> From: Richard Biener <richard.guenther@gmail.com> >> Sent: Thursday, October 26, 2023 4:38 PM >> To: Li, Pan2 <pan2.li@intel.com> >> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer >> >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: >>> >>> From: Pan Li <pan2.li@intel.com> >>> >>> Update in v2: >>> >>> * Fix one ICE of type assertion. >>> * Adjust some test cases for aarch64 sve and riscv vector. >>> >>> Original log: >>> >>> The vectoriable_call has one restriction of the size of data type. >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message >>> when try to vectorize function call like lrintf. >>> >>> void >>> test_lrintf (long *out, float *in, unsigned count) >>> { >>> for (unsigned i = 0; i < count; i++) >>> out[i] = __builtin_lrintf (in[i]); >>> } >>> >>> lrintf.c:5:26: missed: couldn't vectorize loop >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type >>> >>> Then the standard name pattern like lrintmn2 cannot work for different >>> data type size like SF => DI. This patch would like to remove this data >>> type size check and unblock the standard name like lrintmn2. >>> >>> The below test are passed for this patch. >>> >>> * The x86 bootstrap and regression test. >>> * The aarch64 regression test. >>> * The risc-v regression tests. >>> >>> gcc/ChangeLog: >>> >>> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. >>> * tree-vect-stmts.cc (vectorizable_call): Remove size check. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. >>> * gcc.target/aarch64/sve/clz_1.c: Ditto. >>> * gcc.target/aarch64/sve/popcount_1.c: Ditto. >>> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. >>> >>> Signed-off-by: Pan Li <pan2.li@intel.com> >>> --- >>> gcc/internal-fn.cc | 3 ++- >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- >>> gcc/tree-vect-stmts.cc | 13 ------------- >>> 6 files changed, 6 insertions(+), 21 deletions(-) >>> >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc >>> index 61d5a9e4772..17c0f4c3805 100644 >>> --- a/gcc/internal-fn.cc >>> +++ b/gcc/internal-fn.cc >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, >>> emit_move_insn (lhs_rtx, ops[0].value); >>> else >>> { >>> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); >>> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) >>> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); >> >> Can you explain why this is necessary? In particular what is lhs_rtx >> mode vs ops[0].value mode? >> >>> convert_move (lhs_rtx, ops[0].value, 0); >> >> I'm not sure convert_move handles vector modes correctly. Richard >> probably added this code, CCed. >> >> Richard. >> >>> } >>> } >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> index bdc9856faaf..940d08bbc7b 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> index 0c7a4e6d768..58b8ff406d2 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> index dfb6f4ac7a5..0eba898307c 100644 >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) >>> } >>> >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> index 585a522aa81..e6e3c70f927 100644 >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c >>> @@ -1461,4 +1461,4 @@ main () >>> RUN_ALL () >>> } >>> >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >>> index a9200767f67..fa4ca0634e8 100644 >>> --- a/gcc/tree-vect-stmts.cc >>> +++ b/gcc/tree-vect-stmts.cc >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, >>> >>> return false; >>> } >>> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, >>> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* >>> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed >>> - by a pack of the two vectors into an SI vector. We would need >>> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ >>> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) >>> - { >>> - if (dump_enabled_p ()) >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> - "mismatched vector sizes %T and %T\n", >>> - vectype_in, vectype_out); >>> - return false; >>> - } >>> >>> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) >>> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) >>> -- >>> 2.34.1 >>>
On Fri, Oct 27, 2023 at 4:17 AM Li, Pan2 <pan2.li@intel.com> wrote: > > Thanks Richard S for comments. > > > In other words, I don't think simply removing the test from the vectoriser > > is correct. It needs to be replaced by something more selective. > > Does it mean we need to check if the internal fun allow different modes/sizes here? > > For example, standard name lrintmn2 (m, n mode) is allowed here, while rintm2 (only m mode) isn't. We need to check whether the "size" of the LHS is somehow participating in the optab query. I think the direct_internal_fn_info type0/1 members, when -1 say that. If none is -1 (and -2) then the LHS has to match one of the arguments (if there are two different I'm not sure which we'd pick). So patch-wise the existing check can probably be skipped when vectorizable_internal_function returns an IFN but that function should have the very same check when vectype_out isn't participating in the optab selection. Richard. > > Pan > > -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Friday, October 27, 2023 1:47 AM > To: Richard Biener <richard.guenther@gmail.com> > Cc: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> > Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > > Richard Biener <richard.guenther@gmail.com> writes: > >> Am 26.10.2023 um 13:59 schrieb Li, Pan2 <pan2.li@intel.com>: > >> > >> Thanks Richard for comments. > >> > >>> Can you explain why this is necessary? In particular what is lhs_rtx > >>> mode vs ops[0].value mode? > >> > >> For testcase gcc.target/aarch64/sve/popcount_1.c, the rtl are list as below. > >> > >> The lhs_rtx is (reg:VNx2SI 98 [ vect__5.36 ]). > >> The ops[0].value is (reg:VNx2DI 104). > >> > >> The restriction removing make the vector rtl enter expand_fn_using_insn and of course hit the INTEGER_P assertion. > > > > But I think this shows we mid-selected the optab, a convert_move is certainly not correct unconditionally here (the target might not support that) > > Agreed. Allowing TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out) > makes sense if the called function allows the input and output modes > to vary. That's true for internal functions that eventually map to > two-mode optabs. But we can't remove the condition for calls to > other functions, at least not without some fix-ups. > > ISTM that the problem being hit is the one described by the removed > comment. > > In other words, I don't think simply removing the test from the vectoriser > is correct. It needs to be replaced by something more selective. > > Thanks, > Richard > > >> Pan > >> > >> -----Original Message----- > >> From: Richard Biener <richard.guenther@gmail.com> > >> Sent: Thursday, October 26, 2023 4:38 PM > >> To: Li, Pan2 <pan2.li@intel.com> > >> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com>; Richard Sandiford <richard.sandiford@arm.com> > >> Subject: Re: [PATCH v2] VECT: Remove the type size restriction of vectorizer > >> > >>> On Thu, Oct 26, 2023 at 4:18 AM <pan2.li@intel.com> wrote: > >>> > >>> From: Pan Li <pan2.li@intel.com> > >>> > >>> Update in v2: > >>> > >>> * Fix one ICE of type assertion. > >>> * Adjust some test cases for aarch64 sve and riscv vector. > >>> > >>> Original log: > >>> > >>> The vectoriable_call has one restriction of the size of data type. > >>> Aka DF to DI is allowed but SF to DI isn't. You may see below message > >>> when try to vectorize function call like lrintf. > >>> > >>> void > >>> test_lrintf (long *out, float *in, unsigned count) > >>> { > >>> for (unsigned i = 0; i < count; i++) > >>> out[i] = __builtin_lrintf (in[i]); > >>> } > >>> > >>> lrintf.c:5:26: missed: couldn't vectorize loop > >>> lrintf.c:5:26: missed: not vectorized: unsupported data-type > >>> > >>> Then the standard name pattern like lrintmn2 cannot work for different > >>> data type size like SF => DI. This patch would like to remove this data > >>> type size check and unblock the standard name like lrintmn2. > >>> > >>> The below test are passed for this patch. > >>> > >>> * The x86 bootstrap and regression test. > >>> * The aarch64 regression test. > >>> * The risc-v regression tests. > >>> > >>> gcc/ChangeLog: > >>> > >>> * internal-fn.cc (expand_fn_using_insn): Add vector int assertion. > >>> * tree-vect-stmts.cc (vectorizable_call): Remove size check. > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>> * gcc.target/aarch64/sve/clrsb_1.c: Adjust checker. > >>> * gcc.target/aarch64/sve/clz_1.c: Ditto. > >>> * gcc.target/aarch64/sve/popcount_1.c: Ditto. > >>> * gcc.target/riscv/rvv/autovec/unop/popcount.c: Ditto. > >>> > >>> Signed-off-by: Pan Li <pan2.li@intel.com> > >>> --- > >>> gcc/internal-fn.cc | 3 ++- > >>> gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c | 3 +-- > >>> gcc/testsuite/gcc.target/aarch64/sve/clz_1.c | 3 +-- > >>> gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c | 3 +-- > >>> .../gcc.target/riscv/rvv/autovec/unop/popcount.c | 2 +- > >>> gcc/tree-vect-stmts.cc | 13 ------------- > >>> 6 files changed, 6 insertions(+), 21 deletions(-) > >>> > >>> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > >>> index 61d5a9e4772..17c0f4c3805 100644 > >>> --- a/gcc/internal-fn.cc > >>> +++ b/gcc/internal-fn.cc > >>> @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, > >>> emit_move_insn (lhs_rtx, ops[0].value); > >>> else > >>> { > >>> - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); > >>> + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > >>> + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); > >> > >> Can you explain why this is necessary? In particular what is lhs_rtx > >> mode vs ops[0].value mode? > >> > >>> convert_move (lhs_rtx, ops[0].value, 0); > >> > >> I'm not sure convert_move handles vector modes correctly. Richard > >> probably added this code, CCed. > >> > >> Richard. > >> > >>> } > >>> } > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> index bdc9856faaf..940d08bbc7b 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c > >>> @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> index 0c7a4e6d768..58b8ff406d2 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c > >>> @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> index dfb6f4ac7a5..0eba898307c 100644 > >>> --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c > >>> @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) > >>> } > >>> > >>> /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ > >>> -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ > >>> -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ > >>> +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ > >>> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> index 585a522aa81..e6e3c70f927 100644 > >>> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c > >>> @@ -1461,4 +1461,4 @@ main () > >>> RUN_ALL () > >>> } > >>> > >>> -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ > >>> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ > >>> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > >>> index a9200767f67..fa4ca0634e8 100644 > >>> --- a/gcc/tree-vect-stmts.cc > >>> +++ b/gcc/tree-vect-stmts.cc > >>> @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, > >>> > >>> return false; > >>> } > >>> - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > >>> - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > >>> - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > >>> - by a pack of the two vectors into an SI vector. We would need > >>> - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > >>> - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > >>> - { > >>> - if (dump_enabled_p ()) > >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>> - "mismatched vector sizes %T and %T\n", > >>> - vectype_in, vectype_out); > >>> - return false; > >>> - } > >>> > >>> if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > >>> != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > >>> -- > >>> 2.34.1 > >>>
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 61d5a9e4772..17c0f4c3805 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -281,7 +281,8 @@ expand_fn_using_insn (gcall *stmt, insn_code icode, unsigned int noutputs, emit_move_insn (lhs_rtx, ops[0].value); else { - gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs))); + gcc_checking_assert (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) + || VECTOR_INTEGER_TYPE_P (TREE_TYPE (lhs))); convert_move (lhs_rtx, ops[0].value, 0); } } diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c index bdc9856faaf..940d08bbc7b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/clrsb_1.c @@ -18,5 +18,4 @@ clrsb_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) } /* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcls\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c index 0c7a4e6d768..58b8ff406d2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/clz_1.c @@ -18,5 +18,4 @@ clz_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) } /* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tclz\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c index dfb6f4ac7a5..0eba898307c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/popcount_1.c @@ -18,5 +18,4 @@ popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) } /* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s\n} 1 } } */ -/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 2 } } */ -/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcnt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c index 585a522aa81..e6e3c70f927 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/popcount.c @@ -1461,4 +1461,4 @@ main () RUN_ALL () } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 229 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 384 "vect" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index a9200767f67..fa4ca0634e8 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3361,19 +3361,6 @@ vectorizable_call (vec_info *vinfo, return false; } - /* FORNOW: we don't yet support mixtures of vector sizes for calls, - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed - by a pack of the two vectors into an SI vector. We would need - separate code to handle direct VnDI->VnSI IFN_CTZs. */ - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "mismatched vector sizes %T and %T\n", - vectype_in, vectype_out); - return false; - } if (VECTOR_BOOLEAN_TYPE_P (vectype_out) != VECTOR_BOOLEAN_TYPE_P (vectype_in))