Message ID | 20231018012009.849697-1-pan2.li@intel.com |
---|---|
State | New |
Headers | show |
Series | [v1] RISC-V: Remove the type size restriction of vectorizer | expand |
On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > Passed the x86 bootstrap and regression test already. OK. On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP vectorization we could expect to see the following vectorized after the patch (with loop vectorization you'll see us pre-select same sized vector types) long int x[4]; float y[4]; void foo () { x[0] = __builtin_lrintf (y[0]); x[1] = __builtin_lrintf (y[1]); x[2] = __builtin_lrintf (y[2]); x[3] = __builtin_lrintf (y[3]); } > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_call): Remove data size > check. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 13 ------------- > 1 file changed, 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index b3a56498595..326e000a71d 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
Thanks Richard, let's wait for a while incase there are comments from others due to not familiar with these parts. Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Wednesday, October 18, 2023 2:34 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > Passed the x86 bootstrap and regression test already. OK. On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP vectorization we could expect to see the following vectorized after the patch (with loop vectorization you'll see us pre-select same sized vector types) long int x[4]; float y[4]; void foo () { x[0] = __builtin_lrintf (y[0]); x[1] = __builtin_lrintf (y[1]); x[2] = __builtin_lrintf (y[2]); x[3] = __builtin_lrintf (y[3]); } > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_call): Remove data size > check. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 13 ------------- > 1 file changed, 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index b3a56498595..326e000a71d 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
Hi Richard Biener, The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon. FAIL: 12 regressions regressions.sum: === gcc tests === Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ... FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284) FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors) FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2 FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284) FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors) FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2 ... and 7 more entries Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Wednesday, October 18, 2023 2:34 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > Passed the x86 bootstrap and regression test already. OK. On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP vectorization we could expect to see the following vectorized after the patch (with loop vectorization you'll see us pre-select same sized vector types) long int x[4]; float y[4]; void foo () { x[0] = __builtin_lrintf (y[0]); x[1] = __builtin_lrintf (y[1]); x[2] = __builtin_lrintf (y[2]); x[3] = __builtin_lrintf (y[3]); } > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_call): Remove data size > check. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 13 ------------- > 1 file changed, 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index b3a56498595..326e000a71d 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
/* We only enable VLS modes for VLA vectorization since fixed length VLMAX mode - is the highest priority choice and should not conflict with VLS modes. */ -#define TARGET_VECTOR_VLS \ - (TARGET_VECTOR && riscv_autovec_preference == RVV_SCALABLE) + is the highest priority choice and should not conflict with VLS modes. + + We also enable VLS modes for some cases in fixed-vlmax, aka the bitsize of + the VLS mode are smaller than the minimal vla. + */ +#define TARGET_VECTOR_VLS(mode) \ + (TARGET_VECTOR \ + && (riscv_autovec_preference == RVV_SCALABLE \ + || (riscv_autovec_preference == RVV_FIXED_VLMAX \ + && riscv_vector::legal_vls_mode_in_fixed_vlmax_p (mode)))) Remove this macro. Create a function called "vls_mode_valid_p" and use riscv_vector::vls_mode_valid_p in all places. Indicate this PR in commit title: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111857 -> RISC-V: Support partial VLS mode when preference fixed-vlmax[PR111857] +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/vls-mode-6.c Change all tests name into pr111857-1.c...etc. juzhe.zhong@rivai.ai From: Li, Pan2 Date: 2023-10-20 16:43 To: Richard Biener CC: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang; kito.cheng@gmail.com; Liu, Hongtao Subject: RE: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer Hi Richard Biener, The CI of linaro-toolchain@lists.linaro.org reports some aarch64 regression of this change, I will double check about it soon. FAIL: 12 regressions regressions.sum: === gcc tests === Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ... FAIL: gcc.target/aarch64/sve/clrsb_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284) FAIL: gcc.target/aarch64/sve/clrsb_1.c (test for excess errors) FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tcls\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2 FAIL: gcc.target/aarch64/sve/clrsb_1.c scan-assembler-times \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 FAIL: gcc.target/aarch64/sve/clz_1.c (internal compiler error: in expand_fn_using_insn, at internal-fn.cc:284) FAIL: gcc.target/aarch64/sve/clz_1.c (test for excess errors) FAIL: gcc.target/aarch64/sve/clz_1.c scan-assembler-times \\tclz\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.d\\n 2 ... and 7 more entries Pan -----Original Message----- From: Richard Biener <richard.guenther@gmail.com> Sent: Wednesday, October 18, 2023 2:34 PM To: Li, Pan2 <pan2.li@intel.com> Cc: gcc-patches@gcc.gnu.org; juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; Liu, Hongtao <hongtao.liu@intel.com> Subject: Re: [PATCH v1] RISC-V: Remove the type size restriction of vectorizer On Wed, Oct 18, 2023 at 3:20 AM <pan2.li@intel.com> wrote: > > From: Pan Li <pan2.li@intel.com> > > The vectoriable_call has one restriction of the size of data type. > Aka DF to DI is allowed but SF to DI isn't. You may see below message > when try to vectorize function call like lrintf. > > void > test_lrintf (long *out, float *in, unsigned count) > { > for (unsigned i = 0; i < count; i++) > out[i] = __builtin_lrintf (in[i]); > } > > lrintf.c:5:26: missed: couldn't vectorize loop > lrintf.c:5:26: missed: not vectorized: unsupported data-type > > Then the standard name pattern like lrintmn2 cannot work for different > data type size like SF => DI. This patch would like to remove this data > type size check and unblock the standard name like lrintmn2. > > Passed the x86 bootstrap and regression test already. OK. On x86 we seem to have lrintsfdi2 but not lrintv4sfv4di2, with SLP vectorization we could expect to see the following vectorized after the patch (with loop vectorization you'll see us pre-select same sized vector types) long int x[4]; float y[4]; void foo () { x[0] = __builtin_lrintf (y[0]); x[1] = __builtin_lrintf (y[1]); x[2] = __builtin_lrintf (y[2]); x[3] = __builtin_lrintf (y[3]); } > gcc/ChangeLog: > > * tree-vect-stmts.cc (vectorizable_call): Remove data size > check. > > Signed-off-by: Pan Li <pan2.li@intel.com> > --- > gcc/tree-vect-stmts.cc | 13 ------------- > 1 file changed, 13 deletions(-) > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index b3a56498595..326e000a71d 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo, > > return false; > } > - /* FORNOW: we don't yet support mixtures of vector sizes for calls, > - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* > - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed > - by a pack of the two vectors into an SI vector. We would need > - separate code to handle direct VnDI->VnSI IFN_CTZs. */ > - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "mismatched vector sizes %T and %T\n", > - vectype_in, vectype_out); > - return false; > - } > > if (VECTOR_BOOLEAN_TYPE_P (vectype_out) > != VECTOR_BOOLEAN_TYPE_P (vectype_in)) > -- > 2.34.1 >
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index b3a56498595..326e000a71d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -3529,19 +3529,6 @@ vectorizable_call (vec_info *vinfo, return false; } - /* FORNOW: we don't yet support mixtures of vector sizes for calls, - just mixtures of nunits. E.g. DI->SI versions of __builtin_ctz* - are traditionally vectorized as two VnDI->VnDI IFN_CTZs followed - by a pack of the two vectors into an SI vector. We would need - separate code to handle direct VnDI->VnSI IFN_CTZs. */ - if (TYPE_SIZE (vectype_in) != TYPE_SIZE (vectype_out)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "mismatched vector sizes %T and %T\n", - vectype_in, vectype_out); - return false; - } if (VECTOR_BOOLEAN_TYPE_P (vectype_out) != VECTOR_BOOLEAN_TYPE_P (vectype_in))