Message ID | 20240530002147.7740-1-quic_pzheng@quicinc.com |
---|---|
State | New |
Headers | show |
Series | aarch64: Add vector floating point extend patterns [PR113880, PR113869] | expand |
Pengxuan Zheng <quic_pzheng@quicinc.com> writes: > This patch improves vectorization of certain floating point widening operations > for the aarch64 target by adding vector floating point extend patterns for > V2SF->V2DF and V4HF->V4SF conversions. > > PR target/113880 > PR target/113869 > > gcc/ChangeLog: > > * config/aarch64/aarch64-simd.md (extend<mode><Vwide>2): New expand. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/extend-vec.c: New test. > > Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com> Thanks for doing this. Could we instead rename aarch64_float_extend_lo_<Vwide> to extend<mode><Vwide>2 and use something similar to: ----------------------------------------------------------------------- /* The builtins below should be expanded through the standard optabs CODE_FOR_[u]avg<mode>3_[floor,ceil]. However the mapping scheme in aarch64-simd-builtins.def does not easily allow us to have a pre-mode ("uavg") and post-mode string ("_ceil") in the CODE_FOR_* construction. So the builtins use a name that is natural for AArch64 instructions e.g. "aarch64_srhadd<mode>" and we re-map these to the optab-related CODE_FOR_ here. */ #undef VAR1 #define VAR1(F,T1,T2,I,M) \ constexpr insn_code CODE_FOR_aarch64_##F##M = CODE_FOR_##T1##M##3##T2; BUILTIN_VDQ_BHSI (srhadd, avg, _ceil, 0) BUILTIN_VDQ_BHSI (urhadd, uavg, _ceil, 0) BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0) BUILTIN_VDQ_BHSI (uhadd, uavg, _floor, 0) #undef VAR1 ----------------------------------------------------------------------- (from aarch64-builtins.cc) to handle the intrinsics? The idea is to try to avoid adding new patterns just to satisfy the internal naming convention. Richard > --- > gcc/config/aarch64/aarch64-simd.md | 7 +++++++ > gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++++++++++++++++++ > 2 files changed, 28 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c > > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 868f4486218..8febb411d06 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -3141,6 +3141,13 @@ (define_insn "aarch64_float_extend_lo_<Vwide>" > [(set_attr "type" "neon_fp_cvt_widen_s")] > ) > > +(define_expand "extend<mode><Vwide>2" > + [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > + (float_extend:<VWIDE> > + (match_operand:VDF 1 "register_operand" "w")))] > + "TARGET_SIMD" > +) > + > ;; Float narrowing operations. > > (define_insn "aarch64_float_trunc_rodd_df" > diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c b/gcc/testsuite/gcc.target/aarch64/extend-vec.c > new file mode 100644 > index 00000000000..f62418888d5 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */ > +void > +f (float *__restrict a, double *__restrict b) > +{ > + b[0] = a[0]; > + b[1] = a[1]; > +} > + > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */ > +void > +f1 (_Float16 *__restrict a, float *__restrict b) > +{ > + > + b[0] = a[0]; > + b[1] = a[1]; > + b[2] = a[2]; > + b[3] = a[3]; > +}
> Pengxuan Zheng <quic_pzheng@quicinc.com> writes: > > This patch improves vectorization of certain floating point widening > > operations for the aarch64 target by adding vector floating point > > extend patterns for > > V2SF->V2DF and V4HF->V4SF conversions. > > > > PR target/113880 > > PR target/113869 > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-simd.md (extend<mode><Vwide>2): New > expand. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/extend-vec.c: New test. > > > > Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com> > > Thanks for doing this. Could we instead rename > aarch64_float_extend_lo_<Vwide> to extend<mode><Vwide>2 and use > something similar to: > > ----------------------------------------------------------------------- > /* The builtins below should be expanded through the standard optabs > CODE_FOR_[u]avg<mode>3_[floor,ceil]. However the mapping scheme in > aarch64-simd-builtins.def does not easily allow us to have a pre-mode > ("uavg") and post-mode string ("_ceil") in the CODE_FOR_* construction. > So the builtins use a name that is natural for AArch64 instructions > e.g. "aarch64_srhadd<mode>" and we re-map these to the optab-related > CODE_FOR_ here. */ > #undef VAR1 > #define VAR1(F,T1,T2,I,M) \ > constexpr insn_code CODE_FOR_aarch64_##F##M = > CODE_FOR_##T1##M##3##T2; > > BUILTIN_VDQ_BHSI (srhadd, avg, _ceil, 0) BUILTIN_VDQ_BHSI (urhadd, uavg, > _ceil, 0) BUILTIN_VDQ_BHSI (shadd, avg, _floor, 0) BUILTIN_VDQ_BHSI > (uhadd, uavg, _floor, 0) > > #undef VAR1 > ----------------------------------------------------------------------- > > (from aarch64-builtins.cc) to handle the intrinsics? The idea is to try to avoid > adding new patterns just to satisfy the internal naming convention. Sure, Richard. Here's the updated patch https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653177.html. Please let me know if I missed anything. Thanks, Pengxuan > > Richard > > > --- > > gcc/config/aarch64/aarch64-simd.md | 7 +++++++ > > gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 > > +++++++++++++++++++ > > 2 files changed, 28 insertions(+) > > create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c > > > > diff --git a/gcc/config/aarch64/aarch64-simd.md > > b/gcc/config/aarch64/aarch64-simd.md > > index 868f4486218..8febb411d06 100644 > > --- a/gcc/config/aarch64/aarch64-simd.md > > +++ b/gcc/config/aarch64/aarch64-simd.md > > @@ -3141,6 +3141,13 @@ (define_insn > "aarch64_float_extend_lo_<Vwide>" > > [(set_attr "type" "neon_fp_cvt_widen_s")] > > ) > > > > +(define_expand "extend<mode><Vwide>2" > > + [(set (match_operand:<VWIDE> 0 "register_operand" "=w") > > + (float_extend:<VWIDE> > > + (match_operand:VDF 1 "register_operand" "w")))] > > + "TARGET_SIMD" > > +) > > + > > ;; Float narrowing operations. > > > > (define_insn "aarch64_float_trunc_rodd_df" > > diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c > > b/gcc/testsuite/gcc.target/aarch64/extend-vec.c > > new file mode 100644 > > index 00000000000..f62418888d5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c > > @@ -0,0 +1,21 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2" } */ > > + > > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} > > +1 } } */ void f (float *__restrict a, double *__restrict b) { > > + b[0] = a[0]; > > + b[1] = a[1]; > > +} > > + > > +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} > > +1 } } */ void > > +f1 (_Float16 *__restrict a, float *__restrict b) { > > + > > + b[0] = a[0]; > > + b[1] = a[1]; > > + b[2] = a[2]; > > + b[3] = a[3]; > > +}
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 868f4486218..8febb411d06 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -3141,6 +3141,13 @@ (define_insn "aarch64_float_extend_lo_<Vwide>" [(set_attr "type" "neon_fp_cvt_widen_s")] ) +(define_expand "extend<mode><Vwide>2" + [(set (match_operand:<VWIDE> 0 "register_operand" "=w") + (float_extend:<VWIDE> + (match_operand:VDF 1 "register_operand" "w")))] + "TARGET_SIMD" +) + ;; Float narrowing operations. (define_insn "aarch64_float_trunc_rodd_df" diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c b/gcc/testsuite/gcc.target/aarch64/extend-vec.c new file mode 100644 index 00000000000..f62418888d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */ +void +f (float *__restrict a, double *__restrict b) +{ + b[0] = a[0]; + b[1] = a[1]; +} + +/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */ +void +f1 (_Float16 *__restrict a, float *__restrict b) +{ + + b[0] = a[0]; + b[1] = a[1]; + b[2] = a[2]; + b[3] = a[3]; +}
This patch improves vectorization of certain floating point widening operations for the aarch64 target by adding vector floating point extend patterns for V2SF->V2DF and V4HF->V4SF conversions. PR target/113880 PR target/113869 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (extend<mode><Vwide>2): New expand. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extend-vec.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com> --- gcc/config/aarch64/aarch64-simd.md | 7 +++++++ gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++++++++++++++++++ 2 files changed, 28 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c