Message ID | mptpmb0kzir.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | [1/2] Add support for conditional xorsign [PR96373] | expand |
On Fri, 27 Jan 2023, Richard Sandiford wrote: > This patch is an optimisation, but it's also a prerequisite for > fixing PR96373 without regressing vect-xorsign_exec.c. > > Currently the vectoriser vectorises: > > for (i = 0; i < N; i++) > r[i] = a[i] * __builtin_copysignf (1.0f, b[i]); > > as two unconditional operations (copysign and mult). > tree-ssa-math-opts.cc later combines them into an "xorsign" function. > This works for both Advanced SIMD and SVE. > > However, with the fix for PR96373, the vectoriser will instead > generate a conditional multiplication (IFN_COND_MUL). Something then > needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional > xorsign. Three obvious options were: > > (1) Extend tree-ssa-math-opts.cc. > (2) Do the fold in match.pd. > (3) Leave it to rtl combine. > > I'm against (3), because this isn't a target-specific optimisation. > (1) would be possible, but would involve open-coding a lot of what > match.pd does for us. And, in contrast to doing the current > tree-ssa-math-opts.cc optimisation in match.pd, there should be > no danger of (2) happening too early. If we have an IFN_COND_MUL > then we're already past the stage of simplifying the original > source code. > > There was also a choice between adding a conditional xorsign ifn > and simply open-coding the xorsign. The latter seems simpler, > and means less boiler-plate for target-specific code. > > The signed_or_unsigned_type_for change is needed to make sure > that we stay in "SVE space" when doing the optimisation on 128-bit > fixed-length SVE. > > Tested on aarch64-linux-gnu. OK to install? OK. Thanks, Richard. > Richard > > > gcc/ > PR tree-optimization/96373 > * tree.h (sign_mask_for): Declare. > * tree.cc (sign_mask_for): New function. > (signed_or_unsigned_type_for): For vector types, try to use the > related_int_vector_mode. > * genmatch.cc (commutative_op): Handle conditional internal functions. > * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and. > > gcc/testsuite/ > PR tree-optimization/96373 > * gcc.target/aarch64/sve/cond_xorsign_1.c: New test. > * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise. > --- > gcc/genmatch.cc | 15 ++++++++ > gcc/match.pd | 14 ++++++++ > .../gcc.target/aarch64/sve/cond_xorsign_1.c | 34 +++++++++++++++++++ > .../gcc.target/aarch64/sve/cond_xorsign_2.c | 17 ++++++++++ > gcc/tree.cc | 33 ++++++++++++++++++ > gcc/tree.h | 1 + > 6 files changed, 114 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c > > diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc > index d4cb439a851..e147ab9db7a 100644 > --- a/gcc/genmatch.cc > +++ b/gcc/genmatch.cc > @@ -489,6 +489,21 @@ commutative_op (id_base *id) > case CFN_FNMS: > return 0; > > + case CFN_COND_ADD: > + case CFN_COND_MUL: > + case CFN_COND_MIN: > + case CFN_COND_MAX: > + case CFN_COND_FMIN: > + case CFN_COND_FMAX: > + case CFN_COND_AND: > + case CFN_COND_IOR: > + case CFN_COND_XOR: > + case CFN_COND_FMA: > + case CFN_COND_FMS: > + case CFN_COND_FNMA: > + case CFN_COND_FNMS: > + return 1; > + > default: > return -1; > } > diff --git a/gcc/match.pd b/gcc/match.pd > index 56ac743aa6d..f605b798c44 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0))) > (COPYSIGN_ALL (negate @0) @1))) > > +/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z. > + tree-ssa-math-opts.cc does the corresponding optimization for > + unconditional multiplications (via xorsign). */ > +(simplify > + (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3) > + (with { tree signs = sign_mask_for (type); } > + (if (signs) > + (with { tree inttype = TREE_TYPE (signs); } > + (view_convert:type > + (IFN_COND_XOR:inttype @0 > + (view_convert:inttype @1) > + (bit_and (view_convert:inttype @2) { signs; }) > + (view_convert:inttype @3))))))) > + > /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ > (simplify > (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c > new file mode 100644 > index 00000000000..338ca605923 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c > @@ -0,0 +1,34 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B)) > + > +#define DEF_LOOP(TYPE, SUFFIX) \ > + void __attribute__ ((noinline, noclone)) \ > + test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \ > + TYPE *__restrict b, TYPE *__restrict c, \ > + int n) \ > + { \ > + for (int i = 0; i < n; ++i) \ > + r[i] = a[i] < 20 ? xorsign(b[i], c[i], SUFFIX) : b[i]; \ > + } > + > +#define TEST_ALL(T) \ > + T (_Float16, f16) \ > + T (float, f) \ > + T (double, ) > + > +TEST_ALL (DEF_LOOP) > + > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-not {\tfmul} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c > new file mode 100644 > index 00000000000..274dd0ede59 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msve-vector-bits=128 --param aarch64-autovec-preference=2" } */ > + > +#include "cond_xorsign_1.c" > + > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ > +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ > + > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ > +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ > + > +/* { dg-final { scan-assembler-not {\tfmul} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > diff --git a/gcc/tree.cc b/gcc/tree.cc > index 7473912a065..94d20eaba17 100644 > --- a/gcc/tree.cc > +++ b/gcc/tree.cc > @@ -2669,6 +2669,35 @@ build_zero_cst (tree type) > } > } > > +/* If floating-point type TYPE has an IEEE-style sign bit, return an > + unsigned constant in which only the sign bit is set. Return null > + otherwise. */ > + > +tree > +sign_mask_for (tree type) > +{ > + /* Avoid having to choose between a real-only sign and a pair of signs. > + This could be relaxed if the choice becomes obvious later. */ > + if (TREE_CODE (type) == COMPLEX_TYPE) > + return NULL_TREE; > + > + auto eltmode = as_a<scalar_float_mode> (element_mode (type)); > + auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits; > + if (!bits || !pow2p_hwi (bits)) > + return NULL_TREE; > + > + tree inttype = unsigned_type_for (type); > + if (!inttype) > + return NULL_TREE; > + > + auto mask = wi::set_bit_in_zero (bits - 1, bits); > + if (TREE_CODE (inttype) == VECTOR_TYPE) > + { > + tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask); > + return build_vector_from_val (inttype, elt); > + } > + return wide_int_to_tree (inttype, mask); > +} > > /* Build a BINFO with LEN language slots. */ > > @@ -10961,6 +10990,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type) > return NULL_TREE; > if (inner == inner2) > return type; > + machine_mode new_mode; > + if (VECTOR_MODE_P (TYPE_MODE (type)) > + && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode)) > + return build_vector_type_for_mode (inner2, new_mode); > return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type)); > } > > diff --git a/gcc/tree.h b/gcc/tree.h > index e730a2a3e56..c656cd5b7bf 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -4675,6 +4675,7 @@ extern tree build_one_cst (tree); > extern tree build_minus_one_cst (tree); > extern tree build_all_ones_cst (tree); > extern tree build_zero_cst (tree); > +extern tree sign_mask_for (tree); > extern tree build_string (unsigned, const char * = NULL); > extern tree build_poly_int_cst (tree, const poly_wide_int_ref &); > extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO); >
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc index d4cb439a851..e147ab9db7a 100644 --- a/gcc/genmatch.cc +++ b/gcc/genmatch.cc @@ -489,6 +489,21 @@ commutative_op (id_base *id) case CFN_FNMS: return 0; + case CFN_COND_ADD: + case CFN_COND_MUL: + case CFN_COND_MIN: + case CFN_COND_MAX: + case CFN_COND_FMIN: + case CFN_COND_FMAX: + case CFN_COND_AND: + case CFN_COND_IOR: + case CFN_COND_XOR: + case CFN_COND_FMA: + case CFN_COND_FMS: + case CFN_COND_FNMA: + case CFN_COND_FNMS: + return 1; + default: return -1; } diff --git a/gcc/match.pd b/gcc/match.pd index 56ac743aa6d..f605b798c44 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -339,6 +339,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@0))) (COPYSIGN_ALL (negate @0) @1))) +/* Transform c ? x * copysign (1, y) : z to c ? x ^ signs(y) : z. + tree-ssa-math-opts.cc does the corresponding optimization for + unconditional multiplications (via xorsign). */ +(simplify + (IFN_COND_MUL:c @0 @1 (IFN_COPYSIGN real_onep @2) @3) + (with { tree signs = sign_mask_for (type); } + (if (signs) + (with { tree inttype = TREE_TYPE (signs); } + (view_convert:type + (IFN_COND_XOR:inttype @0 + (view_convert:inttype @1) + (bit_and (view_convert:inttype @2) { signs; }) + (view_convert:inttype @3))))))) + /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ (simplify (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c new file mode 100644 index 00000000000..338ca605923 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_1.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define xorsign(A, B, SUFFIX) ((A) * __builtin_copysign##SUFFIX (1.0, B)) + +#define DEF_LOOP(TYPE, SUFFIX) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE (TYPE *__restrict r, TYPE *__restrict a, \ + TYPE *__restrict b, TYPE *__restrict c, \ + int n) \ + { \ + for (int i = 0; i < n; ++i) \ + r[i] = a[i] < 20 ? xorsign(b[i], c[i], SUFFIX) : b[i]; \ + } + +#define TEST_ALL(T) \ + T (_Float16, f16) \ + T (float, f) \ + T (double, ) + +TEST_ALL (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c new file mode 100644 index 00000000000..274dd0ede59 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_xorsign_2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msve-vector-bits=128 --param aarch64-autovec-preference=2" } */ + +#include "cond_xorsign_1.c" + +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.h, z[0-9]+\.h,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.s, z[0-9]+\.s,} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d,} 1 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.h, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-not {\tfmul} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[^,]*z} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tsel\t} } } */ diff --git a/gcc/tree.cc b/gcc/tree.cc index 7473912a065..94d20eaba17 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -2669,6 +2669,35 @@ build_zero_cst (tree type) } } +/* If floating-point type TYPE has an IEEE-style sign bit, return an + unsigned constant in which only the sign bit is set. Return null + otherwise. */ + +tree +sign_mask_for (tree type) +{ + /* Avoid having to choose between a real-only sign and a pair of signs. + This could be relaxed if the choice becomes obvious later. */ + if (TREE_CODE (type) == COMPLEX_TYPE) + return NULL_TREE; + + auto eltmode = as_a<scalar_float_mode> (element_mode (type)); + auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits; + if (!bits || !pow2p_hwi (bits)) + return NULL_TREE; + + tree inttype = unsigned_type_for (type); + if (!inttype) + return NULL_TREE; + + auto mask = wi::set_bit_in_zero (bits - 1, bits); + if (TREE_CODE (inttype) == VECTOR_TYPE) + { + tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask); + return build_vector_from_val (inttype, elt); + } + return wide_int_to_tree (inttype, mask); +} /* Build a BINFO with LEN language slots. */ @@ -10961,6 +10990,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type) return NULL_TREE; if (inner == inner2) return type; + machine_mode new_mode; + if (VECTOR_MODE_P (TYPE_MODE (type)) + && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode)) + return build_vector_type_for_mode (inner2, new_mode); return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type)); } diff --git a/gcc/tree.h b/gcc/tree.h index e730a2a3e56..c656cd5b7bf 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4675,6 +4675,7 @@ extern tree build_one_cst (tree); extern tree build_minus_one_cst (tree); extern tree build_all_ones_cst (tree); extern tree build_zero_cst (tree); +extern tree sign_mask_for (tree); extern tree build_string (unsigned, const char * = NULL); extern tree build_poly_int_cst (tree, const poly_wide_int_ref &); extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO);