Message ID | mptmsmd5lc4.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | Treat boolean vector elements as 0/-1 [PR115406] | expand |
> Am 19.07.2024 um 19:44 schrieb Richard Sandiford <richard.sandiford@arm.com>: > > Previously we built vector boolean constants using 1 for true > elements and 0 for false elements. This matches the predicates > produced by SVE's PTRUE instruction, but leads to a miscompilation > on AVX512, where all bits of a boolean element should be set. Note it’s for AVX with the issue that QImode masks are handled differently than larger mode masks. AVX512 has only single-bit element masks where this does not make a difference. > One option for RTL would be to make this target-configurable. > But that isn't really possible at the tree level, where vectors > should work in a more target-independent way. (There is currently > no way to create a "generic" packed boolean vector, but never say > never :)) And, if we were going to pick a generic behaviour, > it would make sense to use 0/-1 rather than 0/1, for consistency > with integer vectors. > > Both behaviours should work with SVE on read, since SVE ignores > the upper bits in each predicate element. And the choice shouldn't > make much difference for RTL, since all SVE predicate modes are > expressed as vectors of BI, rather than of multi-bit booleans. > > I suspect there might be some fallout from this change on SVE. > But I think we should at least give it a go, and see whether any > fallout provides a strong counterargument against the approach. > > Tested on aarch64-linux-gnu (configured with --with-cpu=neoverse-v1 > for extra coverage) and x86_64-linux-gnu. OK to install? OK, Richard > Richard > > > gcc/ > PR middle-end/115406 > * fold-const.cc (native_encode_vector_part): For vector booleans, > check whether an element is nonzero and, if so, set all of the > correspending bits in the target image. > * simplify-rtx.cc (native_encode_rtx): Likewise. > > gcc/testsuite/ > PR middle-end/115406 > * gcc.dg/torture/pr115406.c: New test. > --- > gcc/fold-const.cc | 5 +++-- > gcc/simplify-rtx.cc | 3 ++- > gcc/testsuite/gcc.dg/torture/pr115406.c | 18 ++++++++++++++++++ > 3 files changed, 23 insertions(+), 3 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr115406.c > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 710d697c021..a509b46b904 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -8097,16 +8097,17 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits; > unsigned int first_elt = off * elts_per_byte; > unsigned int extract_elts = extract_bytes * elts_per_byte; > + unsigned int elt_mask = (1 << elt_bits) - 1; > for (unsigned int i = 0; i < extract_elts; ++i) > { > tree elt = VECTOR_CST_ELT (expr, first_elt + i); > if (TREE_CODE (elt) != INTEGER_CST) > return 0; > > - if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) > + if (ptr && integer_nonzerop (elt)) > { > unsigned int bit = i * elt_bits; > - ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > + ptr[bit / BITS_PER_UNIT] |= elt_mask << (bit % BITS_PER_UNIT); > } > } > return extract_bytes; > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 35ba54c6292..a49eefb34d4 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -7232,7 +7232,8 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes, > target_unit value = 0; > for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits) > { > - value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j; > + if (INTVAL (CONST_VECTOR_ELT (x, elt))) > + value |= mask << j; > elt += 1; > } > bytes.quick_push (value); > diff --git a/gcc/testsuite/gcc.dg/torture/pr115406.c b/gcc/testsuite/gcc.dg/torture/pr115406.c > new file mode 100644 > index 00000000000..800ef2f8317 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr115406.c > @@ -0,0 +1,18 @@ > +// { dg-do run } > +// { dg-additional-options "-mavx512f" { target avx512f_runtime } } > + > +typedef __attribute__((__vector_size__ (1))) signed char V; > + > +signed char > +foo (V v) > +{ > + return ((V) v == v)[0]; > +} > + > +int > +main () > +{ > + signed char x = foo ((V) { }); > + if (x != -1) > + __builtin_abort (); > +} > -- > 2.25.1 >
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 710d697c021..a509b46b904 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8097,16 +8097,17 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, unsigned int elts_per_byte = BITS_PER_UNIT / elt_bits; unsigned int first_elt = off * elts_per_byte; unsigned int extract_elts = extract_bytes * elts_per_byte; + unsigned int elt_mask = (1 << elt_bits) - 1; for (unsigned int i = 0; i < extract_elts; ++i) { tree elt = VECTOR_CST_ELT (expr, first_elt + i); if (TREE_CODE (elt) != INTEGER_CST) return 0; - if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) + if (ptr && integer_nonzerop (elt)) { unsigned int bit = i * elt_bits; - ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); + ptr[bit / BITS_PER_UNIT] |= elt_mask << (bit % BITS_PER_UNIT); } } return extract_bytes; diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 35ba54c6292..a49eefb34d4 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7232,7 +7232,8 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes, target_unit value = 0; for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits) { - value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j; + if (INTVAL (CONST_VECTOR_ELT (x, elt))) + value |= mask << j; elt += 1; } bytes.quick_push (value); diff --git a/gcc/testsuite/gcc.dg/torture/pr115406.c b/gcc/testsuite/gcc.dg/torture/pr115406.c new file mode 100644 index 00000000000..800ef2f8317 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr115406.c @@ -0,0 +1,18 @@ +// { dg-do run } +// { dg-additional-options "-mavx512f" { target avx512f_runtime } } + +typedef __attribute__((__vector_size__ (1))) signed char V; + +signed char +foo (V v) +{ + return ((V) v == v)[0]; +} + +int +main () +{ + signed char x = foo ((V) { }); + if (x != -1) + __builtin_abort (); +}