Message ID | 20240628011455.3980128-1-hongtao.liu@intel.com |
---|---|
State | New |
Headers | show |
Series | Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT | expand |
On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: > > for the testcase in the PR115406, here is part of the dump. > > char D.4882; > vector(1) <signed-boolean:8> _1; > vector(1) signed char _2; > char _5; > > <bb 2> : > _1 = { -1 }; > > When assign { -1 } to vector(1} {signed-boolean:8}, > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest > with each vector elemnet. But i think the bit setting should only apply for > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). > <signed-boolean:16>, it will be assigned as -1, instead of 1. > Is there any specific reason vector(1) <signed-boolean:8> is handled > differently from vector<1> <signed-boolean:16>? > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision code should work for 8 bit entities as well, it seems we only set the LSB of each element in the "mask". ISTR that SVE masks can have up to 8 bit elements (for 8 byte data elements), so maybe that's why <= BITS_PER_UNIT. So maybe instead of just setting one bit in ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); we should set elt_bits bits, aka (without testing) ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit % BITS_PER_UNIT); ? > gcc/ChangeLog: > > PR middle-end/115406 > * fold-const.cc (native_encode_vector_part): Don't set each > bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/pr115406.c: New test. > --- > gcc/fold-const.cc | 2 +- > gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++ > 2 files changed, 24 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 710d697c021..0f045f851d1 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > { > tree itype = TREE_TYPE (TREE_TYPE (expr)); > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > - && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > + && TYPE_PRECISION (itype) < BITS_PER_UNIT) > { > /* This is the only case in which elements can be smaller than a byte. > Element 0 is always in the lsb of the containing byte. */ > diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c > new file mode 100644 > index 00000000000..623dff06fc3 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr115406.c > @@ -0,0 +1,23 @@ > +/* { dg-do run } */ > +/* { dg-options "-O0 -mavx512f" } */ > +/* { dg-require-effective-target avx512f } */ > + > +typedef __attribute__((__vector_size__ (1))) char V; > + > +char > +foo (V v) > +{ > + return ((V) v == v)[0]; > +} > + > +int > +main () > +{ > + if (!__builtin_cpu_supports ("avx512f")) > + return 0; > + > + char x = foo ((V) { }); > + if (x != -1) > + __builtin_abort (); > + return 0; > +} > -- > 2.31.1 >
On Fri, Jun 28, 2024 at 8:01 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: > > > > for the testcase in the PR115406, here is part of the dump. > > > > char D.4882; > > vector(1) <signed-boolean:8> _1; > > vector(1) signed char _2; > > char _5; > > > > <bb 2> : > > _1 = { -1 }; > > > > When assign { -1 } to vector(1} {signed-boolean:8}, > > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest > > with each vector elemnet. But i think the bit setting should only apply for > > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). > > <signed-boolean:16>, it will be assigned as -1, instead of 1. > > Is there any specific reason vector(1) <signed-boolean:8> is handled > > differently from vector<1> <signed-boolean:16>? > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? > > I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision > code should work for 8 bit > entities as well, it seems we only set the LSB of each element in the > "mask". ISTR that SVE > masks can have up to 8 bit elements (for 8 byte data elements), so > maybe that's why > <= BITS_PER_UNIT. So maybe instead of just setting one bit in > > ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > > we should set elt_bits bits, aka (without testing) > > ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit > % BITS_PER_UNIT); > > ? Alternatively if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) && TYPE_PRECISION (itype) <= BITS_PER_UNIT) should be amended with && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT maybe. Still for the possibility of vector(n) <signed-boolean:16> mask for a int128 element vector we'd have 16bit mask elements, encoding that differently would be inconsistent as well (but of course 16bit elements are not handled by the code right now). Richard. > > gcc/ChangeLog: > > > > PR middle-end/115406 > > * fold-const.cc (native_encode_vector_part): Don't set each > > bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr115406.c: New test. > > --- > > gcc/fold-const.cc | 2 +- > > gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++ > > 2 files changed, 24 insertions(+), 1 deletion(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 710d697c021..0f045f851d1 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > > { > > tree itype = TREE_TYPE (TREE_TYPE (expr)); > > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > > - && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > > + && TYPE_PRECISION (itype) < BITS_PER_UNIT) > > { > > /* This is the only case in which elements can be smaller than a byte. > > Element 0 is always in the lsb of the containing byte. */ > > diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c > > new file mode 100644 > > index 00000000000..623dff06fc3 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr115406.c > > @@ -0,0 +1,23 @@ > > +/* { dg-do run } */ > > +/* { dg-options "-O0 -mavx512f" } */ > > +/* { dg-require-effective-target avx512f } */ > > + > > +typedef __attribute__((__vector_size__ (1))) char V; > > + > > +char > > +foo (V v) > > +{ > > + return ((V) v == v)[0]; > > +} > > + > > +int > > +main () > > +{ > > + if (!__builtin_cpu_supports ("avx512f")) > > + return 0; > > + > > + char x = foo ((V) { }); > > + if (x != -1) > > + __builtin_abort (); > > + return 0; > > +} > > -- > > 2.31.1 > >
Richard Biener <richard.guenther@gmail.com> writes: > On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > <richard.guenther@gmail.com> wrote: >> >> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: >> > >> > for the testcase in the PR115406, here is part of the dump. >> > >> > char D.4882; >> > vector(1) <signed-boolean:8> _1; >> > vector(1) signed char _2; >> > char _5; >> > >> > <bb 2> : >> > _1 = { -1 }; >> > >> > When assign { -1 } to vector(1} {signed-boolean:8}, >> > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest >> > with each vector elemnet. But i think the bit setting should only apply for >> > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). >> > <signed-boolean:16>, it will be assigned as -1, instead of 1. >> > Is there any specific reason vector(1) <signed-boolean:8> is handled >> > differently from vector<1> <signed-boolean:16>? >> > >> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> > Ok for trunk? >> >> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision >> code should work for 8 bit >> entities as well, it seems we only set the LSB of each element in the >> "mask". ISTR that SVE >> masks can have up to 8 bit elements (for 8 byte data elements), so >> maybe that's why >> <= BITS_PER_UNIT. Yeah. >> So maybe instead of just setting one bit in >> >> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); >> >> we should set elt_bits bits, aka (without testing) >> >> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit >> % BITS_PER_UNIT); >> >> ? > > Alternatively > > if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > > should be amended with > > && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT How about: if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) { gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); ? Is it OK for TYPE_MODE to affect tree-level semantics though, especially since it can change with the target attribute? (At least TYPE_MODE_RAW would be stable.) > maybe. Still for the possibility of vector(n) <signed-boolean:16> > mask for a int128 element vector > we'd have 16bit mask elements, encoding that differently would be > inconsistent as well > (but of course 16bit elements are not handled by the code right now). Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not had to add support for them. Richard
> Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: > > Richard Biener <richard.guenther@gmail.com> writes: >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener >>> <richard.guenther@gmail.com> wrote: >>> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: >>>> >>>> for the testcase in the PR115406, here is part of the dump. >>>> >>>> char D.4882; >>>> vector(1) <signed-boolean:8> _1; >>>> vector(1) signed char _2; >>>> char _5; >>>> >>>> <bb 2> : >>>> _1 = { -1 }; >>>> >>>> When assign { -1 } to vector(1} {signed-boolean:8}, >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest >>>> with each vector elemnet. But i think the bit setting should only apply for >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled >>>> differently from vector<1> <signed-boolean:16>? >>>> >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >>>> Ok for trunk? >>> >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision >>> code should work for 8 bit >>> entities as well, it seems we only set the LSB of each element in the >>> "mask". ISTR that SVE >>> masks can have up to 8 bit elements (for 8 byte data elements), so >>> maybe that's why >>> <= BITS_PER_UNIT. > > Yeah. So is it necessary that only one bit is set for SVE? >>> So maybe instead of just setting one bit in >>> >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); >>> >>> we should set elt_bits bits, aka (without testing) >>> >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit >>> % BITS_PER_UNIT); >>> >>> ? >> >> Alternatively >> >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) >> >> should be amended with >> >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT > > How about: > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) > { > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); > > ? Note the path is also necessary for avx512 and gcn mask modes which are integer modes. > Is it OK for TYPE_MODE to affect tree-level semantics though, especially > since it can change with the target attribute? (At least TYPE_MODE_RAW > would be stable.) That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? I guess some test coverage would be nice here. >> maybe. Still for the possibility of vector(n) <signed-boolean:16> >> mask for a int128 element vector >> we'd have 16bit mask elements, encoding that differently would be >> inconsistent as well >> (but of course 16bit elements are not handled by the code right now). > > Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not > had to add support for them. > > Richard
On Fri, Jun 28, 2024 at 11:06 AM Richard Biener <richard.guenther@gmail.com> wrote: > > > > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: > > > > Richard Biener <richard.guenther@gmail.com> writes: > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > >>> <richard.guenther@gmail.com> wrote: > >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: > >>>> > >>>> for the testcase in the PR115406, here is part of the dump. > >>>> > >>>> char D.4882; > >>>> vector(1) <signed-boolean:8> _1; > >>>> vector(1) signed char _2; > >>>> char _5; > >>>> > >>>> <bb 2> : > >>>> _1 = { -1 }; > >>>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest > >>>> with each vector elemnet. But i think the bit setting should only apply for > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled > >>>> differently from vector<1> <signed-boolean:16>? > >>>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > >>>> Ok for trunk? > >>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision > >>> code should work for 8 bit > >>> entities as well, it seems we only set the LSB of each element in the > >>> "mask". ISTR that SVE > >>> masks can have up to 8 bit elements (for 8 byte data elements), so > >>> maybe that's why > >>> <= BITS_PER_UNIT. > > > > Yeah. > > So is it necessary that only one bit is set for SVE? > > >>> So maybe instead of just setting one bit in > >>> > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > >>> > >>> we should set elt_bits bits, aka (without testing) > >>> > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit > >>> % BITS_PER_UNIT); > >>> > >>> ? > >> > >> Alternatively > >> > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > >> > >> should be amended with > >> > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT > > > > How about: > > > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) > > { > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); > > > > ? > > Note the path is also necessary for avx512 and gcn mask modes which are integer modes. > > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially > > since it can change with the target attribute? (At least TYPE_MODE_RAW > > would be stable.) > > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? > > I guess some test coverage would be nice here. To continue on that, we do not currently have a way to capture a vector comparison output but the C++ frontend has vector ?: typedef int v8si __attribute__((vector_size(32))); void foo (v8si *a, v8si *b, v8si *c) { *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; } with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled that becomes <singed-boolean:1>. When we enlarge the vector to size 128 then even with AVX512 enabled I see <signed-boolean:32> here and vector lowering decomposes that to scalar (also with AVX or SSE, so maybe just a missed optimization). But note that to decompose this into two AVX512 vectors the temporary would have to change from <signed-boolean:32> elements to <signed-boolean:1>. The not supported vector bool types have BLKmode sofar. But for example on i?86-linux with -mno-sse (like -march=i586) for typedef short v2hi __attribute__((vector_size(4))); void foo (v2hi *a, v2hi *b, v2hi *c) { *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; } we get a SImode vector <signed-boolean:16> as I feared. That means <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous between SVE (bool for a 8byte data vector) and emulated vectors ("word-mode" vectors; for 1byte data vectors). And without knowing that SVE would have used VnBImode given that AVX512 uses an integer mode. Aside from the too large vector and AVX512 issue above I think we can use MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we can assert the mode is a scalar integer mode (AVX512 or GCN)? Richard. > >> maybe. Still for the possibility of vector(n) <signed-boolean:16> > >> mask for a int128 element vector > >> we'd have 16bit mask elements, encoding that differently would be > >> inconsistent as well > >> (but of course 16bit elements are not handled by the code right now). > > > > Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not > > had to add support for them. > > > > Richard
On Fri, Jun 28, 2024 at 2:16 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jun 28, 2024 at 11:06 AM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > > > > > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: > > > > > > Richard Biener <richard.guenther@gmail.com> writes: > > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > > >>> <richard.guenther@gmail.com> wrote: > > >>> > > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: > > >>>> > > >>>> for the testcase in the PR115406, here is part of the dump. > > >>>> > > >>>> char D.4882; > > >>>> vector(1) <signed-boolean:8> _1; > > >>>> vector(1) signed char _2; > > >>>> char _5; > > >>>> > > >>>> <bb 2> : > > >>>> _1 = { -1 }; > > >>>> > > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, > > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest > > >>>> with each vector elemnet. But i think the bit setting should only apply for > > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). > > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. > > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled > > >>>> differently from vector<1> <signed-boolean:16>? > > >>>> > > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > >>>> Ok for trunk? > > >>> > > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision > > >>> code should work for 8 bit > > >>> entities as well, it seems we only set the LSB of each element in the > > >>> "mask". ISTR that SVE > > >>> masks can have up to 8 bit elements (for 8 byte data elements), so > > >>> maybe that's why > > >>> <= BITS_PER_UNIT. > > > > > > Yeah. > > > > So is it necessary that only one bit is set for SVE? > > > > >>> So maybe instead of just setting one bit in > > >>> > > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > > >>> > > >>> we should set elt_bits bits, aka (without testing) > > >>> > > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit > > >>> % BITS_PER_UNIT); > > >>> > > >>> ? > > >> > > >> Alternatively > > >> > > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > > >> > > >> should be amended with > > >> > > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT > > > > > > How about: > > > > > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) > > > { > > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); > > > > > > ? > > > > Note the path is also necessary for avx512 and gcn mask modes which are integer modes. > > > > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially > > > since it can change with the target attribute? (At least TYPE_MODE_RAW > > > would be stable.) > > > > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? > > > > I guess some test coverage would be nice here. > > To continue on that, we do not currently have a way to capture a > vector comparison output > but the C++ frontend has vector ?: > > typedef int v8si __attribute__((vector_size(32))); > > void foo (v8si *a, v8si *b, v8si *c) > { > *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; > } > > with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled > that becomes <singed-boolean:1>. When we enlarge the vector to size 128 > then even with AVX512 enabled I see <signed-boolean:32> here and > vector lowering decomposes that to scalar (also with AVX or SSE, so maybe > just a missed optimization). But note that to decompose this into two > AVX512 vectors the temporary would have to change from <signed-boolean:32> > elements to <signed-boolean:1>. > > The not supported vector bool types have BLKmode sofar. > > But for example on i?86-linux with -mno-sse (like -march=i586) for > > typedef short v2hi __attribute__((vector_size(4))); > > void foo (v2hi *a, v2hi *b, v2hi *c) > { > *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; > } > > we get a SImode vector <signed-boolean:16> as I feared. That means > <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous > between SVE (bool for a 8byte data vector) and emulated vectors > ("word-mode" vectors; for 1byte data vectors). > > And without knowing that SVE would have used VnBImode given that > AVX512 uses an integer mode. > > Aside from the too large vector and AVX512 issue above I think we can use > MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we > can assert the mode is a scalar integer mode (AVX512 or GCN)? So like the attached? Richard.
Richard Biener <richard.guenther@gmail.com> writes: > On Fri, Jun 28, 2024 at 2:16 PM Richard Biener > <richard.guenther@gmail.com> wrote: >> >> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener >> <richard.guenther@gmail.com> wrote: >> > >> > >> > >> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: >> > > >> > > Richard Biener <richard.guenther@gmail.com> writes: >> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener >> > >>> <richard.guenther@gmail.com> wrote: >> > >>> >> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: >> > >>>> >> > >>>> for the testcase in the PR115406, here is part of the dump. >> > >>>> >> > >>>> char D.4882; >> > >>>> vector(1) <signed-boolean:8> _1; >> > >>>> vector(1) signed char _2; >> > >>>> char _5; >> > >>>> >> > >>>> <bb 2> : >> > >>>> _1 = { -1 }; >> > >>>> >> > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, >> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest >> > >>>> with each vector elemnet. But i think the bit setting should only apply for >> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). >> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. >> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled >> > >>>> differently from vector<1> <signed-boolean:16>? >> > >>>> >> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >> > >>>> Ok for trunk? >> > >>> >> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision >> > >>> code should work for 8 bit >> > >>> entities as well, it seems we only set the LSB of each element in the >> > >>> "mask". ISTR that SVE >> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so >> > >>> maybe that's why >> > >>> <= BITS_PER_UNIT. >> > > >> > > Yeah. >> > >> > So is it necessary that only one bit is set for SVE? TBH I can't remember now. It matches what SVE instructions produce, and lines up with the associated RTL code (which at the time was SVE-specific). But when dealing with multibyte elements, upper predicate element bits are ignored on read, so matching the instructions might not matter. >> > >>> So maybe instead of just setting one bit in >> > >>> >> > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); >> > >>> >> > >>> we should set elt_bits bits, aka (without testing) >> > >>> >> > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit >> > >>> % BITS_PER_UNIT); >> > >>> >> > >>> ? >> > >> >> > >> Alternatively >> > >> >> > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) >> > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) >> > >> >> > >> should be amended with >> > >> >> > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT >> > > >> > > How about: >> > > >> > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) >> > > { >> > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); >> > > >> > > ? >> > >> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes. >> > >> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially >> > > since it can change with the target attribute? (At least TYPE_MODE_RAW >> > > would be stable.) >> > >> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? >> > >> > I guess some test coverage would be nice here. >> >> To continue on that, we do not currently have a way to capture a >> vector comparison output >> but the C++ frontend has vector ?: >> >> typedef int v8si __attribute__((vector_size(32))); >> >> void foo (v8si *a, v8si *b, v8si *c) >> { >> *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; >> } >> >> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled >> that becomes <singed-boolean:1>. When we enlarge the vector to size 128 >> then even with AVX512 enabled I see <signed-boolean:32> here and >> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe >> just a missed optimization). But note that to decompose this into two >> AVX512 vectors the temporary would have to change from <signed-boolean:32> >> elements to <signed-boolean:1>. >> >> The not supported vector bool types have BLKmode sofar. >> >> But for example on i?86-linux with -mno-sse (like -march=i586) for >> >> typedef short v2hi __attribute__((vector_size(4))); >> >> void foo (v2hi *a, v2hi *b, v2hi *c) >> { >> *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; >> } >> >> we get a SImode vector <signed-boolean:16> as I feared. That means >> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous >> between SVE (bool for a 8byte data vector) and emulated vectors >> ("word-mode" vectors; for 1byte data vectors). >> >> And without knowing that SVE would have used VnBImode given that >> AVX512 uses an integer mode. >> >> Aside from the too large vector and AVX512 issue above I think we can use >> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we >> can assert the mode is a scalar integer mode (AVX512 or GCN)? > > So like the attached? Can you give me a few days to see whether swapping to -1 : 0 masks works for SVE? I guess it would make things easier in the long run. Thanks, Richard
Richard Sandiford <richard.sandiford@arm.com> writes: > Richard Biener <richard.guenther@gmail.com> writes: >> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener >> <richard.guenther@gmail.com> wrote: >>> >>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener >>> <richard.guenther@gmail.com> wrote: >>> > >>> > >>> > >>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: >>> > > >>> > > Richard Biener <richard.guenther@gmail.com> writes: >>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener >>> > >>> <richard.guenther@gmail.com> wrote: >>> > >>> >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: >>> > >>>> >>> > >>>> for the testcase in the PR115406, here is part of the dump. >>> > >>>> >>> > >>>> char D.4882; >>> > >>>> vector(1) <signed-boolean:8> _1; >>> > >>>> vector(1) signed char _2; >>> > >>>> char _5; >>> > >>>> >>> > >>>> <bb 2> : >>> > >>>> _1 = { -1 }; >>> > >>>> >>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, >>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest >>> > >>>> with each vector elemnet. But i think the bit setting should only apply for >>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). >>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. >>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled >>> > >>>> differently from vector<1> <signed-boolean:16>? >>> > >>>> >>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. >>> > >>>> Ok for trunk? >>> > >>> >>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision >>> > >>> code should work for 8 bit >>> > >>> entities as well, it seems we only set the LSB of each element in the >>> > >>> "mask". ISTR that SVE >>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so >>> > >>> maybe that's why >>> > >>> <= BITS_PER_UNIT. >>> > > >>> > > Yeah. >>> > >>> > So is it necessary that only one bit is set for SVE? > > TBH I can't remember now. It matches what SVE instructions produce, and > lines up with the associated RTL code (which at the time was SVE-specific). > But when dealing with multibyte elements, upper predicate element bits > are ignored on read, so matching the instructions might not matter. > >>> > >>> So maybe instead of just setting one bit in >>> > >>> >>> > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); >>> > >>> >>> > >>> we should set elt_bits bits, aka (without testing) >>> > >>> >>> > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit >>> > >>> % BITS_PER_UNIT); >>> > >>> >>> > >>> ? >>> > >> >>> > >> Alternatively >>> > >> >>> > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) >>> > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) >>> > >> >>> > >> should be amended with >>> > >> >>> > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT >>> > > >>> > > How about: >>> > > >>> > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) >>> > > { >>> > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); >>> > > >>> > > ? >>> > >>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes. >>> > >>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially >>> > > since it can change with the target attribute? (At least TYPE_MODE_RAW >>> > > would be stable.) >>> > >>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? >>> > >>> > I guess some test coverage would be nice here. >>> >>> To continue on that, we do not currently have a way to capture a >>> vector comparison output >>> but the C++ frontend has vector ?: >>> >>> typedef int v8si __attribute__((vector_size(32))); >>> >>> void foo (v8si *a, v8si *b, v8si *c) >>> { >>> *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; >>> } >>> >>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled >>> that becomes <singed-boolean:1>. When we enlarge the vector to size 128 >>> then even with AVX512 enabled I see <signed-boolean:32> here and >>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe >>> just a missed optimization). But note that to decompose this into two >>> AVX512 vectors the temporary would have to change from <signed-boolean:32> >>> elements to <signed-boolean:1>. >>> >>> The not supported vector bool types have BLKmode sofar. >>> >>> But for example on i?86-linux with -mno-sse (like -march=i586) for >>> >>> typedef short v2hi __attribute__((vector_size(4))); >>> >>> void foo (v2hi *a, v2hi *b, v2hi *c) >>> { >>> *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; >>> } >>> >>> we get a SImode vector <signed-boolean:16> as I feared. That means >>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous >>> between SVE (bool for a 8byte data vector) and emulated vectors >>> ("word-mode" vectors; for 1byte data vectors). >>> >>> And without knowing that SVE would have used VnBImode given that >>> AVX512 uses an integer mode. >>> >>> Aside from the too large vector and AVX512 issue above I think we can use >>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we >>> can assert the mode is a scalar integer mode (AVX512 or GCN)? >> >> So like the attached? > > Can you give me a few days to see whether swapping to -1 : 0 masks > works for SVE? I guess it would make things easier in the long run. It took longer than expected, but FWIW, the SVE results look good with the attached. I suppose it does raise the question whether: if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) is the right way to test a boolean vector element at the tree level. For SVE it is, but I suppose for other targets 0b10 should be treated as true? The same question doesn't really arise on the RTL side because the SVE predicate elements are single bits. So perhaps one way would be to base the test above on GET_MODE_UNIT_PRECISION for boolean vector modes, and test for nonzero otherwise. Doesn't feel very clean though... Alternatively, we could say that the semantics of boolean vectors at the tree level are independent of the target, and try to fix things up at the gimple->rtl boundary. Thanks, Richard gcc/ * fold-const.cc (native_encode_vector_part): Set all bits of a boolean element. * simplify-rtx.cc (native_encode_rtx): Likewise. --- gcc/fold-const.cc | 2 +- gcc/simplify-rtx.cc | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 710d697c021..8b7aaca2500 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8106,7 +8106,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) { unsigned int bit = i * elt_bits; - ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); + ptr[bit / BITS_PER_UNIT] |= ((1 << elt_bits) - 1) << (bit % BITS_PER_UNIT); } } return extract_bytes; diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 35ba54c6292..bfc33bd2f19 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -7232,7 +7232,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes, target_unit value = 0; for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits) { - value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j; + value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) ? mask : 0) << j; elt += 1; } bytes.quick_push (value);
On Fri, Jul 5, 2024 at 11:09 AM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Richard Sandiford <richard.sandiford@arm.com> writes: > > Richard Biener <richard.guenther@gmail.com> writes: > >> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener > >> <richard.guenther@gmail.com> wrote: > >>> > >>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener > >>> <richard.guenther@gmail.com> wrote: > >>> > > >>> > > >>> > > >>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>: > >>> > > > >>> > > Richard Biener <richard.guenther@gmail.com> writes: > >>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener > >>> > >>> <richard.guenther@gmail.com> wrote: > >>> > >>> > >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote: > >>> > >>>> > >>> > >>>> for the testcase in the PR115406, here is part of the dump. > >>> > >>>> > >>> > >>>> char D.4882; > >>> > >>>> vector(1) <signed-boolean:8> _1; > >>> > >>>> vector(1) signed char _2; > >>> > >>>> char _5; > >>> > >>>> > >>> > >>>> <bb 2> : > >>> > >>>> _1 = { -1 }; > >>> > >>>> > >>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8}, > >>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest > >>> > >>>> with each vector elemnet. But i think the bit setting should only apply for > >>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1). > >>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1. > >>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled > >>> > >>>> differently from vector<1> <signed-boolean:16>? > >>> > >>>> > >>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > >>> > >>>> Ok for trunk? > >>> > >>> > >>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision > >>> > >>> code should work for 8 bit > >>> > >>> entities as well, it seems we only set the LSB of each element in the > >>> > >>> "mask". ISTR that SVE > >>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so > >>> > >>> maybe that's why > >>> > >>> <= BITS_PER_UNIT. > >>> > > > >>> > > Yeah. > >>> > > >>> > So is it necessary that only one bit is set for SVE? > > > > TBH I can't remember now. It matches what SVE instructions produce, and > > lines up with the associated RTL code (which at the time was SVE-specific). > > But when dealing with multibyte elements, upper predicate element bits > > are ignored on read, so matching the instructions might not matter. > > > >>> > >>> So maybe instead of just setting one bit in > >>> > >>> > >>> > >>> ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > >>> > >>> > >>> > >>> we should set elt_bits bits, aka (without testing) > >>> > >>> > >>> > >>> ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit > >>> > >>> % BITS_PER_UNIT); > >>> > >>> > >>> > >>> ? > >>> > >> > >>> > >> Alternatively > >>> > >> > >>> > >> if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) > >>> > >> && TYPE_PRECISION (itype) <= BITS_PER_UNIT) > >>> > >> > >>> > >> should be amended with > >>> > >> > >>> > >> && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT > >>> > > > >>> > > How about: > >>> > > > >>> > > if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL) > >>> > > { > >>> > > gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT); > >>> > > > >>> > > ? > >>> > > >>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes. > >>> > > >>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially > >>> > > since it can change with the target attribute? (At least TYPE_MODE_RAW > >>> > > would be stable.) > >>> > > >>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used. But I’m not sure how we expose masks to the middle end here. A too large vector bool could be lowered to AVX512 mode. Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)? > >>> > > >>> > I guess some test coverage would be nice here. > >>> > >>> To continue on that, we do not currently have a way to capture a > >>> vector comparison output > >>> but the C++ frontend has vector ?: > >>> > >>> typedef int v8si __attribute__((vector_size(32))); > >>> > >>> void foo (v8si *a, v8si *b, v8si *c) > >>> { > >>> *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0}; > >>> } > >>> > >>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled > >>> that becomes <singed-boolean:1>. When we enlarge the vector to size 128 > >>> then even with AVX512 enabled I see <signed-boolean:32> here and > >>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe > >>> just a missed optimization). But note that to decompose this into two > >>> AVX512 vectors the temporary would have to change from <signed-boolean:32> > >>> elements to <signed-boolean:1>. > >>> > >>> The not supported vector bool types have BLKmode sofar. > >>> > >>> But for example on i?86-linux with -mno-sse (like -march=i586) for > >>> > >>> typedef short v2hi __attribute__((vector_size(4))); > >>> > >>> void foo (v2hi *a, v2hi *b, v2hi *c) > >>> { > >>> *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0}; > >>> } > >>> > >>> we get a SImode vector <signed-boolean:16> as I feared. That means > >>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous > >>> between SVE (bool for a 8byte data vector) and emulated vectors > >>> ("word-mode" vectors; for 1byte data vectors). > >>> > >>> And without knowing that SVE would have used VnBImode given that > >>> AVX512 uses an integer mode. > >>> > >>> Aside from the too large vector and AVX512 issue above I think we can use > >>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we > >>> can assert the mode is a scalar integer mode (AVX512 or GCN)? > >> > >> So like the attached? > > > > Can you give me a few days to see whether swapping to -1 : 0 masks > > works for SVE? I guess it would make things easier in the long run. > > It took longer than expected, but FWIW, the SVE results look good > with the attached. > > I suppose it does raise the question whether: > > if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) > > is the right way to test a boolean vector element at the tree level. > For SVE it is, but I suppose for other targets 0b10 should be treated > as true? > > The same question doesn't really arise on the RTL side because > the SVE predicate elements are single bits. So perhaps one way > would be to base the test above on GET_MODE_UNIT_PRECISION for > boolean vector modes, and test for nonzero otherwise. Doesn't feel > very clean though... native_interpret_vector_part also tests the LSB. > Alternatively, we could say that the semantics of boolean vectors > at the tree level are independent of the target, and try to fix > things up at the gimple->rtl boundary. I think since we do not expose boolean vectors to users via the vector extension we're free to choose. Intrinsics may get them ways to spill/load predicates to/from memory and thus write target appropriate literal constants. And do sth like (fix syntax for me): svebool b = 0x3452; int main() { auto a = (v4si) { 1, 2, 3, 4 }; auto b = (V4si) { 5, 6, 7, 8 }; svebool c = sve_scmp_lt (a, b); if (c != b) abort (); } and when we'd expose sve_scmp_lt to GIMPLE (or RTL) for constant folding we have to make sure that constant folding is in line with the svebool representation the user expects. Hardware-wise I think setting all bits when constant folding is on the safe side. But if this might expose inconsistencies like above to the user (I'm not sure this is actually observable), then that would be a bug. But this is likely a problem right now already. On x86 and GCN where there's only one bit per lane this is not ambiguous but I think both RISC-V and SVE have "padding" bits and we do not expose their detail (LSB or MSB significant) to the middle-end? But I'd say we care once somebody comes up with a test like above. So I think your patch is OK - it should also solve the original reported issue. Note this is a latent issue that we should fix on branches as well, otherwise VnQImode mask vectors are wrongly constant folded. You are best to assess whether your or my approach is better on branches - it doesn't matter for x86 or gcn. Thanks, Richard. > Thanks, > Richard > > > gcc/ > * fold-const.cc (native_encode_vector_part): Set all bits of > a boolean element. > * simplify-rtx.cc (native_encode_rtx): Likewise. > --- > gcc/fold-const.cc | 2 +- > gcc/simplify-rtx.cc | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > index 710d697c021..8b7aaca2500 100644 > --- a/gcc/fold-const.cc > +++ b/gcc/fold-const.cc > @@ -8106,7 +8106,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, > if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1)) > { > unsigned int bit = i * elt_bits; > - ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT); > + ptr[bit / BITS_PER_UNIT] |= ((1 << elt_bits) - 1) << (bit % BITS_PER_UNIT); > } > } > return extract_bytes; > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index 35ba54c6292..bfc33bd2f19 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -7232,7 +7232,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes, > target_unit value = 0; > for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits) > { > - value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j; > + value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) ? mask : 0) << j; > elt += 1; > } > bytes.quick_push (value); > -- > 2.25.1 >
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index 710d697c021..0f045f851d1 100644 --- a/gcc/fold-const.cc +++ b/gcc/fold-const.cc @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len, { tree itype = TREE_TYPE (TREE_TYPE (expr)); if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr)) - && TYPE_PRECISION (itype) <= BITS_PER_UNIT) + && TYPE_PRECISION (itype) < BITS_PER_UNIT) { /* This is the only case in which elements can be smaller than a byte. Element 0 is always in the lsb of the containing byte. */ diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c new file mode 100644 index 00000000000..623dff06fc3 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr115406.c @@ -0,0 +1,23 @@ +/* { dg-do run } */ +/* { dg-options "-O0 -mavx512f" } */ +/* { dg-require-effective-target avx512f } */ + +typedef __attribute__((__vector_size__ (1))) char V; + +char +foo (V v) +{ + return ((V) v == v)[0]; +} + +int +main () +{ + if (!__builtin_cpu_supports ("avx512f")) + return 0; + + char x = foo ((V) { }); + if (x != -1) + __builtin_abort (); + return 0; +}