diff mbox series

Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT

Message ID 20240628011455.3980128-1-hongtao.liu@intel.com
State New
Headers show
Series Fix native_encode_vector_part for itype when TYPE_PRECISION (itype) == BITS_PER_UNIT | expand

Commit Message

liuhongt June 28, 2024, 1:14 a.m. UTC
for the testcase in the PR115406, here is part of the dump.

  char D.4882;
  vector(1) <signed-boolean:8> _1;
  vector(1) signed char _2;
  char _5;

  <bb 2> :
  _1 = { -1 };

When assign { -1 } to vector(1} {signed-boolean:8},
Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
with each vector elemnet. But i think the bit setting should only apply for
TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
<signed-boolean:16>, it will be assigned as -1, instead of 1.
Is there any specific reason vector(1) <signed-boolean:8> is handled
differently from vector<1> <signed-boolean:16>?

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

	PR middle-end/115406
	* fold-const.cc (native_encode_vector_part): Don't set each
	bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr115406.c: New test.
---
 gcc/fold-const.cc                        |  2 +-
 gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c

Comments

Richard Biener June 28, 2024, 6:01 a.m. UTC | #1
On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> for the testcase in the PR115406, here is part of the dump.
>
>   char D.4882;
>   vector(1) <signed-boolean:8> _1;
>   vector(1) signed char _2;
>   char _5;
>
>   <bb 2> :
>   _1 = { -1 };
>
> When assign { -1 } to vector(1} {signed-boolean:8},
> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> with each vector elemnet. But i think the bit setting should only apply for
> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> Is there any specific reason vector(1) <signed-boolean:8> is handled
> differently from vector<1> <signed-boolean:16>?
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
code should work for 8 bit
entities as well, it seems we only set the LSB of each element in the
"mask".  ISTR that SVE
masks can have up to 8 bit elements (for 8 byte data elements), so
maybe that's why
<= BITS_PER_UNIT.  So maybe instead of just setting one bit in

              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);

we should set elt_bits bits, aka (without testing)

              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
% BITS_PER_UNIT);

?

> gcc/ChangeLog:
>
>         PR middle-end/115406
>         * fold-const.cc (native_encode_vector_part): Don't set each
>         bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pr115406.c: New test.
> ---
>  gcc/fold-const.cc                        |  2 +-
>  gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 710d697c021..0f045f851d1 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
>  {
>    tree itype = TREE_TYPE (TREE_TYPE (expr));
>    if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> -      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> +      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
>      {
>        /* This is the only case in which elements can be smaller than a byte.
>          Element 0 is always in the lsb of the containing byte.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
> new file mode 100644
> index 00000000000..623dff06fc3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115406.c
> @@ -0,0 +1,23 @@
> +/* { dg-do run } */
> +/* { dg-options "-O0 -mavx512f" } */
> +/* { dg-require-effective-target avx512f } */
> +
> +typedef __attribute__((__vector_size__ (1))) char V;
> +
> +char
> +foo (V v)
> +{
> +  return ((V) v == v)[0];
> +}
> +
> +int
> +main ()
> +{
> +  if (!__builtin_cpu_supports ("avx512f"))
> +    return 0;
> +
> +  char x = foo ((V) { });
> +  if (x != -1)
> +    __builtin_abort ();
> +  return 0;
> +}
> --
> 2.31.1
>
Richard Biener June 28, 2024, 6:08 a.m. UTC | #2
On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > for the testcase in the PR115406, here is part of the dump.
> >
> >   char D.4882;
> >   vector(1) <signed-boolean:8> _1;
> >   vector(1) signed char _2;
> >   char _5;
> >
> >   <bb 2> :
> >   _1 = { -1 };
> >
> > When assign { -1 } to vector(1} {signed-boolean:8},
> > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> > with each vector elemnet. But i think the bit setting should only apply for
> > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> > <signed-boolean:16>, it will be assigned as -1, instead of 1.
> > Is there any specific reason vector(1) <signed-boolean:8> is handled
> > differently from vector<1> <signed-boolean:16>?
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
>
> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> code should work for 8 bit
> entities as well, it seems we only set the LSB of each element in the
> "mask".  ISTR that SVE
> masks can have up to 8 bit elements (for 8 byte data elements), so
> maybe that's why
> <= BITS_PER_UNIT.  So maybe instead of just setting one bit in
>
>               ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>
> we should set elt_bits bits, aka (without testing)
>
>               ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> % BITS_PER_UNIT);
>
> ?

Alternatively

  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)

should be amended with

   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT

maybe.  Still for the possibility of vector(n) <signed-boolean:16>
mask for a int128 element vector
we'd have 16bit mask elements, encoding that differently would be
inconsistent as well
(but of course 16bit elements are not handled by the code right now).

Richard.

> > gcc/ChangeLog:
> >
> >         PR middle-end/115406
> >         * fold-const.cc (native_encode_vector_part): Don't set each
> >         bit to the dest when TYPE_PRECISION (itype) == BITS_PER_UNIT.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pr115406.c: New test.
> > ---
> >  gcc/fold-const.cc                        |  2 +-
> >  gcc/testsuite/gcc.target/i386/pr115406.c | 23 +++++++++++++++++++++++
> >  2 files changed, 24 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr115406.c
> >
> > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > index 710d697c021..0f045f851d1 100644
> > --- a/gcc/fold-const.cc
> > +++ b/gcc/fold-const.cc
> > @@ -8077,7 +8077,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
> >  {
> >    tree itype = TREE_TYPE (TREE_TYPE (expr));
> >    if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> > -      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> > +      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
> >      {
> >        /* This is the only case in which elements can be smaller than a byte.
> >          Element 0 is always in the lsb of the containing byte.  */
> > diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
> > new file mode 100644
> > index 00000000000..623dff06fc3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr115406.c
> > @@ -0,0 +1,23 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O0 -mavx512f" } */
> > +/* { dg-require-effective-target avx512f } */
> > +
> > +typedef __attribute__((__vector_size__ (1))) char V;
> > +
> > +char
> > +foo (V v)
> > +{
> > +  return ((V) v == v)[0];
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  if (!__builtin_cpu_supports ("avx512f"))
> > +    return 0;
> > +
> > +  char x = foo ((V) { });
> > +  if (x != -1)
> > +    __builtin_abort ();
> > +  return 0;
> > +}
> > --
> > 2.31.1
> >
Richard Sandiford June 28, 2024, 8:27 a.m. UTC | #3
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>> >
>> > for the testcase in the PR115406, here is part of the dump.
>> >
>> >   char D.4882;
>> >   vector(1) <signed-boolean:8> _1;
>> >   vector(1) signed char _2;
>> >   char _5;
>> >
>> >   <bb 2> :
>> >   _1 = { -1 };
>> >
>> > When assign { -1 } to vector(1} {signed-boolean:8},
>> > Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>> > with each vector elemnet. But i think the bit setting should only apply for
>> > TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>> > <signed-boolean:16>, it will be assigned as -1, instead of 1.
>> > Is there any specific reason vector(1) <signed-boolean:8> is handled
>> > differently from vector<1> <signed-boolean:16>?
>> >
>> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> > Ok for trunk?
>>
>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>> code should work for 8 bit
>> entities as well, it seems we only set the LSB of each element in the
>> "mask".  ISTR that SVE
>> masks can have up to 8 bit elements (for 8 byte data elements), so
>> maybe that's why
>> <= BITS_PER_UNIT.

Yeah.

>>  So maybe instead of just setting one bit in
>>
>>               ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>
>> we should set elt_bits bits, aka (without testing)
>>
>>               ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>> % BITS_PER_UNIT);
>>
>> ?
>
> Alternatively
>
>   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>       && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>
> should be amended with
>
>    && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT

How about:

  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
    {
      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);

?

Is it OK for TYPE_MODE to affect tree-level semantics though, especially
since it can change with the target attribute?  (At least TYPE_MODE_RAW
would be stable.)

> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
> mask for a int128 element vector
> we'd have 16bit mask elements, encoding that differently would be
> inconsistent as well
> (but of course 16bit elements are not handled by the code right now).

Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
had to add support for them.

Richard
Richard Biener June 28, 2024, 9:06 a.m. UTC | #4
> Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> 
> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>> 
>>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>>>> 
>>>> for the testcase in the PR115406, here is part of the dump.
>>>> 
>>>>  char D.4882;
>>>>  vector(1) <signed-boolean:8> _1;
>>>>  vector(1) signed char _2;
>>>>  char _5;
>>>> 
>>>>  <bb 2> :
>>>>  _1 = { -1 };
>>>> 
>>>> When assign { -1 } to vector(1} {signed-boolean:8},
>>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>>>> with each vector elemnet. But i think the bit setting should only apply for
>>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>>>> differently from vector<1> <signed-boolean:16>?
>>>> 
>>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>>> Ok for trunk?
>>> 
>>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>>> code should work for 8 bit
>>> entities as well, it seems we only set the LSB of each element in the
>>> "mask".  ISTR that SVE
>>> masks can have up to 8 bit elements (for 8 byte data elements), so
>>> maybe that's why
>>> <= BITS_PER_UNIT.
> 
> Yeah.

So is it necessary that only one bit is set for SVE?

>>> So maybe instead of just setting one bit in
>>> 
>>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>> 
>>> we should set elt_bits bits, aka (without testing)
>>> 
>>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>>> % BITS_PER_UNIT);
>>> 
>>> ?
>> 
>> Alternatively
>> 
>>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>> 
>> should be amended with
>> 
>>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> 
> How about:
> 
>  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>    {
>      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> 
> ?

Note the path is also necessary for avx512 and gcn mask modes which are integer modes.

> Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> since it can change with the target attribute?  (At least TYPE_MODE_RAW
> would be stable.)

That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?

I guess some test coverage would be nice here.

>> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
>> mask for a int128 element vector
>> we'd have 16bit mask elements, encoding that differently would be
>> inconsistent as well
>> (but of course 16bit elements are not handled by the code right now).
> 
> Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
> had to add support for them.
> 
> Richard
Richard Biener June 28, 2024, 12:16 p.m. UTC | #5
On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
>
>
> > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> >
> > Richard Biener <richard.guenther@gmail.com> writes:
> >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> <richard.guenther@gmail.com> wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> >>>>
> >>>> for the testcase in the PR115406, here is part of the dump.
> >>>>
> >>>>  char D.4882;
> >>>>  vector(1) <signed-boolean:8> _1;
> >>>>  vector(1) signed char _2;
> >>>>  char _5;
> >>>>
> >>>>  <bb 2> :
> >>>>  _1 = { -1 };
> >>>>
> >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> >>>> with each vector elemnet. But i think the bit setting should only apply for
> >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> >>>> differently from vector<1> <signed-boolean:16>?
> >>>>
> >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>>> Ok for trunk?
> >>>
> >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> >>> code should work for 8 bit
> >>> entities as well, it seems we only set the LSB of each element in the
> >>> "mask".  ISTR that SVE
> >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> >>> maybe that's why
> >>> <= BITS_PER_UNIT.
> >
> > Yeah.
>
> So is it necessary that only one bit is set for SVE?
>
> >>> So maybe instead of just setting one bit in
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> >>>
> >>> we should set elt_bits bits, aka (without testing)
> >>>
> >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> >>> % BITS_PER_UNIT);
> >>>
> >>> ?
> >>
> >> Alternatively
> >>
> >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> >>
> >> should be amended with
> >>
> >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> >
> > How about:
> >
> >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> >    {
> >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> >
> > ?
>
> Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
>
> > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> > would be stable.)
>
> That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
>
> I guess some test coverage would be nice here.

To continue on that, we do not currently have a way to capture a
vector comparison output
but the C++ frontend has vector ?:

typedef int v8si __attribute__((vector_size(32)));

void foo (v8si *a, v8si *b, v8si *c)
{
  *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
}

with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
then even with AVX512 enabled I see <signed-boolean:32> here and
vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
just a missed optimization).  But note that to decompose this into two
AVX512 vectors the temporary would have to change from <signed-boolean:32>
elements to <signed-boolean:1>.

The not supported vector bool types have BLKmode sofar.

But for example on i?86-linux with -mno-sse (like -march=i586) for

typedef short v2hi __attribute__((vector_size(4)));

void foo (v2hi *a, v2hi *b, v2hi *c)
{
  *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
}

we get a SImode vector <signed-boolean:16> as I feared.  That means
<signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
between SVE (bool for a 8byte data vector) and emulated vectors
("word-mode" vectors; for 1byte data vectors).

And without knowing that SVE would have used VnBImode given that
AVX512 uses an integer mode.

Aside from the too large vector and AVX512 issue above I think we can use
MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
can assert the mode is a scalar integer mode (AVX512 or GCN)?

Richard.


> >> maybe.  Still for the possibility of vector(n) <signed-boolean:16>
> >> mask for a int128 element vector
> >> we'd have 16bit mask elements, encoding that differently would be
> >> inconsistent as well
> >> (but of course 16bit elements are not handled by the code right now).
> >
> > Yeah, 16-bit predicate elements aren't a thing for SVE, so we've not
> > had to add support for them.
> >
> > Richard
Richard Biener June 28, 2024, 12:22 p.m. UTC | #6
On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> >
> >
> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> > >
> > > Richard Biener <richard.guenther@gmail.com> writes:
> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> > >>> <richard.guenther@gmail.com> wrote:
> > >>>
> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >>>>
> > >>>> for the testcase in the PR115406, here is part of the dump.
> > >>>>
> > >>>>  char D.4882;
> > >>>>  vector(1) <signed-boolean:8> _1;
> > >>>>  vector(1) signed char _2;
> > >>>>  char _5;
> > >>>>
> > >>>>  <bb 2> :
> > >>>>  _1 = { -1 };
> > >>>>
> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> > >>>> with each vector elemnet. But i think the bit setting should only apply for
> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> > >>>> differently from vector<1> <signed-boolean:16>?
> > >>>>
> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > >>>> Ok for trunk?
> > >>>
> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> > >>> code should work for 8 bit
> > >>> entities as well, it seems we only set the LSB of each element in the
> > >>> "mask".  ISTR that SVE
> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> > >>> maybe that's why
> > >>> <= BITS_PER_UNIT.
> > >
> > > Yeah.
> >
> > So is it necessary that only one bit is set for SVE?
> >
> > >>> So maybe instead of just setting one bit in
> > >>>
> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> > >>>
> > >>> we should set elt_bits bits, aka (without testing)
> > >>>
> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> > >>> % BITS_PER_UNIT);
> > >>>
> > >>> ?
> > >>
> > >> Alternatively
> > >>
> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> > >>
> > >> should be amended with
> > >>
> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> > >
> > > How about:
> > >
> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> > >    {
> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> > >
> > > ?
> >
> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
> >
> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> > > would be stable.)
> >
> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
> >
> > I guess some test coverage would be nice here.
>
> To continue on that, we do not currently have a way to capture a
> vector comparison output
> but the C++ frontend has vector ?:
>
> typedef int v8si __attribute__((vector_size(32)));
>
> void foo (v8si *a, v8si *b, v8si *c)
> {
>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
> }
>
> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
> then even with AVX512 enabled I see <signed-boolean:32> here and
> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
> just a missed optimization).  But note that to decompose this into two
> AVX512 vectors the temporary would have to change from <signed-boolean:32>
> elements to <signed-boolean:1>.
>
> The not supported vector bool types have BLKmode sofar.
>
> But for example on i?86-linux with -mno-sse (like -march=i586) for
>
> typedef short v2hi __attribute__((vector_size(4)));
>
> void foo (v2hi *a, v2hi *b, v2hi *c)
> {
>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
> }
>
> we get a SImode vector <signed-boolean:16> as I feared.  That means
> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
> between SVE (bool for a 8byte data vector) and emulated vectors
> ("word-mode" vectors; for 1byte data vectors).
>
> And without knowing that SVE would have used VnBImode given that
> AVX512 uses an integer mode.
>
> Aside from the too large vector and AVX512 issue above I think we can use
> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
> can assert the mode is a scalar integer mode (AVX512 or GCN)?

So like the attached?

Richard.
Richard Sandiford June 28, 2024, 12:28 p.m. UTC | #7
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>> >
>> >
>> >
>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>> > >
>> > > Richard Biener <richard.guenther@gmail.com> writes:
>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>> > >>> <richard.guenther@gmail.com> wrote:
>> > >>>
>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>> > >>>>
>> > >>>> for the testcase in the PR115406, here is part of the dump.
>> > >>>>
>> > >>>>  char D.4882;
>> > >>>>  vector(1) <signed-boolean:8> _1;
>> > >>>>  vector(1) signed char _2;
>> > >>>>  char _5;
>> > >>>>
>> > >>>>  <bb 2> :
>> > >>>>  _1 = { -1 };
>> > >>>>
>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>> > >>>> with each vector elemnet. But i think the bit setting should only apply for
>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>> > >>>> differently from vector<1> <signed-boolean:16>?
>> > >>>>
>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>> > >>>> Ok for trunk?
>> > >>>
>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>> > >>> code should work for 8 bit
>> > >>> entities as well, it seems we only set the LSB of each element in the
>> > >>> "mask".  ISTR that SVE
>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>> > >>> maybe that's why
>> > >>> <= BITS_PER_UNIT.
>> > >
>> > > Yeah.
>> >
>> > So is it necessary that only one bit is set for SVE?

TBH I can't remember now.  It matches what SVE instructions produce, and
lines up with the associated RTL code (which at the time was SVE-specific).
But when dealing with multibyte elements, upper predicate element bits
are ignored on read, so matching the instructions might not matter.

>> > >>> So maybe instead of just setting one bit in
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>> > >>>
>> > >>> we should set elt_bits bits, aka (without testing)
>> > >>>
>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>> > >>> % BITS_PER_UNIT);
>> > >>>
>> > >>> ?
>> > >>
>> > >> Alternatively
>> > >>
>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>> > >>
>> > >> should be amended with
>> > >>
>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>> > >
>> > > How about:
>> > >
>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>> > >    {
>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>> > >
>> > > ?
>> >
>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
>> >
>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>> > > would be stable.)
>> >
>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
>> >
>> > I guess some test coverage would be nice here.
>>
>> To continue on that, we do not currently have a way to capture a
>> vector comparison output
>> but the C++ frontend has vector ?:
>>
>> typedef int v8si __attribute__((vector_size(32)));
>>
>> void foo (v8si *a, v8si *b, v8si *c)
>> {
>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>> }
>>
>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
>> then even with AVX512 enabled I see <signed-boolean:32> here and
>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>> just a missed optimization).  But note that to decompose this into two
>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
>> elements to <signed-boolean:1>.
>>
>> The not supported vector bool types have BLKmode sofar.
>>
>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>
>> typedef short v2hi __attribute__((vector_size(4)));
>>
>> void foo (v2hi *a, v2hi *b, v2hi *c)
>> {
>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>> }
>>
>> we get a SImode vector <signed-boolean:16> as I feared.  That means
>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
>> between SVE (bool for a 8byte data vector) and emulated vectors
>> ("word-mode" vectors; for 1byte data vectors).
>>
>> And without knowing that SVE would have used VnBImode given that
>> AVX512 uses an integer mode.
>>
>> Aside from the too large vector and AVX512 issue above I think we can use
>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
>
> So like the attached?

Can you give me a few days to see whether swapping to -1 : 0 masks
works for SVE?  I guess it would make things easier in the long run.

Thanks,
Richard
Richard Sandiford July 5, 2024, 9:09 a.m. UTC | #8
Richard Sandiford <richard.sandiford@arm.com> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
>>> > >
>>> > > Richard Biener <richard.guenther@gmail.com> writes:
>>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
>>> > >>> <richard.guenther@gmail.com> wrote:
>>> > >>>
>>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
>>> > >>>>
>>> > >>>> for the testcase in the PR115406, here is part of the dump.
>>> > >>>>
>>> > >>>>  char D.4882;
>>> > >>>>  vector(1) <signed-boolean:8> _1;
>>> > >>>>  vector(1) signed char _2;
>>> > >>>>  char _5;
>>> > >>>>
>>> > >>>>  <bb 2> :
>>> > >>>>  _1 = { -1 };
>>> > >>>>
>>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
>>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
>>> > >>>> with each vector elemnet. But i think the bit setting should only apply for
>>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
>>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
>>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
>>> > >>>> differently from vector<1> <signed-boolean:16>?
>>> > >>>>
>>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>>> > >>>> Ok for trunk?
>>> > >>>
>>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
>>> > >>> code should work for 8 bit
>>> > >>> entities as well, it seems we only set the LSB of each element in the
>>> > >>> "mask".  ISTR that SVE
>>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
>>> > >>> maybe that's why
>>> > >>> <= BITS_PER_UNIT.
>>> > >
>>> > > Yeah.
>>> >
>>> > So is it necessary that only one bit is set for SVE?
>
> TBH I can't remember now.  It matches what SVE instructions produce, and
> lines up with the associated RTL code (which at the time was SVE-specific).
> But when dealing with multibyte elements, upper predicate element bits
> are ignored on read, so matching the instructions might not matter.
>
>>> > >>> So maybe instead of just setting one bit in
>>> > >>>
>>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
>>> > >>>
>>> > >>> we should set elt_bits bits, aka (without testing)
>>> > >>>
>>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
>>> > >>> % BITS_PER_UNIT);
>>> > >>>
>>> > >>> ?
>>> > >>
>>> > >> Alternatively
>>> > >>
>>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
>>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
>>> > >>
>>> > >> should be amended with
>>> > >>
>>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
>>> > >
>>> > > How about:
>>> > >
>>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
>>> > >    {
>>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
>>> > >
>>> > > ?
>>> >
>>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
>>> >
>>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
>>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
>>> > > would be stable.)
>>> >
>>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
>>> >
>>> > I guess some test coverage would be nice here.
>>>
>>> To continue on that, we do not currently have a way to capture a
>>> vector comparison output
>>> but the C++ frontend has vector ?:
>>>
>>> typedef int v8si __attribute__((vector_size(32)));
>>>
>>> void foo (v8si *a, v8si *b, v8si *c)
>>> {
>>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
>>> }
>>>
>>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
>>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
>>> then even with AVX512 enabled I see <signed-boolean:32> here and
>>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
>>> just a missed optimization).  But note that to decompose this into two
>>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
>>> elements to <signed-boolean:1>.
>>>
>>> The not supported vector bool types have BLKmode sofar.
>>>
>>> But for example on i?86-linux with -mno-sse (like -march=i586) for
>>>
>>> typedef short v2hi __attribute__((vector_size(4)));
>>>
>>> void foo (v2hi *a, v2hi *b, v2hi *c)
>>> {
>>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
>>> }
>>>
>>> we get a SImode vector <signed-boolean:16> as I feared.  That means
>>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
>>> between SVE (bool for a 8byte data vector) and emulated vectors
>>> ("word-mode" vectors; for 1byte data vectors).
>>>
>>> And without knowing that SVE would have used VnBImode given that
>>> AVX512 uses an integer mode.
>>>
>>> Aside from the too large vector and AVX512 issue above I think we can use
>>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
>>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
>>
>> So like the attached?
>
> Can you give me a few days to see whether swapping to -1 : 0 masks
> works for SVE?  I guess it would make things easier in the long run.

It took longer than expected, but FWIW, the SVE results look good
with the attached.

I suppose it does raise the question whether:

	  if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))

is the right way to test a boolean vector element at the tree level.
For SVE it is, but I suppose for other targets 0b10 should be treated
as true?

The same question doesn't really arise on the RTL side because
the SVE predicate elements are single bits.  So perhaps one way
would be to base the test above on GET_MODE_UNIT_PRECISION for
boolean vector modes, and test for nonzero otherwise.  Doesn't feel
very clean though...

Alternatively, we could say that the semantics of boolean vectors
at the tree level are independent of the target, and try to fix
things up at the gimple->rtl boundary.

Thanks,
Richard


gcc/
	* fold-const.cc (native_encode_vector_part): Set all bits of
	a boolean element.
	* simplify-rtx.cc (native_encode_rtx): Likewise.
---
 gcc/fold-const.cc   | 2 +-
 gcc/simplify-rtx.cc | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c021..8b7aaca2500 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8106,7 +8106,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
 	  if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))
 	    {
 	      unsigned int bit = i * elt_bits;
-	      ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
+	      ptr[bit / BITS_PER_UNIT] |= ((1 << elt_bits) - 1) << (bit % BITS_PER_UNIT);
 	    }
 	}
       return extract_bytes;
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 35ba54c6292..bfc33bd2f19 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -7232,7 +7232,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
 	      target_unit value = 0;
 	      for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
 		{
-		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
+		  value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) ? mask : 0) << j;
 		  elt += 1;
 		}
 	      bytes.quick_push (value);
Richard Biener July 5, 2024, 12:02 p.m. UTC | #9
On Fri, Jul 5, 2024 at 11:09 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> On Fri, Jun 28, 2024 at 2:16 PM Richard Biener
> >> <richard.guenther@gmail.com> wrote:
> >>>
> >>> On Fri, Jun 28, 2024 at 11:06 AM Richard Biener
> >>> <richard.guenther@gmail.com> wrote:
> >>> >
> >>> >
> >>> >
> >>> > > Am 28.06.2024 um 10:27 schrieb Richard Sandiford <richard.sandiford@arm.com>:
> >>> > >
> >>> > > Richard Biener <richard.guenther@gmail.com> writes:
> >>> > >>> On Fri, Jun 28, 2024 at 8:01 AM Richard Biener
> >>> > >>> <richard.guenther@gmail.com> wrote:
> >>> > >>>
> >>> > >>> On Fri, Jun 28, 2024 at 3:15 AM liuhongt <hongtao.liu@intel.com> wrote:
> >>> > >>>>
> >>> > >>>> for the testcase in the PR115406, here is part of the dump.
> >>> > >>>>
> >>> > >>>>  char D.4882;
> >>> > >>>>  vector(1) <signed-boolean:8> _1;
> >>> > >>>>  vector(1) signed char _2;
> >>> > >>>>  char _5;
> >>> > >>>>
> >>> > >>>>  <bb 2> :
> >>> > >>>>  _1 = { -1 };
> >>> > >>>>
> >>> > >>>> When assign { -1 } to vector(1} {signed-boolean:8},
> >>> > >>>> Since TYPE_PRECISION (itype) <= BITS_PER_UNIT, so it set each bit of dest
> >>> > >>>> with each vector elemnet. But i think the bit setting should only apply for
> >>> > >>>> TYPE_PRECISION (itype) < BITS_PER_UNIT. .i.e for vector(1).
> >>> > >>>> <signed-boolean:16>, it will be assigned as -1, instead of 1.
> >>> > >>>> Is there any specific reason vector(1) <signed-boolean:8> is handled
> >>> > >>>> differently from vector<1> <signed-boolean:16>?
> >>> > >>>>
> >>> > >>>> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >>> > >>>> Ok for trunk?
> >>> > >>>
> >>> > >>> I agree that <= BITS_PER_UNIT is suspicious, but the bit-precision
> >>> > >>> code should work for 8 bit
> >>> > >>> entities as well, it seems we only set the LSB of each element in the
> >>> > >>> "mask".  ISTR that SVE
> >>> > >>> masks can have up to 8 bit elements (for 8 byte data elements), so
> >>> > >>> maybe that's why
> >>> > >>> <= BITS_PER_UNIT.
> >>> > >
> >>> > > Yeah.
> >>> >
> >>> > So is it necessary that only one bit is set for SVE?
> >
> > TBH I can't remember now.  It matches what SVE instructions produce, and
> > lines up with the associated RTL code (which at the time was SVE-specific).
> > But when dealing with multibyte elements, upper predicate element bits
> > are ignored on read, so matching the instructions might not matter.
> >
> >>> > >>> So maybe instead of just setting one bit in
> >>> > >>>
> >>> > >>>              ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> >>> > >>>
> >>> > >>> we should set elt_bits bits, aka (without testing)
> >>> > >>>
> >>> > >>>              ptr[bit / BITS_PER_UNIT] |= (1 << elt_bits - 1) << (bit
> >>> > >>> % BITS_PER_UNIT);
> >>> > >>>
> >>> > >>> ?
> >>> > >>
> >>> > >> Alternatively
> >>> > >>
> >>> > >>  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
> >>> > >>      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
> >>> > >>
> >>> > >> should be amended with
> >>> > >>
> >>> > >>   && GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) != MODE_VECTOR_INT
> >>> > >
> >>> > > How about:
> >>> > >
> >>> > >  if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (expr))) == MODE_VECTOR_BOOL)
> >>> > >    {
> >>> > >      gcc_assert (TYPE_PRECISION (itype) <= BITS_PER_UNIT);
> >>> > >
> >>> > > ?
> >>> >
> >>> > Note the path is also necessary for avx512 and gcn mask modes which are integer modes.
> >>> >
> >>> > > Is it OK for TYPE_MODE to affect tree-level semantics though, especially
> >>> > > since it can change with the target attribute?  (At least TYPE_MODE_RAW
> >>> > > would be stable.)
> >>> >
> >>> > That’s a good question and also related to GCC vector extension which can result in both BLKmode and integer modes to be used.  But I’m not sure how we expose masks to the middle end here.  A too large vector bool could be lowered to AVX512 mode.  Maybe we should simply reject interpret/encode of BLKmode vectors and make sure to never assign integer modes to vector bools (if the target didn’t specify that mode)?
> >>> >
> >>> > I guess some test coverage would be nice here.
> >>>
> >>> To continue on that, we do not currently have a way to capture a
> >>> vector comparison output
> >>> but the C++ frontend has vector ?:
> >>>
> >>> typedef int v8si __attribute__((vector_size(32)));
> >>>
> >>> void foo (v8si *a, v8si *b, v8si *c)
> >>> {
> >>>   *c = *a < *b ? (v8si){-1,-1,-1,-1,-1,-1,-1,-1 } : (v8si){0,0,0,0,0,0,0,0};
> >>> }
> >>>
> >>> with SSE2 we get a <signed-boolean:32> temporary, with AVX512 enabled
> >>> that becomes <singed-boolean:1>.  When we enlarge the vector to size 128
> >>> then even with AVX512 enabled I see <signed-boolean:32> here and
> >>> vector lowering decomposes that to scalar (also with AVX or SSE, so maybe
> >>> just a missed optimization).  But note that to decompose this into two
> >>> AVX512 vectors the temporary would have to change from <signed-boolean:32>
> >>> elements to <signed-boolean:1>.
> >>>
> >>> The not supported vector bool types have BLKmode sofar.
> >>>
> >>> But for example on i?86-linux with -mno-sse (like -march=i586) for
> >>>
> >>> typedef short v2hi __attribute__((vector_size(4)));
> >>>
> >>> void foo (v2hi *a, v2hi *b, v2hi *c)
> >>> {
> >>>   *c = *a < *b ? (v2hi){-1,-1} : (v2hi){0,0};
> >>> }
> >>>
> >>> we get a SImode vector <signed-boolean:16> as I feared.  That means
> >>> <signed-boolean:8> (the BITS_PER_UNIT case) can be ambiguous
> >>> between SVE (bool for a 8byte data vector) and emulated vectors
> >>> ("word-mode" vectors; for 1byte data vectors).
> >>>
> >>> And without knowing that SVE would have used VnBImode given that
> >>> AVX512 uses an integer mode.
> >>>
> >>> Aside from the too large vector and AVX512 issue above I think we can use
> >>> MODE_VECTOR_BOOL || TYPE_PRECISION == 1 and for the latter we
> >>> can assert the mode is a scalar integer mode (AVX512 or GCN)?
> >>
> >> So like the attached?
> >
> > Can you give me a few days to see whether swapping to -1 : 0 masks
> > works for SVE?  I guess it would make things easier in the long run.
>
> It took longer than expected, but FWIW, the SVE results look good
> with the attached.
>
> I suppose it does raise the question whether:
>
>           if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))
>
> is the right way to test a boolean vector element at the tree level.
> For SVE it is, but I suppose for other targets 0b10 should be treated
> as true?
>
> The same question doesn't really arise on the RTL side because
> the SVE predicate elements are single bits.  So perhaps one way
> would be to base the test above on GET_MODE_UNIT_PRECISION for
> boolean vector modes, and test for nonzero otherwise.  Doesn't feel
> very clean though...

native_interpret_vector_part also tests the LSB.

> Alternatively, we could say that the semantics of boolean vectors
> at the tree level are independent of the target, and try to fix
> things up at the gimple->rtl boundary.

I think since we do not expose boolean vectors to users via the
vector extension we're free to choose.  Intrinsics may get them
ways to spill/load predicates to/from memory and thus write
target appropriate literal constants.  And do sth like (fix syntax for me):

svebool b = 0x3452;

int main()
{
  auto a = (v4si) { 1, 2, 3, 4 };
  auto b = (V4si) { 5, 6, 7, 8 };
  svebool c = sve_scmp_lt (a, b);
  if (c != b)
   abort ();
}

and when we'd expose sve_scmp_lt to GIMPLE (or RTL) for constant
folding we have to make sure that constant folding is in line with
the svebool representation the user expects.

Hardware-wise I think setting all bits when constant folding is on the
safe side.  But if this might expose inconsistencies like above to the
user (I'm not sure this is actually observable), then that would be a bug.
But this is likely a problem right now already.

On x86 and GCN where there's only one bit per lane this is not ambiguous
but I think both RISC-V and SVE have "padding" bits and we do not expose
their detail (LSB or MSB significant) to the middle-end?

But I'd say we care once somebody comes up with a test like above.

So I think your patch is OK - it should also solve the original reported issue.
Note this is a latent issue that we should fix on branches as well, otherwise
VnQImode mask vectors are wrongly constant folded.  You are best to assess
whether your or my approach is better on branches - it doesn't matter for x86
or gcn.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> gcc/
>         * fold-const.cc (native_encode_vector_part): Set all bits of
>         a boolean element.
>         * simplify-rtx.cc (native_encode_rtx): Likewise.
> ---
>  gcc/fold-const.cc   | 2 +-
>  gcc/simplify-rtx.cc | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 710d697c021..8b7aaca2500 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -8106,7 +8106,7 @@ native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
>           if (ptr && wi::extract_uhwi (wi::to_wide (elt), 0, 1))
>             {
>               unsigned int bit = i * elt_bits;
> -             ptr[bit / BITS_PER_UNIT] |= 1 << (bit % BITS_PER_UNIT);
> +             ptr[bit / BITS_PER_UNIT] |= ((1 << elt_bits) - 1) << (bit % BITS_PER_UNIT);
>             }
>         }
>        return extract_bytes;
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 35ba54c6292..bfc33bd2f19 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -7232,7 +7232,7 @@ native_encode_rtx (machine_mode mode, rtx x, vec<target_unit> &bytes,
>               target_unit value = 0;
>               for (unsigned int j = 0; j < BITS_PER_UNIT; j += elt_bits)
>                 {
> -                 value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) & mask) << j;
> +                 value |= (INTVAL (CONST_VECTOR_ELT (x, elt)) ? mask : 0) << j;
>                   elt += 1;
>                 }
>               bytes.quick_push (value);
> --
> 2.25.1
>
diff mbox series

Patch

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index 710d697c021..0f045f851d1 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -8077,7 +8077,7 @@  native_encode_vector_part (const_tree expr, unsigned char *ptr, int len,
 {
   tree itype = TREE_TYPE (TREE_TYPE (expr));
   if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (expr))
-      && TYPE_PRECISION (itype) <= BITS_PER_UNIT)
+      && TYPE_PRECISION (itype) < BITS_PER_UNIT)
     {
       /* This is the only case in which elements can be smaller than a byte.
 	 Element 0 is always in the lsb of the containing byte.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr115406.c b/gcc/testsuite/gcc.target/i386/pr115406.c
new file mode 100644
index 00000000000..623dff06fc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115406.c
@@ -0,0 +1,23 @@ 
+/* { dg-do run } */
+/* { dg-options "-O0 -mavx512f" } */
+/* { dg-require-effective-target avx512f } */
+
+typedef __attribute__((__vector_size__ (1))) char V;
+
+char
+foo (V v)
+{
+  return ((V) v == v)[0];
+}
+
+int
+main ()
+{
+  if (!__builtin_cpu_supports ("avx512f"))
+    return 0;
+
+  char x = foo ((V) { });
+  if (x != -1)
+    __builtin_abort ();
+  return 0;
+}