VEC_COND_EXPR optimizations

Message ID	alpine.DEB.2.23.453.2007291859410.6927@stedding.saclay.inria.fr
State	New
Headers	show Return-Path: <gcc-patches-bounces@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 0635E3858D35 Date: Thu, 30 Jul 2020 09:49:10 +0200 (CEST) From: Marc Glisse <marc.glisse@inria.fr> To: gcc-patches@gcc.gnu.org Subject: VEC_COND_EXPR optimizations Message-ID: <alpine.DEB.2.23.453.2007291859410.6927@stedding.saclay.inria.fr> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1199518297-1596059937=:48307" Content-ID: <alpine.DEB.2.02.2007300929040.1785401@grove.saclay.inria.fr> Precedence: list Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces@gcc.gnu.org>
Series	VEC_COND_EXPR optimizations \| expand VEC_COND_EXPR optimizations

Marc Glisse July 30, 2020, 7:49 a.m. UTC

When vector comparisons were forced to use vec_cond_expr, we lost a number 
of optimizations (my fault for not adding enough testcases to prevent 
that). This patch tries to unwrap vec_cond_expr a bit so some 
optimizations can still happen.

I wasn't planning to add all those transformations together, but adding 
one caused a regression, whose fix introduced a second regression, etc.

Using a simple fold_binary internally looks like an ok compromise to me. 
It remains cheap enough (not recursive, and vector instructions are not 
that frequent), while still allowing more than const_binop (X|0 or X&X for 
instance). The transformations are quite conservative with :s and folding 
only if everything simplifies, we may want to relax this later. And of 
course we are going to miss things like a?b:c + a?c:b -> b+c.

In terms of number of operations, some transformations turning 2 
VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not 
look like a gain... I expect the bit_not disappears in most cases, and 
VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.

I am a bit confused that with avx512 we get types like "vector(4) 
<signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not 
-1?), but that doesn't matter for this patch.

Regtest+bootstrap on x86_64-pc-linux-gnu

2020-07-30  Marc Glisse  <marc.glisse@inria.fr>

 	PR tree-optimization/95906
 	PR target/70314
 	* match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
 	(v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.

 	* gcc.dg/tree-ssa/andnot-2.c: New file.
 	* gcc.dg/tree-ssa/pr95906.c: Likewise.
 	* gcc.target/i386/pr70314.c: Likewise.

Richard Sandiford July 31, 2020, 11:18 a.m. UTC | #1

Marc Glisse <marc.glisse@inria.fr> writes:
> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> + (simplify
> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> +  (with
> +   {
> +     tree rhs1, rhs2 = NULL;
> +     rhs1 = fold_binary (op, type, @1, @3);
> +     if (rhs1 && is_gimple_val (rhs1))
> +       rhs2 = fold_binary (op, type, @2, @4);
> +   }
> +   (if (rhs2 && is_gimple_val (rhs2))
> +    (vec_cond @0 { rhs1; } { rhs2; })))))
> +#endif

This one looks dangerous for potentially-trapping ops.

> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
> +(simplify
> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
> + (vec_cond (bit_and @0 @3) @1 @2))

Does something check automatically that @0 and @3 have compatible types?
Same question for the later folds.

Thanks,
Richard

Richard Biener July 31, 2020, 11:35 a.m. UTC | #2

On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> When vector comparisons were forced to use vec_cond_expr, we lost a number
> of optimizations (my fault for not adding enough testcases to prevent
> that). This patch tries to unwrap vec_cond_expr a bit so some
> optimizations can still happen.
>
> I wasn't planning to add all those transformations together, but adding
> one caused a regression, whose fix introduced a second regression, etc.
>
> Using a simple fold_binary internally looks like an ok compromise to me.
> It remains cheap enough (not recursive, and vector instructions are not
> that frequent), while still allowing more than const_binop (X|0 or X&X for
> instance). The transformations are quite conservative with :s and folding
> only if everything simplifies, we may want to relax this later. And of
> course we are going to miss things like a?b:c + a?c:b -> b+c.
>
> In terms of number of operations, some transformations turning 2
> VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not
> look like a gain... I expect the bit_not disappears in most cases, and
> VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
>
> I am a bit confused that with avx512 we get types like "vector(4)
> <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
> -1?), but that doesn't matter for this patch.
>
> Regtest+bootstrap on x86_64-pc-linux-gnu

+  (with
+   {
+     tree rhs1, rhs2 = NULL;
+     rhs1 = fold_binary (op, type, @1, @3);
+     if (rhs1 && is_gimple_val (rhs1))
+       rhs2 = fold_binary (op, type, @2, @3);

ICK.  I guess a more match-and-simplify way would be

   (with
    {
      tree rhs1, rhs2;
      gimple_match_op op (gimple_match_cond::UNCOND, op,
                                      type, @1, @3);
      if (op.resimplify (NULL, valueize)
          && gimple_simplified_result_is_gimple_val (op))
       {
         rhs1 = op.ops[0];
         ... other operand ...
       }

now in theory we could invent some new syntax for this, like

 (simplify
  (op (vec_cond:s @0 @1 @2) @3)
  (vec_cond @0 (op:x @1 @3) (op:x @2 @3)))

and pick something better instead of :x (:s is taken,
would be 'simplified', :c is taken would be 'constexpr', ...).

_Maybe_ just

 (simplify
  (op (vec_cond:s @0 @1 @2) @3)
  (vec_cond:x @0 (op @1 @3) (op @2 @3)))

which would have the same practical meaning as passing
NULL for the seq argument to simplification - do not allow
any intermediate stmt to be generated.

The other "simple" patterns look good, you can commit
them separately if you like.

Richard.

> 2020-07-30  Marc Glisse  <marc.glisse@inria.fr>
>
>         PR tree-optimization/95906
>         PR target/70314
>         * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
>         (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
>
>         * gcc.dg/tree-ssa/andnot-2.c: New file.
>         * gcc.dg/tree-ssa/pr95906.c: Likewise.
>         * gcc.target/i386/pr70314.c: Likewise.
>
> --
> Marc Glisse

Marc Glisse July 31, 2020, 11:38 a.m. UTC | #3

On Fri, 31 Jul 2020, Richard Sandiford wrote:

> Marc Glisse <marc.glisse@inria.fr> writes:
>> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
>> + (simplify
>> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>> +  (with
>> +   {
>> +     tree rhs1, rhs2 = NULL;
>> +     rhs1 = fold_binary (op, type, @1, @3);
>> +     if (rhs1 && is_gimple_val (rhs1))
>> +       rhs2 = fold_binary (op, type, @2, @4);
>> +   }
>> +   (if (rhs2 && is_gimple_val (rhs2))
>> +    (vec_cond @0 { rhs1; } { rhs2; })))))
>> +#endif
>
> This one looks dangerous for potentially-trapping ops.

I would expect the operation not to be folded if it can trap. Is that too 
optimistic?

>> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
>> +(simplify
>> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
>> + (vec_cond (bit_and @0 @3) @1 @2))
>
> Does something check automatically that @0 and @3 have compatible types?

My memory of this dates from before the avx512-related changes, but at 
least at the time, the type of the condition in vec_cond_expr had to have 
the same size and number of elements as the result, and have signed 
integral elements. Now the size constraint has changed, but it still looks 
like for a given number of elements and size (this last one ignored for 
avx512), there is essentially a single type that can appear as condition. 
Is this wrong for x86? For SVE?

I could add some types_match conditions if that seems too unsafe...

Richard Biener July 31, 2020, 11:39 a.m. UTC | #4

On Fri, Jul 31, 2020 at 1:35 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> >
> > When vector comparisons were forced to use vec_cond_expr, we lost a number
> > of optimizations (my fault for not adding enough testcases to prevent
> > that). This patch tries to unwrap vec_cond_expr a bit so some
> > optimizations can still happen.
> >
> > I wasn't planning to add all those transformations together, but adding
> > one caused a regression, whose fix introduced a second regression, etc.
> >
> > Using a simple fold_binary internally looks like an ok compromise to me.
> > It remains cheap enough (not recursive, and vector instructions are not
> > that frequent), while still allowing more than const_binop (X|0 or X&X for
> > instance). The transformations are quite conservative with :s and folding
> > only if everything simplifies, we may want to relax this later. And of
> > course we are going to miss things like a?b:c + a?c:b -> b+c.
> >
> > In terms of number of operations, some transformations turning 2
> > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not
> > look like a gain... I expect the bit_not disappears in most cases, and
> > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
> >
> > I am a bit confused that with avx512 we get types like "vector(4)
> > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
> > -1?), but that doesn't matter for this patch.
> >
> > Regtest+bootstrap on x86_64-pc-linux-gnu
>
> +  (with
> +   {
> +     tree rhs1, rhs2 = NULL;
> +     rhs1 = fold_binary (op, type, @1, @3);
> +     if (rhs1 && is_gimple_val (rhs1))
> +       rhs2 = fold_binary (op, type, @2, @3);
>
> ICK.  I guess a more match-and-simplify way would be
>
>    (with
>     {
>       tree rhs1, rhs2;
>       gimple_match_op op (gimple_match_cond::UNCOND, op,
>                                       type, @1, @3);
>       if (op.resimplify (NULL, valueize)
>           && gimple_simplified_result_is_gimple_val (op))
>        {
>          rhs1 = op.ops[0];
>          ... other operand ...
>        }
>
> now in theory we could invent some new syntax for this, like
>
>  (simplify
>   (op (vec_cond:s @0 @1 @2) @3)
>   (vec_cond @0 (op:x @1 @3) (op:x @2 @3)))
>
> and pick something better instead of :x (:s is taken,
> would be 'simplified', :c is taken would be 'constexpr', ...).
>
> _Maybe_ just
>
>  (simplify
>   (op (vec_cond:s @0 @1 @2) @3)
>   (vec_cond:x @0 (op @1 @3) (op @2 @3)))
>
> which would have the same practical meaning as passing
> NULL for the seq argument to simplification - do not allow
> any intermediate stmt to be generated.

Note I specifically do not like those if (it-simplifies) checks
because we already would code-generate those anyway.  For

(simplify
  (plus (vec_cond:s @0 @1 @2) @3)
  (vec_cond @0 (plus @1 @3) (plus @2 @3)))

we get

                    res_op->set_op (VEC_COND_EXPR, type, 3);
                    res_op->ops[0] = captures[1];
                    res_op->ops[0] = unshare_expr (res_op->ops[0]);
                    {
                      tree _o1[2], _r1;
                      _o1[0] = captures[2];
                      _o1[1] = captures[4];
                      gimple_match_op tem_op (res_op->cond.any_else
(), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
                      tem_op.resimplify (lseq, valueize);
                      _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (****)
                      if (!_r1) return false;
                      res_op->ops[1] = _r1;
                    }
                    {
                      tree _o1[2], _r1;
                      _o1[0] = captures[3];
                      _o1[1] = captures[4];
                      gimple_match_op tem_op (res_op->cond.any_else
(), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
                      tem_op.resimplify (lseq, valueize);
                      _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (***)
                      if (!_r1) return false;
                      res_op->ops[2] = _r1;
                    }
                    res_op->resimplify (lseq, valueize);
                    return true;

and the only change required would be to pass NULL to maybe_push_res_to_seq
here instead of lseq at the (***) marked points.

Richard.

> The other "simple" patterns look good, you can commit
> them separately if you like.
>
> Richard.
>
> > 2020-07-30  Marc Glisse  <marc.glisse@inria.fr>
> >
> >         PR tree-optimization/95906
> >         PR target/70314
> >         * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> >         (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> >
> >         * gcc.dg/tree-ssa/andnot-2.c: New file.
> >         * gcc.dg/tree-ssa/pr95906.c: Likewise.
> >         * gcc.target/i386/pr70314.c: Likewise.
> >
> > --
> > Marc Glisse

Richard Biener July 31, 2020, 11:43 a.m. UTC | #5

On Fri, Jul 31, 2020 at 1:39 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Fri, 31 Jul 2020, Richard Sandiford wrote:
>
> > Marc Glisse <marc.glisse@inria.fr> writes:
> >> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> >> + (simplify
> >> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> >> +  (with
> >> +   {
> >> +     tree rhs1, rhs2 = NULL;
> >> +     rhs1 = fold_binary (op, type, @1, @3);
> >> +     if (rhs1 && is_gimple_val (rhs1))
> >> +       rhs2 = fold_binary (op, type, @2, @4);
> >> +   }
> >> +   (if (rhs2 && is_gimple_val (rhs2))
> >> +    (vec_cond @0 { rhs1; } { rhs2; })))))
> >> +#endif
> >
> > This one looks dangerous for potentially-trapping ops.
>
> I would expect the operation not to be folded if it can trap. Is that too
> optimistic?
>
> >> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
> >> +(simplify
> >> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
> >> + (vec_cond (bit_and @0 @3) @1 @2))
> >
> > Does something check automatically that @0 and @3 have compatible types?

@0 should always have a vector boolean type and thus will not be generally
compatible with @3.  But OTOH then when you see (vec_cond (vec_cond ...
then @3 must be vector boolean as well...

But in theory with AVX512 the inner vec_cond could have a SImode
condition @0 producing a regular V4SImode vector mask for an outer
AVX512 SSE-style vec-cond and you then would get a mismatch.

So indeed better add a type compatibility check.

> My memory of this dates from before the avx512-related changes, but at
> least at the time, the type of the condition in vec_cond_expr had to have
> the same size and number of elements as the result, and have signed
> integral elements. Now the size constraint has changed, but it still looks
> like for a given number of elements and size (this last one ignored for
> avx512), there is essentially a single type that can appear as condition.
> Is this wrong for x86? For SVE?
>
> I could add some types_match conditions if that seems too unsafe...
>
> --
> Marc Glisse

Richard Biener July 31, 2020, 11:47 a.m. UTC | #6

On Fri, Jul 31, 2020 at 1:39 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jul 31, 2020 at 1:35 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> > >
> > > When vector comparisons were forced to use vec_cond_expr, we lost a number
> > > of optimizations (my fault for not adding enough testcases to prevent
> > > that). This patch tries to unwrap vec_cond_expr a bit so some
> > > optimizations can still happen.
> > >
> > > I wasn't planning to add all those transformations together, but adding
> > > one caused a regression, whose fix introduced a second regression, etc.
> > >
> > > Using a simple fold_binary internally looks like an ok compromise to me.
> > > It remains cheap enough (not recursive, and vector instructions are not
> > > that frequent), while still allowing more than const_binop (X|0 or X&X for
> > > instance). The transformations are quite conservative with :s and folding
> > > only if everything simplifies, we may want to relax this later. And of
> > > course we are going to miss things like a?b:c + a?c:b -> b+c.
> > >
> > > In terms of number of operations, some transformations turning 2
> > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not
> > > look like a gain... I expect the bit_not disappears in most cases, and
> > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
> > >
> > > I am a bit confused that with avx512 we get types like "vector(4)
> > > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
> > > -1?), but that doesn't matter for this patch.
> > >
> > > Regtest+bootstrap on x86_64-pc-linux-gnu
> >
> > +  (with
> > +   {
> > +     tree rhs1, rhs2 = NULL;
> > +     rhs1 = fold_binary (op, type, @1, @3);
> > +     if (rhs1 && is_gimple_val (rhs1))
> > +       rhs2 = fold_binary (op, type, @2, @3);
> >
> > ICK.  I guess a more match-and-simplify way would be
> >
> >    (with
> >     {
> >       tree rhs1, rhs2;
> >       gimple_match_op op (gimple_match_cond::UNCOND, op,
> >                                       type, @1, @3);
> >       if (op.resimplify (NULL, valueize)
> >           && gimple_simplified_result_is_gimple_val (op))
> >        {
> >          rhs1 = op.ops[0];
> >          ... other operand ...
> >        }
> >
> > now in theory we could invent some new syntax for this, like
> >
> >  (simplify
> >   (op (vec_cond:s @0 @1 @2) @3)
> >   (vec_cond @0 (op:x @1 @3) (op:x @2 @3)))
> >
> > and pick something better instead of :x (:s is taken,
> > would be 'simplified', :c is taken would be 'constexpr', ...).
> >
> > _Maybe_ just
> >
> >  (simplify
> >   (op (vec_cond:s @0 @1 @2) @3)
> >   (vec_cond:x @0 (op @1 @3) (op @2 @3)))
> >
> > which would have the same practical meaning as passing
> > NULL for the seq argument to simplification - do not allow
> > any intermediate stmt to be generated.
>
> Note I specifically do not like those if (it-simplifies) checks
> because we already would code-generate those anyway.  For
>
> (simplify
>   (plus (vec_cond:s @0 @1 @2) @3)
>   (vec_cond @0 (plus @1 @3) (plus @2 @3)))
>
> we get
>
>                     res_op->set_op (VEC_COND_EXPR, type, 3);
>                     res_op->ops[0] = captures[1];
>                     res_op->ops[0] = unshare_expr (res_op->ops[0]);
>                     {
>                       tree _o1[2], _r1;
>                       _o1[0] = captures[2];
>                       _o1[1] = captures[4];
>                       gimple_match_op tem_op (res_op->cond.any_else
> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
>                       tem_op.resimplify (lseq, valueize);
>                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (****)
>                       if (!_r1) return false;
>                       res_op->ops[1] = _r1;
>                     }
>                     {
>                       tree _o1[2], _r1;
>                       _o1[0] = captures[3];
>                       _o1[1] = captures[4];
>                       gimple_match_op tem_op (res_op->cond.any_else
> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
>                       tem_op.resimplify (lseq, valueize);
>                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (***)
>                       if (!_r1) return false;
>                       res_op->ops[2] = _r1;
>                     }
>                     res_op->resimplify (lseq, valueize);
>                     return true;
>
> and the only change required would be to pass NULL to maybe_push_res_to_seq
> here instead of lseq at the (***) marked points.

(simplify
  (plus (vec_cond:s @0 @1 @2) @3)
  (vec_cond:l @0 (plus @1 @3) (plus @2 @3)))

'l' for 'force leaf'.  I'll see if I can quickly cme up with a patch.

Richard.



> Richard.
>
> > The other "simple" patterns look good, you can commit
> > them separately if you like.
> >
> > Richard.
> >
> > > 2020-07-30  Marc Glisse  <marc.glisse@inria.fr>
> > >
> > >         PR tree-optimization/95906
> > >         PR target/70314
> > >         * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> > >         (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> > >
> > >         * gcc.dg/tree-ssa/andnot-2.c: New file.
> > >         * gcc.dg/tree-ssa/pr95906.c: Likewise.
> > >         * gcc.target/i386/pr70314.c: Likewise.
> > >
> > > --
> > > Marc Glisse

Marc Glisse July 31, 2020, 11:57 a.m. UTC | #7

On Fri, 31 Jul 2020, Richard Biener wrote:

>>>> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b  */
>>>> +(simplify
>>>> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
>>>> + (vec_cond (bit_and @0 @3) @1 @2))
>>>
>>> Does something check automatically that @0 and @3 have compatible types?
>
> @0 should always have a vector boolean type and thus will not be generally
> compatible with @3.  But OTOH then when you see (vec_cond (vec_cond ...
> then @3 must be vector boolean as well...
>
> But in theory with AVX512 the inner vec_cond could have a SImode
> condition @0 producing a regular V4SImode vector mask for an outer
> AVX512 SSE-style vec-cond and you then would get a mismatch.

Ah, I thought the SSE-style vec_cond was impossible in AVX512 mode, at 
least I couldn't generate one in a few tests, but I didn't try very hard.

> So indeed better add a type compatibility check.

Ok, it can't hurt.

Richard Biener July 31, 2020, 12:08 p.m. UTC | #8

On Fri, Jul 31, 2020 at 1:47 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Fri, Jul 31, 2020 at 1:39 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Fri, Jul 31, 2020 at 1:35 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
> > > >
> > > > When vector comparisons were forced to use vec_cond_expr, we lost a number
> > > > of optimizations (my fault for not adding enough testcases to prevent
> > > > that). This patch tries to unwrap vec_cond_expr a bit so some
> > > > optimizations can still happen.
> > > >
> > > > I wasn't planning to add all those transformations together, but adding
> > > > one caused a regression, whose fix introduced a second regression, etc.
> > > >
> > > > Using a simple fold_binary internally looks like an ok compromise to me.
> > > > It remains cheap enough (not recursive, and vector instructions are not
> > > > that frequent), while still allowing more than const_binop (X|0 or X&X for
> > > > instance). The transformations are quite conservative with :s and folding
> > > > only if everything simplifies, we may want to relax this later. And of
> > > > course we are going to miss things like a?b:c + a?c:b -> b+c.
> > > >
> > > > In terms of number of operations, some transformations turning 2
> > > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not
> > > > look like a gain... I expect the bit_not disappears in most cases, and
> > > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
> > > >
> > > > I am a bit confused that with avx512 we get types like "vector(4)
> > > > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
> > > > -1?), but that doesn't matter for this patch.
> > > >
> > > > Regtest+bootstrap on x86_64-pc-linux-gnu
> > >
> > > +  (with
> > > +   {
> > > +     tree rhs1, rhs2 = NULL;
> > > +     rhs1 = fold_binary (op, type, @1, @3);
> > > +     if (rhs1 && is_gimple_val (rhs1))
> > > +       rhs2 = fold_binary (op, type, @2, @3);
> > >
> > > ICK.  I guess a more match-and-simplify way would be
> > >
> > >    (with
> > >     {
> > >       tree rhs1, rhs2;
> > >       gimple_match_op op (gimple_match_cond::UNCOND, op,
> > >                                       type, @1, @3);
> > >       if (op.resimplify (NULL, valueize)
> > >           && gimple_simplified_result_is_gimple_val (op))
> > >        {
> > >          rhs1 = op.ops[0];
> > >          ... other operand ...
> > >        }
> > >
> > > now in theory we could invent some new syntax for this, like
> > >
> > >  (simplify
> > >   (op (vec_cond:s @0 @1 @2) @3)
> > >   (vec_cond @0 (op:x @1 @3) (op:x @2 @3)))
> > >
> > > and pick something better instead of :x (:s is taken,
> > > would be 'simplified', :c is taken would be 'constexpr', ...).
> > >
> > > _Maybe_ just
> > >
> > >  (simplify
> > >   (op (vec_cond:s @0 @1 @2) @3)
> > >   (vec_cond:x @0 (op @1 @3) (op @2 @3)))
> > >
> > > which would have the same practical meaning as passing
> > > NULL for the seq argument to simplification - do not allow
> > > any intermediate stmt to be generated.
> >
> > Note I specifically do not like those if (it-simplifies) checks
> > because we already would code-generate those anyway.  For
> >
> > (simplify
> >   (plus (vec_cond:s @0 @1 @2) @3)
> >   (vec_cond @0 (plus @1 @3) (plus @2 @3)))
> >
> > we get
> >
> >                     res_op->set_op (VEC_COND_EXPR, type, 3);
> >                     res_op->ops[0] = captures[1];
> >                     res_op->ops[0] = unshare_expr (res_op->ops[0]);
> >                     {
> >                       tree _o1[2], _r1;
> >                       _o1[0] = captures[2];
> >                       _o1[1] = captures[4];
> >                       gimple_match_op tem_op (res_op->cond.any_else
> > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
> >                       tem_op.resimplify (lseq, valueize);
> >                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (****)
> >                       if (!_r1) return false;
> >                       res_op->ops[1] = _r1;
> >                     }
> >                     {
> >                       tree _o1[2], _r1;
> >                       _o1[0] = captures[3];
> >                       _o1[1] = captures[4];
> >                       gimple_match_op tem_op (res_op->cond.any_else
> > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
> >                       tem_op.resimplify (lseq, valueize);
> >                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (***)
> >                       if (!_r1) return false;
> >                       res_op->ops[2] = _r1;
> >                     }
> >                     res_op->resimplify (lseq, valueize);
> >                     return true;
> >
> > and the only change required would be to pass NULL to maybe_push_res_to_seq
> > here instead of lseq at the (***) marked points.
>
> (simplify
>   (plus (vec_cond:s @0 @1 @2) @3)
>   (vec_cond:l @0 (plus @1 @3) (plus @2 @3)))
>
> 'l' for 'force leaf'.  I'll see if I can quickly cme up with a patch.

The attached prototype works for

(simplify
  (plus (vec_cond:s @0 @1 @2) @3)
  (vec_cond @0 (plus:l @1 @3) (plus:l @2 @3)))

but ':...' is already taken for an explicitly specified type so I have
to think about sth better.  As you see I've also moved it to
the actual ops that should simplify.  It doesn't work on the
outermost expression but I guess it doesn't make sense there
(adding support would be possible).

Now I need some non-ambiguous syntax...  it currently
is id[:type][@cid] so maybe id[!][:type][@cid].  I guess
non-ambiguous is good enough?

Richard.

> Richard.
>
>
>
> > Richard.
> >
> > > The other "simple" patterns look good, you can commit
> > > them separately if you like.
> > >
> > > Richard.
> > >
> > > > 2020-07-30  Marc Glisse  <marc.glisse@inria.fr>
> > > >
> > > >         PR tree-optimization/95906
> > > >         PR target/70314
> > > >         * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> > > >         (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations.
> > > >
> > > >         * gcc.dg/tree-ssa/andnot-2.c: New file.
> > > >         * gcc.dg/tree-ssa/pr95906.c: Likewise.
> > > >         * gcc.target/i386/pr70314.c: Likewise.
> > > >
> > > > --
> > > > Marc Glisse

Marc Glisse July 31, 2020, 12:12 p.m. UTC | #9

On Fri, 31 Jul 2020, Richard Biener wrote:

> On Fri, Jul 31, 2020 at 1:39 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Fri, Jul 31, 2020 at 1:35 PM Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote:
>>>>
>>>> When vector comparisons were forced to use vec_cond_expr, we lost a number
>>>> of optimizations (my fault for not adding enough testcases to prevent
>>>> that). This patch tries to unwrap vec_cond_expr a bit so some
>>>> optimizations can still happen.
>>>>
>>>> I wasn't planning to add all those transformations together, but adding
>>>> one caused a regression, whose fix introduced a second regression, etc.
>>>>
>>>> Using a simple fold_binary internally looks like an ok compromise to me.
>>>> It remains cheap enough (not recursive, and vector instructions are not
>>>> that frequent), while still allowing more than const_binop (X|0 or X&X for
>>>> instance). The transformations are quite conservative with :s and folding
>>>> only if everything simplifies, we may want to relax this later. And of
>>>> course we are going to miss things like a?b:c + a?c:b -> b+c.
>>>>
>>>> In terms of number of operations, some transformations turning 2
>>>> VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not
>>>> look like a gain... I expect the bit_not disappears in most cases, and
>>>> VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR.
>>>>
>>>> I am a bit confused that with avx512 we get types like "vector(4)
>>>> <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not
>>>> -1?), but that doesn't matter for this patch.
>>>>
>>>> Regtest+bootstrap on x86_64-pc-linux-gnu
>>>
>>> +  (with
>>> +   {
>>> +     tree rhs1, rhs2 = NULL;
>>> +     rhs1 = fold_binary (op, type, @1, @3);
>>> +     if (rhs1 && is_gimple_val (rhs1))
>>> +       rhs2 = fold_binary (op, type, @2, @3);
>>>
>>> ICK.  I guess a more match-and-simplify way would be

I was focused on avoiding recursion, but indeed that's independent, I 
could have set a trivial valueize function for that.

>>>    (with
>>>     {
>>>       tree rhs1, rhs2;
>>>       gimple_match_op op (gimple_match_cond::UNCOND, op,
>>>                                       type, @1, @3);
>>>       if (op.resimplify (NULL, valueize)

Oh, so you would be ok with something that recurses without limit? I 
thought we were avoiding this because it may allow some compile time 
explosion with pathological examples.

>>>           && gimple_simplified_result_is_gimple_val (op))
>>>        {
>>>          rhs1 = op.ops[0];
>>>          ... other operand ...
>>>        }
>>>
>>> now in theory we could invent some new syntax for this, like
>>>
>>>  (simplify
>>>   (op (vec_cond:s @0 @1 @2) @3)
>>>   (vec_cond @0 (op:x @1 @3) (op:x @2 @3)))
>>>
>>> and pick something better instead of :x (:s is taken,
>>> would be 'simplified', :c is taken would be 'constexpr', ...).
>>>
>>> _Maybe_ just
>>>
>>>  (simplify
>>>   (op (vec_cond:s @0 @1 @2) @3)
>>>   (vec_cond:x @0 (op @1 @3) (op @2 @3)))
>>>
>>> which would have the same practical meaning as passing
>>> NULL for the seq argument to simplification - do not allow
>>> any intermediate stmt to be generated.
>>
>> Note I specifically do not like those if (it-simplifies) checks
>> because we already would code-generate those anyway.  For
>>
>> (simplify
>>   (plus (vec_cond:s @0 @1 @2) @3)
>>   (vec_cond @0 (plus @1 @3) (plus @2 @3)))
>>
>> we get
>>
>>                     res_op->set_op (VEC_COND_EXPR, type, 3);
>>                     res_op->ops[0] = captures[1];
>>                     res_op->ops[0] = unshare_expr (res_op->ops[0]);
>>                     {
>>                       tree _o1[2], _r1;
>>                       _o1[0] = captures[2];
>>                       _o1[1] = captures[4];
>>                       gimple_match_op tem_op (res_op->cond.any_else
>> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
>>                       tem_op.resimplify (lseq, valueize);
>>                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (****)
>>                       if (!_r1) return false;
>>                       res_op->ops[1] = _r1;
>>                     }
>>                     {
>>                       tree _o1[2], _r1;
>>                       _o1[0] = captures[3];
>>                       _o1[1] = captures[4];
>>                       gimple_match_op tem_op (res_op->cond.any_else
>> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]);
>>                       tem_op.resimplify (lseq, valueize);
>>                       _r1 = maybe_push_res_to_seq (&tem_op, lseq);  (***)
>>                       if (!_r1) return false;
>>                       res_op->ops[2] = _r1;
>>                     }
>>                     res_op->resimplify (lseq, valueize);
>>                     return true;
>>
>> and the only change required would be to pass NULL to maybe_push_res_to_seq
>> here instead of lseq at the (***) marked points.
>
> (simplify
>  (plus (vec_cond:s @0 @1 @2) @3)
>  (vec_cond:l @0 (plus @1 @3) (plus @2 @3)))
>
> 'l' for 'force leaf'.  I'll see if I can quickly cme up with a patch.

Does it have a clear meaning for GENERIC? I guess that's probably not such 
a big problem.

There are a number of transformations that we would like to perform "if 
<something> simplifies", but I don't know if it will always have exactly 
this form of "if this part of the output pattern doesn't simplify, give 
up". In some cases, we might prefer "ok if at least one of the branches 
simplifies" for instance. Or "ok if this generates at most one extra 
statement". Even for this particular transformation, I am not sure this is 
the right condition.

Your suggestion looks good (although I understand better the version with 
a mark on plus instead of vec_cond_expr), I just want to avoid wasting 
your time if it turns out we don't use it that much in the end (I have no 
idea yet).

Richard Sandiford July 31, 2020, 12:50 p.m. UTC | #10

Marc Glisse <marc.glisse@inria.fr> writes:
> On Fri, 31 Jul 2020, Richard Sandiford wrote:
>
>> Marc Glisse <marc.glisse@inria.fr> writes:
>>> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
>>> + (simplify
>>> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>>> +  (with
>>> +   {
>>> +     tree rhs1, rhs2 = NULL;
>>> +     rhs1 = fold_binary (op, type, @1, @3);
>>> +     if (rhs1 && is_gimple_val (rhs1))
>>> +       rhs2 = fold_binary (op, type, @2, @4);
>>> +   }
>>> +   (if (rhs2 && is_gimple_val (rhs2))
>>> +    (vec_cond @0 { rhs1; } { rhs2; })))))
>>> +#endif
>>
>> This one looks dangerous for potentially-trapping ops.
>
> I would expect the operation not to be folded if it can trap. Is that too 
> optimistic?

Not sure TBH.  I was thinking of “trapping” in the sense of raising
an IEEE exception, rather than in the could-throw/must-end-bb sense.
I thought match.pd applied to things like FP addition as normal and
it was up to individual patterns to check the appropriate properties.

Thanks,
Richard

Richard Biener July 31, 2020, 12:59 p.m. UTC | #11

On Fri, Jul 31, 2020 at 2:50 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Marc Glisse <marc.glisse@inria.fr> writes:
> > On Fri, 31 Jul 2020, Richard Sandiford wrote:
> >
> >> Marc Glisse <marc.glisse@inria.fr> writes:
> >>> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
> >>> + (simplify
> >>> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
> >>> +  (with
> >>> +   {
> >>> +     tree rhs1, rhs2 = NULL;
> >>> +     rhs1 = fold_binary (op, type, @1, @3);
> >>> +     if (rhs1 && is_gimple_val (rhs1))
> >>> +       rhs2 = fold_binary (op, type, @2, @4);
> >>> +   }
> >>> +   (if (rhs2 && is_gimple_val (rhs2))
> >>> +    (vec_cond @0 { rhs1; } { rhs2; })))))
> >>> +#endif
> >>
> >> This one looks dangerous for potentially-trapping ops.
> >
> > I would expect the operation not to be folded if it can trap. Is that too
> > optimistic?
>
> Not sure TBH.  I was thinking of “trapping” in the sense of raising
> an IEEE exception, rather than in the could-throw/must-end-bb sense.
> I thought match.pd applied to things like FP addition as normal and
> it was up to individual patterns to check the appropriate properties.

I think it can be indeed defered to the simplification of (op @0 @2)
because that would be invalid if it removes a IEEE exception.
The VEC_COND_EXPR itself cannot trap.

Richard.

> Thanks,
> Richard

Marc Glisse July 31, 2020, 1:01 p.m. UTC | #12

On Fri, 31 Jul 2020, Richard Sandiford wrote:

> Marc Glisse <marc.glisse@inria.fr> writes:
>> On Fri, 31 Jul 2020, Richard Sandiford wrote:
>>
>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
>>>> + (simplify
>>>> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>>>> +  (with
>>>> +   {
>>>> +     tree rhs1, rhs2 = NULL;
>>>> +     rhs1 = fold_binary (op, type, @1, @3);
>>>> +     if (rhs1 && is_gimple_val (rhs1))
>>>> +       rhs2 = fold_binary (op, type, @2, @4);
>>>> +   }
>>>> +   (if (rhs2 && is_gimple_val (rhs2))
>>>> +    (vec_cond @0 { rhs1; } { rhs2; })))))
>>>> +#endif
>>>
>>> This one looks dangerous for potentially-trapping ops.
>>
>> I would expect the operation not to be folded if it can trap. Is that too
>> optimistic?
>
> Not sure TBH.  I was thinking of “trapping” in the sense of raising
> an IEEE exception, rather than in the could-throw/must-end-bb sense.

That's what I understood from your message :-)

> I thought match.pd applied to things like FP addition as normal and
> it was up to individual patterns to check the appropriate properties.

Yes, and in this case I am delegating that to fold_binary, which already 
performs this check.

I tried with this C++ program

typedef double vecf __attribute__((vector_size(16)));
typedef long long veci __attribute__((vector_size(16)));
vecf f(veci c){
   return (c?1.:2.)/(c?3.:7.);
}

the folding happens by default, but not with -frounding-math, which seems 
like exactly what we want.

Marc Glisse July 31, 2020, 1:13 p.m. UTC | #13

On Fri, 31 Jul 2020, Marc Glisse wrote:

> On Fri, 31 Jul 2020, Richard Sandiford wrote:
>
>> Marc Glisse <marc.glisse@inria.fr> writes:
>>> On Fri, 31 Jul 2020, Richard Sandiford wrote:
>>> 
>>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>>> +/* (c ? a : b) op (c ? d : e)  -->  c ? (a op d) : (b op e) */
>>>>> + (simplify
>>>>> +  (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4))
>>>>> +  (with
>>>>> +   {
>>>>> +     tree rhs1, rhs2 = NULL;
>>>>> +     rhs1 = fold_binary (op, type, @1, @3);
>>>>> +     if (rhs1 && is_gimple_val (rhs1))
>>>>> +       rhs2 = fold_binary (op, type, @2, @4);
>>>>> +   }
>>>>> +   (if (rhs2 && is_gimple_val (rhs2))
>>>>> +    (vec_cond @0 { rhs1; } { rhs2; })))))
>>>>> +#endif
>>>> 
>>>> This one looks dangerous for potentially-trapping ops.
>>> 
>>> I would expect the operation not to be folded if it can trap. Is that too
>>> optimistic?
>> 
>> Not sure TBH.  I was thinking of “trapping” in the sense of raising
>> an IEEE exception, rather than in the could-throw/must-end-bb sense.
>
> That's what I understood from your message :-)
>
>> I thought match.pd applied to things like FP addition as normal and
>> it was up to individual patterns to check the appropriate properties.
>
> Yes, and in this case I am delegating that to fold_binary, which already 
> performs this check.
>
> I tried with this C++ program
>
> typedef double vecf __attribute__((vector_size(16)));
> typedef long long veci __attribute__((vector_size(16)));
> vecf f(veci c){
>  return (c?1.:2.)/(c?3.:7.);
> }
>
> the folding happens by default, but not with -frounding-math, which seems 
> like exactly what we want.

That was for rounding. -ftrapping-math does not disable folding of

typedef double vecf __attribute__((vector_size(16)));
typedef long long veci __attribute__((vector_size(16)));
vecf f(veci c){
     vecf z={};
   return (z+1)/(z+3);
}

despite a possible inexact flag, so it doesn't disable vec_cond_expr 
folding either.

(not sure we want to fix this unless -fno-trapping-math becomes the 
default)

VEC_COND_EXPR optimizations

Commit Message

Comments

Patch