Message ID | alpine.DEB.2.23.453.2007291859410.6927@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
Series | VEC_COND_EXPR optimizations | expand |
Marc Glisse <marc.glisse@inria.fr> writes: > +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ > + (simplify > + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) > + (with > + { > + tree rhs1, rhs2 = NULL; > + rhs1 = fold_binary (op, type, @1, @3); > + if (rhs1 && is_gimple_val (rhs1)) > + rhs2 = fold_binary (op, type, @2, @4); > + } > + (if (rhs2 && is_gimple_val (rhs2)) > + (vec_cond @0 { rhs1; } { rhs2; }))))) > +#endif This one looks dangerous for potentially-trapping ops. > +/* (v ? w : 0) ? a : b is just (v & w) ? a : b */ > +(simplify > + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2) > + (vec_cond (bit_and @0 @3) @1 @2)) Does something check automatically that @0 and @3 have compatible types? Same question for the later folds. Thanks, Richard
On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > When vector comparisons were forced to use vec_cond_expr, we lost a number > of optimizations (my fault for not adding enough testcases to prevent > that). This patch tries to unwrap vec_cond_expr a bit so some > optimizations can still happen. > > I wasn't planning to add all those transformations together, but adding > one caused a regression, whose fix introduced a second regression, etc. > > Using a simple fold_binary internally looks like an ok compromise to me. > It remains cheap enough (not recursive, and vector instructions are not > that frequent), while still allowing more than const_binop (X|0 or X&X for > instance). The transformations are quite conservative with :s and folding > only if everything simplifies, we may want to relax this later. And of > course we are going to miss things like a?b:c + a?c:b -> b+c. > > In terms of number of operations, some transformations turning 2 > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not > look like a gain... I expect the bit_not disappears in most cases, and > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR. > > I am a bit confused that with avx512 we get types like "vector(4) > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not > -1?), but that doesn't matter for this patch. > > Regtest+bootstrap on x86_64-pc-linux-gnu + (with + { + tree rhs1, rhs2 = NULL; + rhs1 = fold_binary (op, type, @1, @3); + if (rhs1 && is_gimple_val (rhs1)) + rhs2 = fold_binary (op, type, @2, @3); ICK. I guess a more match-and-simplify way would be (with { tree rhs1, rhs2; gimple_match_op op (gimple_match_cond::UNCOND, op, type, @1, @3); if (op.resimplify (NULL, valueize) && gimple_simplified_result_is_gimple_val (op)) { rhs1 = op.ops[0]; ... other operand ... } now in theory we could invent some new syntax for this, like (simplify (op (vec_cond:s @0 @1 @2) @3) (vec_cond @0 (op:x @1 @3) (op:x @2 @3))) and pick something better instead of :x (:s is taken, would be 'simplified', :c is taken would be 'constexpr', ...). _Maybe_ just (simplify (op (vec_cond:s @0 @1 @2) @3) (vec_cond:x @0 (op @1 @3) (op @2 @3))) which would have the same practical meaning as passing NULL for the seq argument to simplification - do not allow any intermediate stmt to be generated. The other "simple" patterns look good, you can commit them separately if you like. Richard. > 2020-07-30 Marc Glisse <marc.glisse@inria.fr> > > PR tree-optimization/95906 > PR target/70314 > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations. > > * gcc.dg/tree-ssa/andnot-2.c: New file. > * gcc.dg/tree-ssa/pr95906.c: Likewise. > * gcc.target/i386/pr70314.c: Likewise. > > -- > Marc Glisse
On Fri, 31 Jul 2020, Richard Sandiford wrote: > Marc Glisse <marc.glisse@inria.fr> writes: >> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ >> + (simplify >> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) >> + (with >> + { >> + tree rhs1, rhs2 = NULL; >> + rhs1 = fold_binary (op, type, @1, @3); >> + if (rhs1 && is_gimple_val (rhs1)) >> + rhs2 = fold_binary (op, type, @2, @4); >> + } >> + (if (rhs2 && is_gimple_val (rhs2)) >> + (vec_cond @0 { rhs1; } { rhs2; }))))) >> +#endif > > This one looks dangerous for potentially-trapping ops. I would expect the operation not to be folded if it can trap. Is that too optimistic? >> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b */ >> +(simplify >> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2) >> + (vec_cond (bit_and @0 @3) @1 @2)) > > Does something check automatically that @0 and @3 have compatible types? My memory of this dates from before the avx512-related changes, but at least at the time, the type of the condition in vec_cond_expr had to have the same size and number of elements as the result, and have signed integral elements. Now the size constraint has changed, but it still looks like for a given number of elements and size (this last one ignored for avx512), there is essentially a single type that can appear as condition. Is this wrong for x86? For SVE? I could add some types_match conditions if that seems too unsafe...
On Fri, Jul 31, 2020 at 1:35 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > > > When vector comparisons were forced to use vec_cond_expr, we lost a number > > of optimizations (my fault for not adding enough testcases to prevent > > that). This patch tries to unwrap vec_cond_expr a bit so some > > optimizations can still happen. > > > > I wasn't planning to add all those transformations together, but adding > > one caused a regression, whose fix introduced a second regression, etc. > > > > Using a simple fold_binary internally looks like an ok compromise to me. > > It remains cheap enough (not recursive, and vector instructions are not > > that frequent), while still allowing more than const_binop (X|0 or X&X for > > instance). The transformations are quite conservative with :s and folding > > only if everything simplifies, we may want to relax this later. And of > > course we are going to miss things like a?b:c + a?c:b -> b+c. > > > > In terms of number of operations, some transformations turning 2 > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not > > look like a gain... I expect the bit_not disappears in most cases, and > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR. > > > > I am a bit confused that with avx512 we get types like "vector(4) > > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not > > -1?), but that doesn't matter for this patch. > > > > Regtest+bootstrap on x86_64-pc-linux-gnu > > + (with > + { > + tree rhs1, rhs2 = NULL; > + rhs1 = fold_binary (op, type, @1, @3); > + if (rhs1 && is_gimple_val (rhs1)) > + rhs2 = fold_binary (op, type, @2, @3); > > ICK. I guess a more match-and-simplify way would be > > (with > { > tree rhs1, rhs2; > gimple_match_op op (gimple_match_cond::UNCOND, op, > type, @1, @3); > if (op.resimplify (NULL, valueize) > && gimple_simplified_result_is_gimple_val (op)) > { > rhs1 = op.ops[0]; > ... other operand ... > } > > now in theory we could invent some new syntax for this, like > > (simplify > (op (vec_cond:s @0 @1 @2) @3) > (vec_cond @0 (op:x @1 @3) (op:x @2 @3))) > > and pick something better instead of :x (:s is taken, > would be 'simplified', :c is taken would be 'constexpr', ...). > > _Maybe_ just > > (simplify > (op (vec_cond:s @0 @1 @2) @3) > (vec_cond:x @0 (op @1 @3) (op @2 @3))) > > which would have the same practical meaning as passing > NULL for the seq argument to simplification - do not allow > any intermediate stmt to be generated. Note I specifically do not like those if (it-simplifies) checks because we already would code-generate those anyway. For (simplify (plus (vec_cond:s @0 @1 @2) @3) (vec_cond @0 (plus @1 @3) (plus @2 @3))) we get res_op->set_op (VEC_COND_EXPR, type, 3); res_op->ops[0] = captures[1]; res_op->ops[0] = unshare_expr (res_op->ops[0]); { tree _o1[2], _r1; _o1[0] = captures[2]; _o1[1] = captures[4]; gimple_match_op tem_op (res_op->cond.any_else (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); tem_op.resimplify (lseq, valueize); _r1 = maybe_push_res_to_seq (&tem_op, lseq); (****) if (!_r1) return false; res_op->ops[1] = _r1; } { tree _o1[2], _r1; _o1[0] = captures[3]; _o1[1] = captures[4]; gimple_match_op tem_op (res_op->cond.any_else (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); tem_op.resimplify (lseq, valueize); _r1 = maybe_push_res_to_seq (&tem_op, lseq); (***) if (!_r1) return false; res_op->ops[2] = _r1; } res_op->resimplify (lseq, valueize); return true; and the only change required would be to pass NULL to maybe_push_res_to_seq here instead of lseq at the (***) marked points. Richard. > The other "simple" patterns look good, you can commit > them separately if you like. > > Richard. > > > 2020-07-30 Marc Glisse <marc.glisse@inria.fr> > > > > PR tree-optimization/95906 > > PR target/70314 > > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), > > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations. > > > > * gcc.dg/tree-ssa/andnot-2.c: New file. > > * gcc.dg/tree-ssa/pr95906.c: Likewise. > > * gcc.target/i386/pr70314.c: Likewise. > > > > -- > > Marc Glisse
On Fri, Jul 31, 2020 at 1:39 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Fri, 31 Jul 2020, Richard Sandiford wrote: > > > Marc Glisse <marc.glisse@inria.fr> writes: > >> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ > >> + (simplify > >> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) > >> + (with > >> + { > >> + tree rhs1, rhs2 = NULL; > >> + rhs1 = fold_binary (op, type, @1, @3); > >> + if (rhs1 && is_gimple_val (rhs1)) > >> + rhs2 = fold_binary (op, type, @2, @4); > >> + } > >> + (if (rhs2 && is_gimple_val (rhs2)) > >> + (vec_cond @0 { rhs1; } { rhs2; }))))) > >> +#endif > > > > This one looks dangerous for potentially-trapping ops. > > I would expect the operation not to be folded if it can trap. Is that too > optimistic? > > >> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b */ > >> +(simplify > >> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2) > >> + (vec_cond (bit_and @0 @3) @1 @2)) > > > > Does something check automatically that @0 and @3 have compatible types? @0 should always have a vector boolean type and thus will not be generally compatible with @3. But OTOH then when you see (vec_cond (vec_cond ... then @3 must be vector boolean as well... But in theory with AVX512 the inner vec_cond could have a SImode condition @0 producing a regular V4SImode vector mask for an outer AVX512 SSE-style vec-cond and you then would get a mismatch. So indeed better add a type compatibility check. > My memory of this dates from before the avx512-related changes, but at > least at the time, the type of the condition in vec_cond_expr had to have > the same size and number of elements as the result, and have signed > integral elements. Now the size constraint has changed, but it still looks > like for a given number of elements and size (this last one ignored for > avx512), there is essentially a single type that can appear as condition. > Is this wrong for x86? For SVE? > > I could add some types_match conditions if that seems too unsafe... > > -- > Marc Glisse
On Fri, Jul 31, 2020 at 1:39 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jul 31, 2020 at 1:35 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > > > > > When vector comparisons were forced to use vec_cond_expr, we lost a number > > > of optimizations (my fault for not adding enough testcases to prevent > > > that). This patch tries to unwrap vec_cond_expr a bit so some > > > optimizations can still happen. > > > > > > I wasn't planning to add all those transformations together, but adding > > > one caused a regression, whose fix introduced a second regression, etc. > > > > > > Using a simple fold_binary internally looks like an ok compromise to me. > > > It remains cheap enough (not recursive, and vector instructions are not > > > that frequent), while still allowing more than const_binop (X|0 or X&X for > > > instance). The transformations are quite conservative with :s and folding > > > only if everything simplifies, we may want to relax this later. And of > > > course we are going to miss things like a?b:c + a?c:b -> b+c. > > > > > > In terms of number of operations, some transformations turning 2 > > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not > > > look like a gain... I expect the bit_not disappears in most cases, and > > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR. > > > > > > I am a bit confused that with avx512 we get types like "vector(4) > > > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not > > > -1?), but that doesn't matter for this patch. > > > > > > Regtest+bootstrap on x86_64-pc-linux-gnu > > > > + (with > > + { > > + tree rhs1, rhs2 = NULL; > > + rhs1 = fold_binary (op, type, @1, @3); > > + if (rhs1 && is_gimple_val (rhs1)) > > + rhs2 = fold_binary (op, type, @2, @3); > > > > ICK. I guess a more match-and-simplify way would be > > > > (with > > { > > tree rhs1, rhs2; > > gimple_match_op op (gimple_match_cond::UNCOND, op, > > type, @1, @3); > > if (op.resimplify (NULL, valueize) > > && gimple_simplified_result_is_gimple_val (op)) > > { > > rhs1 = op.ops[0]; > > ... other operand ... > > } > > > > now in theory we could invent some new syntax for this, like > > > > (simplify > > (op (vec_cond:s @0 @1 @2) @3) > > (vec_cond @0 (op:x @1 @3) (op:x @2 @3))) > > > > and pick something better instead of :x (:s is taken, > > would be 'simplified', :c is taken would be 'constexpr', ...). > > > > _Maybe_ just > > > > (simplify > > (op (vec_cond:s @0 @1 @2) @3) > > (vec_cond:x @0 (op @1 @3) (op @2 @3))) > > > > which would have the same practical meaning as passing > > NULL for the seq argument to simplification - do not allow > > any intermediate stmt to be generated. > > Note I specifically do not like those if (it-simplifies) checks > because we already would code-generate those anyway. For > > (simplify > (plus (vec_cond:s @0 @1 @2) @3) > (vec_cond @0 (plus @1 @3) (plus @2 @3))) > > we get > > res_op->set_op (VEC_COND_EXPR, type, 3); > res_op->ops[0] = captures[1]; > res_op->ops[0] = unshare_expr (res_op->ops[0]); > { > tree _o1[2], _r1; > _o1[0] = captures[2]; > _o1[1] = captures[4]; > gimple_match_op tem_op (res_op->cond.any_else > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); > tem_op.resimplify (lseq, valueize); > _r1 = maybe_push_res_to_seq (&tem_op, lseq); (****) > if (!_r1) return false; > res_op->ops[1] = _r1; > } > { > tree _o1[2], _r1; > _o1[0] = captures[3]; > _o1[1] = captures[4]; > gimple_match_op tem_op (res_op->cond.any_else > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); > tem_op.resimplify (lseq, valueize); > _r1 = maybe_push_res_to_seq (&tem_op, lseq); (***) > if (!_r1) return false; > res_op->ops[2] = _r1; > } > res_op->resimplify (lseq, valueize); > return true; > > and the only change required would be to pass NULL to maybe_push_res_to_seq > here instead of lseq at the (***) marked points. (simplify (plus (vec_cond:s @0 @1 @2) @3) (vec_cond:l @0 (plus @1 @3) (plus @2 @3))) 'l' for 'force leaf'. I'll see if I can quickly cme up with a patch. Richard. > Richard. > > > The other "simple" patterns look good, you can commit > > them separately if you like. > > > > Richard. > > > > > 2020-07-30 Marc Glisse <marc.glisse@inria.fr> > > > > > > PR tree-optimization/95906 > > > PR target/70314 > > > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), > > > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations. > > > > > > * gcc.dg/tree-ssa/andnot-2.c: New file. > > > * gcc.dg/tree-ssa/pr95906.c: Likewise. > > > * gcc.target/i386/pr70314.c: Likewise. > > > > > > -- > > > Marc Glisse
On Fri, 31 Jul 2020, Richard Biener wrote: >>>> +/* (v ? w : 0) ? a : b is just (v & w) ? a : b */ >>>> +(simplify >>>> + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2) >>>> + (vec_cond (bit_and @0 @3) @1 @2)) >>> >>> Does something check automatically that @0 and @3 have compatible types? > > @0 should always have a vector boolean type and thus will not be generally > compatible with @3. But OTOH then when you see (vec_cond (vec_cond ... > then @3 must be vector boolean as well... > > But in theory with AVX512 the inner vec_cond could have a SImode > condition @0 producing a regular V4SImode vector mask for an outer > AVX512 SSE-style vec-cond and you then would get a mismatch. Ah, I thought the SSE-style vec_cond was impossible in AVX512 mode, at least I couldn't generate one in a few tests, but I didn't try very hard. > So indeed better add a type compatibility check. Ok, it can't hurt.
On Fri, Jul 31, 2020 at 1:47 PM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jul 31, 2020 at 1:39 PM Richard Biener > <richard.guenther@gmail.com> wrote: > > > > On Fri, Jul 31, 2020 at 1:35 PM Richard Biener > > <richard.guenther@gmail.com> wrote: > > > > > > On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote: > > > > > > > > When vector comparisons were forced to use vec_cond_expr, we lost a number > > > > of optimizations (my fault for not adding enough testcases to prevent > > > > that). This patch tries to unwrap vec_cond_expr a bit so some > > > > optimizations can still happen. > > > > > > > > I wasn't planning to add all those transformations together, but adding > > > > one caused a regression, whose fix introduced a second regression, etc. > > > > > > > > Using a simple fold_binary internally looks like an ok compromise to me. > > > > It remains cheap enough (not recursive, and vector instructions are not > > > > that frequent), while still allowing more than const_binop (X|0 or X&X for > > > > instance). The transformations are quite conservative with :s and folding > > > > only if everything simplifies, we may want to relax this later. And of > > > > course we are going to miss things like a?b:c + a?c:b -> b+c. > > > > > > > > In terms of number of operations, some transformations turning 2 > > > > VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not > > > > look like a gain... I expect the bit_not disappears in most cases, and > > > > VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR. > > > > > > > > I am a bit confused that with avx512 we get types like "vector(4) > > > > <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not > > > > -1?), but that doesn't matter for this patch. > > > > > > > > Regtest+bootstrap on x86_64-pc-linux-gnu > > > > > > + (with > > > + { > > > + tree rhs1, rhs2 = NULL; > > > + rhs1 = fold_binary (op, type, @1, @3); > > > + if (rhs1 && is_gimple_val (rhs1)) > > > + rhs2 = fold_binary (op, type, @2, @3); > > > > > > ICK. I guess a more match-and-simplify way would be > > > > > > (with > > > { > > > tree rhs1, rhs2; > > > gimple_match_op op (gimple_match_cond::UNCOND, op, > > > type, @1, @3); > > > if (op.resimplify (NULL, valueize) > > > && gimple_simplified_result_is_gimple_val (op)) > > > { > > > rhs1 = op.ops[0]; > > > ... other operand ... > > > } > > > > > > now in theory we could invent some new syntax for this, like > > > > > > (simplify > > > (op (vec_cond:s @0 @1 @2) @3) > > > (vec_cond @0 (op:x @1 @3) (op:x @2 @3))) > > > > > > and pick something better instead of :x (:s is taken, > > > would be 'simplified', :c is taken would be 'constexpr', ...). > > > > > > _Maybe_ just > > > > > > (simplify > > > (op (vec_cond:s @0 @1 @2) @3) > > > (vec_cond:x @0 (op @1 @3) (op @2 @3))) > > > > > > which would have the same practical meaning as passing > > > NULL for the seq argument to simplification - do not allow > > > any intermediate stmt to be generated. > > > > Note I specifically do not like those if (it-simplifies) checks > > because we already would code-generate those anyway. For > > > > (simplify > > (plus (vec_cond:s @0 @1 @2) @3) > > (vec_cond @0 (plus @1 @3) (plus @2 @3))) > > > > we get > > > > res_op->set_op (VEC_COND_EXPR, type, 3); > > res_op->ops[0] = captures[1]; > > res_op->ops[0] = unshare_expr (res_op->ops[0]); > > { > > tree _o1[2], _r1; > > _o1[0] = captures[2]; > > _o1[1] = captures[4]; > > gimple_match_op tem_op (res_op->cond.any_else > > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); > > tem_op.resimplify (lseq, valueize); > > _r1 = maybe_push_res_to_seq (&tem_op, lseq); (****) > > if (!_r1) return false; > > res_op->ops[1] = _r1; > > } > > { > > tree _o1[2], _r1; > > _o1[0] = captures[3]; > > _o1[1] = captures[4]; > > gimple_match_op tem_op (res_op->cond.any_else > > (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); > > tem_op.resimplify (lseq, valueize); > > _r1 = maybe_push_res_to_seq (&tem_op, lseq); (***) > > if (!_r1) return false; > > res_op->ops[2] = _r1; > > } > > res_op->resimplify (lseq, valueize); > > return true; > > > > and the only change required would be to pass NULL to maybe_push_res_to_seq > > here instead of lseq at the (***) marked points. > > (simplify > (plus (vec_cond:s @0 @1 @2) @3) > (vec_cond:l @0 (plus @1 @3) (plus @2 @3))) > > 'l' for 'force leaf'. I'll see if I can quickly cme up with a patch. The attached prototype works for (simplify (plus (vec_cond:s @0 @1 @2) @3) (vec_cond @0 (plus:l @1 @3) (plus:l @2 @3))) but ':...' is already taken for an explicitly specified type so I have to think about sth better. As you see I've also moved it to the actual ops that should simplify. It doesn't work on the outermost expression but I guess it doesn't make sense there (adding support would be possible). Now I need some non-ambiguous syntax... it currently is id[:type][@cid] so maybe id[!][:type][@cid]. I guess non-ambiguous is good enough? Richard. > Richard. > > > > > Richard. > > > > > The other "simple" patterns look good, you can commit > > > them separately if you like. > > > > > > Richard. > > > > > > > 2020-07-30 Marc Glisse <marc.glisse@inria.fr> > > > > > > > > PR tree-optimization/95906 > > > > PR target/70314 > > > > * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), > > > > (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): New transformations. > > > > > > > > * gcc.dg/tree-ssa/andnot-2.c: New file. > > > > * gcc.dg/tree-ssa/pr95906.c: Likewise. > > > > * gcc.target/i386/pr70314.c: Likewise. > > > > > > > > -- > > > > Marc Glisse
On Fri, 31 Jul 2020, Richard Biener wrote: > On Fri, Jul 31, 2020 at 1:39 PM Richard Biener > <richard.guenther@gmail.com> wrote: >> >> On Fri, Jul 31, 2020 at 1:35 PM Richard Biener >> <richard.guenther@gmail.com> wrote: >>> >>> On Thu, Jul 30, 2020 at 9:49 AM Marc Glisse <marc.glisse@inria.fr> wrote: >>>> >>>> When vector comparisons were forced to use vec_cond_expr, we lost a number >>>> of optimizations (my fault for not adding enough testcases to prevent >>>> that). This patch tries to unwrap vec_cond_expr a bit so some >>>> optimizations can still happen. >>>> >>>> I wasn't planning to add all those transformations together, but adding >>>> one caused a regression, whose fix introduced a second regression, etc. >>>> >>>> Using a simple fold_binary internally looks like an ok compromise to me. >>>> It remains cheap enough (not recursive, and vector instructions are not >>>> that frequent), while still allowing more than const_binop (X|0 or X&X for >>>> instance). The transformations are quite conservative with :s and folding >>>> only if everything simplifies, we may want to relax this later. And of >>>> course we are going to miss things like a?b:c + a?c:b -> b+c. >>>> >>>> In terms of number of operations, some transformations turning 2 >>>> VEC_COND_EXPR into VEC_COND_EXPR + BIT_IOR_EXPR + BIT_NOT_EXPR might not >>>> look like a gain... I expect the bit_not disappears in most cases, and >>>> VEC_COND_EXPR looks more costly than a simpler BIT_IOR_EXPR. >>>> >>>> I am a bit confused that with avx512 we get types like "vector(4) >>>> <signed-boolean:2>" with :2 and not :1 (is it a hack so true is 1 and not >>>> -1?), but that doesn't matter for this patch. >>>> >>>> Regtest+bootstrap on x86_64-pc-linux-gnu >>> >>> + (with >>> + { >>> + tree rhs1, rhs2 = NULL; >>> + rhs1 = fold_binary (op, type, @1, @3); >>> + if (rhs1 && is_gimple_val (rhs1)) >>> + rhs2 = fold_binary (op, type, @2, @3); >>> >>> ICK. I guess a more match-and-simplify way would be I was focused on avoiding recursion, but indeed that's independent, I could have set a trivial valueize function for that. >>> (with >>> { >>> tree rhs1, rhs2; >>> gimple_match_op op (gimple_match_cond::UNCOND, op, >>> type, @1, @3); >>> if (op.resimplify (NULL, valueize) Oh, so you would be ok with something that recurses without limit? I thought we were avoiding this because it may allow some compile time explosion with pathological examples. >>> && gimple_simplified_result_is_gimple_val (op)) >>> { >>> rhs1 = op.ops[0]; >>> ... other operand ... >>> } >>> >>> now in theory we could invent some new syntax for this, like >>> >>> (simplify >>> (op (vec_cond:s @0 @1 @2) @3) >>> (vec_cond @0 (op:x @1 @3) (op:x @2 @3))) >>> >>> and pick something better instead of :x (:s is taken, >>> would be 'simplified', :c is taken would be 'constexpr', ...). >>> >>> _Maybe_ just >>> >>> (simplify >>> (op (vec_cond:s @0 @1 @2) @3) >>> (vec_cond:x @0 (op @1 @3) (op @2 @3))) >>> >>> which would have the same practical meaning as passing >>> NULL for the seq argument to simplification - do not allow >>> any intermediate stmt to be generated. >> >> Note I specifically do not like those if (it-simplifies) checks >> because we already would code-generate those anyway. For >> >> (simplify >> (plus (vec_cond:s @0 @1 @2) @3) >> (vec_cond @0 (plus @1 @3) (plus @2 @3))) >> >> we get >> >> res_op->set_op (VEC_COND_EXPR, type, 3); >> res_op->ops[0] = captures[1]; >> res_op->ops[0] = unshare_expr (res_op->ops[0]); >> { >> tree _o1[2], _r1; >> _o1[0] = captures[2]; >> _o1[1] = captures[4]; >> gimple_match_op tem_op (res_op->cond.any_else >> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); >> tem_op.resimplify (lseq, valueize); >> _r1 = maybe_push_res_to_seq (&tem_op, lseq); (****) >> if (!_r1) return false; >> res_op->ops[1] = _r1; >> } >> { >> tree _o1[2], _r1; >> _o1[0] = captures[3]; >> _o1[1] = captures[4]; >> gimple_match_op tem_op (res_op->cond.any_else >> (), PLUS_EXPR, TREE_TYPE (_o1[0]), _o1[0], _o1[1]); >> tem_op.resimplify (lseq, valueize); >> _r1 = maybe_push_res_to_seq (&tem_op, lseq); (***) >> if (!_r1) return false; >> res_op->ops[2] = _r1; >> } >> res_op->resimplify (lseq, valueize); >> return true; >> >> and the only change required would be to pass NULL to maybe_push_res_to_seq >> here instead of lseq at the (***) marked points. > > (simplify > (plus (vec_cond:s @0 @1 @2) @3) > (vec_cond:l @0 (plus @1 @3) (plus @2 @3))) > > 'l' for 'force leaf'. I'll see if I can quickly cme up with a patch. Does it have a clear meaning for GENERIC? I guess that's probably not such a big problem. There are a number of transformations that we would like to perform "if <something> simplifies", but I don't know if it will always have exactly this form of "if this part of the output pattern doesn't simplify, give up". In some cases, we might prefer "ok if at least one of the branches simplifies" for instance. Or "ok if this generates at most one extra statement". Even for this particular transformation, I am not sure this is the right condition. Your suggestion looks good (although I understand better the version with a mark on plus instead of vec_cond_expr), I just want to avoid wasting your time if it turns out we don't use it that much in the end (I have no idea yet).
Marc Glisse <marc.glisse@inria.fr> writes: > On Fri, 31 Jul 2020, Richard Sandiford wrote: > >> Marc Glisse <marc.glisse@inria.fr> writes: >>> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ >>> + (simplify >>> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) >>> + (with >>> + { >>> + tree rhs1, rhs2 = NULL; >>> + rhs1 = fold_binary (op, type, @1, @3); >>> + if (rhs1 && is_gimple_val (rhs1)) >>> + rhs2 = fold_binary (op, type, @2, @4); >>> + } >>> + (if (rhs2 && is_gimple_val (rhs2)) >>> + (vec_cond @0 { rhs1; } { rhs2; }))))) >>> +#endif >> >> This one looks dangerous for potentially-trapping ops. > > I would expect the operation not to be folded if it can trap. Is that too > optimistic? Not sure TBH. I was thinking of “trapping” in the sense of raising an IEEE exception, rather than in the could-throw/must-end-bb sense. I thought match.pd applied to things like FP addition as normal and it was up to individual patterns to check the appropriate properties. Thanks, Richard
On Fri, Jul 31, 2020 at 2:50 PM Richard Sandiford <richard.sandiford@arm.com> wrote: > > Marc Glisse <marc.glisse@inria.fr> writes: > > On Fri, 31 Jul 2020, Richard Sandiford wrote: > > > >> Marc Glisse <marc.glisse@inria.fr> writes: > >>> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ > >>> + (simplify > >>> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) > >>> + (with > >>> + { > >>> + tree rhs1, rhs2 = NULL; > >>> + rhs1 = fold_binary (op, type, @1, @3); > >>> + if (rhs1 && is_gimple_val (rhs1)) > >>> + rhs2 = fold_binary (op, type, @2, @4); > >>> + } > >>> + (if (rhs2 && is_gimple_val (rhs2)) > >>> + (vec_cond @0 { rhs1; } { rhs2; }))))) > >>> +#endif > >> > >> This one looks dangerous for potentially-trapping ops. > > > > I would expect the operation not to be folded if it can trap. Is that too > > optimistic? > > Not sure TBH. I was thinking of “trapping” in the sense of raising > an IEEE exception, rather than in the could-throw/must-end-bb sense. > I thought match.pd applied to things like FP addition as normal and > it was up to individual patterns to check the appropriate properties. I think it can be indeed defered to the simplification of (op @0 @2) because that would be invalid if it removes a IEEE exception. The VEC_COND_EXPR itself cannot trap. Richard. > Thanks, > Richard
On Fri, 31 Jul 2020, Richard Sandiford wrote: > Marc Glisse <marc.glisse@inria.fr> writes: >> On Fri, 31 Jul 2020, Richard Sandiford wrote: >> >>> Marc Glisse <marc.glisse@inria.fr> writes: >>>> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ >>>> + (simplify >>>> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) >>>> + (with >>>> + { >>>> + tree rhs1, rhs2 = NULL; >>>> + rhs1 = fold_binary (op, type, @1, @3); >>>> + if (rhs1 && is_gimple_val (rhs1)) >>>> + rhs2 = fold_binary (op, type, @2, @4); >>>> + } >>>> + (if (rhs2 && is_gimple_val (rhs2)) >>>> + (vec_cond @0 { rhs1; } { rhs2; }))))) >>>> +#endif >>> >>> This one looks dangerous for potentially-trapping ops. >> >> I would expect the operation not to be folded if it can trap. Is that too >> optimistic? > > Not sure TBH. I was thinking of “trapping” in the sense of raising > an IEEE exception, rather than in the could-throw/must-end-bb sense. That's what I understood from your message :-) > I thought match.pd applied to things like FP addition as normal and > it was up to individual patterns to check the appropriate properties. Yes, and in this case I am delegating that to fold_binary, which already performs this check. I tried with this C++ program typedef double vecf __attribute__((vector_size(16))); typedef long long veci __attribute__((vector_size(16))); vecf f(veci c){ return (c?1.:2.)/(c?3.:7.); } the folding happens by default, but not with -frounding-math, which seems like exactly what we want.
On Fri, 31 Jul 2020, Marc Glisse wrote: > On Fri, 31 Jul 2020, Richard Sandiford wrote: > >> Marc Glisse <marc.glisse@inria.fr> writes: >>> On Fri, 31 Jul 2020, Richard Sandiford wrote: >>> >>>> Marc Glisse <marc.glisse@inria.fr> writes: >>>>> +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ >>>>> + (simplify >>>>> + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) >>>>> + (with >>>>> + { >>>>> + tree rhs1, rhs2 = NULL; >>>>> + rhs1 = fold_binary (op, type, @1, @3); >>>>> + if (rhs1 && is_gimple_val (rhs1)) >>>>> + rhs2 = fold_binary (op, type, @2, @4); >>>>> + } >>>>> + (if (rhs2 && is_gimple_val (rhs2)) >>>>> + (vec_cond @0 { rhs1; } { rhs2; }))))) >>>>> +#endif >>>> >>>> This one looks dangerous for potentially-trapping ops. >>> >>> I would expect the operation not to be folded if it can trap. Is that too >>> optimistic? >> >> Not sure TBH. I was thinking of “trapping” in the sense of raising >> an IEEE exception, rather than in the could-throw/must-end-bb sense. > > That's what I understood from your message :-) > >> I thought match.pd applied to things like FP addition as normal and >> it was up to individual patterns to check the appropriate properties. > > Yes, and in this case I am delegating that to fold_binary, which already > performs this check. > > I tried with this C++ program > > typedef double vecf __attribute__((vector_size(16))); > typedef long long veci __attribute__((vector_size(16))); > vecf f(veci c){ > return (c?1.:2.)/(c?3.:7.); > } > > the folding happens by default, but not with -frounding-math, which seems > like exactly what we want. That was for rounding. -ftrapping-math does not disable folding of typedef double vecf __attribute__((vector_size(16))); typedef long long veci __attribute__((vector_size(16))); vecf f(veci c){ vecf z={}; return (z+1)/(z+3); } despite a possible inexact flag, so it doesn't disable vec_cond_expr folding either. (not sure we want to fix this unless -fno-trapping-math becomes the default)
diff --git a/gcc/match.pd b/gcc/match.pd index c6ae7a7db7a..af52d56162b 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3451,6 +3451,77 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (cst1 && cst2) (vec_cond @0 { cst1; } { cst2; }))))) +/* Sink binary operation to branches, but only if we can fold it. */ +#if GIMPLE +(for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor + rdiv trunc_div ceil_div floor_div round_div + trunc_mod ceil_mod floor_mod round_mod min max) +/* (c ? a : b) op d --> c ? (a op d) : (b op d) */ + (simplify + (op (vec_cond:s @0 @1 @2) @3) + (with + { + tree rhs1, rhs2 = NULL; + rhs1 = fold_binary (op, type, @1, @3); + if (rhs1 && is_gimple_val (rhs1)) + rhs2 = fold_binary (op, type, @2, @3); + } + (if (rhs2 && is_gimple_val (rhs2)) + (vec_cond @0 { rhs1; } { rhs2; })))) + (simplify + (op @3 (vec_cond:s @0 @1 @2)) + (with + { + tree rhs1, rhs2 = NULL; + rhs1 = fold_binary (op, type, @3, @1); + if (rhs1 && is_gimple_val (rhs1)) + rhs2 = fold_binary (op, type, @3, @2); + } + (if (rhs2 && is_gimple_val (rhs2)) + (vec_cond @0 { rhs1; } { rhs2; })))) + +/* (c ? a : b) op (c ? d : e) --> c ? (a op d) : (b op e) */ + (simplify + (op (vec_cond:s @0 @1 @2) (vec_cond:s @0 @3 @4)) + (with + { + tree rhs1, rhs2 = NULL; + rhs1 = fold_binary (op, type, @1, @3); + if (rhs1 && is_gimple_val (rhs1)) + rhs2 = fold_binary (op, type, @2, @4); + } + (if (rhs2 && is_gimple_val (rhs2)) + (vec_cond @0 { rhs1; } { rhs2; }))))) +#endif + +/* (v ? w : 0) ? a : b is just (v & w) ? a : b */ +(simplify + (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2) + (vec_cond (bit_and @0 @3) @1 @2)) +(simplify + (vec_cond (vec_cond:s @0 integer_all_onesp @3) @1 @2) + (vec_cond (bit_ior @0 @3) @1 @2)) +(simplify + (vec_cond (vec_cond:s @0 integer_zerop @3) @1 @2) + (vec_cond (bit_ior @0 (bit_not @3)) @2 @1)) +(simplify + (vec_cond (vec_cond:s @0 @3 integer_all_onesp) @1 @2) + (vec_cond (bit_and @0 (bit_not @3)) @2 @1)) + +/* c1 ? c2 ? a : b : b --> (c1 & c2) ? a : b */ +(simplify + (vec_cond @0 (vec_cond:s @1 @2 @3) @3) + (vec_cond (bit_and @0 @1) @2 @3)) +(simplify + (vec_cond @0 @2 (vec_cond:s @1 @2 @3)) + (vec_cond (bit_ior @0 @1) @2 @3)) +(simplify + (vec_cond @0 (vec_cond:s @1 @2 @3) @2) + (vec_cond (bit_ior (bit_not @0) @1) @2 @3)) +(simplify + (vec_cond @0 @3 (vec_cond:s @1 @2 @3)) + (vec_cond (bit_and (bit_not @0) @1) @2 @3)) + /* Simplification moved from fold_cond_expr_with_comparison. It may also be extended. */ /* This pattern implements two kinds simplification: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c new file mode 100644 index 00000000000..e0955ce3ffd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/andnot-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop3-raw -w -Wno-psabi" } */ + +typedef long vec __attribute__((vector_size(16))); +vec f(vec x){ + vec y = x < 10; + return y & (y == 0); +} + +/* { dg-final { scan-tree-dump-not "_expr" "forwprop3" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c new file mode 100644 index 00000000000..3d820a58e93 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95906.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-forwprop3-raw -w -Wno-psabi" } */ + +// FIXME: this should further optimize to a MAX_EXPR +typedef signed char v16i8 __attribute__((vector_size(16))); +v16i8 f(v16i8 a, v16i8 b) +{ + v16i8 cmp = (a > b); + return (cmp & a) | (~cmp & b); +} + +/* { dg-final { scan-tree-dump-not "bit_(and|ior)_expr" "forwprop3" } } */ +/* { dg-final { scan-tree-dump-times "vec_cond_expr" 1 "forwprop3" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr70314.c b/gcc/testsuite/gcc.target/i386/pr70314.c new file mode 100644 index 00000000000..aad8dd9b57e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr70314.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-march=skylake-avx512 -O2" } */ +/* { dg-final { scan-assembler-times "cmp" 2 } } */ +/* { dg-final { scan-assembler-not "and" } } */ + +typedef long vec __attribute__((vector_size(16))); +vec f(vec x, vec y){ + return (x < 5) & (y < 8); +} + +/* On x86_64, currently + vpcmpq $2, .LC1(%rip), %xmm1, %k1 + vpcmpq $2, .LC0(%rip), %xmm0, %k0{%k1} + vpmovm2q %k0, %xmm0 +*/