Message ID | 20240621070942.16086-1-quic_eikagupt@quicinc.com |
---|---|
State | New |
Headers | show |
Series | MATCH: Simplify (vec CMP vec) eq/ne (vec CMP vec) [PR111150] | expand |
On Fri, Jun 21, 2024 at 9:12 AM Eikansh Gupta <quic_eikagupt@quicinc.com> wrote: > > We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a > result of (vec CMP vec). The optimization is because of the > observation that in vec_cond, (-1 != 0) is true. So, we can > generate vec_cond of xor of vec resulting in a single > VEC_COND_EXPR instead of 3. > > The patch adds match pattern for vec a, b: > (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 > (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 Why should this only work for uniform -1 and 0 vectors? It seems to me it's valid for arbitrary values, thus (a ? x : y) != (b ? x : y) -> a^b ? x : y (a ? x : y) == (b ? x : y) -> a^b ? y : x no? > PR tree-optimization/111150 > > gcc/ChangeLog: > > * match.pd: Optimization for above mentioned pattern. > > gcc/testsuite/ChangeLog: > > * gcc.dg/tree-ssa/pr111150.c: New test. > > Signed-off-by: Eikansh Gupta <quic_eikagupt@quicinc.com> > --- > gcc/match.pd | 18 ++++++++++++++++++ > gcc/testsuite/gcc.dg/tree-ssa/pr111150.c | 19 +++++++++++++++++++ > 2 files changed, 37 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index 3d0689c9312..5cb78bd7ff9 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -5522,6 +5522,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (vec_cond (bit_and (bit_not @0) @1) @2 @3))) > #endif > > +/* (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 */ > +/* (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 */ > +(for eqne (eq ne) > + (simplify > + (eqne:c (vec_cond @0 uniform_integer_cst_p@2 uniform_integer_cst_p@3) > + (vec_cond @1 @2 @3)) > + (with > + { > + tree newop1 = @2; > + tree newop2 = @3; > + if (eqne == NE_EXPR) > + std::swap (newop1, newop2); > + } > + (if (integer_all_onesp (@2) && integer_zerop (@3)) > + (vec_cond (bit_xor @0 @1) {newop1;} {newop2;}) > + (if (integer_all_onesp (@3) && integer_zerop (@2)) > + (vec_cond (bit_xor @0 @1) {newop2;} {newop1;})))))) > + > /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask > types are compatible. */ > (simplify > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > new file mode 100644 > index 00000000000..d10564fd722 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > @@ -0,0 +1,19 @@ > +/* PR tree-optimization/111150 */ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-tree-forwprop1" } */ > + > +typedef int v4si __attribute((__vector_size__(4 * sizeof(int)))); > + > +v4si f1_(v4si a, v4si b, v4si c, v4si d) { > + v4si X = a == b; > + v4si Y = c == d; > + return (X != Y); > +} > + > +v4si f2_(v4si a, v4si b, v4si c, v4si d) { > + v4si X = a == b; > + v4si Y = c == d; > + return (X == Y); > +} > + > +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 2 "forwprop1" } } */ > -- > 2.17.1 >
On Fri, Jun 21, 2024 at 1:04 AM Richard Biener <richard.guenther@gmail.com> wrote: > > On Fri, Jun 21, 2024 at 9:12 AM Eikansh Gupta <quic_eikagupt@quicinc.com> wrote: > > > > We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a > > result of (vec CMP vec). The optimization is because of the > > observation that in vec_cond, (-1 != 0) is true. So, we can > > generate vec_cond of xor of vec resulting in a single > > VEC_COND_EXPR instead of 3. > > > > The patch adds match pattern for vec a, b: > > (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 > > (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 > > Why should this only work for uniform -1 and 0 vectors? > It seems to me it's valid for arbitrary values, thus > > (a ? x : y) != (b ? x : y) -> a^b ? x : y > (a ? x : y) == (b ? x : y) -> a^b ? y : x > > no? Well I think it should be: (a ? x : y) != (b ? x : y) -> a^b ? TRUE : FALSE (a ? x : y) == (b ? x : y) -> a^b ? FALSE : TRUE In that the values of x/y . This is also true for scalar (cond) too, Gimple testcase which can be used: ``` __GIMPLE() _Bool f4_ (int a, int b, int c, int d, int e, int f) { _Bool X; _Bool Y; _Bool t; int t1; int t2; X = a == b; Y = c == d; t1 = X ? e : f; t2 = Y ? e : f; t = t1 == t2; return t; } ``` I will work with Eikansh to finish this off list. > > > PR tree-optimization/111150 > > > > gcc/ChangeLog: > > > > * match.pd: Optimization for above mentioned pattern. Oh I just noticed the changelog should be improved too. Eikansh, The wording there needs to be independent from the commit message as it gets added to ChangeLog and has no real reference back to the commit message. Thanks, Andrew Pinski > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/pr111150.c: New test. > > > > Signed-off-by: Eikansh Gupta <quic_eikagupt@quicinc.com> > > --- > > gcc/match.pd | 18 ++++++++++++++++++ > > gcc/testsuite/gcc.dg/tree-ssa/pr111150.c | 19 +++++++++++++++++++ > > 2 files changed, 37 insertions(+) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > index 3d0689c9312..5cb78bd7ff9 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -5522,6 +5522,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > (vec_cond (bit_and (bit_not @0) @1) @2 @3))) > > #endif > > > > +/* (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 */ > > +/* (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 */ > > +(for eqne (eq ne) > > + (simplify > > + (eqne:c (vec_cond @0 uniform_integer_cst_p@2 uniform_integer_cst_p@3) > > + (vec_cond @1 @2 @3)) > > + (with > > + { > > + tree newop1 = @2; > > + tree newop2 = @3; > > + if (eqne == NE_EXPR) > > + std::swap (newop1, newop2); > > + } > > + (if (integer_all_onesp (@2) && integer_zerop (@3)) > > + (vec_cond (bit_xor @0 @1) {newop1;} {newop2;}) > > + (if (integer_all_onesp (@3) && integer_zerop (@2)) > > + (vec_cond (bit_xor @0 @1) {newop2;} {newop1;})))))) > > + > > /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask > > types are compatible. */ > > (simplify > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > > new file mode 100644 > > index 00000000000..d10564fd722 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c > > @@ -0,0 +1,19 @@ > > +/* PR tree-optimization/111150 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O1 -fdump-tree-forwprop1" } */ > > + > > +typedef int v4si __attribute((__vector_size__(4 * sizeof(int)))); > > + > > +v4si f1_(v4si a, v4si b, v4si c, v4si d) { > > + v4si X = a == b; > > + v4si Y = c == d; > > + return (X != Y); > > +} > > + > > +v4si f2_(v4si a, v4si b, v4si c, v4si d) { > > + v4si X = a == b; > > + v4si Y = c == d; > > + return (X == Y); > > +} > > + > > +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 2 "forwprop1" } } */ > > -- > > 2.17.1 > >
diff --git a/gcc/match.pd b/gcc/match.pd index 3d0689c9312..5cb78bd7ff9 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -5522,6 +5522,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (vec_cond (bit_and (bit_not @0) @1) @2 @3))) #endif +/* (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 */ +/* (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 */ +(for eqne (eq ne) + (simplify + (eqne:c (vec_cond @0 uniform_integer_cst_p@2 uniform_integer_cst_p@3) + (vec_cond @1 @2 @3)) + (with + { + tree newop1 = @2; + tree newop2 = @3; + if (eqne == NE_EXPR) + std::swap (newop1, newop2); + } + (if (integer_all_onesp (@2) && integer_zerop (@3)) + (vec_cond (bit_xor @0 @1) {newop1;} {newop2;}) + (if (integer_all_onesp (@3) && integer_zerop (@2)) + (vec_cond (bit_xor @0 @1) {newop2;} {newop1;})))))) + /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask types are compatible. */ (simplify diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c new file mode 100644 index 00000000000..d10564fd722 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr111150.c @@ -0,0 +1,19 @@ +/* PR tree-optimization/111150 */ +/* { dg-do compile } */ +/* { dg-options "-O1 -fdump-tree-forwprop1" } */ + +typedef int v4si __attribute((__vector_size__(4 * sizeof(int)))); + +v4si f1_(v4si a, v4si b, v4si c, v4si d) { + v4si X = a == b; + v4si Y = c == d; + return (X != Y); +} + +v4si f2_(v4si a, v4si b, v4si c, v4si d) { + v4si X = a == b; + v4si Y = c == d; + return (X == Y); +} + +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 2 "forwprop1" } } */
We can optimize (vec_cond eq/ne vec_cond) when vec_cond is a result of (vec CMP vec). The optimization is because of the observation that in vec_cond, (-1 != 0) is true. So, we can generate vec_cond of xor of vec resulting in a single VEC_COND_EXPR instead of 3. The patch adds match pattern for vec a, b: (a ? -1 : 0) != (b ? -1 : 0) --> (a^b) ? -1 : 0 (a ? -1 : 0) == (b ? -1 : 0) --> ~(a^b) ? -1 : 0 PR tree-optimization/111150 gcc/ChangeLog: * match.pd: Optimization for above mentioned pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr111150.c: New test. Signed-off-by: Eikansh Gupta <quic_eikagupt@quicinc.com> --- gcc/match.pd | 18 ++++++++++++++++++ gcc/testsuite/gcc.dg/tree-ssa/pr111150.c | 19 +++++++++++++++++++ 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr111150.c