Message ID | mptsffxmhbj.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | vect/aarch64: Fix various sve/cond*.c failures | expand |
On Thu, 26 Jan 2023, Richard Sandiford wrote: > Quite a few gcc.target/aarch64/sve/cond*.c tests started failing > after g:68e0063397ba820e71adc220b2da0581dce29ffa, but it turns out > that we were cheating passes before the patch. > > The tests involve comparing the cost of N wide compares, a pack > sequence, and a narrow COND_EXPR with the cost of a single COND_EXPR > on fewer elements. The costs for the former included all operations, > but the costs for the latter didn't model the comparison embedded in > the COND_EXPR. The patch made us include the comparison on both sides, > making it apples-for-apples, but that's enough to tip the balance in > favour of using the wider types. > > I think the new choice does reflect the current SVE cost model > correctly. (Whether and how the model should be tweaked is a > different question.) This patch therefore changes the tuning > vector length to one that makes the choice more obvious. > > That in turn needs a tweak to compare_inside_loop_cost. > The function compares body_cost1/vf1 with body_cost2/vf2, > but for fully-amsked loops, it limits vf to the actual number > of iterations. This is so that (say) an expensive 16-element > vector body doesn't win over a cheaper 8-element vector body > when there are only 7 elements to process. > > However, the limit was applied using known_le, regardless of > the tuning target. For a heuristic like this, it seems better > to use the likely minimum (which is a concept that was only > added after this code went in). > > g:68e0063397ba820e71adc220b2da0581dce29ffa also fixed > vcond_4_costly.c. > > Tested on aarch64-linux-gnu. OK to install? OK. > Richard > > > gcc/ > * tree-vectorizer.cc (vector_costs::compare_inside_loop_cost): > Use the likely minimum VF when bounding the denominators to > the estimated number of iterations. > > gcc/testsuite/ > * gcc.target/aarch64/sve/cond_asrd_1.c: Tune for a 256-bit > vector length. > * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise. > * gcc.target/aarch64/sve/cond_cnot_6.c: Likewise. > * gcc.target/aarch64/sve/cond_unary_5.c: Likewise. > * gcc.target/aarch64/sve/cond_unary_6.c: Likewise. > * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise. > * gcc.target/aarch64/sve/vcond_4_costly.c: Remove XFAILs. > --- > gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c | 2 +- > gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c | 4 ++-- > gcc/tree-vectorizer.cc | 6 ++++-- > 8 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c > index 478b52ac27c..aac06bd8093 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c > index 729d3f4f2ac..f6278916e1a 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c > index d44e357f44a..ef1b067172f 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c > index 17b3f86c8c6..03a6636f2d2 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c > index 1bd342b65d4..c49a3040b21 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c > index 18866286b7f..9a2bd8f152f 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-vectorize" } */ > +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ > > #include <stdint.h> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c > index 4aa567e3709..76d7a288612 100644 > --- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c > +++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c > @@ -61,8 +61,8 @@ TEST_CMP (nuge) > TEST_CMP (nugt) > > /* 2 each for: eq, ne, ueq, nueq. */ > -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 { xfail *-*-* } } } */ > -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 { xfail *-*-* } } } */ > +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */ > +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 } } */ > > /* 2 each for: olt, ult, nult, ogt, ugt, nugt. */ > /* { dg-final { scan-assembler-times {\tfcm[lg]t\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */ > diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc > index 875acbbf948..89cd0b88b61 100644 > --- a/gcc/tree-vectorizer.cc > +++ b/gcc/tree-vectorizer.cc > @@ -1973,9 +1973,11 @@ vector_costs::compare_inside_loop_cost (const vector_costs *other) const > HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); > if (estimated_max_niter != -1) > { > - if (known_le (estimated_max_niter, this_vf)) > + if (estimated_poly_value (this_vf, POLY_VALUE_MIN) > + >= estimated_max_niter) > this_vf = estimated_max_niter; > - if (known_le (estimated_max_niter, other_vf)) > + if (estimated_poly_value (other_vf, POLY_VALUE_MIN) > + >= estimated_max_niter) > other_vf = estimated_max_niter; > } > >
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c index 478b52ac27c..aac06bd8093 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c index 729d3f4f2ac..f6278916e1a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c index d44e357f44a..ef1b067172f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c index 17b3f86c8c6..03a6636f2d2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c index 1bd342b65d4..c49a3040b21 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c index 18866286b7f..9a2bd8f152f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include <stdint.h> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c index 4aa567e3709..76d7a288612 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c @@ -61,8 +61,8 @@ TEST_CMP (nuge) TEST_CMP (nugt) /* 2 each for: eq, ne, ueq, nueq. */ -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */ +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 } } */ /* 2 each for: olt, ult, nult, ogt, ugt, nugt. */ /* { dg-final { scan-assembler-times {\tfcm[lg]t\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */ diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc index 875acbbf948..89cd0b88b61 100644 --- a/gcc/tree-vectorizer.cc +++ b/gcc/tree-vectorizer.cc @@ -1973,9 +1973,11 @@ vector_costs::compare_inside_loop_cost (const vector_costs *other) const HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); if (estimated_max_niter != -1) { - if (known_le (estimated_max_niter, this_vf)) + if (estimated_poly_value (this_vf, POLY_VALUE_MIN) + >= estimated_max_niter) this_vf = estimated_max_niter; - if (known_le (estimated_max_niter, other_vf)) + if (estimated_poly_value (other_vf, POLY_VALUE_MIN) + >= estimated_max_niter) other_vf = estimated_max_niter; }