Message ID | AM5PR0802MB2610A35A5419228F752E3B9583B80@AM5PR0802MB2610.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
On 10/11/16 17:10, Wilco Dijkstra wrote: > The existing vector costs stop some beneficial vectorization. This is mostly due > to vector statement cost being set to 3 as well as vector loads having a higher > cost than scalar loads. This means that even when we vectorize 4x, it is possible > that the cost of a vectorized loop is similar to the scalar version, and we fail > to vectorize. For example for a particular loop the costs for -mcpu=generic are: > > note: Cost model analysis: > Vector inside of loop cost: 146 > Vector prologue cost: 5 > Vector epilogue cost: 0 > Scalar iteration cost: 50 > Scalar outside cost: 0 > Vector outside cost: 5 > prologue iterations: 0 > epilogue iterations: 0 > Calculated minimum iters for profitability: 1 > note: Runtime profitability threshold = 3 > note: Static estimate profitability threshold = 3 > note: loop vectorized > > > While -mcpu=cortex-a57 reports: > > note: Cost model analysis: > Vector inside of loop cost: 294 > Vector prologue cost: 15 > Vector epilogue cost: 0 > Scalar iteration cost: 74 > Scalar outside cost: 0 > Vector outside cost: 15 > prologue iterations: 0 > epilogue iterations: 0 > Calculated minimum iters for profitability: 31 > note: Runtime profitability threshold = 30 > note: Static estimate profitability threshold = 30 > note: not vectorized: vectorization not profitable. > note: not vectorized: iteration count smaller than user specified loop bound parameter or minimum profitable iterations (whichever is more conservative). > > > Using a cost of 3 for a vector operation suggests they are 3 times as > expensive as scalar operations. Since most vector operations have a > similar throughput as scalar operations, this is not correct. > > Using slightly lower values for these heuristics now allows this loop > and many others to be vectorized. On a proprietary benchmark the gain > from vectorizing this loop is around 15-30% which shows vectorizing it is > indeed beneficial. > > ChangeLog: > 2016-11-10 Wilco Dijkstra <wdijkstr@arm.com> > > * config/aarch64/aarch64.c (cortexa57_vector_cost): > Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost. > OK. R. > -- > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost = > 1, /* scalar_stmt_cost */ > 4, /* scalar_load_cost */ > 1, /* scalar_store_cost */ > - 3, /* vec_stmt_cost */ > + 2, /* vec_stmt_cost */ > 3, /* vec_permute_cost */ > 8, /* vec_to_scalar_cost */ > 8, /* scalar_to_vec_cost */ > - 5, /* vec_align_load_cost */ > - 5, /* vec_unalign_load_cost */ > + 4, /* vec_align_load_cost */ > + 4, /* vec_unalign_load_cost */ > 1, /* vec_unalign_store_cost */ > 1, /* vec_store_cost */ > 1, /* cond_taken_branch_cost */ >
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 279a6dfaa4a9c306bc7a8dba9f4f53704f61fefe..cff2e8fc6e9309e6aa4f68a5aba3bfac3b737283 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -382,12 +382,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost = 1, /* scalar_stmt_cost */ 4, /* scalar_load_cost */ 1, /* scalar_store_cost */ - 3, /* vec_stmt_cost */ + 2, /* vec_stmt_cost */ 3, /* vec_permute_cost */ 8, /* vec_to_scalar_cost */ 8, /* scalar_to_vec_cost */ - 5, /* vec_align_load_cost */ - 5, /* vec_unalign_load_cost */ + 4, /* vec_align_load_cost */ + 4, /* vec_unalign_load_cost */ 1, /* vec_unalign_store_cost */ 1, /* vec_store_cost */ 1, /* cond_taken_branch_cost */