Message ID | 3329c840-dccc-5d06-740f-7e669fd5e39a@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] Adjust vectorization cost for scalar COND_EXPR | expand |
Hi! I can't approve this, but for what it's worth it looks fine to me. Bill On 12/11/19 6:31 AM, Kewen.Lin wrote: > Hi, > > We found that the vectorization cost modeling on scalar COND_EXPR is a bit off > on rs6000. One typical case is 548.exchange2_r, -Ofast -mcpu=power9 -mrecip > -fvect-cost-model=unlimited is better than -Ofast -mcpu=power9 -mrecip (the > default is -fvect-cost-model=dynamic) by 1.94%. Scalar COND_EXPR is expanded > into compare + branch or compare + isel normally, either of them should be > priced more than the simple FXU operation. This patch is to add additional > vectorization cost onto scalar COND_EXPR on top of builtin_vectorization_cost. > The idea to use additional cost value 2 instead of the others: 1) try various > possible value candidates from 1 to 5, 2 is the best measured on Power9. 2) > from latency view, compare takes 3 cycles and isel takes 2 on Power9, it's > 2.5 times of simple FXU instruction which takes cost 1 in the current > modeling, it's close. 3) get fine SPEC2017 ratio on Power8 as well. > > The SPEC2017 performance evaluation on Power9 with explicit unrolling shows > 548.exchange2_r +2.35% gains, but 526.blender_r -1.99% degradation, the others > is trivial. By further investigation on 526.blender_r, the assembly of 10 > hottest functions are unchanged, the impact should be due to some side effects. > SPECINT geomean +0.16%, SPECFP geomean -0.16% (mainly due to blender_r). > Without explicit unrolling, 548.exchange2_r +1.78% gains and the others are > trivial. SPECINT geomean +0.19%, SPECINT geomean +0.06%. > > While the SPEC2017 performance evaluation on Power8 shows 500.perlbench_r > +1.32% gain and 511.povray_r +2.03% gain, the others are trivial. SPECINT > geomean +0.08%, SPECINT geomean +0.18%. > > Bootstrapped and regress tested on powerpc64le-linux-gnu. > Is OK for trunk? > > BR, > Kewen > --- > > gcc/ChangeLog > > 2019-12-11 Kewen Lin <linkw@gcc.gnu.org> > > * config/rs6000/rs6000.c (adjust_vectorization_cost): New function. > (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update > stmt_cost. >
On Wed, Dec 11, 2019 at 07:39:38AM -0600, Bill Schmidt wrote: > I can't approve this, but for what it's worth it looks fine to me. But I can :-) Thanks for looking Bill! The patch is okay for trunk. Thanks Ke Wen! Segher > >2019-12-11 Kewen Lin <linkw@gcc.gnu.org> > > > > * config/rs6000/rs6000.c (adjust_vectorization_cost): New function. > > (rs6000_add_stmt_cost): Call adjust_vectorization_cost and update > > stmt_cost.
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2995348..5dad3cc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -5016,6 +5016,29 @@ rs6000_init_cost (struct loop *loop_info) return data; } +/* Adjust vectorization cost after calling rs6000_builtin_vectorization_cost. + For some statement, we would like to further fine-grain tweak the cost on + top of rs6000_builtin_vectorization_cost handling which doesn't have any + information on statement operation codes etc. One typical case here is + COND_EXPR, it takes the same cost to simple FXU instruction when evaluating + for scalar cost, but it should be priced more whatever transformed to either + compare + branch or compare + isel instructions. */ + +static unsigned +adjust_vectorization_cost (enum vect_cost_for_stmt kind, + struct _stmt_vec_info *stmt_info) +{ + if (kind == scalar_stmt && stmt_info && stmt_info->stmt + && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) + { + tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); + if (subcode == COND_EXPR) + return 2; + } + + return 0; +} + /* Implement targetm.vectorize.add_stmt_cost. */ static unsigned @@ -5031,6 +5054,7 @@ rs6000_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind, tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = rs6000_builtin_vectorization_cost (kind, vectype, misalign); + stmt_cost += adjust_vectorization_cost (kind, stmt_info); /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is arbitrary and could potentially be improved with analysis. */