Message ID | 1333633759.19102.46.camel@gnopaine |
---|---|
State | New |
Headers | show |
On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: >> On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt >> <wschmidt@linux.vnet.ibm.com> wrote: >> > >> > Unfortunately this seems to be necessary if I name the two passes >> > "reassoc1" and "reassoc2". If I try to name both of them "reassoc" I >> > get failures in other tests like gfortran.dg/reassoc_4, where >> > -fdump-tree-reassoc1 doesn't work. Unless I'm missing something >> > obvious, I think I need to keep that change. >> >> Hm, naming them "reassoc1" and "reassoc2" is a hack. Naming both >> "reassoc" will not trigger re-naming them to reassoc1 and reassoc2 >> I think. How ugly. Especially that -fdump-tree-reassoc will no longer >> work. Maybe instead of using two pass structs resort to using >> the existing hack with using first_pass_instance and TODO_mark_first_instance. > > OK, that seems to be the best among evils. Using the > first_pass_instance hack, the patch is transformed as below. > Regstrapped on powerpc64-linux, no additional failures. OK for trunk? Ok. Thanks, Richard. > Thanks, > Bill > > > gcc: > > 2012-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > PR tree-optimization/18589 > * tree-ssa-reassoc.c (reassociate_stats): Add two fields. > (operand_entry): Add count field. > (add_repeat_to_ops_vec): New function. > (completely_remove_stmt): Likewise. > (remove_def_if_absorbed_call): Likewise. > (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. > (acceptable_pow_call): New function. > (linearize_expr_tree): Look for builtin pow/powi calls and add operand > entries with repeat counts when found. > (repeat_factor_d): New struct and associated typedefs. > (repeat_factor_vec): New static vector variable. > (compare_repeat_factors): New function. > (get_reassoc_pow_ssa_name): Likewise. > (attempt_builtin_powi): Likewise. > (reassociate_bb): Call attempt_builtin_powi. > (fini_reassoc): Two new calls to statistics_counter_event. > > gcc/testsuite: > > 2012-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > PR tree-optimization/18589 > * gcc.dg/tree-ssa/pr18589-1.c: New test. > * gcc.dg/tree-ssa/pr18589-2.c: Likewise. > * gcc.dg/tree-ssa/pr18589-3.c: Likewise. > * gcc.dg/tree-ssa/pr18589-4.c: Likewise. > * gcc.dg/tree-ssa/pr18589-5.c: Likewise. > * gcc.dg/tree-ssa/pr18589-6.c: Likewise. > * gcc.dg/tree-ssa/pr18589-7.c: Likewise. > * gcc.dg/tree-ssa/pr18589-8.c: Likewise. > * gcc.dg/tree-ssa/pr18589-9.c: Likewise. > * gcc.dg/tree-ssa/pr18589-10.c: Likewise. > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z, double u) > +{ > + return x * x * y * y * y * z * z * z * z * u; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z, double u) > +{ > + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 4 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +float baz (float x, float y) > +{ > + return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +long double baz (long double x, long double y) > +{ > + return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c (revision 0) > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + return (__builtin_pow (x, 3.0) * __builtin_pow (y, 2.0) > + * __builtin_pow (z, 5.0)); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c (revision 0) > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + return (__builtin_pow (x, 4.0) * __builtin_pow (y, 2.0) > + * __builtin_pow (z, 4.0)); > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + return x * x * x * x * y * y * y * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y) > +{ > + return x * y * y * x * y * x * x * y; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c (revision 0) > @@ -0,0 +1,10 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ > + > +double baz (double x, double y, double z) > +{ > + return x * x * y * y * y * z * z * z * z; > +} > + > +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > Index: gcc/tree-ssa-reassoc.c > =================================================================== > --- gcc/tree-ssa-reassoc.c (revision 186108) > +++ gcc/tree-ssa-reassoc.c (working copy) > @@ -61,6 +61,10 @@ along with GCC; see the file COPYING3. If not see > 3. Optimization of the operand lists, eliminating things like a + > -a, a & a, etc. > > + 3a. Combine repeated factors with the same occurrence counts > + into a __builtin_powi call that will later be optimized into > + an optimal number of multiplies. > + > 4. Rewrite the expression trees we linearized and optimized so > they are in proper rank order. > > @@ -169,6 +173,8 @@ static struct > int constants_eliminated; > int ops_eliminated; > int rewritten; > + int pows_encountered; > + int pows_created; > } reassociate_stats; > > /* Operator, rank pair. */ > @@ -177,6 +183,7 @@ typedef struct operand_entry > unsigned int rank; > int id; > tree op; > + unsigned int count; > } *operand_entry_t; > > static alloc_pool operand_entry_pool; > @@ -515,9 +522,28 @@ add_to_ops_vec (VEC(operand_entry_t, heap) **ops, > oe->op = op; > oe->rank = get_rank (op); > oe->id = next_operand_entry_id++; > + oe->count = 1; > VEC_safe_push (operand_entry_t, heap, *ops, oe); > } > > +/* Add an operand entry to *OPS for the tree operand OP with repeat > + count REPEAT. */ > + > +static void > +add_repeat_to_ops_vec (VEC(operand_entry_t, heap) **ops, tree op, > + HOST_WIDE_INT repeat) > +{ > + operand_entry_t oe = (operand_entry_t) pool_alloc (operand_entry_pool); > + > + oe->op = op; > + oe->rank = get_rank (op); > + oe->id = next_operand_entry_id++; > + oe->count = repeat; > + VEC_safe_push (operand_entry_t, heap, *ops, oe); > + > + reassociate_stats.pows_encountered++; > +} > + > /* Return true if STMT is reassociable operation containing a binary > operation with tree code CODE, and is inside LOOP. */ > > @@ -2049,6 +2075,32 @@ is_phi_for_stmt (gimple stmt, tree operand) > return false; > } > > +/* Remove STMT, unlink its virtual defs, and release its SSA defs. */ > + > +static inline void > +completely_remove_stmt (gimple stmt) > +{ > + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); > + gsi_remove (&gsi, true); > + unlink_stmt_vdef (stmt); > + release_defs (stmt); > +} > + > +/* If OP is defined by a builtin call that has been absorbed by > + reassociation, remove its defining statement completely. */ > + > +static inline void > +remove_def_if_absorbed_call (tree op) > +{ > + gimple stmt; > + > + if (TREE_CODE (op) == SSA_NAME > + && has_zero_uses (op) > + && is_gimple_call ((stmt = SSA_NAME_DEF_STMT (op))) > + && gimple_visited_p (stmt)) > + completely_remove_stmt (stmt); > +} > + > /* Remove def stmt of VAR if VAR has zero uses and recurse > on rhs1 operand if so. */ > > @@ -2057,19 +2109,31 @@ remove_visited_stmt_chain (tree var) > { > gimple stmt; > gimple_stmt_iterator gsi; > + tree var2; > > while (1) > { > if (TREE_CODE (var) != SSA_NAME || !has_zero_uses (var)) > return; > stmt = SSA_NAME_DEF_STMT (var); > - if (!is_gimple_assign (stmt) > - || !gimple_visited_p (stmt)) > + if (is_gimple_assign (stmt) && gimple_visited_p (stmt)) > + { > + var = gimple_assign_rhs1 (stmt); > + var2 = gimple_assign_rhs2 (stmt); > + gsi = gsi_for_stmt (stmt); > + gsi_remove (&gsi, true); > + release_defs (stmt); > + /* A multiply whose operands are both fed by builtin pow/powi > + calls must check whether to remove rhs2 as well. */ > + remove_def_if_absorbed_call (var2); > + } > + else if (is_gimple_call (stmt) && gimple_visited_p (stmt)) > + { > + completely_remove_stmt (stmt); > + return; > + } > + else > return; > - var = gimple_assign_rhs1 (stmt); > - gsi = gsi_for_stmt (stmt); > - gsi_remove (&gsi, true); > - release_defs (stmt); > } > } > > @@ -2564,6 +2628,75 @@ break_up_subtract (gimple stmt, gimple_stmt_iterat > update_stmt (stmt); > } > > +/* Determine whether STMT is a builtin call that raises an SSA name > + to an integer power and has only one use. If so, and this is early > + reassociation and unsafe math optimizations are permitted, place > + the SSA name in *BASE and the exponent in *EXPONENT, and return TRUE. > + If any of these conditions does not hold, return FALSE. */ > + > +static bool > +acceptable_pow_call (gimple stmt, tree *base, HOST_WIDE_INT *exponent) > +{ > + tree fndecl, arg1; > + REAL_VALUE_TYPE c, cint; > + > + if (!first_pass_instance > + || !flag_unsafe_math_optimizations > + || !is_gimple_call (stmt) > + || !has_single_use (gimple_call_lhs (stmt))) > + return false; > + > + fndecl = gimple_call_fndecl (stmt); > + > + if (!fndecl > + || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL) > + return false; > + > + switch (DECL_FUNCTION_CODE (fndecl)) > + { > + CASE_FLT_FN (BUILT_IN_POW): > + *base = gimple_call_arg (stmt, 0); > + arg1 = gimple_call_arg (stmt, 1); > + > + if (TREE_CODE (arg1) != REAL_CST) > + return false; > + > + c = TREE_REAL_CST (arg1); > + > + if (REAL_EXP (&c) > HOST_BITS_PER_WIDE_INT) > + return false; > + > + *exponent = real_to_integer (&c); > + real_from_integer (&cint, VOIDmode, *exponent, > + *exponent < 0 ? -1 : 0, 0); > + if (!real_identical (&c, &cint)) > + return false; > + > + break; > + > + CASE_FLT_FN (BUILT_IN_POWI): > + *base = gimple_call_arg (stmt, 0); > + arg1 = gimple_call_arg (stmt, 1); > + > + if (!host_integerp (arg1, 0)) > + return false; > + > + *exponent = TREE_INT_CST_LOW (arg1); > + break; > + > + default: > + return false; > + } > + > + /* Expanding negative exponents is generally unproductive, so we don't > + complicate matters with those. Exponents of zero and one should > + have been handled by expression folding. */ > + if (*exponent < 2 || TREE_CODE (*base) != SSA_NAME) > + return false; > + > + return true; > +} > + > /* Recursively linearize a binary expression that is the RHS of STMT. > Place the operands of the expression tree in the vector named OPS. */ > > @@ -2573,11 +2706,13 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > { > tree binlhs = gimple_assign_rhs1 (stmt); > tree binrhs = gimple_assign_rhs2 (stmt); > - gimple binlhsdef, binrhsdef; > + gimple binlhsdef = NULL, binrhsdef = NULL; > bool binlhsisreassoc = false; > bool binrhsisreassoc = false; > enum tree_code rhscode = gimple_assign_rhs_code (stmt); > struct loop *loop = loop_containing_stmt (stmt); > + tree base = NULL_TREE; > + HOST_WIDE_INT exponent = 0; > > if (set_visited) > gimple_set_visited (stmt, true); > @@ -2615,8 +2750,26 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > > if (!binrhsisreassoc) > { > - add_to_ops_vec (ops, binrhs); > - add_to_ops_vec (ops, binlhs); > + if (rhscode == MULT_EXPR > + && TREE_CODE (binrhs) == SSA_NAME > + && acceptable_pow_call (binrhsdef, &base, &exponent)) > + { > + add_repeat_to_ops_vec (ops, base, exponent); > + gimple_set_visited (binrhsdef, true); > + } > + else > + add_to_ops_vec (ops, binrhs); > + > + if (rhscode == MULT_EXPR > + && TREE_CODE (binlhs) == SSA_NAME > + && acceptable_pow_call (binlhsdef, &base, &exponent)) > + { > + add_repeat_to_ops_vec (ops, base, exponent); > + gimple_set_visited (binlhsdef, true); > + } > + else > + add_to_ops_vec (ops, binlhs); > + > return; > } > > @@ -2655,7 +2808,16 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** > rhscode, loop)); > linearize_expr_tree (ops, SSA_NAME_DEF_STMT (binlhs), > is_associative, set_visited); > - add_to_ops_vec (ops, binrhs); > + > + if (rhscode == MULT_EXPR > + && TREE_CODE (binrhs) == SSA_NAME > + && acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), &base, &exponent)) > + { > + add_repeat_to_ops_vec (ops, base, exponent); > + gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true); > + } > + else > + add_to_ops_vec (ops, binrhs); > } > > /* Repropagate the negates back into subtracts, since no other pass > @@ -2815,6 +2977,347 @@ break_up_subtract_bb (basic_block bb) > break_up_subtract_bb (son); > } > > +/* Used for repeated factor analysis. */ > +struct repeat_factor_d > +{ > + /* An SSA name that occurs in a multiply chain. */ > + tree factor; > + > + /* Cached rank of the factor. */ > + unsigned rank; > + > + /* Number of occurrences of the factor in the chain. */ > + HOST_WIDE_INT count; > + > + /* An SSA name representing the product of this factor and > + all factors appearing later in the repeated factor vector. */ > + tree repr; > +}; > + > +typedef struct repeat_factor_d repeat_factor, *repeat_factor_t; > +typedef const struct repeat_factor_d *const_repeat_factor_t; > + > +DEF_VEC_O (repeat_factor); > +DEF_VEC_ALLOC_O (repeat_factor, heap); > + > +static VEC (repeat_factor, heap) *repeat_factor_vec; > + > +/* Used for sorting the repeat factor vector. Sort primarily by > + ascending occurrence count, secondarily by descending rank. */ > + > +static int > +compare_repeat_factors (const void *x1, const void *x2) > +{ > + const_repeat_factor_t rf1 = (const_repeat_factor_t) x1; > + const_repeat_factor_t rf2 = (const_repeat_factor_t) x2; > + > + if (rf1->count != rf2->count) > + return rf1->count - rf2->count; > + > + return rf2->rank - rf1->rank; > +} > + > +/* Get a new SSA name for register variable *TARGET of type TYPE. > + If *TARGET is null or incompatible with TYPE, create the variable > + first. */ > + > +static tree > +get_reassoc_pow_ssa_name (tree *target, tree type) > +{ > + if (!*target || !types_compatible_p (type, TREE_TYPE (*target))) > + { > + *target = create_tmp_reg (type, "reassocpow"); > + add_referenced_var (*target); > + } > + > + return make_ssa_name (*target, NULL); > +} > + > +/* Look for repeated operands in OPS in the multiply tree rooted at > + STMT. Replace them with an optimal sequence of multiplies and powi > + builtin calls, and remove the used operands from OPS. Push new > + SSA names onto OPS that represent the introduced multiplies and > + builtin calls. */ > + > +static void > +attempt_builtin_powi (gimple stmt, VEC(operand_entry_t, heap) **ops, > + tree *target) > +{ > + unsigned i, j, vec_len; > + int ii; > + operand_entry_t oe; > + repeat_factor_t rf1, rf2; > + repeat_factor rfnew; > + tree target_ssa, iter_result; > + tree type = TREE_TYPE (gimple_get_lhs (stmt)); > + tree powi_fndecl = mathfn_built_in (type, BUILT_IN_POWI); > + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); > + gimple mul_stmt, pow_stmt; > + > + /* Nothing to do if BUILT_IN_POWI doesn't exist for this type and > + target. */ > + if (!powi_fndecl) > + return; > + > + /* Allocate the repeated factor vector. */ > + repeat_factor_vec = VEC_alloc (repeat_factor, heap, 10); > + > + /* Scan the OPS vector for all SSA names in the product and build > + up a vector of occurrence counts for each factor. */ > + FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe) > + { > + if (TREE_CODE (oe->op) == SSA_NAME) > + { > + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) > + { > + if (rf1->factor == oe->op) > + { > + rf1->count += oe->count; > + break; > + } > + } > + > + if (j >= VEC_length (repeat_factor, repeat_factor_vec)) > + { > + rfnew.factor = oe->op; > + rfnew.rank = oe->rank; > + rfnew.count = oe->count; > + rfnew.repr = NULL_TREE; > + VEC_safe_push (repeat_factor, heap, repeat_factor_vec, &rfnew); > + } > + } > + } > + > + /* Sort the repeated factor vector by (a) increasing occurrence count, > + and (b) decreasing rank. */ > + VEC_qsort (repeat_factor, repeat_factor_vec, compare_repeat_factors); > + > + /* It is generally best to combine as many base factors as possible > + into a product before applying __builtin_powi to the result. > + However, the sort order chosen for the repeated factor vector > + allows us to cache partial results for the product of the base > + factors for subsequent use. When we already have a cached partial > + result from a previous iteration, it is best to make use of it > + before looking for another __builtin_pow opportunity. > + > + As an example, consider x * x * y * y * y * z * z * z * z. > + We want to first compose the product x * y * z, raise it to the > + second power, then multiply this by y * z, and finally multiply > + by z. This can be done in 5 multiplies provided we cache y * z > + for use in both expressions: > + > + t1 = y * z > + t2 = t1 * x > + t3 = t2 * t2 > + t4 = t1 * t3 > + result = t4 * z > + > + If we instead ignored the cached y * z and first multiplied by > + the __builtin_pow opportunity z * z, we would get the inferior: > + > + t1 = y * z > + t2 = t1 * x > + t3 = t2 * t2 > + t4 = z * z > + t5 = t3 * t4 > + result = t5 * y */ > + > + vec_len = VEC_length (repeat_factor, repeat_factor_vec); > + > + /* Repeatedly look for opportunities to create a builtin_powi call. */ > + while (true) > + { > + HOST_WIDE_INT power; > + > + /* First look for the largest cached product of factors from > + preceding iterations. If found, create a builtin_powi for > + it if the minimum occurrence count for its factors is at > + least 2, or just use this cached product as our next > + multiplicand if the minimum occurrence count is 1. */ > + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) > + { > + if (rf1->repr && rf1->count > 0) > + break; > + } > + > + if (j < vec_len) > + { > + power = rf1->count; > + > + if (power == 1) > + { > + iter_result = rf1->repr; > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + unsigned elt; > + repeat_factor_t rf; > + fputs ("Multiplying by cached product ", dump_file); > + for (elt = j; elt < vec_len; elt++) > + { > + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); > + print_generic_expr (dump_file, rf->factor, 0); > + if (elt < vec_len - 1) > + fputs (" * ", dump_file); > + } > + fputs ("\n", dump_file); > + } > + } > + else > + { > + iter_result = get_reassoc_pow_ssa_name (target, type); > + pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr, > + build_int_cst (integer_type_node, > + power)); > + gimple_call_set_lhs (pow_stmt, iter_result); > + gimple_set_location (pow_stmt, gimple_location (stmt)); > + gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT); > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + unsigned elt; > + repeat_factor_t rf; > + fputs ("Building __builtin_pow call for cached product (", > + dump_file); > + for (elt = j; elt < vec_len; elt++) > + { > + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); > + print_generic_expr (dump_file, rf->factor, 0); > + if (elt < vec_len - 1) > + fputs (" * ", dump_file); > + } > + fprintf (dump_file, ")^%ld\n", power); > + } > + } > + } > + else > + { > + /* Otherwise, find the first factor in the repeated factor > + vector whose occurrence count is at least 2. If no such > + factor exists, there are no builtin_powi opportunities > + remaining. */ > + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) > + { > + if (rf1->count >= 2) > + break; > + } > + > + if (j >= vec_len) > + break; > + > + power = rf1->count; > + > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + unsigned elt; > + repeat_factor_t rf; > + fputs ("Building __builtin_pow call for (", dump_file); > + for (elt = j; elt < vec_len; elt++) > + { > + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); > + print_generic_expr (dump_file, rf->factor, 0); > + if (elt < vec_len - 1) > + fputs (" * ", dump_file); > + } > + fprintf (dump_file, ")^%ld\n", power); > + } > + > + reassociate_stats.pows_created++; > + > + /* Visit each element of the vector in reverse order (so that > + high-occurrence elements are visited first, and within the > + same occurrence count, lower-ranked elements are visited > + first). Form a linear product of all elements in this order > + whose occurrencce count is at least that of element J. > + Record the SSA name representing the product of each element > + with all subsequent elements in the vector. */ > + if (j == vec_len - 1) > + rf1->repr = rf1->factor; > + else > + { > + for (ii = vec_len - 2; ii >= (int)j; ii--) > + { > + tree op1, op2; > + > + rf1 = VEC_index (repeat_factor, repeat_factor_vec, ii); > + rf2 = VEC_index (repeat_factor, repeat_factor_vec, ii + 1); > + > + /* Init the last factor's representative to be itself. */ > + if (!rf2->repr) > + rf2->repr = rf2->factor; > + > + op1 = rf1->factor; > + op2 = rf2->repr; > + > + target_ssa = get_reassoc_pow_ssa_name (target, type); > + mul_stmt = gimple_build_assign_with_ops (MULT_EXPR, > + target_ssa, > + op1, op2); > + gimple_set_location (mul_stmt, gimple_location (stmt)); > + gsi_insert_before (&gsi, mul_stmt, GSI_SAME_STMT); > + rf1->repr = target_ssa; > + > + /* Don't reprocess the multiply we just introduced. */ > + gimple_set_visited (mul_stmt, true); > + } > + } > + > + /* Form a call to __builtin_powi for the maximum product > + just formed, raised to the power obtained earlier. */ > + rf1 = VEC_index (repeat_factor, repeat_factor_vec, j); > + iter_result = get_reassoc_pow_ssa_name (target, type); > + pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr, > + build_int_cst (integer_type_node, > + power)); > + gimple_call_set_lhs (pow_stmt, iter_result); > + gimple_set_location (pow_stmt, gimple_location (stmt)); > + gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT); > + } > + > + /* Append the result of this iteration to the ops vector. */ > + add_to_ops_vec (ops, iter_result); > + > + /* Decrement the occurrence count of each element in the product > + by the count found above, and remove this many copies of each > + factor from OPS. */ > + for (i = j; i < vec_len; i++) > + { > + unsigned k = power; > + unsigned n; > + > + rf1 = VEC_index (repeat_factor, repeat_factor_vec, i); > + rf1->count -= power; > + > + FOR_EACH_VEC_ELT_REVERSE (operand_entry_t, *ops, n, oe) > + { > + if (oe->op == rf1->factor) > + { > + if (oe->count <= k) > + { > + VEC_ordered_remove (operand_entry_t, *ops, n); > + k -= oe->count; > + > + if (k == 0) > + break; > + } > + else > + { > + oe->count -= k; > + break; > + } > + } > + } > + } > + } > + > + /* At this point all elements in the repeated factor vector have a > + remaining occurrence count of 0 or 1, and those with a count of 1 > + don't have cached representatives. Re-sort the ops vector and > + clean up. */ > + VEC_qsort (operand_entry_t, *ops, sort_by_operand_rank); > + VEC_free (repeat_factor, heap, repeat_factor_vec); > +} > + > /* Reassociate expressions in basic block BB and its post-dominator as > children. */ > > @@ -2823,6 +3326,7 @@ reassociate_bb (basic_block bb) > { > gimple_stmt_iterator gsi; > basic_block son; > + tree target = NULL_TREE; > > for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) > { > @@ -2904,6 +3408,11 @@ reassociate_bb (basic_block bb) > if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR) > optimize_range_tests (rhs_code, &ops); > > + if (first_pass_instance > + && rhs_code == MULT_EXPR > + && flag_unsafe_math_optimizations) > + attempt_builtin_powi (stmt, &ops, &target); > + > if (VEC_length (operand_entry_t, ops) == 1) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > @@ -3054,6 +3563,10 @@ fini_reassoc (void) > reassociate_stats.ops_eliminated); > statistics_counter_event (cfun, "Statements rewritten", > reassociate_stats.rewritten); > + statistics_counter_event (cfun, "Built-in pow[i] calls encountered", > + reassociate_stats.pows_encountered); > + statistics_counter_event (cfun, "Built-in powi calls created", > + reassociate_stats.pows_created); > > pointer_map_destroy (operand_rank); > free_alloc_pool (operand_entry_pool); > >
On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: >> On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt >> <wschmidt@linux.vnet.ibm.com> wrote: >> > >> > Unfortunately this seems to be necessary if I name the two passes >> > "reassoc1" and "reassoc2". If I try to name both of them "reassoc" I >> > get failures in other tests like gfortran.dg/reassoc_4, where >> > -fdump-tree-reassoc1 doesn't work. Unless I'm missing something >> > obvious, I think I need to keep that change. >> >> Hm, naming them "reassoc1" and "reassoc2" is a hack. Naming both >> "reassoc" will not trigger re-naming them to reassoc1 and reassoc2 >> I think. How ugly. Especially that -fdump-tree-reassoc will no longer >> work. Maybe instead of using two pass structs resort to using >> the existing hack with using first_pass_instance and TODO_mark_first_instance. > > OK, that seems to be the best among evils. Using the > first_pass_instance hack, the patch is transformed as below. > Regstrapped on powerpc64-linux, no additional failures. OK for trunk? > > Thanks, > Bill > > > gcc: > > 2012-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > PR tree-optimization/18589 > * tree-ssa-reassoc.c (reassociate_stats): Add two fields. > (operand_entry): Add count field. > (add_repeat_to_ops_vec): New function. > (completely_remove_stmt): Likewise. > (remove_def_if_absorbed_call): Likewise. > (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. > (acceptable_pow_call): New function. > (linearize_expr_tree): Look for builtin pow/powi calls and add operand > entries with repeat counts when found. > (repeat_factor_d): New struct and associated typedefs. > (repeat_factor_vec): New static vector variable. > (compare_repeat_factors): New function. > (get_reassoc_pow_ssa_name): Likewise. > (attempt_builtin_powi): Likewise. > (reassociate_bb): Call attempt_builtin_powi. > (fini_reassoc): Two new calls to statistics_counter_event. > It breaks bootstrap on Linux/ia32: ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, tree_node**)': ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, ")^%ld\n", power); ^ ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'long long int' [-Werror=format] fprintf (dump_file, ")^%ld\n", power); ^ cc1plus: all warnings being treated as errors H.J.
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote: > On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt > <wschmidt@linux.vnet.ibm.com> wrote: > > On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote: > >> On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt > >> <wschmidt@linux.vnet.ibm.com> wrote: > >> > > >> > Unfortunately this seems to be necessary if I name the two passes > >> > "reassoc1" and "reassoc2". If I try to name both of them "reassoc" I > >> > get failures in other tests like gfortran.dg/reassoc_4, where > >> > -fdump-tree-reassoc1 doesn't work. Unless I'm missing something > >> > obvious, I think I need to keep that change. > >> > >> Hm, naming them "reassoc1" and "reassoc2" is a hack. Naming both > >> "reassoc" will not trigger re-naming them to reassoc1 and reassoc2 > >> I think. How ugly. Especially that -fdump-tree-reassoc will no longer > >> work. Maybe instead of using two pass structs resort to using > >> the existing hack with using first_pass_instance and TODO_mark_first_instance. > > > > OK, that seems to be the best among evils. Using the > > first_pass_instance hack, the patch is transformed as below. > > Regstrapped on powerpc64-linux, no additional failures. OK for trunk? > > > > Thanks, > > Bill > > > > > > gcc: > > > > 2012-04-05 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > > > PR tree-optimization/18589 > > * tree-ssa-reassoc.c (reassociate_stats): Add two fields. > > (operand_entry): Add count field. > > (add_repeat_to_ops_vec): New function. > > (completely_remove_stmt): Likewise. > > (remove_def_if_absorbed_call): Likewise. > > (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls. > > (acceptable_pow_call): New function. > > (linearize_expr_tree): Look for builtin pow/powi calls and add operand > > entries with repeat counts when found. > > (repeat_factor_d): New struct and associated typedefs. > > (repeat_factor_vec): New static vector variable. > > (compare_repeat_factors): New function. > > (get_reassoc_pow_ssa_name): Likewise. > > (attempt_builtin_powi): Likewise. > > (reassociate_bb): Call attempt_builtin_powi. > > (fini_reassoc): Two new calls to statistics_counter_event. > > > > It breaks bootstrap on Linux/ia32: > > ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void > attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**, > tree_node**)': > ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld' > expects argument of type 'long int', but argument 3 has type 'long > long int' [-Werror=format] > fprintf (dump_file, ")^%ld\n", power); > ^ > ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld' > expects argument of type 'long int', but argument 3 has type 'long > long int' [-Werror=format] > fprintf (dump_file, ")^%ld\n", power); > ^ > cc1plus: all warnings being treated as errors > > > H.J. > Whoops. Looks like I need to use HOST_WIDE_INT_PRINT_DEC instead of %ld in those spots. I'll get a fix prepared.
Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * y * y * y * z * z * z * z * u; +} + +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y, double z, double u) +{ + return x * x * x * y * y * y * z * z * z * z * u * u * u * u; +} + +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y) +{ + return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0); +} + +/* { dg-final { scan-tree-dump-times " \\* " 4 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +float baz (float x, float y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-8.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +long double baz (long double x, long double y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-9.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y, double z) +{ + return (__builtin_pow (x, 3.0) * __builtin_pow (y, 2.0) + * __builtin_pow (z, 5.0)); +} + +/* { dg-final { scan-tree-dump-times " \\* " 6 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-10.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y, double z) +{ + return (__builtin_pow (x, 4.0) * __builtin_pow (y, 2.0) + * __builtin_pow (z, 4.0)); +} + +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-1.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y) +{ + return x * x * x * x * y * y * y * y; +} + +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-2.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y) +{ + return x * y * y * x * y * x * x * y; +} + +/* { dg-final { scan-tree-dump-times " \\* " 3 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-3.c (revision 0) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -ffast-math -fdump-tree-optimized" } */ + +double baz (double x, double y, double z) +{ + return x * x * y * y * y * z * z * z * z; +} + +/* { dg-final { scan-tree-dump-times " \\* " 5 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: gcc/tree-ssa-reassoc.c =================================================================== --- gcc/tree-ssa-reassoc.c (revision 186108) +++ gcc/tree-ssa-reassoc.c (working copy) @@ -61,6 +61,10 @@ along with GCC; see the file COPYING3. If not see 3. Optimization of the operand lists, eliminating things like a + -a, a & a, etc. + 3a. Combine repeated factors with the same occurrence counts + into a __builtin_powi call that will later be optimized into + an optimal number of multiplies. + 4. Rewrite the expression trees we linearized and optimized so they are in proper rank order. @@ -169,6 +173,8 @@ static struct int constants_eliminated; int ops_eliminated; int rewritten; + int pows_encountered; + int pows_created; } reassociate_stats; /* Operator, rank pair. */ @@ -177,6 +183,7 @@ typedef struct operand_entry unsigned int rank; int id; tree op; + unsigned int count; } *operand_entry_t; static alloc_pool operand_entry_pool; @@ -515,9 +522,28 @@ add_to_ops_vec (VEC(operand_entry_t, heap) **ops, oe->op = op; oe->rank = get_rank (op); oe->id = next_operand_entry_id++; + oe->count = 1; VEC_safe_push (operand_entry_t, heap, *ops, oe); } +/* Add an operand entry to *OPS for the tree operand OP with repeat + count REPEAT. */ + +static void +add_repeat_to_ops_vec (VEC(operand_entry_t, heap) **ops, tree op, + HOST_WIDE_INT repeat) +{ + operand_entry_t oe = (operand_entry_t) pool_alloc (operand_entry_pool); + + oe->op = op; + oe->rank = get_rank (op); + oe->id = next_operand_entry_id++; + oe->count = repeat; + VEC_safe_push (operand_entry_t, heap, *ops, oe); + + reassociate_stats.pows_encountered++; +} + /* Return true if STMT is reassociable operation containing a binary operation with tree code CODE, and is inside LOOP. */ @@ -2049,6 +2075,32 @@ is_phi_for_stmt (gimple stmt, tree operand) return false; } +/* Remove STMT, unlink its virtual defs, and release its SSA defs. */ + +static inline void +completely_remove_stmt (gimple stmt) +{ + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_remove (&gsi, true); + unlink_stmt_vdef (stmt); + release_defs (stmt); +} + +/* If OP is defined by a builtin call that has been absorbed by + reassociation, remove its defining statement completely. */ + +static inline void +remove_def_if_absorbed_call (tree op) +{ + gimple stmt; + + if (TREE_CODE (op) == SSA_NAME + && has_zero_uses (op) + && is_gimple_call ((stmt = SSA_NAME_DEF_STMT (op))) + && gimple_visited_p (stmt)) + completely_remove_stmt (stmt); +} + /* Remove def stmt of VAR if VAR has zero uses and recurse on rhs1 operand if so. */ @@ -2057,19 +2109,31 @@ remove_visited_stmt_chain (tree var) { gimple stmt; gimple_stmt_iterator gsi; + tree var2; while (1) { if (TREE_CODE (var) != SSA_NAME || !has_zero_uses (var)) return; stmt = SSA_NAME_DEF_STMT (var); - if (!is_gimple_assign (stmt) - || !gimple_visited_p (stmt)) + if (is_gimple_assign (stmt) && gimple_visited_p (stmt)) + { + var = gimple_assign_rhs1 (stmt); + var2 = gimple_assign_rhs2 (stmt); + gsi = gsi_for_stmt (stmt); + gsi_remove (&gsi, true); + release_defs (stmt); + /* A multiply whose operands are both fed by builtin pow/powi + calls must check whether to remove rhs2 as well. */ + remove_def_if_absorbed_call (var2); + } + else if (is_gimple_call (stmt) && gimple_visited_p (stmt)) + { + completely_remove_stmt (stmt); + return; + } + else return; - var = gimple_assign_rhs1 (stmt); - gsi = gsi_for_stmt (stmt); - gsi_remove (&gsi, true); - release_defs (stmt); } } @@ -2564,6 +2628,75 @@ break_up_subtract (gimple stmt, gimple_stmt_iterat update_stmt (stmt); } +/* Determine whether STMT is a builtin call that raises an SSA name + to an integer power and has only one use. If so, and this is early + reassociation and unsafe math optimizations are permitted, place + the SSA name in *BASE and the exponent in *EXPONENT, and return TRUE. + If any of these conditions does not hold, return FALSE. */ + +static bool +acceptable_pow_call (gimple stmt, tree *base, HOST_WIDE_INT *exponent) +{ + tree fndecl, arg1; + REAL_VALUE_TYPE c, cint; + + if (!first_pass_instance + || !flag_unsafe_math_optimizations + || !is_gimple_call (stmt) + || !has_single_use (gimple_call_lhs (stmt))) + return false; + + fndecl = gimple_call_fndecl (stmt); + + if (!fndecl + || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL) + return false; + + switch (DECL_FUNCTION_CODE (fndecl)) + { + CASE_FLT_FN (BUILT_IN_POW): + *base = gimple_call_arg (stmt, 0); + arg1 = gimple_call_arg (stmt, 1); + + if (TREE_CODE (arg1) != REAL_CST) + return false; + + c = TREE_REAL_CST (arg1); + + if (REAL_EXP (&c) > HOST_BITS_PER_WIDE_INT) + return false; + + *exponent = real_to_integer (&c); + real_from_integer (&cint, VOIDmode, *exponent, + *exponent < 0 ? -1 : 0, 0); + if (!real_identical (&c, &cint)) + return false; + + break; + + CASE_FLT_FN (BUILT_IN_POWI): + *base = gimple_call_arg (stmt, 0); + arg1 = gimple_call_arg (stmt, 1); + + if (!host_integerp (arg1, 0)) + return false; + + *exponent = TREE_INT_CST_LOW (arg1); + break; + + default: + return false; + } + + /* Expanding negative exponents is generally unproductive, so we don't + complicate matters with those. Exponents of zero and one should + have been handled by expression folding. */ + if (*exponent < 2 || TREE_CODE (*base) != SSA_NAME) + return false; + + return true; +} + /* Recursively linearize a binary expression that is the RHS of STMT. Place the operands of the expression tree in the vector named OPS. */ @@ -2573,11 +2706,13 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** { tree binlhs = gimple_assign_rhs1 (stmt); tree binrhs = gimple_assign_rhs2 (stmt); - gimple binlhsdef, binrhsdef; + gimple binlhsdef = NULL, binrhsdef = NULL; bool binlhsisreassoc = false; bool binrhsisreassoc = false; enum tree_code rhscode = gimple_assign_rhs_code (stmt); struct loop *loop = loop_containing_stmt (stmt); + tree base = NULL_TREE; + HOST_WIDE_INT exponent = 0; if (set_visited) gimple_set_visited (stmt, true); @@ -2615,8 +2750,26 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** if (!binrhsisreassoc) { - add_to_ops_vec (ops, binrhs); - add_to_ops_vec (ops, binlhs); + if (rhscode == MULT_EXPR + && TREE_CODE (binrhs) == SSA_NAME + && acceptable_pow_call (binrhsdef, &base, &exponent)) + { + add_repeat_to_ops_vec (ops, base, exponent); + gimple_set_visited (binrhsdef, true); + } + else + add_to_ops_vec (ops, binrhs); + + if (rhscode == MULT_EXPR + && TREE_CODE (binlhs) == SSA_NAME + && acceptable_pow_call (binlhsdef, &base, &exponent)) + { + add_repeat_to_ops_vec (ops, base, exponent); + gimple_set_visited (binlhsdef, true); + } + else + add_to_ops_vec (ops, binlhs); + return; } @@ -2655,7 +2808,16 @@ linearize_expr_tree (VEC(operand_entry_t, heap) ** rhscode, loop)); linearize_expr_tree (ops, SSA_NAME_DEF_STMT (binlhs), is_associative, set_visited); - add_to_ops_vec (ops, binrhs); + + if (rhscode == MULT_EXPR + && TREE_CODE (binrhs) == SSA_NAME + && acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), &base, &exponent)) + { + add_repeat_to_ops_vec (ops, base, exponent); + gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true); + } + else + add_to_ops_vec (ops, binrhs); } /* Repropagate the negates back into subtracts, since no other pass @@ -2815,6 +2977,347 @@ break_up_subtract_bb (basic_block bb) break_up_subtract_bb (son); } +/* Used for repeated factor analysis. */ +struct repeat_factor_d +{ + /* An SSA name that occurs in a multiply chain. */ + tree factor; + + /* Cached rank of the factor. */ + unsigned rank; + + /* Number of occurrences of the factor in the chain. */ + HOST_WIDE_INT count; + + /* An SSA name representing the product of this factor and + all factors appearing later in the repeated factor vector. */ + tree repr; +}; + +typedef struct repeat_factor_d repeat_factor, *repeat_factor_t; +typedef const struct repeat_factor_d *const_repeat_factor_t; + +DEF_VEC_O (repeat_factor); +DEF_VEC_ALLOC_O (repeat_factor, heap); + +static VEC (repeat_factor, heap) *repeat_factor_vec; + +/* Used for sorting the repeat factor vector. Sort primarily by + ascending occurrence count, secondarily by descending rank. */ + +static int +compare_repeat_factors (const void *x1, const void *x2) +{ + const_repeat_factor_t rf1 = (const_repeat_factor_t) x1; + const_repeat_factor_t rf2 = (const_repeat_factor_t) x2; + + if (rf1->count != rf2->count) + return rf1->count - rf2->count; + + return rf2->rank - rf1->rank; +} + +/* Get a new SSA name for register variable *TARGET of type TYPE. + If *TARGET is null or incompatible with TYPE, create the variable + first. */ + +static tree +get_reassoc_pow_ssa_name (tree *target, tree type) +{ + if (!*target || !types_compatible_p (type, TREE_TYPE (*target))) + { + *target = create_tmp_reg (type, "reassocpow"); + add_referenced_var (*target); + } + + return make_ssa_name (*target, NULL); +} + +/* Look for repeated operands in OPS in the multiply tree rooted at + STMT. Replace them with an optimal sequence of multiplies and powi + builtin calls, and remove the used operands from OPS. Push new + SSA names onto OPS that represent the introduced multiplies and + builtin calls. */ + +static void +attempt_builtin_powi (gimple stmt, VEC(operand_entry_t, heap) **ops, + tree *target) +{ + unsigned i, j, vec_len; + int ii; + operand_entry_t oe; + repeat_factor_t rf1, rf2; + repeat_factor rfnew; + tree target_ssa, iter_result; + tree type = TREE_TYPE (gimple_get_lhs (stmt)); + tree powi_fndecl = mathfn_built_in (type, BUILT_IN_POWI); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gimple mul_stmt, pow_stmt; + + /* Nothing to do if BUILT_IN_POWI doesn't exist for this type and + target. */ + if (!powi_fndecl) + return; + + /* Allocate the repeated factor vector. */ + repeat_factor_vec = VEC_alloc (repeat_factor, heap, 10); + + /* Scan the OPS vector for all SSA names in the product and build + up a vector of occurrence counts for each factor. */ + FOR_EACH_VEC_ELT (operand_entry_t, *ops, i, oe) + { + if (TREE_CODE (oe->op) == SSA_NAME) + { + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) + { + if (rf1->factor == oe->op) + { + rf1->count += oe->count; + break; + } + } + + if (j >= VEC_length (repeat_factor, repeat_factor_vec)) + { + rfnew.factor = oe->op; + rfnew.rank = oe->rank; + rfnew.count = oe->count; + rfnew.repr = NULL_TREE; + VEC_safe_push (repeat_factor, heap, repeat_factor_vec, &rfnew); + } + } + } + + /* Sort the repeated factor vector by (a) increasing occurrence count, + and (b) decreasing rank. */ + VEC_qsort (repeat_factor, repeat_factor_vec, compare_repeat_factors); + + /* It is generally best to combine as many base factors as possible + into a product before applying __builtin_powi to the result. + However, the sort order chosen for the repeated factor vector + allows us to cache partial results for the product of the base + factors for subsequent use. When we already have a cached partial + result from a previous iteration, it is best to make use of it + before looking for another __builtin_pow opportunity. + + As an example, consider x * x * y * y * y * z * z * z * z. + We want to first compose the product x * y * z, raise it to the + second power, then multiply this by y * z, and finally multiply + by z. This can be done in 5 multiplies provided we cache y * z + for use in both expressions: + + t1 = y * z + t2 = t1 * x + t3 = t2 * t2 + t4 = t1 * t3 + result = t4 * z + + If we instead ignored the cached y * z and first multiplied by + the __builtin_pow opportunity z * z, we would get the inferior: + + t1 = y * z + t2 = t1 * x + t3 = t2 * t2 + t4 = z * z + t5 = t3 * t4 + result = t5 * y */ + + vec_len = VEC_length (repeat_factor, repeat_factor_vec); + + /* Repeatedly look for opportunities to create a builtin_powi call. */ + while (true) + { + HOST_WIDE_INT power; + + /* First look for the largest cached product of factors from + preceding iterations. If found, create a builtin_powi for + it if the minimum occurrence count for its factors is at + least 2, or just use this cached product as our next + multiplicand if the minimum occurrence count is 1. */ + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) + { + if (rf1->repr && rf1->count > 0) + break; + } + + if (j < vec_len) + { + power = rf1->count; + + if (power == 1) + { + iter_result = rf1->repr; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + unsigned elt; + repeat_factor_t rf; + fputs ("Multiplying by cached product ", dump_file); + for (elt = j; elt < vec_len; elt++) + { + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); + print_generic_expr (dump_file, rf->factor, 0); + if (elt < vec_len - 1) + fputs (" * ", dump_file); + } + fputs ("\n", dump_file); + } + } + else + { + iter_result = get_reassoc_pow_ssa_name (target, type); + pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr, + build_int_cst (integer_type_node, + power)); + gimple_call_set_lhs (pow_stmt, iter_result); + gimple_set_location (pow_stmt, gimple_location (stmt)); + gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT); + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + unsigned elt; + repeat_factor_t rf; + fputs ("Building __builtin_pow call for cached product (", + dump_file); + for (elt = j; elt < vec_len; elt++) + { + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); + print_generic_expr (dump_file, rf->factor, 0); + if (elt < vec_len - 1) + fputs (" * ", dump_file); + } + fprintf (dump_file, ")^%ld\n", power); + } + } + } + else + { + /* Otherwise, find the first factor in the repeated factor + vector whose occurrence count is at least 2. If no such + factor exists, there are no builtin_powi opportunities + remaining. */ + FOR_EACH_VEC_ELT (repeat_factor, repeat_factor_vec, j, rf1) + { + if (rf1->count >= 2) + break; + } + + if (j >= vec_len) + break; + + power = rf1->count; + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + unsigned elt; + repeat_factor_t rf; + fputs ("Building __builtin_pow call for (", dump_file); + for (elt = j; elt < vec_len; elt++) + { + rf = VEC_index (repeat_factor, repeat_factor_vec, elt); + print_generic_expr (dump_file, rf->factor, 0); + if (elt < vec_len - 1) + fputs (" * ", dump_file); + } + fprintf (dump_file, ")^%ld\n", power); + } + + reassociate_stats.pows_created++; + + /* Visit each element of the vector in reverse order (so that + high-occurrence elements are visited first, and within the + same occurrence count, lower-ranked elements are visited + first). Form a linear product of all elements in this order + whose occurrencce count is at least that of element J. + Record the SSA name representing the product of each element + with all subsequent elements in the vector. */ + if (j == vec_len - 1) + rf1->repr = rf1->factor; + else + { + for (ii = vec_len - 2; ii >= (int)j; ii--) + { + tree op1, op2; + + rf1 = VEC_index (repeat_factor, repeat_factor_vec, ii); + rf2 = VEC_index (repeat_factor, repeat_factor_vec, ii + 1); + + /* Init the last factor's representative to be itself. */ + if (!rf2->repr) + rf2->repr = rf2->factor; + + op1 = rf1->factor; + op2 = rf2->repr; + + target_ssa = get_reassoc_pow_ssa_name (target, type); + mul_stmt = gimple_build_assign_with_ops (MULT_EXPR, + target_ssa, + op1, op2); + gimple_set_location (mul_stmt, gimple_location (stmt)); + gsi_insert_before (&gsi, mul_stmt, GSI_SAME_STMT); + rf1->repr = target_ssa; + + /* Don't reprocess the multiply we just introduced. */ + gimple_set_visited (mul_stmt, true); + } + } + + /* Form a call to __builtin_powi for the maximum product + just formed, raised to the power obtained earlier. */ + rf1 = VEC_index (repeat_factor, repeat_factor_vec, j); + iter_result = get_reassoc_pow_ssa_name (target, type); + pow_stmt = gimple_build_call (powi_fndecl, 2, rf1->repr, + build_int_cst (integer_type_node, + power)); + gimple_call_set_lhs (pow_stmt, iter_result); + gimple_set_location (pow_stmt, gimple_location (stmt)); + gsi_insert_before (&gsi, pow_stmt, GSI_SAME_STMT); + } + + /* Append the result of this iteration to the ops vector. */ + add_to_ops_vec (ops, iter_result); + + /* Decrement the occurrence count of each element in the product + by the count found above, and remove this many copies of each + factor from OPS. */ + for (i = j; i < vec_len; i++) + { + unsigned k = power; + unsigned n; + + rf1 = VEC_index (repeat_factor, repeat_factor_vec, i); + rf1->count -= power; + + FOR_EACH_VEC_ELT_REVERSE (operand_entry_t, *ops, n, oe) + { + if (oe->op == rf1->factor) + { + if (oe->count <= k) + { + VEC_ordered_remove (operand_entry_t, *ops, n); + k -= oe->count; + + if (k == 0) + break; + } + else + { + oe->count -= k; + break; + } + } + } + } + } + + /* At this point all elements in the repeated factor vector have a + remaining occurrence count of 0 or 1, and those with a count of 1 + don't have cached representatives. Re-sort the ops vector and + clean up. */ + VEC_qsort (operand_entry_t, *ops, sort_by_operand_rank); + VEC_free (repeat_factor, heap, repeat_factor_vec); +} + /* Reassociate expressions in basic block BB and its post-dominator as children. */ @@ -2823,6 +3326,7 @@ reassociate_bb (basic_block bb) { gimple_stmt_iterator gsi; basic_block son; + tree target = NULL_TREE; for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { @@ -2904,6 +3408,11 @@ reassociate_bb (basic_block bb) if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR) optimize_range_tests (rhs_code, &ops); + if (first_pass_instance + && rhs_code == MULT_EXPR + && flag_unsafe_math_optimizations) + attempt_builtin_powi (stmt, &ops, &target); + if (VEC_length (operand_entry_t, ops) == 1) { if (dump_file && (dump_flags & TDF_DETAILS)) @@ -3054,6 +3563,10 @@ fini_reassoc (void) reassociate_stats.ops_eliminated); statistics_counter_event (cfun, "Statements rewritten", reassociate_stats.rewritten); + statistics_counter_event (cfun, "Built-in pow[i] calls encountered", + reassociate_stats.pows_encountered); + statistics_counter_event (cfun, "Built-in powi calls created", + reassociate_stats.pows_created); pointer_map_destroy (operand_rank); free_alloc_pool (operand_entry_pool);