Message ID | 1477086370.5636.18.camel@oc8801110288.ibm.com |
---|---|
State | New |
Headers | show |
On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > Hi, > > I've been meaning for some time now to do a better job handling strength > reduction candidates where the stride of the candidate and its basis > involves a cast (usually widening) from another type. The two PRs > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss > strength reduction opportunities in such cases. Essentially the way I > was handling this before required a cast to go back to the original > type, which was illegal when this was a narrowing operation or a cast > from a wrapping type to a non-wrapping type. > > This patch improves matters by tracking the effective type of the stride > so that we perform arithmetic in that type directly. This allows us to > strength-reduce some cases missed before. We still have to be careful > not to ever perform a narrowing or wrap-to-nonwrap conversion, but these > cases don't come up nearly so often in practice with this patch. > > The patch is pretty straightforward but rather large, so I'd like to > leave it up for review for a week or so before committing it -- extra > eyes welcome! > > There is an existing test case that checks for the narrowing problem, > and now fails with this patch as it should. That is, it's now legal to > insert the initializer that we formerly knew to be bad for this test. > Since the test no longer serves any purpose, I'm deleting it. > > gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now > than when I first added it to the test suite. As a result of various > optimization changes, it's no longer maintained as a single block; > instead, the optimizable computations are sunk into two legs of a > conditional. This exposed another similar case, leading to the bug > reports. This test now passes, but I had to adjust it for the new code > generation we get. I also added some commentary to indicate what we > expect to happen in that test, since it isn't too obvious. > > I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu > with no regressions. To avoid the Wrath of Octoploid ;), I've also > tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3 > -march=amdfam10, also with no failures. I've also verified that we hit > this code about 35 times in the test suite, so it looks like we have > some decent additional test coverage. > > Thanks in advance to anyone willing to take a look. LGTM. Richard. > Bill > > > [gcc] > > 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > PR tree-optimization/71915 > * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add > stride_type field. > (find_basis_for_base_expr): Require stride types to match when > seeking a basis. > (alloc_cand_and_find_basis): Record the stride type. > (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis. > (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than > the expressions having those types. > (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis. > (create_mul_ssa_cand): Likewise. > (create_mul_imm_cand): Likewise. > (create_add_ssa_cand): Likewise. > (create_add_imm_cand): Likewise. > (legal_cast_p_1): Change interface to accept types rather than the > expressions having those types. > (legal_cast_p): Pass types to legal_cast_p_1. > (slsr_process_cast): Pass stride type to > alloc_cand_and_find_basis. > (slsr_process_copy): Likewise. > (dump_candidate): Display stride type when a cast exists. > (create_add_on_incoming_edge): Introduce a cast when necessary for > the stride type. > (analyze_increments): Change the code checking for invalid casts > to rely on the stride type, and update the documentation and > example. Change the code checking for pointer multiplies to rely > on the stride type. > (insert_initializers): Introduce a cast when necessary for the > stride type. Use the stride type for the type of the initializer. > > [gcc/testsuite] > > 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> > > PR tree-optimization/71915 > * gcc.dg/tree-ssa/pr54245.c: Delete. > * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and > document why. > > > Index: gcc/gimple-ssa-strength-reduction.c > =================================================================== > --- gcc/gimple-ssa-strength-reduction.c (revision 241379) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -246,6 +246,13 @@ struct slsr_cand_d > replacement MEM_REF.) */ > tree cand_type; > > + /* The type to be used to interpret the stride field when the stride > + is not a constant. Normally the same as the type of the recorded > + stride, but when the stride has been cast we need to maintain that > + knowledge in order to make legal substitutions without losing > + precision. When the stride is a constant, this will be sizetype. */ > + tree stride_type; > + > /* The kind of candidate (CAND_MULT, etc.). */ > enum cand_kind kind; > > @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base > || one_basis->cand_stmt == c->cand_stmt > || !operand_equal_p (one_basis->stride, c->stride, 0) > || !types_compatible_p (one_basis->cand_type, c->cand_type) > + || !types_compatible_p (one_basis->stride_type, c->stride_type) > || !dominated_by_p (CDI_DOMINATORS, > gimple_bb (c->cand_stmt), > gimple_bb (one_basis->cand_stmt))) > @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base) > static slsr_cand_t > alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base, > const widest_int &index, tree stride, tree ctype, > - unsigned savings) > + tree stype, unsigned savings) > { > slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack, > sizeof (slsr_cand)); > @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi > c->stride = stride; > c->index = index; > c->cand_type = ctype; > + c->stride_type = stype; > c->kind = kind; > c->cand_num = cand_vec.length () + 1; > c->next_interp = 0; > @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed) > base_type = TREE_TYPE (arg0_base); > > c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base, > - 0, integer_one_node, base_type, savings); > + 0, integer_one_node, base_type, > + sizetype, savings); > > /* Add the candidate to the statement-candidate mapping. */ > add_cand_for_stmt (phi, c); > @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase) > e.g. 'B' is widened from an 'int' in order to calculate > a 64-bit address. */ > if (CONVERT_EXPR_P (base_in) > - && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0))) > + && legal_cast_p_1 (TREE_TYPE (base_in), > + TREE_TYPE (TREE_OPERAND (base_in, 0)))) > base_in = get_unwidened (base_in, NULL_TREE); > > if (TREE_CODE (base_in) != SSA_NAME) > @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs) > return; > > c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset, > - type, 0); > + type, sizetype, 0); > > /* Add the candidate to the statement-candidate mapping. */ > add_cand_for_stmt (gs, c); > @@ -1010,6 +1021,7 @@ static slsr_cand_t > create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed) > { > tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; > + tree stype = NULL_TREE; > widest_int index; > unsigned savings = 0; > slsr_cand_t c; > @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre > index = base_cand->index; > stride = stride_in; > ctype = base_cand->cand_type; > + stype = TREE_TYPE (stride_in); > if (has_single_use (base_in)) > savings = (base_cand->dead_savings > + stmt_cost (base_cand->cand_stmt, speed)); > @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre > index = base_cand->index * wi::to_widest (base_cand->stride); > stride = stride_in; > ctype = base_cand->cand_type; > + stype = TREE_TYPE (stride_in); > if (has_single_use (base_in)) > savings = (base_cand->dead_savings > + stmt_cost (base_cand->cand_stmt, speed)); > @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre > index = 0; > stride = stride_in; > ctype = TREE_TYPE (base_in); > + stype = TREE_TYPE (stride_in); > } > > c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, > - ctype, savings); > + ctype, stype, savings); > return c; > } > > @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre > } > > c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, > - ctype, savings); > + ctype, sizetype, savings); > return c; > } > > @@ -1212,7 +1227,8 @@ static slsr_cand_t > create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in, > bool subtract_p, bool speed) > { > - tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL; > + tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; > + tree stype = NULL_TREE; > widest_int index; > unsigned savings = 0; > slsr_cand_t c; > @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre > index = -index; > stride = addend_cand->base_expr; > ctype = TREE_TYPE (base_in); > + stype = addend_cand->cand_type; > if (has_single_use (addend_in)) > savings = (addend_cand->dead_savings > + stmt_cost (addend_cand->cand_stmt, speed)); > @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre > index = subtract_p ? -1 : 1; > stride = addend_in; > ctype = base_cand->cand_type; > + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype > + : TREE_TYPE (addend_in)); > if (has_single_use (base_in)) > savings = (base_cand->dead_savings > + stmt_cost (base_cand->cand_stmt, speed)); > @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre > index = -index; > stride = subtrahend_cand->base_expr; > ctype = TREE_TYPE (base_in); > + stype = subtrahend_cand->cand_type; > if (has_single_use (addend_in)) > savings = (subtrahend_cand->dead_savings > + stmt_cost (subtrahend_cand->cand_stmt, speed)); > @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre > index = subtract_p ? -1 : 1; > stride = addend_in; > ctype = TREE_TYPE (base_in); > + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype > + : TREE_TYPE (addend_in)); > } > > c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride, > - ctype, savings); > + ctype, stype, savings); > return c; > } > > @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con > { > enum cand_kind kind = CAND_ADD; > tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; > + tree stype = NULL_TREE; > widest_int index, multiple; > unsigned savings = 0; > slsr_cand_t c; > @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con > index = base_cand->index + multiple; > stride = base_cand->stride; > ctype = base_cand->cand_type; > + stype = base_cand->stride_type; > if (has_single_use (base_in)) > savings = (base_cand->dead_savings > + stmt_cost (base_cand->cand_stmt, speed)); > @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con > index = index_in; > stride = integer_one_node; > ctype = TREE_TYPE (base_in); > + stype = sizetype; > } > > c = alloc_cand_and_find_basis (kind, gs, base, index, stride, > - ctype, savings); > + ctype, stype, savings); > return c; > } > > @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee > for more details. */ > > static bool > -legal_cast_p_1 (tree lhs, tree rhs) > +legal_cast_p_1 (tree lhs_type, tree rhs_type) > { > - tree lhs_type, rhs_type; > unsigned lhs_size, rhs_size; > bool lhs_wraps, rhs_wraps; > > - lhs_type = TREE_TYPE (lhs); > - rhs_type = TREE_TYPE (rhs); > lhs_size = TYPE_PRECISION (lhs_type); > rhs_size = TYPE_PRECISION (rhs_type); > lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type); > @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs) > || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs))) > return false; > > - return legal_cast_p_1 (gimple_assign_lhs (gs), rhs); > + return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs)); > } > > /* Given GS which is a cast to a scalar integer type, determine whether > @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe > c = alloc_cand_and_find_basis (base_cand->kind, gs, > base_cand->base_expr, > base_cand->index, base_cand->stride, > - ctype, savings); > + ctype, base_cand->stride_type, > + savings); > if (base_cand->next_interp) > base_cand = lookup_cand (base_cand->next_interp); > else > @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe > The first of these is somewhat arbitrary, but the choice of > 1 for the stride simplifies the logic for propagating casts > into their uses. */ > - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, > - 0, integer_one_node, ctype, 0); > - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, > - 0, integer_one_node, ctype, 0); > + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, > + integer_one_node, ctype, sizetype, 0); > + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, > + integer_one_node, ctype, sizetype, 0); > c->next_interp = c2->cand_num; > } > > @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe > c = alloc_cand_and_find_basis (base_cand->kind, gs, > base_cand->base_expr, > base_cand->index, base_cand->stride, > - base_cand->cand_type, savings); > + base_cand->cand_type, > + base_cand->stride_type, savings); > if (base_cand->next_interp) > base_cand = lookup_cand (base_cand->next_interp); > else > @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe > The first of these is somewhat arbitrary, but the choice of > 1 for the stride simplifies the logic for propagating casts > into their uses. */ > - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, > - 0, integer_one_node, TREE_TYPE (rhs1), 0); > - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, > - 0, integer_one_node, TREE_TYPE (rhs1), 0); > + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, > + integer_one_node, TREE_TYPE (rhs1), > + sizetype, 0); > + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, > + integer_one_node, TREE_TYPE (rhs1), > + sizetype, 0); > c->next_interp = c2->cand_num; > } > > @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c) > fputs (" + ", dump_file); > print_decs (c->index, dump_file); > fputs (") * ", dump_file); > + if (TREE_CODE (c->stride) != INTEGER_CST > + && c->stride_type != TREE_TYPE (c->stride)) > + { > + fputs ("(", dump_file); > + print_generic_expr (dump_file, c->stride_type, 0); > + fputs (")", dump_file); > + } > print_generic_expr (dump_file, c->stride, 0); > fputs (" : ", dump_file); > break; > @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c) > fputs (" + (", dump_file); > print_decs (c->index, dump_file); > fputs (" * ", dump_file); > + if (TREE_CODE (c->stride) != INTEGER_CST > + && c->stride_type != TREE_TYPE (c->stride)) > + { > + fputs ("(", dump_file); > + print_generic_expr (dump_file, c->stride_type, 0); > + fputs (")", dump_file); > + } > print_generic_expr (dump_file, c->stride, 0); > fputs (") : ", dump_file); > break; > @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b > basic_block insert_bb; > gimple_stmt_iterator gsi; > tree lhs, basis_type; > - gassign *new_stmt; > + gassign *new_stmt, *cast_stmt = NULL; > > /* If the add candidate along this incoming edge has the same > index as C's hidden basis, the hidden basis represents this > @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b > new_stmt = gimple_build_assign (lhs, code, basis_name, > incr_vec[i].initializer); > } > - else if (increment == 1) > - new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride); > - else if (increment == -1) > - new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, > - c->stride); > - else > - gcc_unreachable (); > + else { > + tree stride; > + > + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) > + { > + tree cast_stride = make_temp_ssa_name (c->stride_type, NULL, > + "slsr"); > + cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR, > + c->stride); > + stride = cast_stride; > + } > + else > + stride = c->stride; > + > + if (increment == 1) > + new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride); > + else if (increment == -1) > + new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride); > + else > + gcc_unreachable (); > + } > } > > insert_bb = single_succ_p (e->src) ? e->src : split_edge (e); > @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b > gsi = gsi_last_bb (insert_bb); > > if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) > - gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT); > + { > + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); > + if (cast_stmt) > + { > + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); > + gimple_set_location (cast_stmt, loc); > + } > + } > else > - gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); > + { > + if (cast_stmt) > + { > + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); > + gimple_set_location (cast_stmt, loc); > + } > + gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); > + } > > gimple_set_location (new_stmt, loc); > > if (dump_file && (dump_flags & TDF_DETAILS)) > { > + if (cast_stmt) > + { > + fprintf (dump_file, "Inserting cast in block %d: ", > + insert_bb->index); > + print_gimple_stmt (dump_file, cast_stmt, 0, 0); > + } > fprintf (dump_file, "Inserting in block %d: ", insert_bb->index); > print_gimple_stmt (dump_file, new_stmt, 0, 0); > } > @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine > && !POINTER_TYPE_P (first_dep->cand_type))) > incr_vec[i].cost = COST_NEUTRAL; > > - /* FORNOW: If we need to add an initializer, give up if a cast from > - the candidate's type to its stride's type can lose precision. > - This could eventually be handled better by expressly retaining the > - result of a cast to a wider type in the stride. Example: > + /* If we need to add an initializer, give up if a cast from the > + candidate's type to its stride's type can lose precision. > + Note that this already takes into account that the stride may > + have been cast to a wider type, in which case this test won't > + fire. Example: > > short int _1; > _2 = (int) _1; > _3 = _2 * 10; > - _4 = x + _3; ADD: x + (10 * _1) : int > + _4 = x + _3; ADD: x + (10 * (int)_1) : int > _5 = _2 * 15; > - _6 = x + _3; ADD: x + (15 * _1) : int > + _6 = x + _5; ADD: x + (15 * (int)_1) : int > > - Right now replacing _6 would cause insertion of an initializer > - of the form "short int T = _1 * 5;" followed by a cast to > - int, which could overflow incorrectly. Had we recorded _2 or > - (int)_1 as the stride, this wouldn't happen. However, doing > - this breaks other opportunities, so this will require some > - care. */ > + Although the stride was a short int initially, the stride > + used in the analysis has been widened to an int, and such > + widening will be done in the initializer as well. */ > else if (!incr_vec[i].initializer > && TREE_CODE (first_dep->stride) != INTEGER_CST > - && !legal_cast_p_1 (first_dep->stride, > - gimple_assign_lhs (first_dep->cand_stmt))) > - > + && !legal_cast_p_1 (first_dep->stride_type, > + TREE_TYPE (gimple_assign_lhs > + (first_dep->cand_stmt)))) > incr_vec[i].cost = COST_INFINITE; > > /* If we need to add an initializer, make sure we don't introduce > a multiply by a pointer type, which can happen in certain cast > - scenarios. FIXME: When cleaning up these cast issues, we can > - afford to introduce the multiply provided we cast out to an > - unsigned int of appropriate size. */ > + scenarios. */ > else if (!incr_vec[i].initializer > && TREE_CODE (first_dep->stride) != INTEGER_CST > - && POINTER_TYPE_P (TREE_TYPE (first_dep->stride))) > - > + && POINTER_TYPE_P (first_dep->stride_type)) > incr_vec[i].cost = COST_INFINITE; > > /* For any other increment, if this is a multiply candidate, we > @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c) > basic_block bb; > slsr_cand_t where = NULL; > gassign *init_stmt; > - tree stride_type, new_name, incr_tree; > + gassign *cast_stmt = NULL; > + tree new_name, incr_tree, init_stride; > widest_int incr = incr_vec[i].incr; > > if (!profitable_increment_p (i) > @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c) > that block, the earliest one will be returned in WHERE. */ > bb = nearest_common_dominator_for_cands (c, incr, &where); > > + /* If the nominal stride has a different type than the recorded > + stride type, build a cast from the nominal stride to that type. */ > + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) > + { > + init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr"); > + cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride); > + } > + else > + init_stride = c->stride; > + > /* Create a new SSA name to hold the initializer's value. */ > - stride_type = TREE_TYPE (c->stride); > - new_name = make_temp_ssa_name (stride_type, NULL, "slsr"); > + new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr"); > incr_vec[i].initializer = new_name; > > /* Create the initializer and insert it in the latest possible > dominating position. */ > - incr_tree = wide_int_to_tree (stride_type, incr); > + incr_tree = wide_int_to_tree (c->stride_type, incr); > init_stmt = gimple_build_assign (new_name, MULT_EXPR, > - c->stride, incr_tree); > + init_stride, incr_tree); > if (where) > { > gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt); > + location_t loc = gimple_location (where->cand_stmt); > + > + if (cast_stmt) > + { > + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); > + gimple_set_location (cast_stmt, loc); > + } > + > gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); > - gimple_set_location (init_stmt, gimple_location (where->cand_stmt)); > + gimple_set_location (init_stmt, loc); > } > else > { > gimple_stmt_iterator gsi = gsi_last_bb (bb); > gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt; > + location_t loc = gimple_location (basis_stmt); > > if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) > - gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); > + { > + if (cast_stmt) > + { > + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); > + gimple_set_location (cast_stmt, loc); > + } > + gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); > + } > else > - gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); > + { > + if (cast_stmt) > + { > + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); > + gimple_set_location (cast_stmt, loc); > + } > + gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); > + } > > gimple_set_location (init_stmt, gimple_location (basis_stmt)); > } > @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c) > > if (dump_file && (dump_flags & TDF_DETAILS)) > { > + if (cast_stmt) > + { > + fputs ("Inserting stride cast: ", dump_file); > + print_gimple_stmt (dump_file, cast_stmt, 0, 0); > + } > fputs ("Inserting initializer: ", dump_file); > print_gimple_stmt (dump_file, init_stmt, 0, 0); > } > Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (revision 241379) > +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (working copy) > @@ -1,48 +0,0 @@ > -/* { dg-do compile } */ > -/* { dg-options "-O1 -fdump-tree-slsr-details" } */ > - > -#include <stdio.h> > - > -#define W1 22725 > -#define W2 21407 > -#define W3 19266 > -#define W6 8867 > - > -void idct_row(short *row, int *dst) > -{ > - int a0, a1, b0, b1; > - > - a0 = W1 * row[0]; > - a1 = a0; > - > - a0 += W2 * row[2]; > - a1 += W6 * row[2]; > - > - b0 = W1 * row[1]; > - b1 = W3 * row[1]; > - > - dst[0] = a0 + b0; > - dst[1] = a0 - b0; > - dst[2] = a1 + b1; > - dst[3] = a1 - b1; > -} > - > -static short block[8] = { 1, 2, 3, 4 }; > - > -int main(void) > -{ > - int out[4]; > - int i; > - > - idct_row(block, out); > - > - for (i = 0; i < 4; i++) > - printf("%d\n", out[i]); > - > - return !(out[2] == 87858 && out[3] == 10794); > -} > - > -/* For now, disable inserting an initializer when the multiplication will > - take place in a smaller type than originally. This test may be deleted > - in future when this case is handled more precisely. */ > -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */ > Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (revision 241379) > +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (working copy) > @@ -17,7 +17,12 @@ f (int s, int *c) > return x1 ? x2 : x3; > } > > +/* Note that since some branch prediction heuristics changed, the > + calculations of x2 and x3 are pushed downward into the legs > + of the conditional, changing the code presented to SLSR. > + However, this proves to be a useful test for introducing an > + initializer with a cast, so we'll keep it as is. */ > + > /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is > - added), 1 parm, 2 in the code. The second one in the code can be > - a widening mult. */ > -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */ > + added), 2 parms, 3 in the code. */ > +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */ > >
On 24 October 2016 at 12:55, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Oct 21, 2016 at 11:46 PM, Bill Schmidt > <wschmidt@linux.vnet.ibm.com> wrote: >> Hi, >> >> I've been meaning for some time now to do a better job handling strength >> reduction candidates where the stride of the candidate and its basis >> involves a cast (usually widening) from another type. The two PRs >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss >> strength reduction opportunities in such cases. Essentially the way I >> was handling this before required a cast to go back to the original >> type, which was illegal when this was a narrowing operation or a cast >> from a wrapping type to a non-wrapping type. >> >> This patch improves matters by tracking the effective type of the stride >> so that we perform arithmetic in that type directly. This allows us to >> strength-reduce some cases missed before. We still have to be careful >> not to ever perform a narrowing or wrap-to-nonwrap conversion, but these >> cases don't come up nearly so often in practice with this patch. >> >> The patch is pretty straightforward but rather large, so I'd like to >> leave it up for review for a week or so before committing it -- extra >> eyes welcome! >> >> There is an existing test case that checks for the narrowing problem, >> and now fails with this patch as it should. That is, it's now legal to >> insert the initializer that we formerly knew to be bad for this test. >> Since the test no longer serves any purpose, I'm deleting it. >> >> gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now >> than when I first added it to the test suite. As a result of various >> optimization changes, it's no longer maintained as a single block; >> instead, the optimizable computations are sunk into two legs of a >> conditional. This exposed another similar case, leading to the bug >> reports. This test now passes, but I had to adjust it for the new code >> generation we get. I also added some commentary to indicate what we >> expect to happen in that test, since it isn't too obvious. >> >> I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu >> with no regressions. To avoid the Wrath of Octoploid ;), I've also >> tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3 >> -march=amdfam10, also with no failures. I've also verified that we hit >> this code about 35 times in the test suite, so it looks like we have >> some decent additional test coverage. >> >> Thanks in advance to anyone willing to take a look. > > LGTM. > Hi, Since this was committed, I'm seeing a failure in gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9 on aarch64 targets. Christophe > Richard. > >> Bill >> >> >> [gcc] >> >> 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> >> PR tree-optimization/71915 >> * gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add >> stride_type field. >> (find_basis_for_base_expr): Require stride types to match when >> seeking a basis. >> (alloc_cand_and_find_basis): Record the stride type. >> (slsr_process_phi): Pass stride type to alloc_cand_and_find_basis. >> (backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than >> the expressions having those types. >> (slsr_process_ref): Pass stride type to alloc_cand_and_find_basis. >> (create_mul_ssa_cand): Likewise. >> (create_mul_imm_cand): Likewise. >> (create_add_ssa_cand): Likewise. >> (create_add_imm_cand): Likewise. >> (legal_cast_p_1): Change interface to accept types rather than the >> expressions having those types. >> (legal_cast_p): Pass types to legal_cast_p_1. >> (slsr_process_cast): Pass stride type to >> alloc_cand_and_find_basis. >> (slsr_process_copy): Likewise. >> (dump_candidate): Display stride type when a cast exists. >> (create_add_on_incoming_edge): Introduce a cast when necessary for >> the stride type. >> (analyze_increments): Change the code checking for invalid casts >> to rely on the stride type, and update the documentation and >> example. Change the code checking for pointer multiplies to rely >> on the stride type. >> (insert_initializers): Introduce a cast when necessary for the >> stride type. Use the stride type for the type of the initializer. >> >> [gcc/testsuite] >> >> 2016-10-21 Bill Schmidt <wschmidt@linux.vnet.ibm.com> >> >> PR tree-optimization/71915 >> * gcc.dg/tree-ssa/pr54245.c: Delete. >> * gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and >> document why. >> >> >> Index: gcc/gimple-ssa-strength-reduction.c >> =================================================================== >> --- gcc/gimple-ssa-strength-reduction.c (revision 241379) >> +++ gcc/gimple-ssa-strength-reduction.c (working copy) >> @@ -246,6 +246,13 @@ struct slsr_cand_d >> replacement MEM_REF.) */ >> tree cand_type; >> >> + /* The type to be used to interpret the stride field when the stride >> + is not a constant. Normally the same as the type of the recorded >> + stride, but when the stride has been cast we need to maintain that >> + knowledge in order to make legal substitutions without losing >> + precision. When the stride is a constant, this will be sizetype. */ >> + tree stride_type; >> + >> /* The kind of candidate (CAND_MULT, etc.). */ >> enum cand_kind kind; >> >> @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base >> || one_basis->cand_stmt == c->cand_stmt >> || !operand_equal_p (one_basis->stride, c->stride, 0) >> || !types_compatible_p (one_basis->cand_type, c->cand_type) >> + || !types_compatible_p (one_basis->stride_type, c->stride_type) >> || !dominated_by_p (CDI_DOMINATORS, >> gimple_bb (c->cand_stmt), >> gimple_bb (one_basis->cand_stmt))) >> @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base) >> static slsr_cand_t >> alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base, >> const widest_int &index, tree stride, tree ctype, >> - unsigned savings) >> + tree stype, unsigned savings) >> { >> slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack, >> sizeof (slsr_cand)); >> @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi >> c->stride = stride; >> c->index = index; >> c->cand_type = ctype; >> + c->stride_type = stype; >> c->kind = kind; >> c->cand_num = cand_vec.length () + 1; >> c->next_interp = 0; >> @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed) >> base_type = TREE_TYPE (arg0_base); >> >> c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base, >> - 0, integer_one_node, base_type, savings); >> + 0, integer_one_node, base_type, >> + sizetype, savings); >> >> /* Add the candidate to the statement-candidate mapping. */ >> add_cand_for_stmt (phi, c); >> @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase) >> e.g. 'B' is widened from an 'int' in order to calculate >> a 64-bit address. */ >> if (CONVERT_EXPR_P (base_in) >> - && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0))) >> + && legal_cast_p_1 (TREE_TYPE (base_in), >> + TREE_TYPE (TREE_OPERAND (base_in, 0)))) >> base_in = get_unwidened (base_in, NULL_TREE); >> >> if (TREE_CODE (base_in) != SSA_NAME) >> @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs) >> return; >> >> c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset, >> - type, 0); >> + type, sizetype, 0); >> >> /* Add the candidate to the statement-candidate mapping. */ >> add_cand_for_stmt (gs, c); >> @@ -1010,6 +1021,7 @@ static slsr_cand_t >> create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed) >> { >> tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; >> + tree stype = NULL_TREE; >> widest_int index; >> unsigned savings = 0; >> slsr_cand_t c; >> @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre >> index = base_cand->index; >> stride = stride_in; >> ctype = base_cand->cand_type; >> + stype = TREE_TYPE (stride_in); >> if (has_single_use (base_in)) >> savings = (base_cand->dead_savings >> + stmt_cost (base_cand->cand_stmt, speed)); >> @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre >> index = base_cand->index * wi::to_widest (base_cand->stride); >> stride = stride_in; >> ctype = base_cand->cand_type; >> + stype = TREE_TYPE (stride_in); >> if (has_single_use (base_in)) >> savings = (base_cand->dead_savings >> + stmt_cost (base_cand->cand_stmt, speed)); >> @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre >> index = 0; >> stride = stride_in; >> ctype = TREE_TYPE (base_in); >> + stype = TREE_TYPE (stride_in); >> } >> >> c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, >> - ctype, savings); >> + ctype, stype, savings); >> return c; >> } >> >> @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre >> } >> >> c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, >> - ctype, savings); >> + ctype, sizetype, savings); >> return c; >> } >> >> @@ -1212,7 +1227,8 @@ static slsr_cand_t >> create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in, >> bool subtract_p, bool speed) >> { >> - tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL; >> + tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; >> + tree stype = NULL_TREE; >> widest_int index; >> unsigned savings = 0; >> slsr_cand_t c; >> @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre >> index = -index; >> stride = addend_cand->base_expr; >> ctype = TREE_TYPE (base_in); >> + stype = addend_cand->cand_type; >> if (has_single_use (addend_in)) >> savings = (addend_cand->dead_savings >> + stmt_cost (addend_cand->cand_stmt, speed)); >> @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre >> index = subtract_p ? -1 : 1; >> stride = addend_in; >> ctype = base_cand->cand_type; >> + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype >> + : TREE_TYPE (addend_in)); >> if (has_single_use (base_in)) >> savings = (base_cand->dead_savings >> + stmt_cost (base_cand->cand_stmt, speed)); >> @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre >> index = -index; >> stride = subtrahend_cand->base_expr; >> ctype = TREE_TYPE (base_in); >> + stype = subtrahend_cand->cand_type; >> if (has_single_use (addend_in)) >> savings = (subtrahend_cand->dead_savings >> + stmt_cost (subtrahend_cand->cand_stmt, speed)); >> @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre >> index = subtract_p ? -1 : 1; >> stride = addend_in; >> ctype = TREE_TYPE (base_in); >> + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype >> + : TREE_TYPE (addend_in)); >> } >> >> c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride, >> - ctype, savings); >> + ctype, stype, savings); >> return c; >> } >> >> @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con >> { >> enum cand_kind kind = CAND_ADD; >> tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; >> + tree stype = NULL_TREE; >> widest_int index, multiple; >> unsigned savings = 0; >> slsr_cand_t c; >> @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con >> index = base_cand->index + multiple; >> stride = base_cand->stride; >> ctype = base_cand->cand_type; >> + stype = base_cand->stride_type; >> if (has_single_use (base_in)) >> savings = (base_cand->dead_savings >> + stmt_cost (base_cand->cand_stmt, speed)); >> @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con >> index = index_in; >> stride = integer_one_node; >> ctype = TREE_TYPE (base_in); >> + stype = sizetype; >> } >> >> c = alloc_cand_and_find_basis (kind, gs, base, index, stride, >> - ctype, savings); >> + ctype, stype, savings); >> return c; >> } >> >> @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee >> for more details. */ >> >> static bool >> -legal_cast_p_1 (tree lhs, tree rhs) >> +legal_cast_p_1 (tree lhs_type, tree rhs_type) >> { >> - tree lhs_type, rhs_type; >> unsigned lhs_size, rhs_size; >> bool lhs_wraps, rhs_wraps; >> >> - lhs_type = TREE_TYPE (lhs); >> - rhs_type = TREE_TYPE (rhs); >> lhs_size = TYPE_PRECISION (lhs_type); >> rhs_size = TYPE_PRECISION (rhs_type); >> lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type); >> @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs) >> || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs))) >> return false; >> >> - return legal_cast_p_1 (gimple_assign_lhs (gs), rhs); >> + return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs)); >> } >> >> /* Given GS which is a cast to a scalar integer type, determine whether >> @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe >> c = alloc_cand_and_find_basis (base_cand->kind, gs, >> base_cand->base_expr, >> base_cand->index, base_cand->stride, >> - ctype, savings); >> + ctype, base_cand->stride_type, >> + savings); >> if (base_cand->next_interp) >> base_cand = lookup_cand (base_cand->next_interp); >> else >> @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe >> The first of these is somewhat arbitrary, but the choice of >> 1 for the stride simplifies the logic for propagating casts >> into their uses. */ >> - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, >> - 0, integer_one_node, ctype, 0); >> - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, >> - 0, integer_one_node, ctype, 0); >> + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, >> + integer_one_node, ctype, sizetype, 0); >> + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, >> + integer_one_node, ctype, sizetype, 0); >> c->next_interp = c2->cand_num; >> } >> >> @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe >> c = alloc_cand_and_find_basis (base_cand->kind, gs, >> base_cand->base_expr, >> base_cand->index, base_cand->stride, >> - base_cand->cand_type, savings); >> + base_cand->cand_type, >> + base_cand->stride_type, savings); >> if (base_cand->next_interp) >> base_cand = lookup_cand (base_cand->next_interp); >> else >> @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe >> The first of these is somewhat arbitrary, but the choice of >> 1 for the stride simplifies the logic for propagating casts >> into their uses. */ >> - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, >> - 0, integer_one_node, TREE_TYPE (rhs1), 0); >> - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, >> - 0, integer_one_node, TREE_TYPE (rhs1), 0); >> + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, >> + integer_one_node, TREE_TYPE (rhs1), >> + sizetype, 0); >> + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, >> + integer_one_node, TREE_TYPE (rhs1), >> + sizetype, 0); >> c->next_interp = c2->cand_num; >> } >> >> @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c) >> fputs (" + ", dump_file); >> print_decs (c->index, dump_file); >> fputs (") * ", dump_file); >> + if (TREE_CODE (c->stride) != INTEGER_CST >> + && c->stride_type != TREE_TYPE (c->stride)) >> + { >> + fputs ("(", dump_file); >> + print_generic_expr (dump_file, c->stride_type, 0); >> + fputs (")", dump_file); >> + } >> print_generic_expr (dump_file, c->stride, 0); >> fputs (" : ", dump_file); >> break; >> @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c) >> fputs (" + (", dump_file); >> print_decs (c->index, dump_file); >> fputs (" * ", dump_file); >> + if (TREE_CODE (c->stride) != INTEGER_CST >> + && c->stride_type != TREE_TYPE (c->stride)) >> + { >> + fputs ("(", dump_file); >> + print_generic_expr (dump_file, c->stride_type, 0); >> + fputs (")", dump_file); >> + } >> print_generic_expr (dump_file, c->stride, 0); >> fputs (") : ", dump_file); >> break; >> @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b >> basic_block insert_bb; >> gimple_stmt_iterator gsi; >> tree lhs, basis_type; >> - gassign *new_stmt; >> + gassign *new_stmt, *cast_stmt = NULL; >> >> /* If the add candidate along this incoming edge has the same >> index as C's hidden basis, the hidden basis represents this >> @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b >> new_stmt = gimple_build_assign (lhs, code, basis_name, >> incr_vec[i].initializer); >> } >> - else if (increment == 1) >> - new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride); >> - else if (increment == -1) >> - new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, >> - c->stride); >> - else >> - gcc_unreachable (); >> + else { >> + tree stride; >> + >> + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) >> + { >> + tree cast_stride = make_temp_ssa_name (c->stride_type, NULL, >> + "slsr"); >> + cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR, >> + c->stride); >> + stride = cast_stride; >> + } >> + else >> + stride = c->stride; >> + >> + if (increment == 1) >> + new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride); >> + else if (increment == -1) >> + new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride); >> + else >> + gcc_unreachable (); >> + } >> } >> >> insert_bb = single_succ_p (e->src) ? e->src : split_edge (e); >> @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b >> gsi = gsi_last_bb (insert_bb); >> >> if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) >> - gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT); >> + { >> + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); >> + if (cast_stmt) >> + { >> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); >> + gimple_set_location (cast_stmt, loc); >> + } >> + } >> else >> - gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); >> + { >> + if (cast_stmt) >> + { >> + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); >> + gimple_set_location (cast_stmt, loc); >> + } >> + gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); >> + } >> >> gimple_set_location (new_stmt, loc); >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> { >> + if (cast_stmt) >> + { >> + fprintf (dump_file, "Inserting cast in block %d: ", >> + insert_bb->index); >> + print_gimple_stmt (dump_file, cast_stmt, 0, 0); >> + } >> fprintf (dump_file, "Inserting in block %d: ", insert_bb->index); >> print_gimple_stmt (dump_file, new_stmt, 0, 0); >> } >> @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine >> && !POINTER_TYPE_P (first_dep->cand_type))) >> incr_vec[i].cost = COST_NEUTRAL; >> >> - /* FORNOW: If we need to add an initializer, give up if a cast from >> - the candidate's type to its stride's type can lose precision. >> - This could eventually be handled better by expressly retaining the >> - result of a cast to a wider type in the stride. Example: >> + /* If we need to add an initializer, give up if a cast from the >> + candidate's type to its stride's type can lose precision. >> + Note that this already takes into account that the stride may >> + have been cast to a wider type, in which case this test won't >> + fire. Example: >> >> short int _1; >> _2 = (int) _1; >> _3 = _2 * 10; >> - _4 = x + _3; ADD: x + (10 * _1) : int >> + _4 = x + _3; ADD: x + (10 * (int)_1) : int >> _5 = _2 * 15; >> - _6 = x + _3; ADD: x + (15 * _1) : int >> + _6 = x + _5; ADD: x + (15 * (int)_1) : int >> >> - Right now replacing _6 would cause insertion of an initializer >> - of the form "short int T = _1 * 5;" followed by a cast to >> - int, which could overflow incorrectly. Had we recorded _2 or >> - (int)_1 as the stride, this wouldn't happen. However, doing >> - this breaks other opportunities, so this will require some >> - care. */ >> + Although the stride was a short int initially, the stride >> + used in the analysis has been widened to an int, and such >> + widening will be done in the initializer as well. */ >> else if (!incr_vec[i].initializer >> && TREE_CODE (first_dep->stride) != INTEGER_CST >> - && !legal_cast_p_1 (first_dep->stride, >> - gimple_assign_lhs (first_dep->cand_stmt))) >> - >> + && !legal_cast_p_1 (first_dep->stride_type, >> + TREE_TYPE (gimple_assign_lhs >> + (first_dep->cand_stmt)))) >> incr_vec[i].cost = COST_INFINITE; >> >> /* If we need to add an initializer, make sure we don't introduce >> a multiply by a pointer type, which can happen in certain cast >> - scenarios. FIXME: When cleaning up these cast issues, we can >> - afford to introduce the multiply provided we cast out to an >> - unsigned int of appropriate size. */ >> + scenarios. */ >> else if (!incr_vec[i].initializer >> && TREE_CODE (first_dep->stride) != INTEGER_CST >> - && POINTER_TYPE_P (TREE_TYPE (first_dep->stride))) >> - >> + && POINTER_TYPE_P (first_dep->stride_type)) >> incr_vec[i].cost = COST_INFINITE; >> >> /* For any other increment, if this is a multiply candidate, we >> @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c) >> basic_block bb; >> slsr_cand_t where = NULL; >> gassign *init_stmt; >> - tree stride_type, new_name, incr_tree; >> + gassign *cast_stmt = NULL; >> + tree new_name, incr_tree, init_stride; >> widest_int incr = incr_vec[i].incr; >> >> if (!profitable_increment_p (i) >> @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c) >> that block, the earliest one will be returned in WHERE. */ >> bb = nearest_common_dominator_for_cands (c, incr, &where); >> >> + /* If the nominal stride has a different type than the recorded >> + stride type, build a cast from the nominal stride to that type. */ >> + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) >> + { >> + init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr"); >> + cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride); >> + } >> + else >> + init_stride = c->stride; >> + >> /* Create a new SSA name to hold the initializer's value. */ >> - stride_type = TREE_TYPE (c->stride); >> - new_name = make_temp_ssa_name (stride_type, NULL, "slsr"); >> + new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr"); >> incr_vec[i].initializer = new_name; >> >> /* Create the initializer and insert it in the latest possible >> dominating position. */ >> - incr_tree = wide_int_to_tree (stride_type, incr); >> + incr_tree = wide_int_to_tree (c->stride_type, incr); >> init_stmt = gimple_build_assign (new_name, MULT_EXPR, >> - c->stride, incr_tree); >> + init_stride, incr_tree); >> if (where) >> { >> gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt); >> + location_t loc = gimple_location (where->cand_stmt); >> + >> + if (cast_stmt) >> + { >> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); >> + gimple_set_location (cast_stmt, loc); >> + } >> + >> gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); >> - gimple_set_location (init_stmt, gimple_location (where->cand_stmt)); >> + gimple_set_location (init_stmt, loc); >> } >> else >> { >> gimple_stmt_iterator gsi = gsi_last_bb (bb); >> gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt; >> + location_t loc = gimple_location (basis_stmt); >> >> if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) >> - gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); >> + { >> + if (cast_stmt) >> + { >> + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); >> + gimple_set_location (cast_stmt, loc); >> + } >> + gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); >> + } >> else >> - gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); >> + { >> + if (cast_stmt) >> + { >> + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); >> + gimple_set_location (cast_stmt, loc); >> + } >> + gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); >> + } >> >> gimple_set_location (init_stmt, gimple_location (basis_stmt)); >> } >> @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c) >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> { >> + if (cast_stmt) >> + { >> + fputs ("Inserting stride cast: ", dump_file); >> + print_gimple_stmt (dump_file, cast_stmt, 0, 0); >> + } >> fputs ("Inserting initializer: ", dump_file); >> print_gimple_stmt (dump_file, init_stmt, 0, 0); >> } >> Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (revision 241379) >> +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (working copy) >> @@ -1,48 +0,0 @@ >> -/* { dg-do compile } */ >> -/* { dg-options "-O1 -fdump-tree-slsr-details" } */ >> - >> -#include <stdio.h> >> - >> -#define W1 22725 >> -#define W2 21407 >> -#define W3 19266 >> -#define W6 8867 >> - >> -void idct_row(short *row, int *dst) >> -{ >> - int a0, a1, b0, b1; >> - >> - a0 = W1 * row[0]; >> - a1 = a0; >> - >> - a0 += W2 * row[2]; >> - a1 += W6 * row[2]; >> - >> - b0 = W1 * row[1]; >> - b1 = W3 * row[1]; >> - >> - dst[0] = a0 + b0; >> - dst[1] = a0 - b0; >> - dst[2] = a1 + b1; >> - dst[3] = a1 - b1; >> -} >> - >> -static short block[8] = { 1, 2, 3, 4 }; >> - >> -int main(void) >> -{ >> - int out[4]; >> - int i; >> - >> - idct_row(block, out); >> - >> - for (i = 0; i < 4; i++) >> - printf("%d\n", out[i]); >> - >> - return !(out[2] == 87858 && out[3] == 10794); >> -} >> - >> -/* For now, disable inserting an initializer when the multiplication will >> - take place in a smaller type than originally. This test may be deleted >> - in future when this case is handled more precisely. */ >> -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */ >> Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (revision 241379) >> +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (working copy) >> @@ -17,7 +17,12 @@ f (int s, int *c) >> return x1 ? x2 : x3; >> } >> >> +/* Note that since some branch prediction heuristics changed, the >> + calculations of x2 and x3 are pushed downward into the legs >> + of the conditional, changing the code presented to SLSR. >> + However, this proves to be a useful test for introducing an >> + initializer with a cast, so we'll keep it as is. */ >> + >> /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is >> - added), 1 parm, 2 in the code. The second one in the code can be >> - a widening mult. */ >> -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */ >> + added), 2 parms, 3 in the code. */ >> +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */ >> >>
On Nov 4, 2016, at 9:10 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: > > > Hi, > > Since this was committed, I'm seeing a failure in > gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9 > on aarch64 targets. Hi Christophe, Best open a bug report or I will likely lose this in the crush. If you don't mind, please include the generated assembly file so I can see why the number of *'s is off, and provide the output from -fdump-tree-slsr-details. I imagine this is a minor issue. Please CC me as wschmidt@gcc.gnu.org. Thanks! Bill
On 4 November 2016 at 15:36, Bill Schmidt <wschmidt@linux.vnet.ibm.com> wrote: > On Nov 4, 2016, at 9:10 AM, Christophe Lyon <christophe.lyon@linaro.org> wrote: >> >> >> Hi, >> >> Since this was committed, I'm seeing a failure in >> gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " \\* " 9 >> on aarch64 targets. > > Hi Christophe, > > Best open a bug report or I will likely lose this in the crush. If you don't mind, > please include the generated assembly file so I can see why the number of *'s > is off, and provide the output from -fdump-tree-slsr-details. I imagine this is > a minor issue. Please CC me as wschmidt@gcc.gnu.org. > OK, I've created pr78210 for this. Christophe > Thanks! > Bill
Index: gcc/gimple-ssa-strength-reduction.c =================================================================== --- gcc/gimple-ssa-strength-reduction.c (revision 241379) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -246,6 +246,13 @@ struct slsr_cand_d replacement MEM_REF.) */ tree cand_type; + /* The type to be used to interpret the stride field when the stride + is not a constant. Normally the same as the type of the recorded + stride, but when the stride has been cast we need to maintain that + knowledge in order to make legal substitutions without losing + precision. When the stride is a constant, this will be sizetype. */ + tree stride_type; + /* The kind of candidate (CAND_MULT, etc.). */ enum cand_kind kind; @@ -502,6 +509,7 @@ find_basis_for_base_expr (slsr_cand_t c, tree base || one_basis->cand_stmt == c->cand_stmt || !operand_equal_p (one_basis->stride, c->stride, 0) || !types_compatible_p (one_basis->cand_type, c->cand_type) + || !types_compatible_p (one_basis->stride_type, c->stride_type) || !dominated_by_p (CDI_DOMINATORS, gimple_bb (c->cand_stmt), gimple_bb (one_basis->cand_stmt))) @@ -615,7 +623,7 @@ record_potential_basis (slsr_cand_t c, tree base) static slsr_cand_t alloc_cand_and_find_basis (enum cand_kind kind, gimple *gs, tree base, const widest_int &index, tree stride, tree ctype, - unsigned savings) + tree stype, unsigned savings) { slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack, sizeof (slsr_cand)); @@ -624,6 +632,7 @@ alloc_cand_and_find_basis (enum cand_kind kind, gi c->stride = stride; c->index = index; c->cand_type = ctype; + c->stride_type = stype; c->kind = kind; c->cand_num = cand_vec.length () + 1; c->next_interp = 0; @@ -809,7 +818,8 @@ slsr_process_phi (gphi *phi, bool speed) base_type = TREE_TYPE (arg0_base); c = alloc_cand_and_find_basis (CAND_PHI, phi, arg0_base, - 0, integer_one_node, base_type, savings); + 0, integer_one_node, base_type, + sizetype, savings); /* Add the candidate to the statement-candidate mapping. */ add_cand_for_stmt (phi, c); @@ -838,7 +848,8 @@ backtrace_base_for_ref (tree *pbase) e.g. 'B' is widened from an 'int' in order to calculate a 64-bit address. */ if (CONVERT_EXPR_P (base_in) - && legal_cast_p_1 (base_in, TREE_OPERAND (base_in, 0))) + && legal_cast_p_1 (TREE_TYPE (base_in), + TREE_TYPE (TREE_OPERAND (base_in, 0)))) base_in = get_unwidened (base_in, NULL_TREE); if (TREE_CODE (base_in) != SSA_NAME) @@ -995,7 +1006,7 @@ slsr_process_ref (gimple *gs) return; c = alloc_cand_and_find_basis (CAND_REF, gs, base, index, offset, - type, 0); + type, sizetype, 0); /* Add the candidate to the statement-candidate mapping. */ add_cand_for_stmt (gs, c); @@ -1010,6 +1021,7 @@ static slsr_cand_t create_mul_ssa_cand (gimple *gs, tree base_in, tree stride_in, bool speed) { tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; + tree stype = NULL_TREE; widest_int index; unsigned savings = 0; slsr_cand_t c; @@ -1030,6 +1042,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre index = base_cand->index; stride = stride_in; ctype = base_cand->cand_type; + stype = TREE_TYPE (stride_in); if (has_single_use (base_in)) savings = (base_cand->dead_savings + stmt_cost (base_cand->cand_stmt, speed)); @@ -1045,6 +1058,7 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre index = base_cand->index * wi::to_widest (base_cand->stride); stride = stride_in; ctype = base_cand->cand_type; + stype = TREE_TYPE (stride_in); if (has_single_use (base_in)) savings = (base_cand->dead_savings + stmt_cost (base_cand->cand_stmt, speed)); @@ -1064,10 +1078,11 @@ create_mul_ssa_cand (gimple *gs, tree base_in, tre index = 0; stride = stride_in; ctype = TREE_TYPE (base_in); + stype = TREE_TYPE (stride_in); } c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, - ctype, savings); + ctype, stype, savings); return c; } @@ -1156,7 +1171,7 @@ create_mul_imm_cand (gimple *gs, tree base_in, tre } c = alloc_cand_and_find_basis (CAND_MULT, gs, base, index, stride, - ctype, savings); + ctype, sizetype, savings); return c; } @@ -1212,7 +1227,8 @@ static slsr_cand_t create_add_ssa_cand (gimple *gs, tree base_in, tree addend_in, bool subtract_p, bool speed) { - tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL; + tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; + tree stype = NULL_TREE; widest_int index; unsigned savings = 0; slsr_cand_t c; @@ -1237,6 +1253,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre index = -index; stride = addend_cand->base_expr; ctype = TREE_TYPE (base_in); + stype = addend_cand->cand_type; if (has_single_use (addend_in)) savings = (addend_cand->dead_savings + stmt_cost (addend_cand->cand_stmt, speed)); @@ -1263,6 +1280,8 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre index = subtract_p ? -1 : 1; stride = addend_in; ctype = base_cand->cand_type; + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype + : TREE_TYPE (addend_in)); if (has_single_use (base_in)) savings = (base_cand->dead_savings + stmt_cost (base_cand->cand_stmt, speed)); @@ -1286,6 +1305,7 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre index = -index; stride = subtrahend_cand->base_expr; ctype = TREE_TYPE (base_in); + stype = subtrahend_cand->cand_type; if (has_single_use (addend_in)) savings = (subtrahend_cand->dead_savings + stmt_cost (subtrahend_cand->cand_stmt, speed)); @@ -1312,10 +1332,12 @@ create_add_ssa_cand (gimple *gs, tree base_in, tre index = subtract_p ? -1 : 1; stride = addend_in; ctype = TREE_TYPE (base_in); + stype = (TREE_CODE (addend_in) == INTEGER_CST ? sizetype + : TREE_TYPE (addend_in)); } c = alloc_cand_and_find_basis (CAND_ADD, gs, base, index, stride, - ctype, savings); + ctype, stype, savings); return c; } @@ -1329,6 +1351,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con { enum cand_kind kind = CAND_ADD; tree base = NULL_TREE, stride = NULL_TREE, ctype = NULL_TREE; + tree stype = NULL_TREE; widest_int index, multiple; unsigned savings = 0; slsr_cand_t c; @@ -1356,6 +1379,7 @@ create_add_imm_cand (gimple *gs, tree base_in, con index = base_cand->index + multiple; stride = base_cand->stride; ctype = base_cand->cand_type; + stype = base_cand->stride_type; if (has_single_use (base_in)) savings = (base_cand->dead_savings + stmt_cost (base_cand->cand_stmt, speed)); @@ -1376,10 +1400,11 @@ create_add_imm_cand (gimple *gs, tree base_in, con index = index_in; stride = integer_one_node; ctype = TREE_TYPE (base_in); + stype = sizetype; } c = alloc_cand_and_find_basis (kind, gs, base, index, stride, - ctype, savings); + ctype, stype, savings); return c; } @@ -1456,14 +1481,11 @@ slsr_process_neg (gimple *gs, tree rhs1, bool spee for more details. */ static bool -legal_cast_p_1 (tree lhs, tree rhs) +legal_cast_p_1 (tree lhs_type, tree rhs_type) { - tree lhs_type, rhs_type; unsigned lhs_size, rhs_size; bool lhs_wraps, rhs_wraps; - lhs_type = TREE_TYPE (lhs); - rhs_type = TREE_TYPE (rhs); lhs_size = TYPE_PRECISION (lhs_type); rhs_size = TYPE_PRECISION (rhs_type); lhs_wraps = ANY_INTEGRAL_TYPE_P (lhs_type) && TYPE_OVERFLOW_WRAPS (lhs_type); @@ -1521,7 +1543,7 @@ legal_cast_p (gimple *gs, tree rhs) || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (gs))) return false; - return legal_cast_p_1 (gimple_assign_lhs (gs), rhs); + return legal_cast_p_1 (TREE_TYPE (gimple_assign_lhs (gs)), TREE_TYPE (rhs)); } /* Given GS which is a cast to a scalar integer type, determine whether @@ -1556,7 +1578,8 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe c = alloc_cand_and_find_basis (base_cand->kind, gs, base_cand->base_expr, base_cand->index, base_cand->stride, - ctype, savings); + ctype, base_cand->stride_type, + savings); if (base_cand->next_interp) base_cand = lookup_cand (base_cand->next_interp); else @@ -1574,10 +1597,10 @@ slsr_process_cast (gimple *gs, tree rhs1, bool spe The first of these is somewhat arbitrary, but the choice of 1 for the stride simplifies the logic for propagating casts into their uses. */ - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, - 0, integer_one_node, ctype, 0); - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, - 0, integer_one_node, ctype, 0); + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, + integer_one_node, ctype, sizetype, 0); + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, + integer_one_node, ctype, sizetype, 0); c->next_interp = c2->cand_num; } @@ -1613,7 +1636,8 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe c = alloc_cand_and_find_basis (base_cand->kind, gs, base_cand->base_expr, base_cand->index, base_cand->stride, - base_cand->cand_type, savings); + base_cand->cand_type, + base_cand->stride_type, savings); if (base_cand->next_interp) base_cand = lookup_cand (base_cand->next_interp); else @@ -1631,10 +1655,12 @@ slsr_process_copy (gimple *gs, tree rhs1, bool spe The first of these is somewhat arbitrary, but the choice of 1 for the stride simplifies the logic for propagating casts into their uses. */ - c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, - 0, integer_one_node, TREE_TYPE (rhs1), 0); - c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, - 0, integer_one_node, TREE_TYPE (rhs1), 0); + c = alloc_cand_and_find_basis (CAND_ADD, gs, rhs1, 0, + integer_one_node, TREE_TYPE (rhs1), + sizetype, 0); + c2 = alloc_cand_and_find_basis (CAND_MULT, gs, rhs1, 0, + integer_one_node, TREE_TYPE (rhs1), + sizetype, 0); c->next_interp = c2->cand_num; } @@ -1755,6 +1781,13 @@ dump_candidate (slsr_cand_t c) fputs (" + ", dump_file); print_decs (c->index, dump_file); fputs (") * ", dump_file); + if (TREE_CODE (c->stride) != INTEGER_CST + && c->stride_type != TREE_TYPE (c->stride)) + { + fputs ("(", dump_file); + print_generic_expr (dump_file, c->stride_type, 0); + fputs (")", dump_file); + } print_generic_expr (dump_file, c->stride, 0); fputs (" : ", dump_file); break; @@ -1764,6 +1797,13 @@ dump_candidate (slsr_cand_t c) fputs (" + (", dump_file); print_decs (c->index, dump_file); fputs (" * ", dump_file); + if (TREE_CODE (c->stride) != INTEGER_CST + && c->stride_type != TREE_TYPE (c->stride)) + { + fputs ("(", dump_file); + print_generic_expr (dump_file, c->stride_type, 0); + fputs (")", dump_file); + } print_generic_expr (dump_file, c->stride, 0); fputs (") : ", dump_file); break; @@ -2143,7 +2183,7 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b basic_block insert_bb; gimple_stmt_iterator gsi; tree lhs, basis_type; - gassign *new_stmt; + gassign *new_stmt, *cast_stmt = NULL; /* If the add candidate along this incoming edge has the same index as C's hidden basis, the hidden basis represents this @@ -2187,13 +2227,27 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b new_stmt = gimple_build_assign (lhs, code, basis_name, incr_vec[i].initializer); } - else if (increment == 1) - new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride); - else if (increment == -1) - new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, - c->stride); - else - gcc_unreachable (); + else { + tree stride; + + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) + { + tree cast_stride = make_temp_ssa_name (c->stride_type, NULL, + "slsr"); + cast_stmt = gimple_build_assign (cast_stride, NOP_EXPR, + c->stride); + stride = cast_stride; + } + else + stride = c->stride; + + if (increment == 1) + new_stmt = gimple_build_assign (lhs, plus_code, basis_name, stride); + else if (increment == -1) + new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name, stride); + else + gcc_unreachable (); + } } insert_bb = single_succ_p (e->src) ? e->src : split_edge (e); @@ -2200,14 +2254,34 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b gsi = gsi_last_bb (insert_bb); if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) - gsi_insert_before (&gsi, new_stmt, GSI_NEW_STMT); + { + gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT); + if (cast_stmt) + { + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); + gimple_set_location (cast_stmt, loc); + } + } else - gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); + { + if (cast_stmt) + { + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); + gimple_set_location (cast_stmt, loc); + } + gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT); + } gimple_set_location (new_stmt, loc); if (dump_file && (dump_flags & TDF_DETAILS)) { + if (cast_stmt) + { + fprintf (dump_file, "Inserting cast in block %d: ", + insert_bb->index); + print_gimple_stmt (dump_file, cast_stmt, 0, 0); + } fprintf (dump_file, "Inserting in block %d: ", insert_bb->index); print_gimple_stmt (dump_file, new_stmt, 0, 0); } @@ -2825,40 +2899,35 @@ analyze_increments (slsr_cand_t first_dep, machine && !POINTER_TYPE_P (first_dep->cand_type))) incr_vec[i].cost = COST_NEUTRAL; - /* FORNOW: If we need to add an initializer, give up if a cast from - the candidate's type to its stride's type can lose precision. - This could eventually be handled better by expressly retaining the - result of a cast to a wider type in the stride. Example: + /* If we need to add an initializer, give up if a cast from the + candidate's type to its stride's type can lose precision. + Note that this already takes into account that the stride may + have been cast to a wider type, in which case this test won't + fire. Example: short int _1; _2 = (int) _1; _3 = _2 * 10; - _4 = x + _3; ADD: x + (10 * _1) : int + _4 = x + _3; ADD: x + (10 * (int)_1) : int _5 = _2 * 15; - _6 = x + _3; ADD: x + (15 * _1) : int + _6 = x + _5; ADD: x + (15 * (int)_1) : int - Right now replacing _6 would cause insertion of an initializer - of the form "short int T = _1 * 5;" followed by a cast to - int, which could overflow incorrectly. Had we recorded _2 or - (int)_1 as the stride, this wouldn't happen. However, doing - this breaks other opportunities, so this will require some - care. */ + Although the stride was a short int initially, the stride + used in the analysis has been widened to an int, and such + widening will be done in the initializer as well. */ else if (!incr_vec[i].initializer && TREE_CODE (first_dep->stride) != INTEGER_CST - && !legal_cast_p_1 (first_dep->stride, - gimple_assign_lhs (first_dep->cand_stmt))) - + && !legal_cast_p_1 (first_dep->stride_type, + TREE_TYPE (gimple_assign_lhs + (first_dep->cand_stmt)))) incr_vec[i].cost = COST_INFINITE; /* If we need to add an initializer, make sure we don't introduce a multiply by a pointer type, which can happen in certain cast - scenarios. FIXME: When cleaning up these cast issues, we can - afford to introduce the multiply provided we cast out to an - unsigned int of appropriate size. */ + scenarios. */ else if (!incr_vec[i].initializer && TREE_CODE (first_dep->stride) != INTEGER_CST - && POINTER_TYPE_P (TREE_TYPE (first_dep->stride))) - + && POINTER_TYPE_P (first_dep->stride_type)) incr_vec[i].cost = COST_INFINITE; /* For any other increment, if this is a multiply candidate, we @@ -3105,7 +3174,8 @@ insert_initializers (slsr_cand_t c) basic_block bb; slsr_cand_t where = NULL; gassign *init_stmt; - tree stride_type, new_name, incr_tree; + gassign *cast_stmt = NULL; + tree new_name, incr_tree, init_stride; widest_int incr = incr_vec[i].incr; if (!profitable_increment_p (i) @@ -3134,31 +3204,63 @@ insert_initializers (slsr_cand_t c) that block, the earliest one will be returned in WHERE. */ bb = nearest_common_dominator_for_cands (c, incr, &where); + /* If the nominal stride has a different type than the recorded + stride type, build a cast from the nominal stride to that type. */ + if (!types_compatible_p (TREE_TYPE (c->stride), c->stride_type)) + { + init_stride = make_temp_ssa_name (c->stride_type, NULL, "slsr"); + cast_stmt = gimple_build_assign (init_stride, NOP_EXPR, c->stride); + } + else + init_stride = c->stride; + /* Create a new SSA name to hold the initializer's value. */ - stride_type = TREE_TYPE (c->stride); - new_name = make_temp_ssa_name (stride_type, NULL, "slsr"); + new_name = make_temp_ssa_name (c->stride_type, NULL, "slsr"); incr_vec[i].initializer = new_name; /* Create the initializer and insert it in the latest possible dominating position. */ - incr_tree = wide_int_to_tree (stride_type, incr); + incr_tree = wide_int_to_tree (c->stride_type, incr); init_stmt = gimple_build_assign (new_name, MULT_EXPR, - c->stride, incr_tree); + init_stride, incr_tree); if (where) { gimple_stmt_iterator gsi = gsi_for_stmt (where->cand_stmt); + location_t loc = gimple_location (where->cand_stmt); + + if (cast_stmt) + { + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); + gimple_set_location (cast_stmt, loc); + } + gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); - gimple_set_location (init_stmt, gimple_location (where->cand_stmt)); + gimple_set_location (init_stmt, loc); } else { gimple_stmt_iterator gsi = gsi_last_bb (bb); gimple *basis_stmt = lookup_cand (c->basis)->cand_stmt; + location_t loc = gimple_location (basis_stmt); if (!gsi_end_p (gsi) && is_ctrl_stmt (gsi_stmt (gsi))) - gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); + { + if (cast_stmt) + { + gsi_insert_before (&gsi, cast_stmt, GSI_SAME_STMT); + gimple_set_location (cast_stmt, loc); + } + gsi_insert_before (&gsi, init_stmt, GSI_SAME_STMT); + } else - gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); + { + if (cast_stmt) + { + gsi_insert_after (&gsi, cast_stmt, GSI_NEW_STMT); + gimple_set_location (cast_stmt, loc); + } + gsi_insert_after (&gsi, init_stmt, GSI_SAME_STMT); + } gimple_set_location (init_stmt, gimple_location (basis_stmt)); } @@ -3165,6 +3267,11 @@ insert_initializers (slsr_cand_t c) if (dump_file && (dump_flags & TDF_DETAILS)) { + if (cast_stmt) + { + fputs ("Inserting stride cast: ", dump_file); + print_gimple_stmt (dump_file, cast_stmt, 0, 0); + } fputs ("Inserting initializer: ", dump_file); print_gimple_stmt (dump_file, init_stmt, 0, 0); } Index: gcc/testsuite/gcc.dg/tree-ssa/pr54245.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (revision 241379) +++ gcc/testsuite/gcc.dg/tree-ssa/pr54245.c (working copy) @@ -1,48 +0,0 @@ -/* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-slsr-details" } */ - -#include <stdio.h> - -#define W1 22725 -#define W2 21407 -#define W3 19266 -#define W6 8867 - -void idct_row(short *row, int *dst) -{ - int a0, a1, b0, b1; - - a0 = W1 * row[0]; - a1 = a0; - - a0 += W2 * row[2]; - a1 += W6 * row[2]; - - b0 = W1 * row[1]; - b1 = W3 * row[1]; - - dst[0] = a0 + b0; - dst[1] = a0 - b0; - dst[2] = a1 + b1; - dst[3] = a1 - b1; -} - -static short block[8] = { 1, 2, 3, 4 }; - -int main(void) -{ - int out[4]; - int i; - - idct_row(block, out); - - for (i = 0; i < 4; i++) - printf("%d\n", out[i]); - - return !(out[2] == 87858 && out[3] == 10794); -} - -/* For now, disable inserting an initializer when the multiplication will - take place in a smaller type than originally. This test may be deleted - in future when this case is handled more precisely. */ -/* { dg-final { scan-tree-dump-times "Inserting initializer" 0 "slsr" { target { ! int16 } } } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (revision 241379) +++ gcc/testsuite/gcc.dg/tree-ssa/slsr-8.c (working copy) @@ -17,7 +17,12 @@ f (int s, int *c) return x1 ? x2 : x3; } +/* Note that since some branch prediction heuristics changed, the + calculations of x2 and x3 are pushed downward into the legs + of the conditional, changing the code presented to SLSR. + However, this proves to be a useful test for introducing an + initializer with a cast, so we'll keep it as is. */ + /* There are 4 ' * ' instances in the decls (since "int * iftmp.0;" is - added), 1 parm, 2 in the code. The second one in the code can be - a widening mult. */ -/* { dg-final { scan-tree-dump-times " w?\\* " 7 "optimized" } } */ + added), 2 parms, 3 in the code. */ +/* { dg-final { scan-tree-dump-times " \\* " 9 "optimized" } } */