From patchwork Tue Oct 16 17:32:26 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 191831 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 78B542C0098 for ; Wed, 17 Oct 2012 04:34:02 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1351013642; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Date: From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To:User-Agent: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=FPe3XAsfxYkf8x7wCRqH HyV27Ak=; b=WsuNMwmOVtQVAEXWutY9q+RFdPmh+Hm0+n3z9ENY3/hsXKSSpwrk 3NVyJdfod//o5s+c959mQDr61pBQhGoK8omXgbEZZ3o5qsfvtkbOi2emy8ut+2lT qougxJoTyJiPag6fZ01KlqORfC23pq0mxsv/MF0ARhwBEWgZ08ZxKMc= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:Content-Type:Content-Disposition:In-Reply-To:User-Agent:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=uWBC6IBtlBrB7TNStJvdDu0EfRK9yUqMMAg7odVUQTs1jqZ+VfYEVrQXxrBwSc uCwmITh8LbOhiQ46PFjmgELP574lUwgl3nsTBhYh4cm1zdWkNSXIiKnIYRhm3BSK STDoFVoYFW6K2pLxo7Ql/RCYYrOD56hX/v10ohZYKrT6k=; Received: (qmail 31006 invoked by alias); 16 Oct 2012 17:33:16 -0000 Received: (qmail 30886 invoked by uid 22791); 16 Oct 2012 17:33:12 -0000 X-SWARE-Spam-Status: No, hits=-4.0 required=5.0 tests=AWL, BAYES_00, KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RP_MATCHES_RCVD, TW_CF, TW_TM X-Spam-Check-By: sourceware.org Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz) (195.113.20.16) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 16 Oct 2012 17:32:29 +0000 Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id C74AD5437CD; Tue, 16 Oct 2012 19:32:26 +0200 (CEST) Date: Tue, 16 Oct 2012 19:32:26 +0200 From: Jan Hubicka To: Jan Hubicka Cc: Richard Biener , gcc-patches@gcc.gnu.org Subject: Re: Make try_unroll_loop_completely to use loop bounds recorded Message-ID: <20121016173226.GA1373@kam.mff.cuni.cz> References: <20121011162109.GC32179@kam.mff.cuni.cz> <20121012132430.GA18634@kam.mff.cuni.cz> <20121013205922.GA2254@kam.mff.cuni.cz> <20121014194308.GA19314@kam.mff.cuni.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20121014194308.GA19314@kam.mff.cuni.cz> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, here is third revised version of the complette unroling changes. While working on the RTL variant I noticed PR54937 and the fact that I was overly aggressive on forcing single exit of the last iteration to be taken, because loop may terminate otherwise (by EH or by exitting the program). Same thinko is in loop-niter. This patch adds loop_edge_to_cancel that is more conservative: it looks for the exit conditional where the non-exitting edges leads to latch and verifies that latch contains no statement with side effect that may terminate the loop. This still actually matches quite few non-single-exit loops and works well in practice. Unlike previous revision it also enables complette unrolling when code size does not grow even for non-innermost loops (with update in tree_unroll_loops_completely to walk them). This is something we did on RTL land but missed in trees. This actually enables quite some optimizations when things can be propagated to the tiny inner loop body. I also fixed accounting in tree_estimate_loop_size for the cases where last iteration is not going to be updated. Finally I added code constructing __bulitin_unreachable as suggested by Ian. Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror disabled and benchmarked. Among best benefits is about 7% improvement on Applu, and it causes up to 15% improvements on vectorized loops with small iteration counts (by completelly peeling the precondition code). There are no real performance regressions but there is some code size bloat. I plan to followup with strenghtening the heuristic to disable unrolling when benefits are absymal. Easy is to limit unrolling on loops with CFG and/or calls in them. We already have quite informed analysis in place. I also plan to move simple FDO guided loop peeling from RTL level to trees to enable more propagation into peeled sequences. The patch also triggers bug in niter and requires xfailing do_1.f90 testcase. I filled PR 54932 to track this. There are also confused array bound warnings I hope to track incrementally, too, by recording statements that are known to become unreachable in the last iteration and adding __buitin_unreachable in front of them. This is also important to avoid duplication leading to dead code: no other optimizers force paths leading to out of bound accesses to not happen. Honza * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add edge_to_cancel parameter and use it to estimate code optimized out in the final iteration. (loop_edge_to_cancel): New function. (try_unroll_loop_completely): New IRRED_IVALIDATED parameter; handle unrolling loops with bounds given via max_loop_iteratins; handle unrolling non-inner loops when code size shrinks; tidy dump output; when the last iteration loop still stays as loop in the CFG forcongly redirect the latch to __builtin_unreachable. (canonicalize_loop_induction_variables): Add irred_invlaidated parameter; record niter bound derrived; dump max_loop_iterations bounds; call try_unroll_loop_completely even if no niter bound is given. (canonicalize_induction_variables): Handle irred_invalidated. (tree_unroll_loops_completely): Handle non-innermost loops; handle irred_invalidated. * cfgloop.h (unlop): Declare. * cfgloopmanip.c (unloop): Export. * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE. * gcc.target/i386/l_fma_float_?.c: Update. * gcc.target/i386/l_fma_double_?.c: Update. * gfortran.dg/do_1.f90: XFAIL * gcc.dg/tree-ssa/cunroll-1.c: New testcase. * gcc.dg/tree-ssa/cunroll-2.c: New testcase. * gcc.dg/tree-ssa/cunroll-3.c: New testcase. * gcc.dg/tree-ssa/cunroll-4.c: New testcase. * gcc.dg/tree-ssa/cunroll-5.c: New testcase. * gcc.dg/tree-ssa/ldist-17.c: Block cunroll to make testcase still valid. Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 192483) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -192,7 +192,7 @@ constant_after_peeling (tree op, gimple Return results in SIZE, estimate benefits for complete unrolling exiting by EXIT. */ static void -tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size) +tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, struct loop_size *size) { basic_block *body = get_loop_body (loop); gimple_stmt_iterator gsi; @@ -208,8 +208,8 @@ tree_estimate_loop_size (struct loop *lo fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num); for (i = 0; i < loop->num_nodes; i++) { - if (exit && body[i] != exit->src - && dominated_by_p (CDI_DOMINATORS, body[i], exit->src)) + if (edge_to_cancel && body[i] != edge_to_cancel->src + && dominated_by_p (CDI_DOMINATORS, body[i], edge_to_cancel->src)) after_exit = true; else after_exit = false; @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo /* Look for reasons why we might optimize this stmt away. */ /* Exit conditional. */ - if (body[i] == exit->src && stmt == last_stmt (exit->src)) + if (exit && body[i] == exit->src && stmt == last_stmt (exit->src)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " Exit condition will be eliminated.\n"); @@ -314,36 +314,161 @@ estimated_unrolled_size (struct loop_siz return unr_insns; } +/* Loop LOOP is known to not loop. See if there is an edge in the loop + body that can be remove to make the loop to always exit and at + the same time it does not make any code potentially executed + during the last iteration dead. + + After complette unrolling we still may get rid of the conditional + on the exit in the last copy even if we have no idea what it does. + This is quite common case for loops of form + + int a[5]; + for (i=0;ilatch->preds) > 1) + return NULL; + + exits = get_loop_exit_edges (loop); + + FOR_EACH_VEC_ELT (edge, exits, i, edge_to_cancel) + { + /* Find the other edge than the loop exit + leaving the conditoinal. */ + if (EDGE_COUNT (edge_to_cancel->src->succs) != 2) + continue; + if (EDGE_SUCC (edge_to_cancel->src, 0) == edge_to_cancel) + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 1); + else + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 0); + + /* We should never have conditionals in the loop latch. */ + gcc_assert (edge_to_cancel->dest != loop->header); + + /* Check that it leads to loop latch. */ + if (edge_to_cancel->dest != loop->latch) + continue; + + VEC_free (edge, heap, exits); + + /* Verify that the code in loop latch does nothing that may end program + execution without really reaching the exit. This may include + non-pure/const function calls, EH statements, volatile ASMs etc. */ + for (gsi = gsi_start_bb (loop->latch); !gsi_end_p (gsi); gsi_next (&gsi)) + if (gimple_has_side_effects (gsi_stmt (gsi))) + return NULL; + return edge_to_cancel; + } + VEC_free (edge, heap, exits); + return NULL; +} + /* Tries to unroll LOOP completely, i.e. NITER times. UL determines which loops we are allowed to unroll. - EXIT is the exit of the loop that should be eliminated. */ + EXIT is the exit of the loop that should be eliminated. + IRRED_INVALIDATED is used to bookkeep if information about + irreducible regions may become invalid as a result + of the transformation. */ static bool try_unroll_loop_completely (struct loop *loop, edge exit, tree niter, - enum unroll_level ul) + enum unroll_level ul, + bool *irred_invalidated) { unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; gimple cond; struct loop_size size; + bool n_unroll_found = false; + HOST_WIDE_INT maxiter; + basic_block latch; + edge latch_edge; + location_t locus; + int flags; + gimple stmt; + gimple_stmt_iterator gsi; + edge edge_to_cancel = NULL; + int num = loop->num; - if (loop->inner) - return false; + /* See if we proved number of iterations to be low constant. - if (!host_integerp (niter, 1)) + EXIT is an edge that will be removed in all but last iteration of + the loop. + + EDGE_TO_CACNEL is an edge that will be removed from the last iteration + of the unrolled sequence and is expected to make the final loop not + rolling. + + If the number of execution of loop is determined by standard induction + variable test, then EXIT and EDGE_TO_CANCEL are the two edges leaving + from the iv test. */ + if (host_integerp (niter, 1)) + { + n_unroll = tree_low_cst (niter, 1); + n_unroll_found = true; + edge_to_cancel = EDGE_SUCC (exit->src, 0); + if (edge_to_cancel == exit) + edge_to_cancel = EDGE_SUCC (exit->src, 1); + } + /* We do not know the number of iterations and thus we can not eliminate + the EXIT edge. */ + else + exit = NULL; + + /* See if we can improve our estimate by using recorded loop bounds. */ + maxiter = max_loop_iterations_int (loop); + if (maxiter >= 0 + && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll)) + { + n_unroll = maxiter; + n_unroll_found = true; + /* Loop terminates before the IV variable test, so we can not + remove it in the last iteration. */ + edge_to_cancel = NULL; + } + + if (!n_unroll_found) return false; - n_unroll = tree_low_cst (niter, 1); max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES); if (n_unroll > max_unroll) return false; + if (!edge_to_cancel) + edge_to_cancel = loop_edge_to_cancel (loop); + if (n_unroll) { + sbitmap wont_exit; + edge e; + unsigned i; + VEC (edge, heap) *to_remove = NULL; if (ul == UL_SINGLE_ITER) return false; - tree_estimate_loop_size (loop, exit, &size); + tree_estimate_loop_size (loop, exit, edge_to_cancel, &size); ninsns = size.overall; unr_insns = estimated_unrolled_size (&size, n_unroll); @@ -354,6 +479,18 @@ try_unroll_loop_completely (struct loop (int) unr_insns); } + /* We unroll only inner loops, because we do not consider it profitable + otheriwse. We still can cancel loopback edge of not rolling loop; + this is always a good idea. */ + if (loop->inner && unr_insns > ninsns) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d:" + "it is not innermost and code would grow.\n", + loop->num); + return false; + } + if (unr_insns > ninsns && (unr_insns > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))) @@ -369,17 +506,10 @@ try_unroll_loop_completely (struct loop && unr_insns > ninsns) { if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d.\n", loop->num); + fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", + loop->num); return false; } - } - - if (n_unroll) - { - sbitmap wont_exit; - edge e; - unsigned i; - VEC (edge, heap) *to_remove = NULL; initialize_original_copy_tables (); wont_exit = sbitmap_alloc (n_unroll + 1); @@ -408,15 +538,67 @@ try_unroll_loop_completely (struct loop free_original_copy_tables (); } - cond = last_stmt (exit->src); - if (exit->flags & EDGE_TRUE_VALUE) - gimple_cond_make_true (cond); - else - gimple_cond_make_false (cond); - update_stmt (cond); + /* Remove the conditional from the last copy of the loop. */ + if (edge_to_cancel) + { + cond = last_stmt (edge_to_cancel->src); + if (edge_to_cancel->flags & EDGE_TRUE_VALUE) + gimple_cond_make_false (cond); + else + gimple_cond_make_true (cond); + update_stmt (cond); + /* Do not remove the path. Doing so may remove outer loop + and confuse bookkeeping code in tree_unroll_loops_completelly. */ + } + /* We did not manage to cancel the loop. + The loop latch remains reachable even if it will never be reached + at runtime. We must redirect it to somewhere, so create basic + block containg __builtin_unreachable call for this reason. */ + else + { + latch = loop->latch; + latch_edge = loop_latch_edge (loop); + flags = latch_edge->flags; + locus = latch_edge->goto_locus; + + /* Unloop destroys the latch edge. */ + unloop (loop, irred_invalidated); + + /* Create new basic block for the latch edge destination and wire + it in. */ + stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 0); + latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), flags); + latch_edge->probability = 0; + latch_edge->count = 0; + latch_edge->flags |= flags; + latch_edge->goto_locus = locus; + + latch_edge->dest->loop_father = current_loops->tree_root; + latch_edge->dest->count = 0; + latch_edge->dest->frequency = 0; + set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, latch_edge->src); + + gsi = gsi_start_bb (latch_edge->dest); + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); + } if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num); + { + if (!n_unroll) + fprintf (dump_file, "Turned loop %d to non-loop; it never loops.\n", + num); + else + fprintf (dump_file, "Unrolled loop %d completely " + "(duplicated %i times).\n", num, (int)n_unroll); + if (exit) + fprintf (dump_file, "Exit condition of peeled iterations was " + "eliminated.\n"); + if (edge_to_cancel) + fprintf (dump_file, "Last iteration exit edge was proved true.\n"); + else + fprintf (dump_file, "Latch of last iteration was marked by " + "__builtin_unreachable ().\n"); + } return true; } @@ -425,12 +608,15 @@ try_unroll_loop_completely (struct loop CREATE_IV is true if we may create a new iv. UL determines which loops we are allowed to completely unroll. If TRY_EVAL is true, we try to determine the number of iterations of a loop by direct evaluation. - Returns true if cfg is changed. */ + Returns true if cfg is changed. + + IRRED_INVALIDATED is used to keep if irreducible reginos needs to be recomputed. */ static bool canonicalize_loop_induction_variables (struct loop *loop, bool create_iv, enum unroll_level ul, - bool try_eval) + bool try_eval, + bool *irred_invalidated) { edge exit = NULL; tree niter; @@ -455,22 +641,34 @@ canonicalize_loop_induction_variables (s || TREE_CODE (niter) != INTEGER_CST)) niter = find_loop_niter_by_eval (loop, &exit); - if (chrec_contains_undetermined (niter) - || TREE_CODE (niter) != INTEGER_CST) - return false; + if (TREE_CODE (niter) != INTEGER_CST) + exit = NULL; } - if (dump_file && (dump_flags & TDF_DETAILS)) + /* We work exceptionally hard here to estimate the bound + by find_loop_niter_by_eval. Be sure to keep it for future. */ + if (niter && TREE_CODE (niter) == INTEGER_CST) + record_niter_bound (loop, tree_to_double_int (niter), false, true); + + if (dump_file && (dump_flags & TDF_DETAILS) + && TREE_CODE (niter) == INTEGER_CST) { fprintf (dump_file, "Loop %d iterates ", loop->num); print_generic_expr (dump_file, niter, TDF_SLIM); fprintf (dump_file, " times.\n"); } + if (dump_file && (dump_flags & TDF_DETAILS) + && max_loop_iterations_int (loop) >= 0) + { + fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num, + (int)max_loop_iterations_int (loop)); + } - if (try_unroll_loop_completely (loop, exit, niter, ul)) + if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated)) return true; - if (create_iv) + if (create_iv + && niter && !chrec_contains_undetermined (niter)) create_canonical_iv (loop, exit, niter); return false; @@ -485,15 +683,21 @@ canonicalize_induction_variables (void) loop_iterator li; struct loop *loop; bool changed = false; + bool irred_invalidated = false; FOR_EACH_LOOP (li, loop, 0) { changed |= canonicalize_loop_induction_variables (loop, true, UL_SINGLE_ITER, - true); + true, + &irred_invalidated); } gcc_assert (!need_ssa_update_p (cfun)); + if (irred_invalidated + && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)) + mark_irreducible_loops (); + /* Clean up the information about numbers of iterations, since brute force evaluation could reveal new information. */ scev_reset (); @@ -594,9 +798,10 @@ tree_unroll_loops_completely (bool may_i do { + bool irred_invalidated = false; changed = false; - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) + FOR_EACH_LOOP (li, loop, 0) { struct loop *loop_father = loop_outer (loop); @@ -609,7 +814,8 @@ tree_unroll_loops_completely (bool may_i ul = UL_NO_GROWTH; if (canonicalize_loop_induction_variables (loop, false, ul, - !flag_tree_loop_ivcanon)) + !flag_tree_loop_ivcanon, + &irred_invalidated)) { changed = true; /* If we'll continue unrolling, we need to propagate constants @@ -629,6 +835,10 @@ tree_unroll_loops_completely (bool may_i struct loop **iter; unsigned i; + if (irred_invalidated + && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)) + mark_irreducible_loops (); + update_ssa (TODO_update_ssa); /* Propagate the constants within the new basic blocks. */ Index: tree.c =================================================================== --- tree.c (revision 192483) +++ tree.c (working copy) @@ -9524,6 +9524,15 @@ build_common_builtin_nodes (void) tree tmp, ftype; int ecf_flags; + if (!builtin_decl_explicit_p (BUILT_IN_UNREACHABLE)) + { + ftype = build_function_type (void_type_node, void_list_node); + local_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_UNREACHABLE, + "__builtin_unreachable", + ECF_NOTHROW | ECF_LEAF | ECF_NORETURN + | ECF_CONST | ECF_LEAF); + } + if (!builtin_decl_explicit_p (BUILT_IN_MEMCPY) || !builtin_decl_explicit_p (BUILT_IN_MEMMOVE)) { Index: cfgloop.h =================================================================== --- cfgloop.h (revision 192483) +++ cfgloop.h (working copy) @@ -320,7 +321,8 @@ extern struct loop *loopify (edge, edge, struct loop * loop_version (struct loop *, void *, basic_block *, unsigned, unsigned, unsigned, bool); extern bool remove_path (edge); -void scale_loop_frequencies (struct loop *, int, int); +extern void unloop (struct loop *, bool *); +extern void scale_loop_frequencies (struct loop *, int, int); /* Induction variable analysis. */ Index: cfgloopmanip.c =================================================================== --- cfgloopmanip.c (revision 192483) +++ cfgloopmanip.c (working copy) @@ -37,7 +37,6 @@ static int find_path (edge, basic_block static void fix_loop_placements (struct loop *, bool *); static bool fix_bb_placement (basic_block); static void fix_bb_placements (basic_block, bool *); -static void unloop (struct loop *, bool *); /* Checks whether basic block BB is dominated by DATA. */ static bool @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed If this may cause the information about irreducible regions to become invalid, IRRED_INVALIDATED is set to true. */ -static void +void unloop (struct loop *loop, bool *irred_invalidated) { basic_block *body; Index: testsuite/gcc.target/i386/l_fma_float_5.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_5.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_5.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ Index: testsuite/gcc.target/i386/l_fma_double_4.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_4.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_4.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ Index: testsuite/gcc.target/i386/l_fma_float_2.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_2.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_2.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ Index: testsuite/gcc.target/i386/l_fma_float_6.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_6.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_6.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ Index: testsuite/gcc.target/i386/l_fma_double_1.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_1.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_1.c (working copy) @@ -16,11 +16,11 @@ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */ Index: testsuite/gcc.target/i386/l_fma_double_5.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_5.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_5.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ Index: testsuite/gcc.target/i386/l_fma_float_3.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_3.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_3.c (working copy) @@ -16,11 +16,11 @@ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */ Index: testsuite/gcc.target/i386/l_fma_double_2.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_2.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_2.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ Index: testsuite/gcc.target/i386/l_fma_double_6.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_6.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_6.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ Index: testsuite/gcc.target/i386/l_fma_float_4.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_4.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_4.c (working copy) @@ -12,7 +12,7 @@ /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ Index: testsuite/gcc.target/i386/l_fma_double_3.c =================================================================== --- testsuite/gcc.target/i386/l_fma_double_3.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_double_3.c (working copy) @@ -16,11 +16,11 @@ /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */ +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */ Index: testsuite/gcc.target/i386/l_fma_float_1.c =================================================================== --- testsuite/gcc.target/i386/l_fma_float_1.c (revision 192483) +++ testsuite/gcc.target/i386/l_fma_float_1.c (working copy) @@ -16,11 +16,11 @@ /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */ -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */ +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */ Index: testsuite/gfortran.dg/do_1.f90 =================================================================== --- testsuite/gfortran.dg/do_1.f90 (revision 192483) +++ testsuite/gfortran.dg/do_1.f90 (working copy) @@ -1,4 +1,5 @@ -! { dg-do run } +! { dg-do run { xfail *-*-* } } +! XFAIL is tracked in PR 54932 ! Program to check corner cases for DO statements. program do_1 implicit none Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ +int a[2]; +test(int c) +{ + int i; + for (i=0;i