From patchwork Wed Nov 22 10:46:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eric Botcazou X-Patchwork-Id: 840378 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-467664-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="U8vzEuls"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yhfKc2stXz9s7F for ; Wed, 22 Nov 2017 21:47:14 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=EQjKpgh3gYBgPhF8 Og45rVFdCNSW7xuW/rSlXC8bgMjNVCibhCmvi+C+RB41yE/FCVQ8DdHLJfwK8DcK b7r1guzShuOj+duDKYy/nTapdC13hVeY278otHWC3A8KbDYLa9q6na56OEeqj6U0 dunFVgYvBVLgRM8UVXeC+/T6/iQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type :content-transfer-encoding; s=default; bh=77DeGH82/5c9Yihgq+wYtg OUJ70=; b=U8vzEulsTqANABRcfR1eNRjvzQQ1wqxBYPJVAPJr/CjCwux9KfQ+ND jU4J6J7MrB5XBS4v4oXw4LoqBE4sHvNdUiAX2x74hzHBR5530wp40mMgV2OJmWWX a/rJKoYySwPDzcz3m95G+Hen5rs41u1eZXN54/FUbEIbN8Ofg4n2o= Received: (qmail 120391 invoked by alias); 22 Nov 2017 10:46:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 115795 invoked by uid 89); 22 Nov 2017 10:46:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-14.1 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=commence X-Spam-User: qpsmtpd, 2 recipients X-HELO: smtp.eu.adacore.com Received: from mel.act-europe.fr (HELO smtp.eu.adacore.com) (194.98.77.210) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 Nov 2017 10:46:35 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-smtp.eu.adacore.com (Postfix) with ESMTP id A5C2F82278; Wed, 22 Nov 2017 11:46:31 +0100 (CET) Received: from smtp.eu.adacore.com ([127.0.0.1]) by localhost (smtp.eu.adacore.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id F2JR0JUHWSzm; Wed, 22 Nov 2017 11:46:31 +0100 (CET) Received: from polaris.localnet (bon31-6-88-161-99-133.fbx.proxad.net [88.161.99.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.eu.adacore.com (Postfix) with ESMTPSA id 9302A81C40; Wed, 22 Nov 2017 11:46:30 +0100 (CET) From: Eric Botcazou To: gcc-patches@gcc.gnu.org Cc: gfortran Subject: [patch] Add support for #pragma GCC unroll v2 Date: Wed, 22 Nov 2017 11:46:29 +0100 Message-ID: <2195708.5NdSF6Gi6B@polaris> User-Agent: KMail/4.14.10 (Linux/3.16.7-53-desktop; KDE/4.14.9; x86_64; ; ) MIME-Version: 1.0 Hi, this is a revised version of: https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01452.html with the following changes: 1. integration of Bernhard's patch for the Fortran front-end, 2. Sandra's fix for the documentation, 3. minor tweaks to the C and C++ front-end, 4. change at the GIMPLE level for the cunrolli pass, 5. change at the RTL level with a fix for a thinko, 6. More testcases for all languages. This makes it so that the presence of a pragma GCC unroll doesn't have a global effect on the function: at the GIMPLE level, the cunrolli pass is no longer forced, so that the unrolling is done by the first activated pass (cunroll at -O1, cunrolli at -O2 and above); at the RTL level, this was already the case but the code no longer fiddles with the unrolling flag. Tested on x86_64-suse-linux, OK for the mainline? 2017-11-22 Mike Stump Eric Botcazou Bernhard Reutner-Fischer ChangeLog/ * doc/extend.texi (Loop-Specific Pragmas): Document pragma GCC unroll. * doc/generic.texi (ANNOTATE_EXPR): Document 3rd operand. * cfgloop.h (struct loop): Add unroll field. * function.h (struct function): Add has_unroll bitfield. * gimplify.c (gimple_boolify) : Deal with unroll kind. (gimplify_expr) : Propagate 3rd operand. * loop-init.c (pass_loop2::gate): Return true if cfun->has_unroll. (pass_rtl_unroll_loops::gate): Likewise. * loop-unroll.c (decide_unrolling): Tweak note message. Skip loops for which loop->unroll==1. (decide_unroll_constant_iterations): Use note for consistency and take loop->unroll into account. Return early if loop->unroll is set. Fix thinko in existing test. (decide_unroll_runtime_iterations): Use note for consistency and take loop->unroll into account. (decide_unroll_stupid): Likewise. * lto-streamer-in.c (input_cfg): Read loop->unroll. * lto-streamer-out.c (output_cfg): Write loop->unroll. * tree-cfg.c (replace_loop_annotate_in_block) New. (replace_loop_annotate) : Likewise. (print_loop): Print loop->unroll if set. * tree-core.h (enum annot_expr_kind): Add annot_expr_unroll_kind. * tree-inline.c (copy_loops): Copy unroll and set cfun->has_unroll. * tree-pretty-print.c (dump_generic_node) : New. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Bail out if loop->unroll is set and smaller than the trip count. Otherwise bypass entirely the heuristics if loop->unroll is set. Remove dead note. Fix off-by-one bug in other node. (try_peel_loop): Bail out if loop->unroll is set. Fix formatting. (tree_unroll_loops_completely_1): Force unrolling if loop->unroll is greater than 1. (tree_unroll_loops_completely): Make static. (pass_complete_unroll::execute): Use correct type for variable. (pass_complete_unrolli::execute): Fix formatting. * tree.def (ANNOTATE_EXPR): Add 3rd operand. ada/ChangeLog: * gcc-interface/trans.c (gnat_gimplify_stmt) : Add 3rd operand to ANNOTATE_EXPR and pass unrolling hints. c-family/ChangeLog: * c-pragma.c (init_pragma): Register pragma GCC unroll. * c-pragma.h (enum pragma_kind): Add PRAGMA_UNROLL. c/ChangeLog: * c-parser.c (c_parser_while_statement): Add unroll parameter and build ANNOTATE_EXPR if present. Add 3rd operand to ANNOTATE_EXPR. (c_parser_do_statement): Likewise. (c_parser_for_statement): Likewise. (c_parser_statement_after_labels): Adjust calls to above. (c_parse_pragma_ivdep): New static function. (c_parser_pragma_unroll): Likewise. (c_parser_pragma) : Add support for pragma Unroll. : New case. cp/ChangeLog: * constexpr.c (cxx_eval_constant_expression) : Remove assertion on 2nd operand. (potential_constant_expression_1): Likewise. * cp-array-notation.c (create_an_loop): Adjut call to finish_for_cond. * cp-tree.h (cp_convert_range_for): Adjust prototype. (finish_while_stmt_cond): Likewise. (finish_do_stmt): Likewise. (finish_for_cond): Likewise. * init.c (build_vec_init): Adjut call to finish_for_cond. * parser.c (cp_parser_statement): Adjust call to cp_parser_iteration_statement. (cp_parser_for): Add unroll parameter and pass it in calls to cp_parser_range_for and cp_parser_c_for. (cp_parser_c_for): Add unroll parameter and pass it in call to finish_for_cond. (cp_parser_range_for): Add unroll parameter and pass it in call to cp_convert_range_for. (cp_convert_range_for): Add unroll parameter and pass it in call to finish_for_cond. (cp_parser_iteration_statement): Add unroll parameter and pass it in calls to finish_while_stmt_cond, finish_do_stmt and cp_parser_for. (cp_parser_pragma_ivdep): New static function. (cp_parser_pragma_unroll): Likewise. (cp_parser_pragma) : Add support for pragma Unroll. : New case. * pt.c (tsubst_expr): Adjut calls to finish_for_cond, cp_convert_range_for, finish_while_stmt_cond and finish_do_stmt. : Propagate 3rd operand. * semantics.c (finish_while_stmt_cond): Add unroll parameter and build ANNOTATE_EXPR if present. Add 3rd operand to ANNOTATE_EXPR. (finish_do_stmt): Likewise. (finish_for_cond): Likewise. fortran/ChangeLog: * array.c (gfc_copy_iterator): Copy unroll field. * decl.c (directive_unroll): New global variable. (gfc_match_gcc_unroll): New function. * gfortran.h (gfc_iterator]): Add unroll field. (directive_unroll): Declare: * match.c (gfc_match_do): Use memset to initialize the iterator. * match.h (gfc_match_gcc_unroll): New prototype. * parse.c (decode_gcc_attribute): Match "unroll". (parse_do_block): Set iterator's unroll. (parse_executable): Diagnose misplaced unroll directive. * trans-stmt.c (gfc_trans_simple_do) Annotate loop condition with annot_expr_unroll_kind. (gfc_trans_do): Likewise. (gfc_trans_forall_loop): Add 3rd operand to ANNOTATE_EXPR. testsuite/ChangeLog: * c-c++-common/unroll-1.c: New test. * c-c++-common/unroll-2.c: Likewise. * c-c++-common/unroll-3.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * c-c++-common/unroll-5.c: Likewise. * testsuite/gcc.dg/pr64277.c: Adjust scan. * gcc.dg/tree-prof/unroll-1.c: Use detailed dump and adjust scan. * gcc.dg/tree-ssa/cunroll-1.c: Adjust scan. * gcc.dg/tree-ssa/cunroll-12.c: Likewise. * gcc.dg/tree-ssa/cunroll-13.c: Likewise. * gcc.dg/tree-ssa/cunroll-14.c: Likewise. * gcc.dg/tree-ssa/cunroll-2.c: Likewise. * gcc.dg/tree-ssa/cunroll-3.c: Likewise. * gcc.dg/tree-ssa/cunroll-5.c: Likewise. * gcc.dg/tree-ssa/loop-1.c: Likewise. * gcc.dg/tree-ssa/loop-23.c: Likewise. * gcc.dg/tree-ssa/pr61743-1.c: Likewise. * gcc.dg/tree-ssa/pr61743-2.c: Likewise. * gcc.dg/unroll-2.c (foo): Adjust message. (foo2): Likewise. * gcc.dg/unroll-3.c: Adjust scan. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/unroll-7.c: Use detailed dump and adjust scan. * gfortran.dg/directive_unroll_1.f90: New test. * gfortran.dg/directive_unroll_2.f90: Likewise. * gfortran.dg/directive_unroll_3.f90: Lkewise. * gfortran.dg/directive_unroll_4.f90: Likewise. * gfortran.dg/directive_unroll_5.f90: Likewise. * gnat.dg/unroll1.ad[sb]: New test. * gnat.dg/unroll2.ad[sb]: Likewise. * gnat.dg/unroll3.ad[sb]: Likewise. ada/gcc-interface/trans.c | 25 +- c-family/c-pragma.c | 4 c-family/c-pragma.h | 1 c/c-parser.c | 160 ++++++++++++--- cfgloop.h | 5 cp/constexpr.c | 2 cp/cp-array-notation.c | 2 cp/cp-tree.h | 9 cp/init.c | 2 cp/parser.c | 129 ++++++++++-- cp/pt.c | 16 - cp/semantics.c | 42 +++- doc/extend.texi | 18 + doc/generic.texi | 2 fortran/array.c | 1 fortran/decl.c | 38 +++ fortran/gfortran.h | 2 fortran/match.c | 2 fortran/match.h | 1 fortran/parse.c | 13 + fortran/trans-stmt.c | 15 + function.h | 5 gimplify.c | 4 loop-init.c | 6 loop-unroll.c | 107 ++++++---- lto-streamer-in.c | 1 lto-streamer-out.c | 1 testsuite/c-c++-common/unroll-1.c | 41 +++ testsuite/c-c++-common/unroll-2.c | 41 +++ testsuite/c-c++-common/unroll-3.c | 41 +++ testsuite/c-c++-common/unroll-4.c | 20 + testsuite/c-c++-common/unroll-5.c | 29 ++ testsuite/gcc.dg/pr64277.c | 2 testsuite/gcc.dg/tree-prof/unroll-1.c | 4 testsuite/gcc.dg/tree-ssa/cunroll-1.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-12.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-13.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-14.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-2.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-3.c | 2 testsuite/gcc.dg/tree-ssa/cunroll-5.c | 2 testsuite/gcc.dg/tree-ssa/loop-1.c | 2 testsuite/gcc.dg/tree-ssa/loop-23.c | 3 testsuite/gcc.dg/tree-ssa/pr61743-1.c | 4 testsuite/gcc.dg/tree-ssa/pr61743-2.c | 4 testsuite/gcc.dg/unroll-2.c | 4 testsuite/gcc.dg/unroll-3.c | 2 testsuite/gcc.dg/unroll-4.c | 2 testsuite/gcc.dg/unroll-5.c | 2 testsuite/gcc.dg/unroll-7.c | 4 testsuite/gfortran.dg/directive_unroll_1.f90 | 52 +++++ testsuite/gfortran.dg/directive_unroll_2.f90 | 52 +++++ testsuite/gfortran.dg/directive_unroll_3.f90 | 52 +++++ testsuite/gfortran.dg/directive_unroll_4.f90 | 29 ++ testsuite/gfortran.dg/directive_unroll_5.f90 | 38 +++ testsuite/gnat.dg/unroll1.adb | 27 ++ testsuite/gnat.dg/unroll1.ads | 9 testsuite/gnat.dg/unroll2.adb | 26 ++ testsuite/gnat.dg/unroll2.ads | 9 testsuite/gnat.dg/unroll3.adb | 26 ++ testsuite/gnat.dg/unroll3.ads | 9 tree-cfg.c | 8 tree-core.h | 1 tree-inline.c | 5 tree-pretty-print.c | 4 tree-ssa-loop-ivcanon.c | 278 +++++++++++++------------ tree.def | 5 67 files changed, 1165 insertions(+), 297 deletions(-) Index: ada/gcc-interface/trans.c =================================================================== --- ada/gcc-interface/trans.c (revision 255000) +++ ada/gcc-interface/trans.c (working copy) @@ -8506,17 +8506,30 @@ gnat_gimplify_stmt (tree *stmt_p) { /* Deal with the optimization hints. */ if (LOOP_STMT_IVDEP (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (LOOP_STMT_NO_UNROLL (stmt)) + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + integer_one_node); + if (LOOP_STMT_UNROLL (stmt)) + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (NULL_TREE, USHRT_MAX)); if (LOOP_STMT_NO_VECTOR (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_no_vector_kind)); + annot_expr_no_vector_kind), + integer_zero_node); if (LOOP_STMT_VECTOR (stmt)) - gnu_cond = build2 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, + gnu_cond = build3 (ANNOTATE_EXPR, TREE_TYPE (gnu_cond), gnu_cond, build_int_cst (integer_type_node, - annot_expr_vector_kind)); + annot_expr_vector_kind), + integer_zero_node); gnu_cond = build3 (COND_EXPR, void_type_node, gnu_cond, NULL_TREE, Index: c/c-parser.c =================================================================== --- c/c-parser.c (revision 255000) +++ c/c-parser.c (working copy) @@ -1410,9 +1410,9 @@ static tree c_parser_c99_block_statement location_t * = NULL); static void c_parser_if_statement (c_parser *, bool *, vec *); static void c_parser_switch_statement (c_parser *, bool *); -static void c_parser_while_statement (c_parser *, bool, bool *); -static void c_parser_do_statement (c_parser *, bool); -static void c_parser_for_statement (c_parser *, bool, bool *); +static void c_parser_while_statement (c_parser *, bool, unsigned short, bool *); +static void c_parser_do_statement (c_parser *, bool, unsigned short); +static void c_parser_for_statement (c_parser *, bool, unsigned short, bool *); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); static tree c_parser_asm_goto_operands (c_parser *); @@ -5499,13 +5499,13 @@ c_parser_statement_after_labels (c_parse c_parser_switch_statement (parser, if_p); break; case RID_WHILE: - c_parser_while_statement (parser, false, if_p); + c_parser_while_statement (parser, false, 0, if_p); break; case RID_DO: - c_parser_do_statement (parser, false); + c_parser_do_statement (parser, 0, false); break; case RID_FOR: - c_parser_for_statement (parser, false, if_p); + c_parser_for_statement (parser, false, 0, if_p); break; case RID_CILK_FOR: if (!flag_cilkplus) @@ -6039,7 +6039,8 @@ c_parser_switch_statement (c_parser *par implement -Wparentheses. */ static void -c_parser_while_statement (c_parser *parser, bool ivdep, bool *if_p) +c_parser_while_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool *if_p) { tree block, cond, body, save_break, save_cont; location_t loc; @@ -6055,9 +6056,15 @@ c_parser_while_statement (c_parser *pars "%<_Cilk_spawn%> statement cannot be used as a condition for while statement")) cond = error_mark_node; if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); save_break = c_break_label; c_break_label = NULL_TREE; save_cont = c_cont_label; @@ -6092,7 +6099,7 @@ c_parser_while_statement (c_parser *pars */ static void -c_parser_do_statement (c_parser *parser, bool ivdep) +c_parser_do_statement (c_parser *parser, bool ivdep, unsigned short unroll) { tree block, cond, body, save_break, save_cont, new_break, new_cont; location_t loc; @@ -6120,9 +6127,16 @@ c_parser_do_statement (c_parser *parser, "%<_Cilk_spawn%> statement cannot be used as a condition for a do-while statement")) cond = error_mark_node; if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_unroll_kind), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + unroll)); if (!c_parser_require (parser, CPP_SEMICOLON, "expected %<;%>")) c_parser_skip_to_end_of_block_or_statement (parser); c_finish_loop (loc, cond, NULL, body, new_break, new_cont, false); @@ -6189,7 +6203,8 @@ c_parser_do_statement (c_parser *parser, implement -Wparentheses. */ static void -c_parser_for_statement (c_parser *parser, bool ivdep, bool *if_p) +c_parser_for_statement (c_parser *parser, bool ivdep, unsigned short unroll, + bool *if_p) { tree block, cond, incr, save_break, save_cont, body; /* The following are only used when parsing an ObjC foreach statement. */ @@ -6310,6 +6325,12 @@ c_parser_for_statement (c_parser *parser "% pragma"); cond = error_mark_node; } + else if (unroll) + { + c_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + cond = error_mark_node; + } else { c_parser_consume_token (parser); @@ -6327,9 +6348,15 @@ c_parser_for_statement (c_parser *parser "expected %<;%>"); } if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); } /* Parse the increment expression (the third expression in a for-statement). In the case of a foreach-statement, this is @@ -11039,6 +11066,49 @@ c_parser_objc_at_dynamic_declaration (c_ } +/* Parse a pragma GCC ivdep. */ + +static bool +c_parse_pragma_ivdep (c_parser *parser) +{ + c_parser_consume_pragma (parser); + c_parser_skip_to_pragma_eol (parser); + return true; +} + +/* Parse a pragma GCC unroll. */ + +static unsigned short +c_parser_pragma_unroll (c_parser *parser) +{ + unsigned short unroll; + c_parser_consume_pragma (parser); + location_t location = c_parser_peek_token (parser)->location; + tree expr = c_parser_expr_no_commas (parser, NULL).value; + mark_exp_read (expr); + expr = c_fully_fold (expr, false, NULL); + HOST_WIDE_INT lunroll = 0; + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + || TREE_CODE (expr) != INTEGER_CST + || (lunroll = tree_to_shwi (expr)) < 0 + || lunroll > USHRT_MAX) + { + error_at (location, "%<#pragma GCC unroll%> requires an" + " assignment-expression that evaluates to a non-negative" + " integral constant less than or equal to %u", USHRT_MAX); + unroll = 0; + } + else + { + unroll = (unsigned short)lunroll; + if (unroll == 0) + unroll = 1; + } + + c_parser_skip_to_pragma_eol (parser); + return unroll; +} + /* Handle pragmas. Some OpenMP pragmas are associated with, and therefore should be considered, statements. ALLOW_STMT is true if we're within the context of a function and such pragmas are to be allowed. Returns @@ -11181,21 +11251,51 @@ c_parser_pragma (c_parser *parser, enum return c_parser_omp_ordered (parser, context, if_p); case PRAGMA_IVDEP: - c_parser_consume_pragma (parser); - c_parser_skip_to_pragma_eol (parser); - if (!c_parser_next_token_is_keyword (parser, RID_FOR) - && !c_parser_next_token_is_keyword (parser, RID_WHILE) - && !c_parser_next_token_is_keyword (parser, RID_DO)) - { - c_parser_error (parser, "for, while or do statement expected"); - return false; - } - if (c_parser_next_token_is_keyword (parser, RID_FOR)) - c_parser_for_statement (parser, true, if_p); - else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) - c_parser_while_statement (parser, true, if_p); - else - c_parser_do_statement (parser, true); + { + const bool ivdep = c_parse_pragma_ivdep (parser); + unsigned short unroll; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_UNROLL) + unroll = c_parser_pragma_unroll (parser); + else + unroll = 0; + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, if_p); + else + c_parser_do_statement (parser, ivdep, unroll); + } + return false; + + case PRAGMA_UNROLL: + { + unsigned short unroll = c_parser_pragma_unroll (parser); + bool ivdep; + if (c_parser_peek_token (parser)->pragma_kind == PRAGMA_IVDEP) + ivdep = c_parse_pragma_ivdep (parser); + else + ivdep = false; + if (!c_parser_next_token_is_keyword (parser, RID_FOR) + && !c_parser_next_token_is_keyword (parser, RID_WHILE) + && !c_parser_next_token_is_keyword (parser, RID_DO)) + { + c_parser_error (parser, "for, while or do statement expected"); + return false; + } + if (c_parser_next_token_is_keyword (parser, RID_FOR)) + c_parser_for_statement (parser, ivdep, unroll, if_p); + else if (c_parser_next_token_is_keyword (parser, RID_WHILE)) + c_parser_while_statement (parser, ivdep, unroll, if_p); + else + c_parser_do_statement (parser, ivdep, unroll); + } return false; case PRAGMA_GCC_PCH_PREPROCESS: Index: c-family/c-pragma.c =================================================================== --- c-family/c-pragma.c (revision 255000) +++ c-family/c-pragma.c (working copy) @@ -1544,6 +1544,10 @@ init_pragma (void) cpp_register_deferred_pragma (parse_in, "GCC", "ivdep", PRAGMA_IVDEP, false, false); + if (!flag_preprocess_only) + cpp_register_deferred_pragma (parse_in, "GCC", "unroll", PRAGMA_UNROLL, + false, false); + if (flag_cilkplus) cpp_register_deferred_pragma (parse_in, "cilk", "grainsize", PRAGMA_CILK_GRAINSIZE, true, false); Index: c-family/c-pragma.h =================================================================== --- c-family/c-pragma.h (revision 255000) +++ c-family/c-pragma.h (working copy) @@ -75,6 +75,7 @@ enum pragma_kind { PRAGMA_GCC_PCH_PREPROCESS, PRAGMA_IVDEP, + PRAGMA_UNROLL, PRAGMA_FIRST_EXTERNAL }; Index: cfgloop.h =================================================================== --- cfgloop.h (revision 255000) +++ cfgloop.h (working copy) @@ -221,6 +221,11 @@ struct GTY ((chain_next ("%h.next"))) lo /* True if the loop is part of an oacc kernels region. */ unsigned in_oacc_kernels_region : 1; + /* The number of times to unroll the loop. 0, means no information + given, just do what we always do. A value of 1, means don't unroll + the loop. */ + unsigned short unroll; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ Index: cp/constexpr.c =================================================================== --- cp/constexpr.c (revision 255000) +++ cp/constexpr.c (working copy) @@ -4672,7 +4672,6 @@ cxx_eval_constant_expression (const cons return t; case ANNOTATE_EXPR: - gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind); r = cxx_eval_constant_expression (ctx, TREE_OPERAND (t, 0), lval, non_constant_p, overflow_p, @@ -5920,7 +5919,6 @@ potential_constant_expression_1 (tree t, } case ANNOTATE_EXPR: - gcc_assert (tree_to_uhwi (TREE_OPERAND (t, 1)) == annot_expr_ivdep_kind); return RECUR (TREE_OPERAND (t, 0), rval); default: Index: cp/cp-array-notation.c =================================================================== --- cp/cp-array-notation.c (revision 255000) +++ cp/cp-array-notation.c (working copy) @@ -67,7 +67,7 @@ create_an_loop (tree init, tree cond, tr finish_expr_stmt (init); for_stmt = begin_for_stmt (NULL_TREE, NULL_TREE); finish_init_stmt (for_stmt); - finish_for_cond (cond, for_stmt, false); + finish_for_cond (cond, for_stmt, false, 0); finish_for_expr (incr, for_stmt); finish_expr_stmt (body); finish_for_stmt (for_stmt); Index: cp/cp-tree.h =================================================================== --- cp/cp-tree.h (revision 255000) +++ cp/cp-tree.h (working copy) @@ -6409,7 +6409,8 @@ extern tree implicitly_declare_fn extern bool maybe_clone_body (tree); /* In parser.c */ -extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool); +extern tree cp_convert_range_for (tree, tree, tree, tree, unsigned int, bool, + unsigned short); extern bool parsing_nsdmi (void); extern bool parsing_default_capturing_generic_lambda_in_template (void); extern void inject_this_parameter (tree, cp_cv_quals); @@ -6694,16 +6695,16 @@ extern void begin_else_clause (tree); extern void finish_else_clause (tree); extern void finish_if_stmt (tree); extern tree begin_while_stmt (void); -extern void finish_while_stmt_cond (tree, tree, bool); +extern void finish_while_stmt_cond (tree, tree, bool, unsigned short); extern void finish_while_stmt (tree); extern tree begin_do_stmt (void); extern void finish_do_body (tree); -extern void finish_do_stmt (tree, tree, bool); +extern void finish_do_stmt (tree, tree, bool, unsigned short); extern tree finish_return_stmt (tree); extern tree begin_for_scope (tree *); extern tree begin_for_stmt (tree, tree); extern void finish_init_stmt (tree); -extern void finish_for_cond (tree, tree, bool); +extern void finish_for_cond (tree, tree, bool, unsigned short); extern void finish_for_expr (tree, tree); extern void finish_for_stmt (tree); extern tree begin_range_for_stmt (tree, tree); Index: cp/init.c =================================================================== --- cp/init.c (revision 255000) +++ cp/init.c (working copy) @@ -4319,7 +4319,7 @@ build_vec_init (tree base, tree maxindex finish_init_stmt (for_stmt); finish_for_cond (build2 (GT_EXPR, boolean_type_node, iterator, build_int_cst (TREE_TYPE (iterator), -1)), - for_stmt, false); + for_stmt, false, 0); elt_init = cp_build_unary_op (PREDECREMENT_EXPR, iterator, false, complain); if (elt_init == error_mark_node) Index: cp/parser.c =================================================================== --- cp/parser.c (revision 255000) +++ cp/parser.c (working copy) @@ -2121,15 +2121,15 @@ static tree cp_parser_selection_statemen static tree cp_parser_condition (cp_parser *); static tree cp_parser_iteration_statement - (cp_parser *, bool *, bool); + (cp_parser *, bool *, bool, unsigned short); static bool cp_parser_init_statement (cp_parser *, tree *decl); static tree cp_parser_for - (cp_parser *, bool); + (cp_parser *, bool, unsigned short); static tree cp_parser_c_for - (cp_parser *, tree, tree, bool); + (cp_parser *, tree, tree, bool, unsigned short); static tree cp_parser_range_for - (cp_parser *, tree, tree, tree, bool); + (cp_parser *, tree, tree, tree, bool, unsigned short); static void do_range_for_auto_deduction (tree, tree); static tree cp_parser_perform_range_for_lookup @@ -10878,7 +10878,7 @@ cp_parser_statement (cp_parser* parser, case RID_WHILE: case RID_DO: case RID_FOR: - statement = cp_parser_iteration_statement (parser, if_p, false); + statement = cp_parser_iteration_statement (parser, if_p, false, 0); break; case RID_CILK_FOR: @@ -11745,7 +11745,7 @@ cp_parser_condition (cp_parser* parser) not included. */ static tree -cp_parser_for (cp_parser *parser, bool ivdep) +cp_parser_for (cp_parser *parser, bool ivdep, unsigned short unroll) { tree init, scope, decl; bool is_range_for; @@ -11757,13 +11757,14 @@ cp_parser_for (cp_parser *parser, bool i is_range_for = cp_parser_init_statement (parser, &decl); if (is_range_for) - return cp_parser_range_for (parser, scope, init, decl, ivdep); + return cp_parser_range_for (parser, scope, init, decl, ivdep, unroll); else - return cp_parser_c_for (parser, scope, init, ivdep); + return cp_parser_c_for (parser, scope, init, ivdep, unroll); } static tree -cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep) +cp_parser_c_for (cp_parser *parser, tree scope, tree init, bool ivdep, + unsigned short unroll) { /* Normal for loop */ tree condition = NULL_TREE; @@ -11784,7 +11785,13 @@ cp_parser_c_for (cp_parser *parser, tree "% pragma"); condition = error_mark_node; } - finish_for_cond (condition, stmt, ivdep); + else if (unroll) + { + cp_parser_error (parser, "missing loop condition in loop with " + "% pragma"); + condition = error_mark_node; + } + finish_for_cond (condition, stmt, ivdep, unroll); /* Look for the `;'. */ cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); @@ -11808,7 +11815,7 @@ cp_parser_c_for (cp_parser *parser, tree static tree cp_parser_range_for (cp_parser *parser, tree scope, tree init, tree range_decl, - bool ivdep) + bool ivdep, unsigned short unroll) { tree stmt, range_expr; auto_vec bindings; @@ -11877,6 +11884,8 @@ cp_parser_range_for (cp_parser *parser, stmt = begin_range_for_stmt (scope, init); if (ivdep) RANGE_FOR_IVDEP (stmt) = 1; + if (unroll) + /* TODO */(void)0; finish_range_for_decl (stmt, range_decl, range_expr); if (!type_dependent_expression_p (range_expr) /* do_auto_deduction doesn't mess with template init-lists. */ @@ -11887,7 +11896,8 @@ cp_parser_range_for (cp_parser *parser, { stmt = begin_for_stmt (scope, init); stmt = cp_convert_range_for (stmt, range_decl, range_expr, - decomp_first_name, decomp_cnt, ivdep); + decomp_first_name, decomp_cnt, ivdep, + unroll); } return stmt; } @@ -11981,7 +11991,7 @@ do_range_for_auto_deduction (tree decl, tree cp_convert_range_for (tree statement, tree range_decl, tree range_expr, tree decomp_first_name, unsigned int decomp_cnt, - bool ivdep) + bool ivdep, unsigned short unroll) { tree begin, end; tree iter_type, begin_expr, end_expr; @@ -12042,7 +12052,7 @@ cp_convert_range_for (tree statement, tr begin, ERROR_MARK, end, ERROR_MARK, NULL, tf_warning_or_error); - finish_for_cond (condition, statement, ivdep); + finish_for_cond (condition, statement, ivdep, unroll); /* The new increment expression. */ expression = finish_unary_op_expr (input_location, @@ -12217,7 +12227,8 @@ cp_parser_range_for_member_function (tre Returns the new WHILE_STMT, DO_STMT, FOR_STMT or RANGE_FOR_STMT. */ static tree -cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep) +cp_parser_iteration_statement (cp_parser* parser, bool *if_p, bool ivdep, + unsigned short unroll) { cp_token *token; enum rid keyword; @@ -12251,7 +12262,7 @@ cp_parser_iteration_statement (cp_parser parens.require_open (parser); /* Parse the condition. */ condition = cp_parser_condition (parser); - finish_while_stmt_cond (condition, statement, ivdep); + finish_while_stmt_cond (condition, statement, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); /* Parse the dependent statement. */ @@ -12282,7 +12293,7 @@ cp_parser_iteration_statement (cp_parser /* Parse the expression. */ expression = cp_parser_expression (parser); /* We're done with the do-statement. */ - finish_do_stmt (expression, statement, ivdep); + finish_do_stmt (expression, statement, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); /* Look for the `;'. */ @@ -12296,7 +12307,7 @@ cp_parser_iteration_statement (cp_parser matching_parens parens; parens.require_open (parser); - statement = cp_parser_for (parser, ivdep); + statement = cp_parser_for (parser, ivdep, unroll); /* Look for the `)'. */ parens.require_close (parser); @@ -38748,6 +38759,45 @@ cp_parser_cilk_grainsize (cp_parser *par cp_parser_skip_to_pragma_eol (parser, pragma_tok); } +/* Parse a pragma GCC ivdep. */ + +static bool +cp_parser_pragma_ivdep (cp_parser *parser, cp_token *pragma_tok) +{ + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + return true; +} + +/* Parse a pragma GCC unroll. */ + +static unsigned short +cp_parser_pragma_unroll (cp_parser *parser, cp_token *pragma_tok) +{ + location_t location = cp_lexer_peek_token (parser->lexer)->location; + tree expr = cp_parser_constant_expression (parser); + unsigned short unroll; + expr = maybe_constant_value (expr); + cp_parser_skip_to_pragma_eol (parser, pragma_tok); + HOST_WIDE_INT lunroll = 0; + if (!INTEGRAL_TYPE_P (TREE_TYPE (expr)) + || TREE_CODE (expr) != INTEGER_CST + || (lunroll = tree_to_shwi (expr)) < 0 + || lunroll > USHRT_MAX) + { + error_at (location, "%<#pragma GCC unroll%> requires an" + " assignment-expression that evaluates to a non-negative" + " integral constant less than or equal to %u", USHRT_MAX); + unroll = 0; + } + else + { + unroll = (unsigned short)lunroll; + if (unroll == 0) + unroll = 1; + } + return unroll; +} + /* Normal parsing of a pragma token. Here we can (and must) use the regular lexer. */ @@ -38984,15 +39034,42 @@ cp_parser_pragma (cp_parser *parser, enu case PRAGMA_IVDEP: { - if (context == pragma_external) + const bool ivdep = cp_parser_pragma_ivdep (parser, pragma_tok); + unsigned short unroll; + cp_token *tok = cp_lexer_peek_token (the_parser->lexer); + if (tok->type == CPP_PRAGMA + && cp_parser_pragma_kind (tok) == PRAGMA_UNROLL) { - error_at (pragma_tok->location, - "%<#pragma GCC ivdep%> must be inside a function"); - break; + unroll = cp_parser_pragma_unroll (parser, pragma_tok); + tok = cp_lexer_peek_token (the_parser->lexer); } - cp_parser_skip_to_pragma_eol (parser, pragma_tok); - cp_token *tok; - tok = cp_lexer_peek_token (the_parser->lexer); + else + unroll = 0; + if (tok->type != CPP_KEYWORD + || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE + && tok->keyword != RID_DO)) + { + cp_parser_error (parser, "for, while or do statement expected"); + return false; + } + cp_parser_iteration_statement (parser, if_p, ivdep, unroll); + return true; + } + + case PRAGMA_UNROLL: + { + const unsigned short unroll + = cp_parser_pragma_unroll (parser, pragma_tok); + bool ivdep; + cp_token *tok = cp_lexer_peek_token (the_parser->lexer); + if (tok->type == CPP_PRAGMA + && cp_parser_pragma_kind (tok) == PRAGMA_IVDEP) + { + ivdep = cp_parser_pragma_ivdep (parser, tok); + tok = cp_lexer_peek_token (the_parser->lexer); + } + else + ivdep = false; if (tok->type != CPP_KEYWORD || (tok->keyword != RID_FOR && tok->keyword != RID_WHILE && tok->keyword != RID_DO)) @@ -39000,7 +39077,7 @@ cp_parser_pragma (cp_parser *parser, enu cp_parser_error (parser, "for, while or do statement expected"); return false; } - cp_parser_iteration_statement (parser, if_p, true); + cp_parser_iteration_statement (parser, if_p, ivdep, unroll); return true; } Index: cp/pt.c =================================================================== --- cp/pt.c (revision 255000) +++ cp/pt.c (working copy) @@ -16119,7 +16119,7 @@ tsubst_expr (tree t, tree args, tsubst_f RECUR (FOR_INIT_STMT (t)); finish_init_stmt (stmt); tmp = RECUR (FOR_COND (t)); - finish_for_cond (tmp, stmt, false); + finish_for_cond (tmp, stmt, false, 0); tmp = RECUR (FOR_EXPR (t)); finish_for_expr (tmp, stmt); RECUR (FOR_BODY (t)); @@ -16141,11 +16141,11 @@ tsubst_expr (tree t, tree args, tsubst_f decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args, complain, in_decl, &first, &cnt); stmt = cp_convert_range_for (stmt, decl, expr, first, cnt, - RANGE_FOR_IVDEP (t)); + RANGE_FOR_IVDEP (t), 0); } else stmt = cp_convert_range_for (stmt, decl, expr, NULL_TREE, 0, - RANGE_FOR_IVDEP (t)); + RANGE_FOR_IVDEP (t), 0); RECUR (RANGE_FOR_BODY (t)); finish_for_stmt (stmt); } @@ -16154,7 +16154,7 @@ tsubst_expr (tree t, tree args, tsubst_f case WHILE_STMT: stmt = begin_while_stmt (); tmp = RECUR (WHILE_COND (t)); - finish_while_stmt_cond (tmp, stmt, false); + finish_while_stmt_cond (tmp, stmt, false, 0); RECUR (WHILE_BODY (t)); finish_while_stmt (stmt); break; @@ -16164,7 +16164,7 @@ tsubst_expr (tree t, tree args, tsubst_f RECUR (DO_BODY (t)); finish_do_body (stmt); tmp = RECUR (DO_COND (t)); - finish_do_stmt (tmp, stmt, false); + finish_do_stmt (tmp, stmt, false, 0); break; case IF_STMT: @@ -16728,8 +16728,10 @@ tsubst_expr (tree t, tree args, tsubst_f case ANNOTATE_EXPR: tmp = RECUR (TREE_OPERAND (t, 0)); - RETURN (build2_loc (EXPR_LOCATION (t), ANNOTATE_EXPR, - TREE_TYPE (tmp), tmp, RECUR (TREE_OPERAND (t, 1)))); + RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR, + TREE_TYPE (tmp), tmp, + RECUR (TREE_OPERAND (t, 1)), + RECUR (TREE_OPERAND (t, 2)))); default: gcc_assert (!STATEMENT_CODE_P (TREE_CODE (t))); Index: cp/semantics.c =================================================================== --- cp/semantics.c (revision 255000) +++ cp/semantics.c (working copy) @@ -802,7 +802,8 @@ begin_while_stmt (void) WHILE_STMT. */ void -finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep) +finish_while_stmt_cond (tree cond, tree while_stmt, bool ivdep, + unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used as a condition for while statement", @@ -812,11 +813,20 @@ finish_while_stmt_cond (tree cond, tree finish_cond (&WHILE_COND (while_stmt), cond); begin_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - WHILE_COND (while_stmt) = build2 (ANNOTATE_EXPR, + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, TREE_TYPE (WHILE_COND (while_stmt)), WHILE_COND (while_stmt), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + WHILE_COND (while_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (WHILE_COND (while_stmt)), + WHILE_COND (while_stmt), + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, + unroll)); simplify_loop_decl_cond (&WHILE_COND (while_stmt), WHILE_BODY (while_stmt)); } @@ -861,7 +871,7 @@ finish_do_body (tree do_stmt) COND is as indicated. */ void -finish_do_stmt (tree cond, tree do_stmt, bool ivdep) +finish_do_stmt (tree cond, tree do_stmt, bool ivdep, unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used as a condition for a do-while statement", @@ -870,8 +880,13 @@ finish_do_stmt (tree cond, tree do_stmt, cond = maybe_convert_cond (cond); end_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, - build_int_cst (integer_type_node, annot_expr_ivdep_kind)); + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_unroll_kind), + build_int_cst (integer_type_node, unroll)); DO_COND (do_stmt) = cond; } @@ -980,7 +995,7 @@ finish_init_stmt (tree for_stmt) FOR_STMT. */ void -finish_for_cond (tree cond, tree for_stmt, bool ivdep) +finish_for_cond (tree cond, tree for_stmt, bool ivdep, unsigned short unroll) { if (check_no_cilk (cond, "Cilk array notation cannot be used in a condition for a for-loop", @@ -990,11 +1005,20 @@ finish_for_cond (tree cond, tree for_stm finish_cond (&FOR_COND (for_stmt), cond); begin_maybe_infinite_loop (cond); if (ivdep && cond != error_mark_node) - FOR_COND (for_stmt) = build2 (ANNOTATE_EXPR, + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, TREE_TYPE (FOR_COND (for_stmt)), FOR_COND (for_stmt), build_int_cst (integer_type_node, - annot_expr_ivdep_kind)); + annot_expr_ivdep_kind), + integer_zero_node); + if (unroll && cond != error_mark_node) + FOR_COND (for_stmt) = build3 (ANNOTATE_EXPR, + TREE_TYPE (FOR_COND (for_stmt)), + FOR_COND (for_stmt), + build_int_cst (integer_type_node, + annot_expr_unroll_kind), + build_int_cst (integer_type_node, + unroll)); simplify_loop_decl_cond (&FOR_COND (for_stmt), FOR_BODY (for_stmt)); } Index: doc/extend.texi =================================================================== --- doc/extend.texi (revision 255000) +++ doc/extend.texi (working copy) @@ -22332,9 +22332,7 @@ function. The parenthesis around the op The @code{#pragma GCC target} pragma is presently implemented for x86, ARM, AArch64, PowerPC, S/390, and Nios II targets only. -@end table -@table @code @item #pragma GCC optimize (@var{"string"}...) @cindex pragma GCC optimize @@ -22345,9 +22343,7 @@ if @code{attribute((optimize("STRING"))) function. The parenthesis around the options is optional. @xref{Function Attributes}, for more information about the @code{optimize} attribute and the attribute syntax. -@end table -@table @code @item #pragma GCC push_options @itemx #pragma GCC pop_options @cindex pragma GCC push_options @@ -22358,15 +22354,14 @@ options. It is intended for include fil to switch to using a different @samp{#pragma GCC target} or @samp{#pragma GCC optimize} and then to pop back to the previous options. -@end table -@table @code @item #pragma GCC reset_options @cindex pragma GCC reset_options This pragma clears the current @code{#pragma GCC target} and @code{#pragma GCC optimize} to use the default switches as specified on the command line. + @end table @node Loop-Specific Pragmas @@ -22375,7 +22370,6 @@ on the command line. @table @code @item #pragma GCC ivdep @cindex pragma GCC ivdep -@end table With this pragma, the programmer asserts that there are no loop-carried dependencies which would prevent consecutive iterations of @@ -22410,6 +22404,16 @@ void ignore_vec_dep (int *a, int k, int @} @end smallexample +@item #pragma GCC unroll @var{n} +@cindex pragma GCC unroll @var{n} + +You can use this pragma to control how many times a loop should be +unrolled. It must be placed immediately before a @code{for}, +@code{while} or @code{do} loop or a @samp{#pragma ivdep}, and applies +only to the loop that follows. @var{n} is an integer constant +expression; a value of 0 or 1 disables unrolling of the loop. + +@end table @node Unnamed Fields @section Unnamed Structure and Union Fields Index: doc/generic.texi =================================================================== --- doc/generic.texi (revision 255000) +++ doc/generic.texi (working copy) @@ -1686,7 +1686,7 @@ its sole argument yields the representat @item ANNOTATE_EXPR This node is used to attach markers to an expression. The first operand is the annotated expression, the second is an @code{INTEGER_CST} with -a value from @code{enum annot_expr_kind}. +a value from @code{enum annot_expr_kind}, the third is an @code{INTEGER_CST}. @end table Index: fortran/array.c =================================================================== --- fortran/array.c (revision 255000) +++ fortran/array.c (working copy) @@ -2123,6 +2123,7 @@ gfc_copy_iterator (gfc_iterator *src) dest->start = gfc_copy_expr (src->start); dest->end = gfc_copy_expr (src->end); dest->step = gfc_copy_expr (src->step); + dest->unroll = src->unroll; return dest; } Index: fortran/decl.c =================================================================== --- fortran/decl.c (revision 255000) +++ fortran/decl.c (working copy) @@ -95,6 +95,9 @@ gfc_symbol *gfc_new_block; bool gfc_matching_function; +/* Set upon parsing a !GCC$ unroll n directive for use in the next loop. */ +int directive_unroll = -1; + /* If a kind expression of a component of a parameterized derived type is parameterized, temporarily store the expression here. */ static gfc_expr *saved_kind_expr = NULL; @@ -104,7 +107,6 @@ static gfc_expr *saved_kind_expr = NULL; static gfc_actual_arglist *decl_type_param_list; static gfc_actual_arglist *type_param_spec_list; - /********************* DATA statement subroutines *********************/ static bool in_match_data = false; @@ -10943,3 +10945,37 @@ syntax: gfc_error ("Syntax error in !GCC$ ATTRIBUTES statement at %C"); return MATCH_ERROR; } + + +/* Match a !GCC$ UNROLL statement of the form: + !GCC$ UNROLL n + + The parameter n is the number of times we are supposed to unroll. + + When we come here, we have already matched the !GCC$ UNROLL string. */ +match +gfc_match_gcc_unroll (void) +{ + int value; + + if (gfc_match_small_int (&value) == MATCH_YES) + { + if (value < 0 || value > USHRT_MAX) + { + gfc_error ("% directive requires a" + " non-negative integral constant" + " less than or equal to %u at %C", + USHRT_MAX + ); + return MATCH_ERROR; + } + if (gfc_match_eos () == MATCH_YES) + { + directive_unroll = value == 0 ? 1 : value; + return MATCH_YES; + } + } + + gfc_error ("Syntax error in !GCC$ UNROLL directive at %C"); + return MATCH_ERROR; +} Index: fortran/gfortran.h =================================================================== --- fortran/gfortran.h (revision 255000) +++ fortran/gfortran.h (working copy) @@ -2350,6 +2350,7 @@ gfc_case; typedef struct { gfc_expr *var, *start, *end, *step; + unsigned short unroll; } gfc_iterator; @@ -2724,6 +2725,7 @@ gfc_finalizer; /* decl.c */ bool gfc_in_match_data (void); match gfc_match_char_spec (gfc_typespec *); +extern int directive_unroll; /* Handling Parameterized Derived Types */ bool gfc_insert_kind_parameter_exprs (gfc_expr *); Index: fortran/match.c =================================================================== --- fortran/match.c (revision 255000) +++ fortran/match.c (working copy) @@ -2539,8 +2539,8 @@ gfc_match_do (void) old_loc = gfc_current_locus; + memset (&iter, '\0', sizeof (gfc_iterator)); label = NULL; - iter.var = iter.start = iter.end = iter.step = NULL; m = gfc_match_label (); if (m == MATCH_ERROR) Index: fortran/match.h =================================================================== --- fortran/match.h (revision 255000) +++ fortran/match.h (working copy) @@ -241,6 +241,7 @@ match gfc_match_contiguous (void); match gfc_match_dimension (void); match gfc_match_external (void); match gfc_match_gcc_attributes (void); +match gfc_match_gcc_unroll (void); match gfc_match_import (void); match gfc_match_intent (void); match gfc_match_intrinsic (void); Index: fortran/parse.c =================================================================== --- fortran/parse.c (revision 255000) +++ fortran/parse.c (working copy) @@ -1063,6 +1063,7 @@ decode_gcc_attribute (void) old_locus = gfc_current_locus; match ("attributes", gfc_match_gcc_attributes, ST_ATTR_DECL); + match ("unroll", gfc_match_gcc_unroll, ST_NONE); /* All else has failed, so give up. See if any of the matchers has stored an error message of some sort. */ @@ -4634,7 +4635,14 @@ parse_do_block (void) s.ext.end_do_label = new_st.label1; if (new_st.ext.iterator != NULL) - stree = new_st.ext.iterator->var->symtree; + { + stree = new_st.ext.iterator->var->symtree; + if (directive_unroll != -1) + { + new_st.ext.iterator->unroll = directive_unroll; + directive_unroll = -1; + } + } else stree = NULL; @@ -5392,6 +5400,9 @@ parse_executable (gfc_statement st) return st; } + if (directive_unroll != -1) + gfc_error ("% directive does not commence a loop at %C"); + st = next_statement (); } } Index: fortran/trans-stmt.c =================================================================== --- fortran/trans-stmt.c (revision 255000) +++ fortran/trans-stmt.c (working copy) @@ -1980,6 +1980,11 @@ gfc_trans_simple_do (gfc_code * code, st fold_convert (type, to)); cond = gfc_evaluate_now_loc (loc, cond, &body); + if (code->ext.iterator->unroll && cond != error_mark_node) + cond + = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_unroll_kind), + build_int_cst (integer_type_node, code->ext.iterator->unroll)); /* The loop exit. */ tmp = fold_build1_loc (loc, GOTO_EXPR, void_type_node, exit_label); @@ -2306,6 +2311,11 @@ gfc_trans_do (gfc_code * code, tree exit /* End with the loop condition. Loop until countm1t == 0. */ cond = fold_build2_loc (loc, EQ_EXPR, logical_type_node, countm1t, build_int_cst (utype, 0)); + if (code->ext.iterator->unroll && cond != error_mark_node) + cond + = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + build_int_cst (integer_type_node, annot_expr_unroll_kind), + build_int_cst (integer_type_node, code->ext.iterator->unroll)); tmp = fold_build1_loc (loc, GOTO_EXPR, void_type_node, exit_label); tmp = fold_build3_loc (loc, COND_EXPR, void_type_node, cond, tmp, build_empty_stmt (loc)); @@ -3460,9 +3470,10 @@ gfc_trans_forall_loop (forall_info *fora cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); if (forall_tmp->do_concurrent) - cond = build2 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, + cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_parallel_kind)); + annot_expr_parallel_kind), + integer_zero_node); tmp = build1_v (GOTO_EXPR, exit_label); tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, Index: function.h =================================================================== --- function.h (revision 255000) +++ function.h (working copy) @@ -385,8 +385,11 @@ struct GTY(()) function { nonzero value in loop->simduid. */ unsigned int has_simduid_loops : 1; - /* Set when the tail call has been identified. */ + /* Nonzero when the tail call has been identified. */ unsigned int tail_call_marked : 1; + + /* Nonzero if the current function contains a #pragma GCC unroll. */ + unsigned int has_unroll : 1; }; /* Add the decl D to the local_decls list of FUN. */ Index: gimplify.c =================================================================== --- gimplify.c (revision 255000) +++ gimplify.c (working copy) @@ -3747,6 +3747,7 @@ gimple_boolify (tree expr) switch ((enum annot_expr_kind) TREE_INT_CST_LOW (TREE_OPERAND (expr, 1))) { case annot_expr_ivdep_kind: + case annot_expr_unroll_kind: case annot_expr_no_vector_kind: case annot_expr_vector_kind: case annot_expr_parallel_kind: @@ -11390,6 +11391,7 @@ gimplify_expr (tree *expr_p, gimple_seq { tree cond = TREE_OPERAND (*expr_p, 0); tree kind = TREE_OPERAND (*expr_p, 1); + tree data = TREE_OPERAND (*expr_p, 2); tree type = TREE_TYPE (cond); if (!INTEGRAL_TYPE_P (type)) { @@ -11400,7 +11402,7 @@ gimplify_expr (tree *expr_p, gimple_seq tree tmp = create_tmp_var (type); gimplify_arg (&cond, pre_p, EXPR_LOCATION (*expr_p)); gcall *call - = gimple_build_call_internal (IFN_ANNOTATE, 2, cond, kind); + = gimple_build_call_internal (IFN_ANNOTATE, 3, cond, kind, data); gimple_call_set_lhs (call, tmp); gimplify_seq_add_stmt (pre_p, call); *expr_p = tmp; Index: loop-init.c =================================================================== --- loop-init.c (revision 255000) +++ loop-init.c (working copy) @@ -361,8 +361,8 @@ pass_loop2::gate (function *fun) && (flag_move_loop_invariants || flag_unswitch_loops || flag_unroll_loops - || (flag_branch_on_count_reg - && targetm.have_doloop_end ()))) + || (flag_branch_on_count_reg && targetm.have_doloop_end ()) + || cfun->has_unroll)) return true; else { @@ -560,7 +560,7 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return (flag_unroll_loops || flag_unroll_all_loops); + return (flag_unroll_loops || flag_unroll_all_loops || cfun->has_unroll); } virtual unsigned int execute (function *); Index: loop-unroll.c =================================================================== --- loop-unroll.c (revision 255000) +++ loop-unroll.c (working copy) @@ -224,9 +224,16 @@ decide_unrolling (int flags) if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, locus, - ";; *** Considering loop %d at BB %d for " - "unrolling ***\n", - loop->num, loop->header->index); + "considering unrolling loop %d at BB %d\n", + loop->num, loop->header->index); + + if (loop->unroll == 1) + { + if (dump_file) + fprintf (dump_file, + ";; Not unrolling loop, user didn't want it unrolled\n"); + continue; + } /* Do not peel cold areas. */ if (optimize_loop_for_size_p (loop)) @@ -256,9 +263,7 @@ decide_unrolling (int flags) loop->ninsns = num_loop_insns (loop); loop->av_ninsns = average_num_loop_insns (loop); - /* Try transformations one by one in decreasing order of - priority. */ - + /* Try transformations one by one in decreasing order of priority. */ decide_unroll_constant_iterations (loop, flags); if (loop->lpt_decision.decision == LPT_NONE) decide_unroll_runtime_iterations (loop, flags); @@ -347,19 +352,17 @@ decide_unroll_constant_iterations (struc struct niter_desc *desc; widest_int iterations; - if (!(flags & UAP_UNROLL)) - { - /* We were not asked to, just return back silently. */ - return; - } - - if (dump_file) - fprintf (dump_file, - "\n;; Considering unrolling loop with constant " - "number of iterations\n"); + /* If we were not asked to unroll this loop, just return back silently. */ + if (!(flags & UAP_UNROLL) && !loop->unroll) + return; + + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, + "considering unrolling loop with constant " + "number of iterations\n"); /* nunroll = total number of copies of the original loop body in - unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ + unrolled loop (i.e. if it is 2, we have to duplicate loop body once). */ nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns; nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns; @@ -391,6 +394,24 @@ decide_unroll_constant_iterations (struc return; } + /* Check for an explicit unrolling factor. */ + if (loop->unroll) + { + /* However we cannot unroll completely at the RTL level a loop with + constant number of iterations; it should have been peeled instead. */ + if ((unsigned) loop->unroll - 1 > desc->niter - 2) + { + if (dump_file) + fprintf (dump_file, ";; Loop should have been peeled\n"); + } + else + { + loop->lpt_decision.decision = LPT_UNROLL_CONSTANT; + loop->lpt_decision.times = loop->unroll - 1; + } + return; + } + /* Check whether the loop rolls enough to consider. Consult also loop bounds and profile; in the case the loop has more than one exit it may well loop less than determined maximal number @@ -412,7 +433,7 @@ decide_unroll_constant_iterations (struc best_copies = 2 * nunroll + 10; i = 2 * nunroll + 2; - if (i - 1 >= desc->niter) + if (i > desc->niter - 2) i = desc->niter - 2; for (; i >= nunroll - 1; i--) @@ -651,16 +672,14 @@ decide_unroll_runtime_iterations (struct struct niter_desc *desc; widest_int iterations; - if (!(flags & UAP_UNROLL)) - { - /* We were not asked to, just return back silently. */ - return; - } - - if (dump_file) - fprintf (dump_file, - "\n;; Considering unrolling loop with runtime " - "computable number of iterations\n"); + /* If we were not asked to unroll this loop, just return back silently. */ + if (!(flags & UAP_UNROLL) && !loop->unroll) + return; + + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, + "considering unrolling loop with runtime-" + "computable number of iterations\n"); /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ @@ -674,6 +693,9 @@ decide_unroll_runtime_iterations (struct if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); + if (loop->unroll) + nunroll = loop->unroll; + /* Skip big loops. */ if (nunroll <= 1) { @@ -712,8 +734,9 @@ decide_unroll_runtime_iterations (struct return; } - /* Success; now force nunroll to be power of 2, as we are unable to - cope with overflows in computation of number of iterations. */ + /* Success; now force nunroll to be power of 2, as code-gen + requires it, we are unable to cope with overflows in + computation of number of iterations. */ for (i = 1; 2 * i <= nunroll; i *= 2) continue; @@ -824,9 +847,10 @@ compare_and_jump_seq (rtx op0, rtx op1, return seq; } -/* Unroll LOOP for which we are able to count number of iterations in runtime - LOOP->LPT_DECISION.TIMES times. The transformation does this (with some - extra care for case n < 0): +/* Unroll LOOP for which we are able to count number of iterations in + runtime LOOP->LPT_DECISION.TIMES times. The times value must be a + power of two. The transformation does this (with some extra care + for case n < 0): for (i = 0; i < n; i++) body; @@ -1133,14 +1157,12 @@ decide_unroll_stupid (struct loop *loop, struct niter_desc *desc; widest_int iterations; - if (!(flags & UAP_UNROLL_ALL)) - { - /* We were not asked to, just return back silently. */ - return; - } + /* If we were not asked to unroll this loop, just return back silently. */ + if (!(flags & UAP_UNROLL_ALL) && !loop->unroll) + return; - if (dump_file) - fprintf (dump_file, "\n;; Considering unrolling loop stupidly\n"); + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "considering unrolling loop stupidly\n"); /* nunroll = total number of copies of the original loop body in unrolled loop (i.e. if it is 2, we have to duplicate loop body once. */ @@ -1155,6 +1177,9 @@ decide_unroll_stupid (struct loop *loop, if (targetm.loop_unroll_adjust) nunroll = targetm.loop_unroll_adjust (nunroll, loop); + if (loop->unroll) + nunroll = loop->unroll; + /* Skip big loops. */ if (nunroll <= 1) { @@ -1170,7 +1195,7 @@ decide_unroll_stupid (struct loop *loop, if (desc->simple_p && !desc->assumptions) { if (dump_file) - fprintf (dump_file, ";; The loop is simple\n"); + fprintf (dump_file, ";; Loop is simple\n"); return; } Index: lto-streamer-in.c =================================================================== --- lto-streamer-in.c (revision 255000) +++ lto-streamer-in.c (working copy) @@ -825,6 +825,7 @@ input_cfg (struct lto_input_block *ib, s /* Read OMP SIMD related info. */ loop->safelen = streamer_read_hwi (ib); + loop->unroll = streamer_read_hwi (ib); loop->dont_vectorize = streamer_read_hwi (ib); loop->force_vectorize = streamer_read_hwi (ib); loop->simduid = stream_read_tree (ib, data_in); Index: lto-streamer-out.c =================================================================== --- lto-streamer-out.c (revision 255000) +++ lto-streamer-out.c (working copy) @@ -1929,6 +1929,7 @@ output_cfg (struct output_block *ob, str /* Write OMP SIMD related info. */ streamer_write_hwi (ob, loop->safelen); + streamer_write_hwi (ob, loop->unroll); streamer_write_hwi (ob, loop->dont_vectorize); streamer_write_hwi (ob, loop->force_vectorize); stream_write_tree (ob, loop->simduid, true); Index: testsuite/c-c++-common/unroll-1.c =================================================================== --- testsuite/c-c++-common/unroll-1.c (revision 0) +++ testsuite/c-c++-common/unroll-1.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + /* { dg-final { scan-tree-dump "11:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 7; ++i) + bar(i); + /* { dg-final { scan-tree-dump "16:.*: note: loop with 7 iterations completely unrolled" "cunrolli" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 15; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 7 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */ + + unsigned long i = 0; + #pragma GCC unroll 3 + do { + bar(i); + } while (++i < 9); + /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-2.c =================================================================== --- testsuite/c-c++-common/unroll-2.c (revision 0) +++ testsuite/c-c++-common/unroll-2.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-cunroll-details -fdump-rtl-loop2_unroll-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + /* { dg-final { scan-tree-dump "11:.*: note: loop with 7 iterations completely unrolled" "cunroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 7; ++i) + bar(i); + /* { dg-final { scan-tree-dump "16:.*: note: loop with 6 iterations completely unrolled" "cunroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 15; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 7 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */ + + unsigned long i = 0; + #pragma GCC unroll 3 + do { + bar(i); + } while (++i < 9); + /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-3.c =================================================================== --- testsuite/c-c++-common/unroll-3.c (revision 0) +++ testsuite/c-c++-common/unroll-3.c (working copy) @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + /* { dg-final { scan-rtl-dump-not "11:.*: note: loop unrolled" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 7; ++i) + bar(i); + /* { dg-final { scan-rtl-dump-not "16:.*: note: loop unrolled" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= 15; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "21:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 8 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "26:.*: note: loop unrolled 7 times" "loop2_unroll" } } */ + + #pragma GCC unroll 7 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + /* { dg-final { scan-rtl-dump "31:.*: note: loop unrolled 3 times" "loop2_unroll" } } */ + + unsigned long i = 0; + #pragma GCC unroll 3 + do { + bar(i); + } while (++i < 9); + /* { dg-final { scan-rtl-dump "3\[79\]:.*: note: loop unrolled 2 times" "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-4.c =================================================================== --- testsuite/c-c++-common/unroll-4.c (revision 0) +++ testsuite/c-c++-common/unroll-4.c (working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 0 + for (unsigned long i = 1; i <= 3; ++i) + bar(i); + + #pragma GCC unroll 0 + for (unsigned long i = 1; i <= j; ++i) + bar(i); + + /* { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */ + /* { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ +} Index: testsuite/c-c++-common/unroll-5.c =================================================================== --- testsuite/c-c++-common/unroll-5.c (revision 0) +++ testsuite/c-c++-common/unroll-5.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ + +extern void bar (int); + +int j; + +void test (void) +{ + #pragma GCC unroll 4+4 + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll -1 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll 20000000000 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll j /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + /* { dg-error "cannot appear in a constant-expression|is not usable in a constant expression" "" { target c++ } 21 } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); + + #pragma GCC unroll 4.2 /* { dg-error "requires an assignment-expression that evaluates to a non-negative integral constant less than or equal to" } */ + for (unsigned long i = 1; i <= 8; ++i) + bar(i); +} Index: testsuite/gcc.dg/pr64277.c =================================================================== --- testsuite/gcc.dg/pr64277.c (revision 255000) +++ testsuite/gcc.dg/pr64277.c (working copy) @@ -1,8 +1,8 @@ /* PR tree-optimization/64277 */ /* { dg-do compile } */ /* { dg-options "-O3 -Wall -Werror -fdump-tree-cunroll-details" } */ +/* { dg-final { scan-tree-dump "loop with 4 iterations completely unrolled" "cunroll" } } */ /* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" "cunroll" } } */ -/* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" "cunroll" } } */ #if __SIZEOF_INT__ < 4 __extension__ typedef __INT32_TYPE__ int32_t; Index: testsuite/gcc.dg/tree-prof/unroll-1.c =================================================================== --- testsuite/gcc.dg/tree-prof/unroll-1.c (revision 255000) +++ testsuite/gcc.dg/tree-prof/unroll-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } */ +/* { dg-options "-O3 -fdump-rtl-loop2_unroll-details -funroll-loops -fno-peel-loops" } */ void abort (); int a[1000]; @@ -20,4 +20,4 @@ main() t(); return 0; } -/* { dg-final-use { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ +/* { dg-final-use { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (working copy) @@ -9,5 +9,5 @@ test(int c) a[i]=5; } /* Array bounds says the loop will not roll much. */ -/* { dg-final { scan-tree-dump "loop with 3 iterations completely unrolled" "cunrolli"} } */ +/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */ /* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunrolli"} } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-12.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-12.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-12.c (working copy) @@ -7,5 +7,5 @@ t(struct a *a) for (int i=0;a->a[i];i++) a->a[i]++; } -/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 1 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 6 iterations completely unrolled" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-13.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-13.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-13.c (working copy) @@ -19,5 +19,5 @@ t(struct a *a) /* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations was eliminated" 1 "cunroll" } } */ -/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 1 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 6 iterations completely unrolled" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-14.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-14.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-14.c (working copy) @@ -7,7 +7,7 @@ t(struct a *a) for (int i=0;i<5 && a->a[i];i++) a->a[i]++; } -/* { dg-final { scan-tree-dump-times "loop with 5 iterations completely unrolled" 1 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-not "Invalid sum" "cunroll" } } */ /* { dg-final { scan-tree-dump-times "Loop 1 iterates 4 times" 1 "cunroll" } } */ /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved true" 1 "cunroll" } } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (working copy) @@ -14,4 +14,4 @@ test(int c) } } /* We are not able to get rid of the final conditional because the loop has two exits. */ -/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunroll"} } */ +/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunroll"} } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c (working copy) @@ -12,4 +12,4 @@ test(int c) } /* If we start duplicating headers prior curoll, this loop will have 0 iterations. */ -/* { dg-final { scan-tree-dump "loop with 2 iterations completely unrolled" "cunrolli"} } */ +/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" "cunrolli"} } */ Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c =================================================================== --- testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c (working copy) @@ -9,6 +9,6 @@ test(int c) a[i]=5; } /* Basic testcase for complette unrolling. */ -/* { dg-final { scan-tree-dump "loop with 6 iterations completely unrolled" "cunroll"} } */ +/* { dg-final { scan-tree-dump "loop with 5 iterations completely unrolled" "cunroll"} } */ /* { dg-final { scan-tree-dump "Exit condition of peeled iterations was eliminated." "cunroll"} } */ /* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." "cunroll"} } */ Index: testsuite/gcc.dg/tree-ssa/loop-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/loop-1.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/loop-1.c (working copy) @@ -34,7 +34,7 @@ int xxx(void) /* We should be able to find out that the loop iterates four times and unroll it completely. */ /* { dg-final { scan-tree-dump-times "Added canonical iv to loop 1, 4 iterations" 1 "ivcanon"} } */ -/* { dg-final { scan-tree-dump-times "loop with 5 iterations completely unrolled" 1 "cunroll"} } */ +/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll"} } */ /* { dg-final { scan-tree-dump-times "foo" 5 "optimized"} } */ /* Because hppa, ia64 and Windows targets include an external declaration Index: testsuite/gcc.dg/tree-ssa/loop-23.c =================================================================== --- testsuite/gcc.dg/tree-ssa/loop-23.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/loop-23.c (working copy) @@ -24,5 +24,4 @@ int foo(void) return sum; } -/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 1 "cunroll" } } */ - +/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunroll" } } */ Index: testsuite/gcc.dg/tree-ssa/pr61743-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/pr61743-1.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/pr61743-1.c (working copy) @@ -48,5 +48,5 @@ int foo1 (e_u8 a[4][N], int b1, int b2, return 0; } -/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 8 "cunroll" } } */ -/* { dg-final { scan-tree-dump-times "loop with 9 iterations completely unrolled" 2 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 8 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 8 iterations completely unrolled" 2 "cunrolli" } } */ Index: testsuite/gcc.dg/tree-ssa/pr61743-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/pr61743-2.c (revision 255000) +++ testsuite/gcc.dg/tree-ssa/pr61743-2.c (working copy) @@ -48,5 +48,5 @@ int foo1 (e_u8 a[4][N], int b1, int b2, return 0; } -/* { dg-final { scan-tree-dump-times "loop with 4 iterations completely unrolled" 2 "cunroll" } } */ -/* { dg-final { scan-tree-dump-times "loop with 8 iterations completely unrolled" 2 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 2 "cunroll" } } */ +/* { dg-final { scan-tree-dump-times "loop with 7 iterations completely unrolled" 2 "cunroll" } } */ Index: testsuite/gcc.dg/unroll-2.c =================================================================== --- testsuite/gcc.dg/unroll-2.c (revision 255000) +++ testsuite/gcc.dg/unroll-2.c (working copy) @@ -15,7 +15,7 @@ int foo(void) { int i; bar(); - for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */ + for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */ { a[i]= b[i] + 1; } @@ -25,7 +25,7 @@ int foo(void) int foo2(void) { int i; - for (i = 0; i < 2; i++) /* { dg-message "note: loop with 3 iterations completely unrolled" } */ + for (i = 0; i < 2; i++) /* { dg-message "note: loop with 2 iterations completely unrolled" } */ { a[i]= b[i] + 1; } Index: testsuite/gcc.dg/unroll-3.c =================================================================== --- testsuite/gcc.dg/unroll-3.c (revision 255000) +++ testsuite/gcc.dg/unroll-3.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-4.c =================================================================== --- testsuite/gcc.dg/unroll-4.c (revision 255000) +++ testsuite/gcc.dg/unroll-4.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-5.c =================================================================== --- testsuite/gcc.dg/unroll-5.c (revision 255000) +++ testsuite/gcc.dg/unroll-5.c (working copy) @@ -28,4 +28,4 @@ int foo2(void) return 1; } -/* { dg-final { scan-tree-dump-times "loop with 3 iterations completely unrolled" 1 "cunrolli" } } */ +/* { dg-final { scan-tree-dump-times "loop with 2 iterations completely unrolled" 1 "cunrolli" } } */ Index: testsuite/gcc.dg/unroll-7.c =================================================================== --- testsuite/gcc.dg/unroll-7.c (revision 255000) +++ testsuite/gcc.dg/unroll-7.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-rtl-loop2_unroll -funroll-loops" } */ +/* { dg-options "-O2 -fdump-rtl-loop2_unroll-details -funroll-loops" } */ /* { dg-require-effective-target int32plus } */ extern int *a; @@ -14,5 +14,5 @@ int t(void) /* { dg-final { scan-rtl-dump "number of iterations: .const_int 999999" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump "upper bound: 999999" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump "realistic bound: 999999" "loop2_unroll" } } */ -/* { dg-final { scan-rtl-dump "Considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ +/* { dg-final { scan-rtl-dump "considering unrolling loop with constant number of iterations" "loop2_unroll" } } */ /* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */ Index: testsuite/gfortran.dg/directive_unroll_1.f90 =================================================================== --- testsuite/gfortran.dg/directive_unroll_1.f90 (revision 0) +++ testsuite/gfortran.dg/directive_unroll_1.f90 (working copy) @@ -0,0 +1,52 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-cunrolli-details -fdump-rtl-loop2_unroll-details" } +! Test that +! #pragma GCC unroll n +! works + +subroutine test1(a) + implicit NONE + integer :: a(8) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, 8, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-tree-dump "12:.*: note: loop with 8 iterations completely unrolled" "cunrolli" } } */ +end subroutine test1 + +subroutine test2(a, n) + implicit NONE + integer :: a(n) + integer (kind=1), intent(in) :: n + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test2 + +subroutine test3(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=n, 1, -1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test3 + +subroutine test4(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 2 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test4 Index: testsuite/gfortran.dg/directive_unroll_2.f90 =================================================================== --- testsuite/gfortran.dg/directive_unroll_2.f90 (revision 0) +++ testsuite/gfortran.dg/directive_unroll_2.f90 (working copy) @@ -0,0 +1,52 @@ +! { dg-do compile } +! { dg-options "-O -fdump-tree-cunroll-details -fdump-rtl-loop2_unroll-details" } +! Test that +! #pragma GCC unroll n +! works + +subroutine test1(a) + implicit NONE + integer :: a(8) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, 8, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-tree-dump "12:.*: note: loop with 7 iterations completely unrolled" "cunroll" } } */ +end subroutine test1 + +subroutine test2(a, n) + implicit NONE + integer :: a(n) + integer (kind=1), intent(in) :: n + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test2 + +subroutine test3(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=n, 1, -1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test3 + +subroutine test4(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 2 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test4 Index: testsuite/gfortran.dg/directive_unroll_3.f90 =================================================================== --- testsuite/gfortran.dg/directive_unroll_3.f90 (revision 0) +++ testsuite/gfortran.dg/directive_unroll_3.f90 (working copy) @@ -0,0 +1,52 @@ +! { dg-do compile } +! { dg-options "-O -fdisable-tree-cunroll -fdump-rtl-loop2_unroll-details" } +! Test that +! #pragma GCC unroll n +! works + +subroutine test1(a) + implicit NONE + integer :: a(8) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, 8, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump-not "12:.: note: loop unrolled" "loop2_unroll" } } +end subroutine test1 + +subroutine test2(a, n) + implicit NONE + integer :: a(n) + integer (kind=1), intent(in) :: n + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "24:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test2 + +subroutine test3(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=n, 1, -1 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "36:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test3 + +subroutine test4(a, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n) + integer (kind=4) :: i +!GCC$ unroll 8 + DO i=1, n, 2 + call dummy(a(i)) + ENDDO +! { dg-final { scan-rtl-dump "48:.: note: loop unrolled 7 times" "loop2_unroll" } } +end subroutine test4 Index: testsuite/gfortran.dg/directive_unroll_4.f90 =================================================================== --- testsuite/gfortran.dg/directive_unroll_4.f90 (revision 0) +++ testsuite/gfortran.dg/directive_unroll_4.f90 (working copy) @@ -0,0 +1,29 @@ +! { dg-do compile } +! { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } +! Test that +! #pragma GCC unroll n +! works + +subroutine test1(a) + implicit NONE + integer :: a(8) + integer (kind=4) :: i +!GCC$ unroll 0 + DO i=1, 8, 1 + call dummy(a(i)) + ENDDO +end subroutine test1 + +subroutine test2(a, n) + implicit NONE + integer :: a(n) + integer (kind=1), intent(in) :: n + integer (kind=4) :: i +!GCC$ unroll 0 + DO i=1, n, 1 + call dummy(a(i)) + ENDDO +end subroutine test2 + +! { dg-final { scan-tree-dump "Not unrolling loop .: user didn't want it unrolled completely" "cunrolli" } } */ +! { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ Index: testsuite/gfortran.dg/directive_unroll_5.f90 =================================================================== --- testsuite/gfortran.dg/directive_unroll_5.f90 (revision 0) +++ testsuite/gfortran.dg/directive_unroll_5.f90 (working copy) @@ -0,0 +1,38 @@ +! { dg-do compile } + +! Test that +! #pragma GCC unroll n +! rejects invalid n and improper use + +subroutine wrong1(n) + implicit NONE + integer (kind=1), intent(in) :: n + integer (kind=4) :: i +!GCC$ unroll 999999999 ! { dg-error "non-negative integral constant less than" } + DO i=0, n, 1 + call dummy1(i) + ENDDO +end subroutine wrong1 + +subroutine wrong2(a, b, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n), b(n) + integer (kind=4) :: i +!GCC$ unroll -1 ! { dg-error "non-negative integral constant less than" } + DO i=1, n, 2 + call dummy2(a(i), b(i), i) + ENDDO +end subroutine wrong2 + +subroutine wrong3(a, b, n) + implicit NONE + integer (kind=1), intent(in) :: n + integer :: a(n), b(n) + integer (kind=4) :: i +!GCC$ unroll 8 + write (*,*) "wrong"! { dg-error "directive does not commence a loop" } + DO i=n, 1, -1 + call dummy2(a(i), b(i), i) + ENDDO +end subroutine wrong3 Index: testsuite/gnat.dg/unroll1.adb =================================================================== --- testsuite/gnat.dg/unroll1.adb (revision 0) +++ testsuite/gnat.dg/unroll1.adb (working copy) @@ -0,0 +1,27 @@ +-- { dg-do compile } +-- { dg-options "-O2 -funroll-all-loops -fdump-rtl-loop2_unroll-details -fdump-tree-cunrolli-details" } + +package body Unroll1 is + + function "+" (X, Y : Sarray) return Sarray is + R : Sarray; + begin + for I in Sarray'Range loop + pragma Loop_Optimize (No_Unroll); + R(I) := X(I) + Y(I); + end loop; + return R; + end; + + procedure Add (X, Y : Sarray; R : out Sarray) is + begin + for I in Sarray'Range loop + pragma Loop_Optimize (No_Unroll); + R(I) := X(I) + Y(I); + end loop; + end; + +end Unroll1; + +-- { dg-final { scan-tree-dump-times "Not unrolling loop .: user didn't want it unrolled completely" 2 "cunrolli" } } */ +-- { dg-final { scan-rtl-dump-times "Not unrolling loop, user didn't want it unrolled" 2 "loop2_unroll" } } */ Index: testsuite/gnat.dg/unroll1.ads =================================================================== --- testsuite/gnat.dg/unroll1.ads (revision 0) +++ testsuite/gnat.dg/unroll1.ads (working copy) @@ -0,0 +1,9 @@ +package Unroll1 is + + type Sarray is array (1 .. 4) of Float; + for Sarray'Alignment use 16; + + function "+" (X, Y : Sarray) return Sarray; + procedure Add (X, Y : Sarray; R : out Sarray); + +end Unroll1; Index: testsuite/gnat.dg/unroll2.adb =================================================================== --- testsuite/gnat.dg/unroll2.adb (revision 0) +++ testsuite/gnat.dg/unroll2.adb (working copy) @@ -0,0 +1,26 @@ +-- { dg-do compile } +-- { dg-options "-O2 -fdump-tree-cunrolli-details" } + +package body Unroll2 is + + function "+" (X, Y : Sarray) return Sarray is + R : Sarray; + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + return R; + end; + + procedure Add (X, Y : Sarray; R : out Sarray) is + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + end; + +end Unroll2; + +-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunrolli" } } */ Index: testsuite/gnat.dg/unroll2.ads =================================================================== --- testsuite/gnat.dg/unroll2.ads (revision 0) +++ testsuite/gnat.dg/unroll2.ads (working copy) @@ -0,0 +1,9 @@ +package Unroll2 is + + type Sarray is array (1 .. 4) of Float; + for Sarray'Alignment use 16; + + function "+" (X, Y : Sarray) return Sarray; + procedure Add (X, Y : Sarray; R : out Sarray); + +end Unroll2; Index: testsuite/gnat.dg/unroll3.adb =================================================================== --- testsuite/gnat.dg/unroll3.adb (revision 0) +++ testsuite/gnat.dg/unroll3.adb (working copy) @@ -0,0 +1,26 @@ +-- { dg-do compile } +-- { dg-options "-O -fdump-tree-cunroll-details" } + +package body Unroll3 is + + function "+" (X, Y : Sarray) return Sarray is + R : Sarray; + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + return R; + end; + + procedure Add (X, Y : Sarray; R : out Sarray) is + begin + for I in Sarray'Range loop + pragma Loop_Optimize (Unroll); + R(I) := X(I) + Y(I); + end loop; + end; + +end Unroll3; + +-- { dg-final { scan-tree-dump-times "note: loop with 3 iterations completely unrolled" 2 "cunroll" } } */ Index: testsuite/gnat.dg/unroll3.ads =================================================================== --- testsuite/gnat.dg/unroll3.ads (revision 0) +++ testsuite/gnat.dg/unroll3.ads (working copy) @@ -0,0 +1,9 @@ +package Unroll3 is + + type Sarray is array (1 .. 4) of Float; + for Sarray'Alignment use 16; + + function "+" (X, Y : Sarray) return Sarray; + procedure Add (X, Y : Sarray; R : out Sarray); + +end Unroll3; Index: tree-cfg.c =================================================================== --- tree-cfg.c (revision 255000) +++ tree-cfg.c (working copy) @@ -280,6 +280,11 @@ replace_loop_annotate_in_block (basic_bl case annot_expr_ivdep_kind: loop->safelen = INT_MAX; break; + case annot_expr_unroll_kind: + loop->unroll + = (unsigned short) tree_to_shwi (gimple_call_arg (stmt, 2)); + cfun->has_unroll = true; + break; case annot_expr_no_vector_kind: loop->dont_vectorize = true; break; @@ -338,6 +343,7 @@ replace_loop_annotate (void) switch ((annot_expr_kind) tree_to_shwi (gimple_call_arg (stmt, 1))) { case annot_expr_ivdep_kind: + case annot_expr_unroll_kind: case annot_expr_no_vector_kind: case annot_expr_vector_kind: break; @@ -7993,6 +7999,8 @@ print_loop (FILE *file, struct loop *loo fprintf (file, ", estimate = "); print_decu (loop->nb_iterations_estimate, file); } + if (loop->unroll) + fprintf (file, ", unroll = %d", loop->unroll); fprintf (file, ")\n"); /* Print loop's body. */ Index: tree-core.h =================================================================== --- tree-core.h (revision 255000) +++ tree-core.h (working copy) @@ -851,6 +851,7 @@ enum tree_node_kind { enum annot_expr_kind { annot_expr_ivdep_kind, + annot_expr_unroll_kind, annot_expr_no_vector_kind, annot_expr_vector_kind, annot_expr_parallel_kind, Index: tree-inline.c =================================================================== --- tree-inline.c (revision 255000) +++ tree-inline.c (working copy) @@ -2597,6 +2597,11 @@ copy_loops (copy_body_data *id, flow_loop_tree_node_add (dest_parent, dest_loop); dest_loop->safelen = src_loop->safelen; + if (src_loop->unroll) + { + dest_loop->unroll = src_loop->unroll; + cfun->has_unroll = true; + } dest_loop->dont_vectorize = src_loop->dont_vectorize; if (src_loop->force_vectorize) { Index: tree-pretty-print.c =================================================================== --- tree-pretty-print.c (revision 255000) +++ tree-pretty-print.c (working copy) @@ -2632,6 +2632,10 @@ dump_generic_node (pretty_printer *pp, t case annot_expr_ivdep_kind: pp_string (pp, ", ivdep"); break; + case annot_expr_unroll_kind: + pp_printf (pp, ", unroll %d", + (int) TREE_INT_CST_LOW (TREE_OPERAND (node, 2))); + break; case annot_expr_no_vector_kind: pp_string (pp, ", no-vector"); break; Index: tree-ssa-loop-ivcanon.c =================================================================== --- tree-ssa-loop-ivcanon.c (revision 255000) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -681,11 +681,9 @@ try_unroll_loop_completely (struct loop HOST_WIDE_INT maxiter, location_t locus) { - unsigned HOST_WIDE_INT n_unroll = 0, ninsns, unr_insns; - struct loop_size size; + unsigned HOST_WIDE_INT n_unroll = 0; bool n_unroll_found = false; edge edge_to_cancel = NULL; - dump_flags_t report_flags = MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS; /* See if we proved number of iterations to be low constant. @@ -726,7 +724,8 @@ try_unroll_loop_completely (struct loop if (!n_unroll_found) return false; - if (n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES)) + if (!loop->unroll + && n_unroll > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES)) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Not unrolling loop %d " @@ -740,121 +739,137 @@ try_unroll_loop_completely (struct loop if (n_unroll) { - bool large; if (ul == UL_SINGLE_ITER) return false; - /* EXIT can be removed only if we are sure it passes first N_UNROLL - iterations. */ - bool remove_exit = (exit && niter - && TREE_CODE (niter) == INTEGER_CST - && wi::leu_p (n_unroll, wi::to_widest (niter))); - - large = tree_estimate_loop_size - (loop, remove_exit ? exit : NULL, edge_to_cancel, &size, - PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)); - ninsns = size.overall; - if (large) + if (loop->unroll) { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: it is too large.\n", - loop->num); - return false; + /* If the unrolling factor is too large, bail out. */ + if (n_unroll > (unsigned)loop->unroll) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Not unrolling loop %d: " + "user didn't want it unrolled completely.\n", + loop->num); + return false; + } } - - unr_insns = estimated_unrolled_size (&size, n_unroll); - if (dump_file && (dump_flags & TDF_DETAILS)) + else { - fprintf (dump_file, " Loop size: %d\n", (int) ninsns); - fprintf (dump_file, " Estimated size after unrolling: %d\n", - (int) unr_insns); - } + struct loop_size size; + /* EXIT can be removed only if we are sure it passes first N_UNROLL + iterations. */ + bool remove_exit = (exit && niter + && TREE_CODE (niter) == INTEGER_CST + && wi::leu_p (n_unroll, wi::to_widest (niter))); + bool large + = tree_estimate_loop_size + (loop, remove_exit ? exit : NULL, edge_to_cancel, &size, + PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)); + if (large) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: it is too large.\n", + loop->num); + return false; + } - /* If the code is going to shrink, we don't need to be extra cautious - on guessing if the unrolling is going to be profitable. */ - if (unr_insns - /* If there is IV variable that will become constant, we save - one instruction in the loop prologue we do not account - otherwise. */ - <= ninsns + (size.constant_iv != false)) - ; - /* We unroll only inner loops, because we do not consider it profitable - otheriwse. We still can cancel loopback edge of not rolling loop; - this is always a good idea. */ - else if (ul == UL_NO_GROWTH) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", - loop->num); - return false; - } - /* Outer loops tend to be less interesting candidates for complete - unrolling unless we can do a lot of propagation into the inner loop - body. For now we disable outer loop unrolling when the code would - grow. */ - else if (loop->inner) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "it is not innermost and code would grow.\n", - loop->num); - return false; - } - /* If there is call on a hot path through the loop, then - there is most probably not much to optimize. */ - else if (size.num_non_pure_calls_on_hot_path) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "contains call and code would grow.\n", - loop->num); - return false; - } - /* If there is pure/const call in the function, then we - can still optimize the unrolled loop body if it contains - some other interesting code than the calls and code - storing or cumulating the return value. */ - else if (size.num_pure_calls_on_hot_path - /* One IV increment, one test, one ivtmp store - and one useful stmt. That is about minimal loop - doing pure call. */ - && (size.non_call_stmts_on_hot_path - <= 3 + size.num_pure_calls_on_hot_path)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "contains just pure calls and code would grow.\n", - loop->num); - return false; - } - /* Complete unrolling is a major win when control flow is removed and - one big basic block is created. If the loop contains control flow - the optimization may still be a win because of eliminating the loop - overhead but it also may blow the branch predictor tables. - Limit number of branches on the hot path through the peeled - sequence. */ - else if (size.num_branches_on_hot_path * (int)n_unroll - > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES)) - { + unsigned HOST_WIDE_INT ninsns = size.overall; + unsigned HOST_WIDE_INT unr_insns + = estimated_unrolled_size (&size, n_unroll); if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - " number of branches on hot path in the unrolled sequence" - " reach --param max-peel-branches limit.\n", - loop->num); - return false; - } - else if (unr_insns - > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)) - { - if (dump_file && (dump_flags & TDF_DETAILS)) - fprintf (dump_file, "Not unrolling loop %d: " - "(--param max-completely-peeled-insns limit reached).\n", - loop->num); - return false; + { + fprintf (dump_file, " Loop size: %d\n", (int) ninsns); + fprintf (dump_file, " Estimated size after unrolling: %d\n", + (int) unr_insns); + } + + /* If the code is going to shrink, we don't need to be extra + cautious on guessing if the unrolling is going to be + profitable. */ + if (unr_insns + /* If there is IV variable that will become constant, we + save one instruction in the loop prologue we do not + account otherwise. */ + <= ninsns + (size.constant_iv != false)) + ; + /* We unroll only inner loops, because we do not consider it + profitable otheriwse. We still can cancel loopback edge + of not rolling loop; this is always a good idea. */ + else if (ul == UL_NO_GROWTH) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", + loop->num); + return false; + } + /* Outer loops tend to be less interesting candidates for + complete unrolling unless we can do a lot of propagation + into the inner loop body. For now we disable outer loop + unrolling when the code would grow. */ + else if (loop->inner) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "it is not innermost and code would grow.\n", + loop->num); + return false; + } + /* If there is call on a hot path through the loop, then + there is most probably not much to optimize. */ + else if (size.num_non_pure_calls_on_hot_path) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "contains call and code would grow.\n", + loop->num); + return false; + } + /* If there is pure/const call in the function, then we can + still optimize the unrolled loop body if it contains some + other interesting code than the calls and code storing or + cumulating the return value. */ + else if (size.num_pure_calls_on_hot_path + /* One IV increment, one test, one ivtmp store and + one useful stmt. That is about minimal loop + doing pure call. */ + && (size.non_call_stmts_on_hot_path + <= 3 + size.num_pure_calls_on_hot_path)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "contains just pure calls and code would grow.\n", + loop->num); + return false; + } + /* Complete unrolling is major win when control flow is + removed and one big basic block is created. If the loop + contains control flow the optimization may still be a win + because of eliminating the loop overhead but it also may + blow the branch predictor tables. Limit number of + branches on the hot path through the peeled sequence. */ + else if (size.num_branches_on_hot_path * (int)n_unroll + > PARAM_VALUE (PARAM_MAX_PEEL_BRANCHES)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "number of branches on hot path in the unrolled " + "sequence reaches --param max-peel-branches limit.\n", + loop->num); + return false; + } + else if (unr_insns + > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "Not unrolling loop %d: " + "number of insns in the unrolled sequence reaches " + "--param max-completely-peeled-insns limit.\n", + loop->num); + return false; + } } - if (!n_unroll) - dump_printf_loc (report_flags, locus, - "loop turned into non-loop; it never loops.\n"); initialize_original_copy_tables (); auto_sbitmap wont_exit (n_unroll + 1); @@ -898,8 +913,8 @@ try_unroll_loop_completely (struct loop else gimple_cond_make_true (cond); update_stmt (cond); - /* Do not remove the path. Doing so may remove outer loop - and confuse bookkeeping code in tree_unroll_loops_completelly. */ + /* Do not remove the path, as doing so may remove outer loop and + confuse bookkeeping code in tree_unroll_loops_completely. */ } /* Store the loop for later unlooping and exit removal. */ @@ -915,7 +930,7 @@ try_unroll_loop_completely (struct loop { dump_printf_loc (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, locus, "loop with %d iterations completely unrolled", - (int) (n_unroll + 1)); + (int) n_unroll); if (loop->header->count.initialized_p ()) dump_printf (MSG_OPTIMIZED_LOCATIONS | TDF_DETAILS, " (header execution count %d)", @@ -963,7 +978,8 @@ try_peel_loop (struct loop *loop, struct loop_size size; int peeled_size; - if (!flag_peel_loops || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0 + if (!flag_peel_loops + || PARAM_VALUE (PARAM_MAX_PEEL_TIMES) <= 0 || !peeled_loops) return false; @@ -974,20 +990,29 @@ try_peel_loop (struct loop *loop, return false; } + /* We don't peel loops that will be unrolled as this can duplicate a + loop more times than the user requested. */ + if (loop->unroll) + { + if (dump_file) + fprintf (dump_file, "Not peeling: user didn't want it peeled.\n"); + return false; + } + /* Peel only innermost loops. While the code is perfectly capable of peeling non-innermost loops, the heuristics would probably need some improvements. */ if (loop->inner) { if (dump_file) - fprintf (dump_file, "Not peeling: outer loop\n"); + fprintf (dump_file, "Not peeling: outer loop\n"); return false; } if (!optimize_loop_for_speed_p (loop)) { if (dump_file) - fprintf (dump_file, "Not peeling: cold loop\n"); + fprintf (dump_file, "Not peeling: cold loop\n"); return false; } @@ -1005,7 +1030,7 @@ try_peel_loop (struct loop *loop, if (maxiter >= 0 && maxiter <= npeel) { if (dump_file) - fprintf (dump_file, "Not peeling: upper bound is known so can " + fprintf (dump_file, "Not peeling: upper bound is known so can " "unroll completely\n"); return false; } @@ -1016,7 +1041,7 @@ try_peel_loop (struct loop *loop, if (npeel > PARAM_VALUE (PARAM_MAX_PEEL_TIMES) - 1) { if (dump_file) - fprintf (dump_file, "Not peeling: rolls too much " + fprintf (dump_file, "Not peeling: rolls too much " "(%i + 1 > --param max-peel-times)\n", (int) npeel); return false; } @@ -1029,7 +1054,7 @@ try_peel_loop (struct loop *loop, > PARAM_VALUE (PARAM_MAX_PEELED_INSNS)) { if (dump_file) - fprintf (dump_file, "Not peeling: peeled sequence size is too large " + fprintf (dump_file, "Not peeling: peeled sequence size is too large " "(%i insns > --param max-peel-insns)", peeled_size); return false; } @@ -1317,7 +1342,9 @@ tree_unroll_loops_completely_1 (bool may if (!loop_father) return false; - if (may_increase_size && optimize_loop_nest_for_speed_p (loop) + if (loop->unroll > 1) + ul = UL_ALL; + else if (may_increase_size && optimize_loop_nest_for_speed_p (loop) /* Unroll outermost loops only if asked to do so or they do not cause code growth. */ && (unroll_outer || loop_outer (loop_father))) @@ -1345,7 +1372,7 @@ tree_unroll_loops_completely_1 (bool may MAY_INCREASE_SIZE is true, perform the unrolling only if the size of the code does not increase. */ -unsigned int +static unsigned int tree_unroll_loops_completely (bool may_increase_size, bool unroll_outer) { bitmap father_bbs = BITMAP_ALLOC (NULL); @@ -1522,9 +1549,9 @@ pass_complete_unroll::execute (function re-peeling the same loop multiple times. */ if (flag_peel_loops) peeled_loops = BITMAP_ALLOC (NULL); - int val = tree_unroll_loops_completely (flag_unroll_loops - || flag_peel_loops - || optimize >= 3, true); + unsigned int val = tree_unroll_loops_completely (flag_unroll_loops + || flag_peel_loops + || optimize >= 3, true); if (peeled_loops) { BITMAP_FREE (peeled_loops); @@ -1576,8 +1603,7 @@ pass_complete_unrolli::execute (function { unsigned ret = 0; - loop_optimizer_init (LOOPS_NORMAL - | LOOPS_HAVE_RECORDED_EXITS); + loop_optimizer_init (LOOPS_NORMAL | LOOPS_HAVE_RECORDED_EXITS); if (number_of_loops (fun) > 1) { scev_initialize (); Index: tree.def =================================================================== --- tree.def (revision 255000) +++ tree.def (working copy) @@ -1410,8 +1410,9 @@ DEFTREECODE (TARGET_OPTION_NODE, "target /* ANNOTATE_EXPR. Operand 0 is the expression to be annotated. - Operand 1 is the annotation kind. */ -DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 2) + Operand 1 is the annotation kind. + Operand 2 is additional data. */ +DEFTREECODE (ANNOTATE_EXPR, "annotate_expr", tcc_expression, 3) /* Cilk spawn statement Operand 0 is the CALL_EXPR. */