From patchwork Wed Mar 23 14:53:16 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Schmidt X-Patchwork-Id: 88094 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B4CF51007D1 for ; Thu, 24 Mar 2011 01:56:29 +1100 (EST) Received: (qmail 11215 invoked by alias); 23 Mar 2011 14:56:07 -0000 Received: (qmail 8181 invoked by uid 22791); 23 Mar 2011 14:55:12 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL, BAYES_50, TW_CF, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 23 Mar 2011 14:54:13 +0000 Received: (qmail 4325 invoked from network); 23 Mar 2011 14:54:09 -0000 Received: from unknown (HELO ?84.152.161.180?) (bernds@127.0.0.2) by mail.codesourcery.com with ESMTPA; 23 Mar 2011 14:54:09 -0000 Message-ID: <4D8A095C.8050809@codesourcery.com> Date: Wed, 23 Mar 2011 15:53:16 +0100 From: Bernd Schmidt User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.15) Gecko/20110309 Lightning/1.0b3pre Thunderbird/3.1.9 MIME-Version: 1.0 To: GCC Patches Subject: [PATCH 4/6] Shrink-wrapping References: <4D8A0703.9090306@codesourcery.com> In-Reply-To: <4D8A0703.9090306@codesourcery.com> Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This adds the actual optimization, and reworks the JUMP_LABEL handling for return blocks. See the introduction mail or the new comment ahead of thread_prologue_and_epilogue_insns for more notes. Bernd * doc/tm.texi (RETURN_ADDR_REGNUM): Document. * doc/md.texi (simple_return): Document pattern. (return): Add a sentence to clarify. * doc/rtl.texi (simple_return): Document. * doc/invoke.texi (Optimize Options): Document -fshrink-wrap. * common.opt (fshrink-wrap): New. * opts.c (default_options_table): Enable it for -O1 and above. * gengenrtl.c (special_rtx): SIMPLE_RETURN is special. * rtl.h (ANY_RETURN_P): New macro. (global_rtl_index): Add GR_SIMPLE_RETURN. (simple_return_rtx): New macro. * genemit.c (gen_exp): SIMPLE_RETURN has a unique rtx. (gen_expand, gen_split): Use ANY_RETURN_P. * rtl.c (copy_rtx): SIMPLE_RETURN is shared. * emit-rtl.c (verify_rtx_sharing): Likewise. (skip_consecutive_labels): Return the argument if it is a return rtx. (classify_insn): Handle both kinds of return. (init_emit_regs): Create global rtl for ret_rtx and simple_return_rtx. * df-scan.c (df_uses_record): Handle SIMPLE_RETURN. * rtl.def (SIMPLE_RETURN): New. * rtlanal.c (tablejump_p): Check JUMP_LABEL for returns. * final.c (final_scan_insn): Recognize both kinds of return. * reorg.c (function_return_label, function_simple_return_label): New static variables. (end_of_function_label): Remove. (simplejump_or_return_p): New static function. (find_end_label): Add a new arg, KIND. All callers changed. Depending on KIND, look for a label suitable for return or simple_return. (make_return_insns): Make corresponding changes. (get_jump_flags): Check JUMP_LABELs for returns. (follow_jumps): Likewise. (get_branch_condition): Check target for return patterns rather than NULL. (own_thread_p): Likewise for thread. (steal_delay_list_from_target): Check JUMP_LABELs for returns. Use simplejump_or_return_p. (fill_simple_delay_slots): Likewise. (optimize_skip): Likewise. (fill_slots_from_thread): Likewise. (relax_delay_slots): Likewise. (dbr_schedule): Adjust handling of end_of_function_label for the two new variables. * ifcvt.c (find_if_case_1): Take care when redirecting jumps to the exit block. (dead_or_predicable): Change NEW_DEST arg to DEST_EDGE. All callers changed. Ensure that the right label is passed to redirect_jump. * jump.c (condjump_p, condjump_in_parallel_p, any_condjump_p, returnjump_p): Handle SIMPLE_RETURNs. (delete_related_insns): Check JUMP_LABEL for returns. (redirect_target): New static function. (redirect_exp_1): Use it. Handle any kind of return rtx as a label rather than interpreting NULL as a return. (redirect_jump_1): Assert that nlabel is not NULL. (redirect_jump): Likewise. (redirect_jump_2): Handle any kind of return rtx as a label rather than interpreting NULL as a return. * dwarf2out.c (compute_barrier_args_size_1): Check JUMP_LABEL for returns. * function.c (emit_return_into_block): Remove useless declaration. (record_hard_reg_sets, frame_required_for_rtx, gen_return_pattern, requires_stack_frame_p): New static functions. (emit_return_into_block): New arg SIMPLE_P. All callers changed. Generate either kind of return pattern and update the JUMP_LABEL. (thread_prologue_and_epilogue_insns): Implement a form of shrink-wrapping. Ensure JUMP_LABELs for return insns are set. * print-rtl.c (print_rtx): Handle returns in JUMP_LABELs. * cfglayout.c (fixup_reorder_chain): Ensure JUMP_LABELs for returns remain correct. * resource.c (find_dead_or_set_registers): Check JUMP_LABELs for returns. (mark_target_live_regs): Don't pass a return rtx to next_active_insn. * basic-block.h (force_nonfallthru_and_redirect): Declare. * sched-vis.c (print_pattern): Add case for SIMPLE_RETURN. * cfgrtl.c (force_nonfallthru_and_redirect): No longer static. New arg JUMP_LABEL. All callers changed. Use the label when generating return insns. * config/i386/i386.md (returns, return_str, return_cond): New code_iterator and corresponding code_attrs. (return): Renamed from return and adapted. (return_internal): Likewise for return_internal. (return_internal_long): Likewise for return_internal_long. (return_pop_internal): Likewise for return_pop_internal. (return_indirect_internal): Likewise for return_indirect_internal. * config/i386/i386.c (ix86_expand_epilogue): Expand a simple_return as the last insn. (ix86_pad_returns): Handle both kinds of return rtx. * config/arm/arm.c (use_simple_return_p): new function. (is_jump_table): Handle returns in JUMP_LABELs. (output_return_instruction): New arg SIMPLE. All callers changed. Use it to determine which kind of return to generate. (arm_final_prescan_insn): Handle both kinds of return. * config/arm/arm.md (returns, return_str, return_simple_p, return_cond): New code_iterator and corresponding code_attrs. (return): Renamed from return and adapted. (arm_return): Renamed from arm_return and adapted. (cond_return): Renamed from cond_return and adapted. (cond_return_inverted): Renamed from cond_return_inverted and adapted. * config/arm/thumb2.md (thumb2_return): Renamed from thumb2_return and adapted. * config/arm/arm.h (RETURN_ADDR_REGNUM): Define. * config/arm/arm-protos.h (use_simple_return_p): Declare. (output_return_instruction): Adjust declaration. * config/mips/mips.c (mips_expand_epilogue): Generate a simple_return as final insn. * config/mips/mips.md (simple_return): New expander. (*simple_return, simple_return_internal): New patterns. * config/sh/sh.c (barrier_align): Handle return in a JUMP_LABEL. (split_branches): Don't pass a null label to redirect_jump. Index: gcc/basic-block.h =================================================================== --- gcc.orig/basic-block.h +++ gcc/basic-block.h @@ -795,6 +795,7 @@ extern void flow_edge_list_print (const /* In cfgrtl.c */ extern basic_block force_nonfallthru (edge); +extern basic_block force_nonfallthru_and_redirect (edge, basic_block, rtx); extern rtx block_label (basic_block); extern bool purge_all_dead_edges (void); extern bool purge_dead_edges (basic_block); Index: gcc/cfglayout.c =================================================================== --- gcc.orig/cfglayout.c +++ gcc/cfglayout.c @@ -766,6 +766,7 @@ fixup_reorder_chain (void) { edge e_fall, e_taken, e; rtx bb_end_insn; + rtx ret_label = NULL_RTX; basic_block nb; edge_iterator ei; @@ -785,6 +786,7 @@ fixup_reorder_chain (void) bb_end_insn = BB_END (bb); if (JUMP_P (bb_end_insn)) { + ret_label = JUMP_LABEL (bb_end_insn); if (any_condjump_p (bb_end_insn)) { /* This might happen if the conditional jump has side @@ -895,7 +897,7 @@ fixup_reorder_chain (void) } /* We got here if we need to add a new jump insn. */ - nb = force_nonfallthru (e_fall); + nb = force_nonfallthru_and_redirect (e_fall, e_fall->dest, ret_label); if (nb) { nb->il.rtl->visited = 1; @@ -1129,24 +1131,30 @@ extern bool cfg_layout_can_duplicate_bb_ bool cfg_layout_can_duplicate_bb_p (const_basic_block bb) { + rtx insn; + /* Do not attempt to duplicate tablejumps, as we need to unshare the dispatch table. This is difficult to do, as the instructions computing jump destination may be hoisted outside the basic block. */ if (tablejump_p (BB_END (bb), NULL, NULL)) return false; - /* Do not duplicate blocks containing insns that can't be copied. */ - if (targetm.cannot_copy_insn_p) + insn = BB_HEAD (bb); + while (1) { - rtx insn = BB_HEAD (bb); - while (1) - { - if (INSN_P (insn) && targetm.cannot_copy_insn_p (insn)) - return false; - if (insn == BB_END (bb)) - break; - insn = NEXT_INSN (insn); - } + /* Do not duplicate blocks containing insns that can't be copied. */ + if (INSN_P (insn) && targetm.cannot_copy_insn_p + && targetm.cannot_copy_insn_p (insn)) + return false; + /* dwarf2out expects that these notes are always paired with a + returnjump or sibling call. */ + if (NOTE_P (insn) && NOTE_KIND (insn) == NOTE_INSN_EPILOGUE_BEG + && !returnjump_p (BB_END (bb)) + && (!CALL_P (BB_END (bb)) || !SIBLING_CALL_P (BB_END (bb)))) + return false; + if (insn == BB_END (bb)) + break; + insn = NEXT_INSN (insn); } return true; @@ -1191,6 +1199,9 @@ duplicate_insn_chain (rtx from, rtx to) break; } copy = emit_copy_of_insn_after (insn, get_last_insn ()); + if (JUMP_P (insn) && JUMP_LABEL (insn) != NULL_RTX + && ANY_RETURN_P (JUMP_LABEL (insn))) + JUMP_LABEL (copy) = JUMP_LABEL (insn); maybe_copy_prologue_epilogue_insn (insn, copy); break; Index: gcc/cfgrtl.c =================================================================== --- gcc.orig/cfgrtl.c +++ gcc/cfgrtl.c @@ -1114,10 +1114,13 @@ rtl_redirect_edge_and_branch (edge e, ba } /* Like force_nonfallthru below, but additionally performs redirection - Used by redirect_edge_and_branch_force. */ + Used by redirect_edge_and_branch_force. JUMP_LABEL is used only + when redirecting to the EXIT_BLOCK, it is either a return or a + simple_return rtx indicating which kind of returnjump to create. + It should be NULL otherwise. */ -static basic_block -force_nonfallthru_and_redirect (edge e, basic_block target) +basic_block +force_nonfallthru_and_redirect (edge e, basic_block target, rtx jump_label) { basic_block jump_block, new_bb = NULL, src = e->src; rtx note; @@ -1249,11 +1252,25 @@ force_nonfallthru_and_redirect (edge e, e->flags &= ~EDGE_FALLTHRU; if (target == EXIT_BLOCK_PTR) { + if (jump_label == ret_rtx) + { #ifdef HAVE_return - emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), loc); + emit_jump_insn_after_setloc (gen_return (), BB_END (jump_block), + loc); #else - gcc_unreachable (); + gcc_unreachable (); #endif + } + else + { + gcc_assert (jump_label == simple_return_rtx); +#ifdef HAVE_simple_return + emit_jump_insn_after_setloc (gen_simple_return (), + BB_END (jump_block), loc); +#else + gcc_unreachable (); +#endif + } } else { @@ -1280,7 +1297,7 @@ force_nonfallthru_and_redirect (edge e, basic_block force_nonfallthru (edge e) { - return force_nonfallthru_and_redirect (e, e->dest); + return force_nonfallthru_and_redirect (e, e->dest, NULL_RTX); } /* Redirect edge even at the expense of creating new jump insn or @@ -1297,7 +1314,7 @@ rtl_redirect_edge_and_branch_force (edge /* In case the edge redirection failed, try to force it to be non-fallthru and redirect newly created simplejump. */ df_set_bb_dirty (e->src); - return force_nonfallthru_and_redirect (e, target); + return force_nonfallthru_and_redirect (e, target, NULL_RTX); } /* The given edge should potentially be a fallthru edge. If that is in Index: gcc/config/arm/arm-protos.h =================================================================== --- gcc.orig/config/arm/arm-protos.h +++ gcc/config/arm/arm-protos.h @@ -24,6 +24,7 @@ #define GCC_ARM_PROTOS_H extern int use_return_insn (int, rtx); +extern bool use_simple_return_p (void); extern enum reg_class arm_regno_class (int); extern void arm_load_pic_register (unsigned long); extern int arm_volatile_func (void); @@ -135,7 +136,7 @@ extern int arm_address_offset_is_imm (rt extern const char *output_add_immediate (rtx *); extern const char *arithmetic_instr (rtx, int); extern void output_ascii_pseudo_op (FILE *, const unsigned char *, int); -extern const char *output_return_instruction (rtx, int, int); +extern const char *output_return_instruction (rtx, bool, bool, bool); extern void arm_poke_function_name (FILE *, const char *); extern void arm_final_prescan_insn (rtx); extern int arm_debugger_arg_offset (int, rtx); Index: gcc/config/arm/arm.c =================================================================== --- gcc.orig/config/arm/arm.c +++ gcc/config/arm/arm.c @@ -2252,6 +2252,18 @@ arm_trampoline_adjust_address (rtx addr) return addr; } +/* Return true if we should try to use a simple_return insn, i.e. perform + shrink-wrapping if possible. This is the case if we need to emit a + prologue, which we can test by looking at the offsets. */ +bool +use_simple_return_p (void) +{ + arm_stack_offsets *offsets; + + offsets = arm_get_frame_offsets (); + return offsets->outgoing_args != 0; +} + /* Return 1 if it is possible to return using a single instruction. If SIBLING is non-null, this is a test for a return before a sibling call. SIBLING is the call insn, so we can examine its register usage. */ @@ -11388,6 +11400,7 @@ is_jump_table (rtx insn) if (GET_CODE (insn) == JUMP_INSN && JUMP_LABEL (insn) != NULL + && !ANY_RETURN_P (JUMP_LABEL (insn)) && ((table = next_real_insn (JUMP_LABEL (insn))) == next_real_insn (insn)) && table != NULL @@ -14234,7 +14247,7 @@ arm_get_vfp_saved_size (void) /* Generate a function exit sequence. If REALLY_RETURN is false, then do everything bar the final return instruction. */ const char * -output_return_instruction (rtx operand, int really_return, int reverse) +output_return_instruction (rtx operand, bool really_return, bool reverse, bool simple) { char conditional[10]; char instr[100]; @@ -14272,10 +14285,15 @@ output_return_instruction (rtx operand, sprintf (conditional, "%%?%%%c0", reverse ? 'D' : 'd'); - cfun->machine->return_used_this_function = 1; + if (simple) + live_regs_mask = 0; + else + { + cfun->machine->return_used_this_function = 1; - offsets = arm_get_frame_offsets (); - live_regs_mask = offsets->saved_regs_mask; + offsets = arm_get_frame_offsets (); + live_regs_mask = offsets->saved_regs_mask; + } if (live_regs_mask) { @@ -17283,6 +17301,7 @@ arm_final_prescan_insn (rtx insn) /* If we start with a return insn, we only succeed if we find another one. */ int seeking_return = 0; + enum rtx_code return_code = UNKNOWN; /* START_INSN will hold the insn from where we start looking. This is the first insn after the following code_label if REVERSE is true. */ @@ -17321,7 +17340,7 @@ arm_final_prescan_insn (rtx insn) else return; } - else if (GET_CODE (body) == RETURN) + else if (ANY_RETURN_P (body)) { start_insn = next_nonnote_insn (start_insn); if (GET_CODE (start_insn) == BARRIER) @@ -17332,6 +17351,7 @@ arm_final_prescan_insn (rtx insn) { reverse = TRUE; seeking_return = 1; + return_code = GET_CODE (body); } else return; @@ -17372,11 +17392,15 @@ arm_final_prescan_insn (rtx insn) label = XEXP (XEXP (SET_SRC (body), 2), 0); then_not_else = FALSE; } - else if (GET_CODE (XEXP (SET_SRC (body), 1)) == RETURN) - seeking_return = 1; - else if (GET_CODE (XEXP (SET_SRC (body), 2)) == RETURN) + else if (ANY_RETURN_P (XEXP (SET_SRC (body), 1))) + { + seeking_return = 1; + return_code = GET_CODE (XEXP (SET_SRC (body), 1)); + } + else if (ANY_RETURN_P (XEXP (SET_SRC (body), 2))) { seeking_return = 1; + return_code = GET_CODE (XEXP (SET_SRC (body), 2)); then_not_else = FALSE; } else @@ -17477,8 +17501,7 @@ arm_final_prescan_insn (rtx insn) && !use_return_insn (TRUE, NULL) && !optimize_size) fail = TRUE; - else if (GET_CODE (scanbody) == RETURN - && seeking_return) + else if (GET_CODE (scanbody) == return_code) { arm_ccfsm_state = 2; succeed = TRUE; Index: gcc/config/arm/arm.h =================================================================== --- gcc.orig/config/arm/arm.h +++ gcc/config/arm/arm.h @@ -2255,6 +2255,8 @@ extern int making_const_table; #define RETURN_ADDR_RTX(COUNT, FRAME) \ arm_return_addr (COUNT, FRAME) +#define RETURN_ADDR_REGNUM LR_REGNUM + /* Mask of the bits in the PC that contain the real return address when running in 26-bit mode. */ #define RETURN_ADDR_MASK26 (0x03fffffc) Index: gcc/config/arm/arm.md =================================================================== --- gcc.orig/config/arm/arm.md +++ gcc/config/arm/arm.md @@ -8116,66 +8116,65 @@ [(set_attr "type" "call")] ) -(define_expand "return" - [(return)] - "TARGET_32BIT && USE_RETURN_INSN (FALSE)" +(define_expand "return" + [(returns)] + "TARGET_32BIT" "") -;; Often the return insn will be the same as loading from memory, so set attr -(define_insn "*arm_return" - [(return)] - "TARGET_ARM && USE_RETURN_INSN (FALSE)" - "* - { - if (arm_ccfsm_state == 2) - { - arm_ccfsm_state += 2; - return \"\"; - } - return output_return_instruction (const_true_rtx, TRUE, FALSE); - }" +(define_insn "*arm_return" + [(returns)] + "TARGET_ARM" +{ + if (arm_ccfsm_state == 2) + { + arm_ccfsm_state += 2; + return ""; + } + return output_return_instruction (const_true_rtx, true, false, + ); +} [(set_attr "type" "load1") (set_attr "length" "12") (set_attr "predicable" "yes")] ) -(define_insn "*cond_return" +(define_insn "*cond_return" [(set (pc) (if_then_else (match_operator 0 "arm_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) - (return) + (returns) (pc)))] - "TARGET_ARM && USE_RETURN_INSN (TRUE)" - "* - { - if (arm_ccfsm_state == 2) - { - arm_ccfsm_state += 2; - return \"\"; - } - return output_return_instruction (operands[0], TRUE, FALSE); - }" + "TARGET_ARM" +{ + if (arm_ccfsm_state == 2) + { + arm_ccfsm_state += 2; + return ""; + } + return output_return_instruction (operands[0], true, false, + ); +} [(set_attr "conds" "use") (set_attr "length" "12") (set_attr "type" "load1")] ) -(define_insn "*cond_return_inverted" +(define_insn "*cond_return_inverted" [(set (pc) (if_then_else (match_operator 0 "arm_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (pc) - (return)))] - "TARGET_ARM && USE_RETURN_INSN (TRUE)" - "* - { - if (arm_ccfsm_state == 2) - { - arm_ccfsm_state += 2; - return \"\"; - } - return output_return_instruction (operands[0], TRUE, TRUE); - }" + (returns)))] + "TARGET_ARM" +{ + if (arm_ccfsm_state == 2) + { + arm_ccfsm_state += 2; + return ""; + } + return output_return_instruction (operands[0], true, true, + ); +} [(set_attr "conds" "use") (set_attr "length" "12") (set_attr "type" "load1")] @@ -9970,7 +9969,8 @@ DONE; } emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode, - gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE)); + gen_rtvec (1, ret_rtx), + VUNSPEC_EPILOGUE)); DONE; " ) @@ -9986,7 +9986,7 @@ "TARGET_32BIT" "* if (use_return_insn (FALSE, next_nonnote_insn (insn))) - return output_return_instruction (const_true_rtx, FALSE, FALSE); + return output_return_instruction (const_true_rtx, false, false, false); return arm_output_epilogue (next_nonnote_insn (insn)); " ;; Length is absolute worst case Index: gcc/config/arm/iterators.md =================================================================== --- gcc.orig/config/arm/iterators.md +++ gcc/config/arm/iterators.md @@ -403,3 +403,11 @@ ;; Assembler mnemonics for signedness of widening operations. (define_code_attr US [(sign_extend "s") (zero_extend "u")]) + +;; Both kinds of return insn. +(define_code_iterator returns [return simple_return]) +(define_code_attr return_str [(return "") (simple_return "simple_")]) +(define_code_attr return_simple_p [(return "false") (simple_return "true")]) +(define_code_attr return_cond [(return " && USE_RETURN_INSN (FALSE)") + (simple_return " && use_simple_return_p ()")]) + Index: gcc/config/arm/thumb2.md =================================================================== --- gcc.orig/config/arm/thumb2.md +++ gcc/config/arm/thumb2.md @@ -635,16 +635,15 @@ ;; Note: this is not predicable, to avoid issues with linker-generated ;; interworking stubs. -(define_insn "*thumb2_return" - [(return)] - "TARGET_THUMB2 && USE_RETURN_INSN (FALSE)" - "* - { - return output_return_instruction (const_true_rtx, TRUE, FALSE); - }" +(define_insn "*thumb2_return" + [(returns)] + "TARGET_THUMB2" +{ + return output_return_instruction (const_true_rtx, true, false, + ); +} [(set_attr "type" "load1") - (set_attr "length" "12")] -) + (set_attr "length" "12")]) (define_insn_and_split "thumb2_eh_return" [(unspec_volatile [(match_operand:SI 0 "s_register_operand" "r")] Index: gcc/config/i386/i386.c =================================================================== --- gcc.orig/config/i386/i386.c +++ gcc/config/i386/i386.c @@ -11230,13 +11230,13 @@ ix86_expand_epilogue (int style) pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx, popc, -1, true); - emit_jump_insn (gen_return_indirect_internal (ecx)); + emit_jump_insn (gen_simple_return_indirect_internal (ecx)); } else - emit_jump_insn (gen_return_pop_internal (popc)); + emit_jump_insn (gen_simple_return_pop_internal (popc)); } else - emit_jump_insn (gen_return_internal ()); + emit_jump_insn (gen_simple_return_internal ()); /* Restore the state back to the state from the prologue, so that it's correct for the next epilogue. */ @@ -30176,7 +30176,7 @@ ix86_pad_returns (void) rtx prev; bool replace = false; - if (!JUMP_P (ret) || GET_CODE (PATTERN (ret)) != RETURN + if (!JUMP_P (ret) || !ANY_RETURN_P (PATTERN (ret)) || optimize_bb_for_size_p (bb)) continue; for (prev = PREV_INSN (ret); prev; prev = PREV_INSN (prev)) @@ -30206,7 +30206,10 @@ ix86_pad_returns (void) } if (replace) { - emit_jump_insn_before (gen_return_internal_long (), ret); + if (PATTERN (ret) == ret_rtx) + emit_jump_insn_before (gen_return_internal_long (), ret); + else + emit_jump_insn_before (gen_simple_return_internal_long (), ret); delete_insn (ret); } } @@ -30227,7 +30230,7 @@ ix86_count_insn_bb (basic_block bb) { /* Only happen in exit blocks. */ if (JUMP_P (insn) - && GET_CODE (PATTERN (insn)) == RETURN) + && ANY_RETURN_P (PATTERN (insn))) break; if (NONDEBUG_INSN_P (insn) @@ -30300,7 +30303,7 @@ ix86_pad_short_function (void) FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) { rtx ret = BB_END (e->src); - if (JUMP_P (ret) && GET_CODE (PATTERN (ret)) == RETURN) + if (JUMP_P (ret) && ANY_RETURN_P (PATTERN (ret))) { int insn_count = ix86_count_insn (e->src); Index: gcc/config/i386/i386.md =================================================================== --- gcc.orig/config/i386/i386.md +++ gcc/config/i386/i386.md @@ -11670,24 +11670,29 @@ "" [(set_attr "length" "0")]) +(define_code_iterator returns [return simple_return]) +(define_code_attr return_str [(return "") (simple_return "simple_")]) +(define_code_attr return_cond [(return "ix86_can_use_return_insn_p ()") + (simple_return "")]) + ;; Insn emitted into the body of a function to return from a function. ;; This is only done if the function's epilogue is known to be simple. ;; See comments for ix86_can_use_return_insn_p in i386.c. -(define_expand "return" - [(return)] - "ix86_can_use_return_insn_p ()" +(define_expand "return" + [(returns)] + "" { if (crtl->args.pops_args) { rtx popc = GEN_INT (crtl->args.pops_args); - emit_jump_insn (gen_return_pop_internal (popc)); + emit_jump_insn (gen_return_pop_internal (popc)); DONE; } }) -(define_insn "return_internal" - [(return)] +(define_insn "return_internal" + [(returns)] "reload_completed" "ret" [(set_attr "length" "1") @@ -11698,8 +11703,8 @@ ;; Used by x86_machine_dependent_reorg to avoid penalty on single byte RET ;; instruction Athlon and K8 have. -(define_insn "return_internal_long" - [(return) +(define_insn "return_internal_long" + [(returns) (unspec [(const_int 0)] UNSPEC_REP)] "reload_completed" "rep\;ret" @@ -11709,8 +11714,8 @@ (set_attr "prefix_rep" "1") (set_attr "modrm" "0")]) -(define_insn "return_pop_internal" - [(return) +(define_insn "return_pop_internal" + [(returns) (use (match_operand:SI 0 "const_int_operand" ""))] "reload_completed" "ret\t%0" @@ -11719,8 +11724,8 @@ (set_attr "length_immediate" "2") (set_attr "modrm" "0")]) -(define_insn "return_indirect_internal" - [(return) +(define_insn "return_indirect_internal" + [(returns) (use (match_operand:SI 0 "register_operand" "r"))] "reload_completed" "jmp\t%A0" Index: gcc/config/mips/mips.c =================================================================== --- gcc.orig/config/mips/mips.c +++ gcc/config/mips/mips.c @@ -10543,7 +10543,8 @@ mips_expand_epilogue (bool sibcall_p) regno = GP_REG_FIRST + 7; else regno = RETURN_ADDR_REGNUM; - emit_jump_insn (gen_return_internal (gen_rtx_REG (Pmode, regno))); + emit_jump_insn (gen_simple_return_internal (gen_rtx_REG (Pmode, + regno))); } } Index: gcc/config/mips/mips.md =================================================================== --- gcc.orig/config/mips/mips.md +++ gcc/config/mips/mips.md @@ -5716,6 +5716,18 @@ [(set_attr "type" "jump") (set_attr "mode" "none")]) +(define_expand "simple_return" + [(simple_return)] + "!mips_can_use_return_insn ()" + { mips_expand_before_return (); }) + +(define_insn "*simple_return" + [(simple_return)] + "!mips_can_use_return_insn ()" + "%*j\t$31%/" + [(set_attr "type" "jump") + (set_attr "mode" "none")]) + ;; Normal return. (define_insn "return_internal" @@ -5726,6 +5738,14 @@ [(set_attr "type" "jump") (set_attr "mode" "none")]) +(define_insn "simple_return_internal" + [(simple_return) + (use (match_operand 0 "pmode_register_operand" ""))] + "" + "%*j\t%0%/" + [(set_attr "type" "jump") + (set_attr "mode" "none")]) + ;; Exception return. (define_insn "mips_eret" [(return) Index: gcc/config/sh/sh.c =================================================================== --- gcc.orig/config/sh/sh.c +++ gcc/config/sh/sh.c @@ -5455,7 +5455,8 @@ barrier_align (rtx barrier_or_label) } if (prev && JUMP_P (prev) - && JUMP_LABEL (prev)) + && JUMP_LABEL (prev) + && !ANY_RETURN_P (JUMP_LABEL (prev))) { rtx x; if (jump_to_next @@ -6154,7 +6155,7 @@ split_branches (rtx first) JUMP_LABEL (insn) = far_label; LABEL_NUSES (far_label)++; } - redirect_jump (insn, NULL_RTX, 1); + redirect_jump (insn, ret_rtx, 1); far_label = 0; } } Index: gcc/config/sparc/sparc.c =================================================================== --- gcc.orig/config/sparc/sparc.c +++ gcc/config/sparc/sparc.c @@ -6105,7 +6105,7 @@ sparc_struct_value_rtx (tree fndecl, int /* We must check and adjust the return address, as it is optional as to whether the return object is really provided. */ - rtx ret_rtx = gen_rtx_REG (Pmode, 31); + rtx ret_reg = gen_rtx_REG (Pmode, 31); rtx scratch = gen_reg_rtx (SImode); rtx endlab = gen_label_rtx (); @@ -6122,12 +6122,12 @@ sparc_struct_value_rtx (tree fndecl, int it's an unimp instruction (the most significant 10 bits will be zero). */ emit_move_insn (scratch, gen_rtx_MEM (SImode, - plus_constant (ret_rtx, 8))); + plus_constant (ret_reg, 8))); /* Assume the size is valid and pre-adjust */ - emit_insn (gen_add3_insn (ret_rtx, ret_rtx, GEN_INT (4))); + emit_insn (gen_add3_insn (ret_reg, ret_reg, GEN_INT (4))); emit_cmp_and_jump_insns (scratch, size_rtx, EQ, const0_rtx, SImode, 0, endlab); - emit_insn (gen_sub3_insn (ret_rtx, ret_rtx, GEN_INT (4))); + emit_insn (gen_sub3_insn (ret_reg, ret_reg, GEN_INT (4))); /* Write the address of the memory pointed to by temp_val into the memory pointed to by mem */ emit_move_insn (mem, XEXP (temp_val, 0)); Index: gcc/df-scan.c =================================================================== --- gcc.orig/df-scan.c +++ gcc/df-scan.c @@ -3181,6 +3181,7 @@ df_uses_record (struct df_collection_rec } case RETURN: + case SIMPLE_RETURN: break; case ASM_OPERANDS: Index: gcc/doc/md.texi =================================================================== --- gcc.orig/doc/md.texi +++ gcc/doc/md.texi @@ -4947,7 +4947,19 @@ RTL generation phase. In this case it i multiple instructions are usually needed to return from a function, but some class of functions only requires one instruction to implement a return. Normally, the applicable functions are those which do not need -to save any registers or allocate stack space. +to save any registers or allocate stack space, although some targets +have instructions that can perform both the epilogue and function return +in one instruction. + +@cindex @code{simple_return} instruction pattern +@item @samp{simple_return} +Subroutine return instruction. This instruction pattern name should be +defined only if a single instruction can do all the work of returning +from a function on a path where no epilogue is required. This pattern +is very similar to the @code{return} instruction pattern, but it is emitted +only by the shrink-wrapping optimization on paths where the function +prologue has not been executed, and a function return should occur without +any of the effects of the epilogue. @findex reload_completed @findex leaf_function_p Index: gcc/doc/rtl.texi =================================================================== --- gcc.orig/doc/rtl.texi +++ gcc/doc/rtl.texi @@ -2895,6 +2895,13 @@ placed in @code{pc} to return to the cal Note that an insn pattern of @code{(return)} is logically equivalent to @code{(set (pc) (return))}, but the latter form is never used. +@findex simple_return +@item (simple_return) +Like @code{(return)}, but truly represents only a function return, while +@code{(return)} may represent an insn that also performs other functions +of the function epilogue. Like @code{(return)}, this may also occur in +conditional jumps. + @findex call @item (call @var{function} @var{nargs}) Represents a function call. @var{function} is a @code{mem} expression @@ -3024,7 +3031,7 @@ Represents several side effects performe brackets stand for a vector; the operand of @code{parallel} is a vector of expressions. @var{x0}, @var{x1} and so on are individual side effect expressions---expressions of code @code{set}, @code{call}, -@code{return}, @code{clobber} or @code{use}. +@code{return}, @code{simple_return}, @code{clobber} or @code{use}. ``In parallel'' means that first all the values used in the individual side-effects are computed, and second all the actual side-effects are @@ -3663,14 +3670,16 @@ and @code{call_insn} insns: @table @code @findex PATTERN @item PATTERN (@var{i}) -An expression for the side effect performed by this insn. This must be -one of the following codes: @code{set}, @code{call}, @code{use}, -@code{clobber}, @code{return}, @code{asm_input}, @code{asm_output}, -@code{addr_vec}, @code{addr_diff_vec}, @code{trap_if}, @code{unspec}, -@code{unspec_volatile}, @code{parallel}, @code{cond_exec}, or @code{sequence}. If it is a @code{parallel}, -each element of the @code{parallel} must be one these codes, except that -@code{parallel} expressions cannot be nested and @code{addr_vec} and -@code{addr_diff_vec} are not permitted inside a @code{parallel} expression. +An expression for the side effect performed by this insn. This must +be one of the following codes: @code{set}, @code{call}, @code{use}, +@code{clobber}, @code{return}, @code{simple_return}, @code{asm_input}, +@code{asm_output}, @code{addr_vec}, @code{addr_diff_vec}, +@code{trap_if}, @code{unspec}, @code{unspec_volatile}, +@code{parallel}, @code{cond_exec}, or @code{sequence}. If it is a +@code{parallel}, each element of the @code{parallel} must be one these +codes, except that @code{parallel} expressions cannot be nested and +@code{addr_vec} and @code{addr_diff_vec} are not permitted inside a +@code{parallel} expression. @findex INSN_CODE @item INSN_CODE (@var{i}) Index: gcc/doc/tm.texi =================================================================== --- gcc.orig/doc/tm.texi +++ gcc/doc/tm.texi @@ -3210,6 +3210,12 @@ Define this if the return address of a p from the frame pointer of the previous stack frame. @end defmac +@defmac RETURN_ADDR_REGNUM +If defined, a C expression whose value is the register number of the return +address for the current function. Targets that pass the return address on +the stack should not define this macro. +@end defmac + @defmac INCOMING_RETURN_ADDR_RTX A C expression whose value is RTL representing the location of the incoming return address at the beginning of any function, before the Index: gcc/doc/tm.texi.in =================================================================== --- gcc.orig/doc/tm.texi.in +++ gcc/doc/tm.texi.in @@ -3198,6 +3198,12 @@ Define this if the return address of a p from the frame pointer of the previous stack frame. @end defmac +@defmac RETURN_ADDR_REGNUM +If defined, a C expression whose value is the register number of the return +address for the current function. Targets that pass the return address on +the stack should not define this macro. +@end defmac + @defmac INCOMING_RETURN_ADDR_RTX A C expression whose value is RTL representing the location of the incoming return address at the beginning of any function, before the Index: gcc/dwarf2out.c =================================================================== --- gcc.orig/dwarf2out.c +++ gcc/dwarf2out.c @@ -1490,7 +1490,7 @@ compute_barrier_args_size_1 (rtx insn, H { rtx dest = JUMP_LABEL (insn); - if (dest) + if (dest && !ANY_RETURN_P (dest)) { if (barrier_args_size [INSN_UID (dest)] < 0) { Index: gcc/emit-rtl.c =================================================================== --- gcc.orig/emit-rtl.c +++ gcc/emit-rtl.c @@ -2448,6 +2448,7 @@ verify_rtx_sharing (rtx orig, rtx insn) case PC: case CC0: case RETURN: + case SIMPLE_RETURN: case SCRATCH: return; /* SCRATCH must be shared because they represent distinct values. */ @@ -3252,14 +3253,17 @@ prev_label (rtx insn) return insn; } -/* Return the last label to mark the same position as LABEL. Return null - if LABEL itself is null. */ +/* Return the last label to mark the same position as LABEL. Return LABEL + itself if it is null or any return rtx. */ rtx skip_consecutive_labels (rtx label) { rtx insn; + if (label && ANY_RETURN_P (label)) + return label; + for (insn = label; insn != 0 && !INSN_P (insn); insn = NEXT_INSN (insn)) if (LABEL_P (insn)) label = insn; @@ -5145,7 +5149,7 @@ classify_insn (rtx x) return CODE_LABEL; if (GET_CODE (x) == CALL) return CALL_INSN; - if (GET_CODE (x) == RETURN) + if (GET_CODE (x) == RETURN || GET_CODE (x) == SIMPLE_RETURN) return JUMP_INSN; if (GET_CODE (x) == SET) { @@ -5654,6 +5658,7 @@ init_emit_regs (void) /* Assign register numbers to the globally defined register rtx. */ pc_rtx = gen_rtx_fmt_ (PC, VOIDmode); ret_rtx = gen_rtx_fmt_ (RETURN, VOIDmode); + simple_return_rtx = gen_rtx_fmt_ (SIMPLE_RETURN, VOIDmode); cc0_rtx = gen_rtx_fmt_ (CC0, VOIDmode); stack_pointer_rtx = gen_raw_REG (Pmode, STACK_POINTER_REGNUM); frame_pointer_rtx = gen_raw_REG (Pmode, FRAME_POINTER_REGNUM); Index: gcc/final.c =================================================================== --- gcc.orig/final.c +++ gcc/final.c @@ -2425,7 +2425,8 @@ final_scan_insn (rtx insn, FILE *file, i delete_insn (insn); break; } - else if (GET_CODE (SET_SRC (body)) == RETURN) + else if (GET_CODE (SET_SRC (body)) == RETURN + || GET_CODE (SET_SRC (body)) == SIMPLE_RETURN) /* Replace (set (pc) (return)) with (return). */ PATTERN (insn) = body = SET_SRC (body); Index: gcc/function.c =================================================================== --- gcc.orig/function.c +++ gcc/function.c @@ -146,9 +146,6 @@ extern tree debug_find_var_in_block_tree can always export `prologue_epilogue_contains'. */ static void record_insns (rtx, rtx, htab_t *) ATTRIBUTE_UNUSED; static bool contains (const_rtx, htab_t); -#ifdef HAVE_return -static void emit_return_into_block (basic_block); -#endif static void prepare_function_start (void); static void do_clobber_return_reg (rtx, void *); static void do_use_return_reg (rtx, void *); @@ -5262,42 +5259,181 @@ prologue_epilogue_contains (const_rtx in return 0; } +#ifdef HAVE_simple_return +/* A subroutine of requires_stack_frame_p, called via for_each_rtx. + If any change is made, set CHANGED + to true. */ + +static int +frame_required_for_rtx (rtx *loc, void *data ATTRIBUTE_UNUSED) +{ + rtx x = *loc; + if (x == stack_pointer_rtx || x == hard_frame_pointer_rtx + || x == arg_pointer_rtx || x == pic_offset_table_rtx +#ifdef RETURN_ADDR_REGNUM + || (REG_P (x) && REGNO (x) == RETURN_ADDR_REGNUM) +#endif + ) + return 1; + return 0; +} + +/* Return true if INSN requires the stack frame to be set up. */ +static bool +requires_stack_frame_p (rtx insn) +{ + HARD_REG_SET hardregs; + unsigned regno; + + if (!INSN_P (insn) || DEBUG_INSN_P (insn)) + return false; + if (CALL_P (insn)) + return !SIBLING_CALL_P (insn); + if (for_each_rtx (&PATTERN (insn), frame_required_for_rtx, NULL)) + return true; + CLEAR_HARD_REG_SET (hardregs); + note_stores (PATTERN (insn), record_hard_reg_sets, &hardregs); + AND_COMPL_HARD_REG_SET (hardregs, call_used_reg_set); + for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) + if (TEST_HARD_REG_BIT (hardregs, regno) + && df_regs_ever_live_p (regno)) + return true; + return false; +} +#endif + #ifdef HAVE_return -/* Insert gen_return at the end of block BB. This also means updating - block_for_insn appropriately. */ + +static rtx +gen_return_pattern (bool simple_p) +{ +#ifdef HAVE_simple_return + return simple_p ? gen_simple_return () : gen_return (); +#else + gcc_assert (!simple_p); + return gen_return (); +#endif +} + +/* Insert an appropriate return pattern at the end of block BB. This + also means updating block_for_insn appropriately. */ static void -emit_return_into_block (basic_block bb) +emit_return_into_block (bool simple_p, basic_block bb) { - emit_jump_insn_after (gen_return (), BB_END (bb)); + rtx jump; + jump = emit_jump_insn_after (gen_return_pattern (simple_p), BB_END (bb)); + JUMP_LABEL (jump) = simple_p ? simple_return_rtx : ret_rtx; } -#endif /* HAVE_return */ +#endif /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where - the epilogue begins. Update the basic block information when possible. */ + the epilogue begins. Update the basic block information when possible. + + Notes on epilogue placement: + There are several kinds of edges to the exit block: + * a single fallthru edge from LAST_BB + * possibly, edges from blocks containing sibcalls + * possibly, fake edges from infinite loops + + The epilogue is always emitted on the fallthru edge from the last basic + block in the function, LAST_BB, into the exit block. + + If LAST_BB is empty except for a label, it is the target of every + other basic block in the function that ends in a return. If a + target has a return or simple_return pattern (possibly with + conditional variants), these basic blocks can be changed so that a + return insn is emitted into them, and their target is adjusted to + the real exit block. + + Notes on shrink wrapping: We implement a fairly conservative + version of shrink-wrapping rather than the textbook one. We only + generate a single prologue and a single epilogue. This is + sufficient to catch a number of interesting cases involving early + exits. + + First, we identify the blocks that require the prologue to occur before + them. These are the ones that modify a call-saved register, or reference + any of the stack or frame pointer registers. To simplify things, we then + mark everything reachable from these blocks as also requiring a prologue. + This takes care of loops automatically, and avoids the need to examine + whether MEMs reference the frame, since it is sufficient to check for + occurrences of the stack or frame pointer. + + We then compute the set of blocks for which the need for a prologue + is anticipatable (borrowing terminology from the shrink-wrapping + description in Muchnick's book). These are the blocks which either + require a prologue themselves, or those that have only successors + where the prologue is anticipatable. The prologue needs to be + inserted on all edges from BB1->BB2 where BB2 is in ANTIC and BB1 + is not. For the moment, we ensure that only one such edge exists. + + The epilogue is placed as described above, but we make a + distinction between inserting return and simple_return patterns + when modifying other blocks that end in a return. Blocks that end + in a sibcall omit the sibcall_epilogue if the block is not in + ANTIC. */ static void thread_prologue_and_epilogue_insns (void) { bool inserted; + basic_block last_bb; + bool last_bb_active; +#ifdef HAVE_simple_return + bool unconverted_simple_returns = false; + basic_block simple_return_block = NULL; +#endif + rtx returnjump ATTRIBUTE_UNUSED; rtx seq ATTRIBUTE_UNUSED, epilogue_end ATTRIBUTE_UNUSED; - edge entry_edge ATTRIBUTE_UNUSED; + rtx prologue_seq ATTRIBUTE_UNUSED, split_prologue_seq ATTRIBUTE_UNUSED; + edge entry_edge, orig_entry_edge, exit_fallthru_edge; edge e; edge_iterator ei; + bitmap_head bb_flags; + + df_analyze (); rtl_profile_for_bb (ENTRY_BLOCK_PTR); inserted = false; seq = NULL_RTX; + prologue_seq = NULL_RTX; epilogue_end = NULL_RTX; + returnjump = NULL_RTX; /* Can't deal with multiple successors of the entry block at the moment. Function should always have at least one entry point. */ gcc_assert (single_succ_p (ENTRY_BLOCK_PTR)); entry_edge = single_succ_edge (ENTRY_BLOCK_PTR); + orig_entry_edge = entry_edge; + exit_fallthru_edge = find_fallthru_edge (EXIT_BLOCK_PTR->preds); + if (exit_fallthru_edge != NULL) + { + rtx label; + + last_bb = exit_fallthru_edge->src; + /* Test whether there are active instructions in the last block. */ + label = BB_END (last_bb); + while (label && !LABEL_P (label)) + { + if (active_insn_p (label)) + break; + label = PREV_INSN (label); + } + + last_bb_active = BB_HEAD (last_bb) != label || !LABEL_P (label); + } + else + { + last_bb = NULL; + last_bb_active = false; + } + + split_prologue_seq = NULL_RTX; if (flag_split_stack && (lookup_attribute ("no_split_stack", DECL_ATTRIBUTES (cfun->decl)) == NULL)) @@ -5309,21 +5445,15 @@ thread_prologue_and_epilogue_insns (void start_sequence (); emit_insn (gen_split_stack_prologue ()); - seq = get_insns (); + split_prologue_seq = get_insns (); end_sequence (); - record_insns (seq, NULL, &prologue_insn_hash); - set_insn_locators (seq, prologue_locator); - - /* This relies on the fact that committing the edge insertion - will look for basic blocks within the inserted instructions, - which in turn relies on the fact that we are not in CFG - layout mode here. */ - insert_insn_on_edge (seq, entry_edge); - inserted = true; + record_insns (split_prologue_seq, NULL, &prologue_insn_hash); + set_insn_locators (split_prologue_seq, prologue_locator); #endif } + prologue_seq = NULL_RTX; #ifdef HAVE_prologue if (HAVE_prologue) { @@ -5346,15 +5476,182 @@ thread_prologue_and_epilogue_insns (void if (!targetm.profile_before_prologue () && crtl->profile) emit_insn (gen_blockage ()); - seq = get_insns (); + prologue_seq = get_insns (); end_sequence (); - set_insn_locators (seq, prologue_locator); + set_insn_locators (prologue_seq, prologue_locator); + } +#endif - insert_insn_on_edge (seq, entry_edge); - inserted = true; + bitmap_initialize (&bb_flags, &bitmap_default_obstack); + +#ifdef HAVE_simple_return + /* Try to perform a kind of shrink-wrapping, making sure the + prologue/epilogue is emitted only around those parts of the + function that require it. */ + + if (flag_shrink_wrap && HAVE_simple_return && !flag_non_call_exceptions + && HAVE_prologue && !crtl->calls_eh_return) + { + HARD_REG_SET prologue_clobbered, live_on_edge; + rtx p_insn; + + VEC(basic_block, heap) *vec; + basic_block bb; + bitmap_head bb_antic_flags; + bitmap_head bb_on_list; + + bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); + bitmap_initialize (&bb_on_list, &bitmap_default_obstack); + + vec = VEC_alloc (basic_block, heap, n_basic_blocks); + + FOR_EACH_BB (bb) + { + rtx insn; + FOR_BB_INSNS (bb, insn) + { + if (requires_stack_frame_p (insn)) + { + bitmap_set_bit (&bb_flags, bb->index); + VEC_quick_push (basic_block, vec, bb); + break; + } + } + } + + /* For every basic block that needs a prologue, mark all blocks + reachable from it, so as to ensure they are also seen as + requiring a prologue. */ + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + edge e; + edge_iterator ei; + FOR_EACH_EDGE (e, ei, tmp_bb->succs) + { + if (e->dest == EXIT_BLOCK_PTR + || bitmap_bit_p (&bb_flags, e->dest->index)) + continue; + bitmap_set_bit (&bb_flags, e->dest->index); + VEC_quick_push (basic_block, vec, e->dest); + } + } + /* If the last basic block contains only a label, we'll be able + to convert jumps to it to (potentially conditional) return + insns later. This means we don't necessarily need a prologue + for paths reaching it. */ + if (last_bb) + { + if (!last_bb_active) + bitmap_clear_bit (&bb_flags, last_bb->index); + else if (!bitmap_bit_p (&bb_flags, last_bb->index)) + goto fail_shrinkwrap; + } + + /* Now walk backwards from every block that is marked as needing + a prologue to compute the bb_antic_flags bitmap. */ + bitmap_copy (&bb_antic_flags, &bb_flags); + FOR_EACH_BB (bb) + { + edge e; + edge_iterator ei; + if (!bitmap_bit_p (&bb_flags, bb->index)) + continue; + FOR_EACH_EDGE (e, ei, bb->preds) + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) + { + VEC_quick_push (basic_block, vec, e->src); + bitmap_set_bit (&bb_on_list, e->src->index); + } + } + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + edge e; + edge_iterator ei; + bool all_set = true; + + bitmap_clear_bit (&bb_on_list, tmp_bb->index); + FOR_EACH_EDGE (e, ei, tmp_bb->succs) + { + if (!bitmap_bit_p (&bb_antic_flags, e->dest->index)) + { + all_set = false; + break; + } + } + if (all_set) + { + bitmap_set_bit (&bb_antic_flags, tmp_bb->index); + FOR_EACH_EDGE (e, ei, tmp_bb->preds) + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) + { + VEC_quick_push (basic_block, vec, e->src); + bitmap_set_bit (&bb_on_list, e->src->index); + } + } + } + /* Find exactly one edge that leads to a block in ANTIC from + a block that isn't. */ + if (!bitmap_bit_p (&bb_antic_flags, entry_edge->dest->index)) + FOR_EACH_BB (bb) + { + if (!bitmap_bit_p (&bb_antic_flags, bb->index)) + continue; + FOR_EACH_EDGE (e, ei, bb->preds) + if (!bitmap_bit_p (&bb_antic_flags, e->src->index)) + { + if (entry_edge != orig_entry_edge) + { + entry_edge = orig_entry_edge; + goto fail_shrinkwrap; + } + entry_edge = e; + } + } + + /* Test whether the prologue is known to clobber any register + (other than FP or SP) which are live on the edge. */ + CLEAR_HARD_REG_SET (prologue_clobbered); + for (p_insn = prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn)) + if (NONDEBUG_INSN_P (p_insn)) + note_stores (PATTERN (p_insn), record_hard_reg_sets, + &prologue_clobbered); + for (p_insn = split_prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn)) + if (NONDEBUG_INSN_P (p_insn)) + note_stores (PATTERN (p_insn), record_hard_reg_sets, + &prologue_clobbered); + CLEAR_HARD_REG_BIT (prologue_clobbered, STACK_POINTER_REGNUM); + if (frame_pointer_needed) + CLEAR_HARD_REG_BIT (prologue_clobbered, HARD_FRAME_POINTER_REGNUM); + CLEAR_HARD_REG_SET (live_on_edge); + reg_set_to_hard_reg_set (&live_on_edge, + df_get_live_in (entry_edge->dest)); + if (hard_reg_set_intersect_p (live_on_edge, prologue_clobbered)) + entry_edge = orig_entry_edge; + + fail_shrinkwrap: + bitmap_clear (&bb_antic_flags); + bitmap_clear (&bb_on_list); + VEC_free (basic_block, heap, vec); } #endif + if (split_prologue_seq != NULL_RTX) + { + /* This relies on the fact that committing the edge insertion + will look for basic blocks within the inserted instructions, + which in turn relies on the fact that we are not in CFG + layout mode here. */ + insert_insn_on_edge (split_prologue_seq, entry_edge); + inserted = true; + } + if (prologue_seq != NULL_RTX) + { + insert_insn_on_edge (prologue_seq, entry_edge); + inserted = true; + } + /* If the exit block has no non-fake predecessors, we don't need an epilogue. */ FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR->preds) @@ -5364,98 +5661,130 @@ thread_prologue_and_epilogue_insns (void goto epilogue_done; rtl_profile_for_bb (EXIT_BLOCK_PTR); + #ifdef HAVE_return - if (optimize && HAVE_return) + /* If we're allowed to generate a simple return instruction, then by + definition we don't need a full epilogue. If the last basic + block before the exit block does not contain active instructions, + examine its predecessors and try to emit (conditional) return + instructions. */ + if (optimize && !last_bb_active + && (HAVE_return || entry_edge != orig_entry_edge)) { - /* If we're allowed to generate a simple return instruction, - then by definition we don't need a full epilogue. Examine - the block that falls through to EXIT. If it does not - contain any code, examine its predecessors and try to - emit (conditional) return instructions. */ - - basic_block last; + edge_iterator ei2; + int i; + basic_block bb; rtx label; + VEC(basic_block,heap) *src_bbs; - e = find_fallthru_edge (EXIT_BLOCK_PTR->preds); - if (e == NULL) + if (exit_fallthru_edge == NULL) goto epilogue_done; - last = e->src; + label = BB_HEAD (last_bb); - /* Verify that there are no active instructions in the last block. */ - label = BB_END (last); - while (label && !LABEL_P (label)) - { - if (active_insn_p (label)) - break; - label = PREV_INSN (label); - } + src_bbs = VEC_alloc (basic_block, heap, EDGE_COUNT (last_bb->preds)); + FOR_EACH_EDGE (e, ei2, last_bb->preds) + if (e->src != ENTRY_BLOCK_PTR) + VEC_quick_push (basic_block, src_bbs, e->src); - if (BB_HEAD (last) == label && LABEL_P (label)) + FOR_EACH_VEC_ELT (basic_block, src_bbs, i, bb) { - edge_iterator ei2; + bool simple_p; + rtx jump; + e = find_edge (bb, last_bb); - for (ei2 = ei_start (last->preds); (e = ei_safe_edge (ei2)); ) - { - basic_block bb = e->src; - rtx jump; + jump = BB_END (bb); - if (bb == ENTRY_BLOCK_PTR) - { - ei_next (&ei2); - continue; - } +#ifdef HAVE_simple_return + simple_p = (entry_edge != orig_entry_edge + ? !bitmap_bit_p (&bb_flags, bb->index) : false); +#else + simple_p = false; +#endif - jump = BB_END (bb); - if (!JUMP_P (jump) || JUMP_LABEL (jump) != label) - { - ei_next (&ei2); - continue; - } + if (!simple_p + && (!HAVE_return || !JUMP_P (jump) + || JUMP_LABEL (jump) != label)) + continue; - /* If we have an unconditional jump, we can replace that - with a simple return instruction. */ - if (simplejump_p (jump)) - { - emit_return_into_block (bb); - delete_insn (jump); - } + /* If we have an unconditional jump, we can replace that + with a simple return instruction. */ + if (!JUMP_P (jump)) + { + emit_barrier_after (BB_END (bb)); + emit_return_into_block (simple_p, bb); + } + else if (simplejump_p (jump)) + { + emit_return_into_block (simple_p, bb); + delete_insn (jump); + } + else if (condjump_p (jump) && JUMP_LABEL (jump) != label) + { + basic_block new_bb; + edge new_e; - /* If we have a conditional jump, we can try to replace - that with a conditional return instruction. */ - else if (condjump_p (jump)) - { - if (! redirect_jump (jump, 0, 0)) - { - ei_next (&ei2); - continue; - } + gcc_assert (simple_p); + new_bb = split_edge (e); + emit_barrier_after (BB_END (new_bb)); + emit_return_into_block (simple_p, new_bb); +#ifdef HAVE_simple_return + simple_return_block = new_bb; +#endif + new_e = single_succ_edge (new_bb); + redirect_edge_succ (new_e, EXIT_BLOCK_PTR); - /* If this block has only one successor, it both jumps - and falls through to the fallthru block, so we can't - delete the edge. */ - if (single_succ_p (bb)) - { - ei_next (&ei2); - continue; - } - } + continue; + } + /* If we have a conditional jump branching to the last + block, we can try to replace that with a conditional + return instruction. */ + else if (condjump_p (jump)) + { + rtx dest; + if (simple_p) + dest = simple_return_rtx; else + dest = ret_rtx; + if (! redirect_jump (jump, dest, 0)) { - ei_next (&ei2); +#ifdef HAVE_simple_return + if (simple_p) + unconverted_simple_returns = true; +#endif continue; } - /* Fix up the CFG for the successful change we just made. */ - redirect_edge_succ (e, EXIT_BLOCK_PTR); + /* If this block has only one successor, it both jumps + and falls through to the fallthru block, so we can't + delete the edge. */ + if (single_succ_p (bb)) + continue; } + else + { +#ifdef HAVE_simple_return + if (simple_p) + unconverted_simple_returns = true; +#endif + continue; + } + + /* Fix up the CFG for the successful change we just made. */ + redirect_edge_succ (e, EXIT_BLOCK_PTR); + } + VEC_free (basic_block, heap, src_bbs); + if (HAVE_return) + { /* Emit a return insn for the exit fallthru block. Whether this is still reachable will be determined later. */ - emit_barrier_after (BB_END (last)); - emit_return_into_block (last); - epilogue_end = BB_END (last); - single_succ_edge (last)->flags &= ~EDGE_FALLTHRU; + emit_barrier_after (BB_END (last_bb)); + emit_return_into_block (false, last_bb); + epilogue_end = BB_END (last_bb); + if (JUMP_P (epilogue_end)) + JUMP_LABEL (epilogue_end) = ret_rtx; + single_succ_edge (last_bb)->flags &= ~EDGE_FALLTHRU; goto epilogue_done; } } @@ -5492,13 +5821,10 @@ thread_prologue_and_epilogue_insns (void } #endif - /* Find the edge that falls through to EXIT. Other edges may exist - due to RETURN instructions, but those don't need epilogues. - There really shouldn't be a mixture -- either all should have - been converted or none, however... */ + /* If nothing falls through into the exit block, we don't need an + epilogue. */ - e = find_fallthru_edge (EXIT_BLOCK_PTR->preds); - if (e == NULL) + if (exit_fallthru_edge == NULL) goto epilogue_done; #ifdef HAVE_epilogue @@ -5515,25 +5841,38 @@ thread_prologue_and_epilogue_insns (void set_insn_locators (seq, epilogue_locator); seq = get_insns (); + returnjump = get_last_insn (); end_sequence (); - insert_insn_on_edge (seq, e); + insert_insn_on_edge (seq, exit_fallthru_edge); inserted = true; + if (JUMP_P (returnjump)) + { + rtx pat = PATTERN (returnjump); + if (GET_CODE (pat) == PARALLEL) + pat = XVECEXP (pat, 0, 0); + if (ANY_RETURN_P (pat)) + JUMP_LABEL (returnjump) = pat; + else + JUMP_LABEL (returnjump) = ret_rtx; + } + else + returnjump = NULL_RTX; } else #endif { basic_block cur_bb; - if (! next_active_insn (BB_END (e->src))) + if (! next_active_insn (BB_END (exit_fallthru_edge->src))) goto epilogue_done; /* We have a fall-through edge to the exit block, the source is not at the end of the function, and there will be an assembler epilogue at the end of the function. We can't use force_nonfallthru here, because that would try to - use return. Inserting a jump 'by hand' is extremely messy, so + use return. Inserting a jump 'by hand' is extremely messy, so we take advantage of cfg_layout_finalize using - fixup_fallthru_exit_predecessor. */ + fixup_fallthru_exit_predecessor. */ cfg_layout_initialize (0); FOR_EACH_BB (cur_bb) if (cur_bb->index >= NUM_FIXED_BLOCKS @@ -5542,6 +5881,7 @@ thread_prologue_and_epilogue_insns (void cfg_layout_finalize (); } epilogue_done: + default_rtl_profile (); if (inserted) @@ -5558,33 +5898,93 @@ epilogue_done: } } +#ifdef HAVE_simple_return + /* If there were branches to an empty LAST_BB which we tried to + convert to conditional simple_returns, but couldn't for some + reason, create a block to hold a simple_return insn and redirect + those remaining edges. */ + if (unconverted_simple_returns) + { + edge_iterator ei2; + basic_block exit_pred = EXIT_BLOCK_PTR->prev_bb; + + gcc_assert (entry_edge != orig_entry_edge); + +#ifdef HAVE_epilogue + if (simple_return_block == NULL && returnjump != NULL_RTX + && JUMP_LABEL (returnjump) == simple_return_rtx) + { + edge e = split_block (exit_fallthru_edge->src, + PREV_INSN (returnjump)); + simple_return_block = e->dest; + } +#endif + if (simple_return_block == NULL) + { + basic_block bb; + rtx start; + + bb = create_basic_block (NULL, NULL, exit_pred); + start = emit_jump_insn_after (gen_simple_return (), + BB_END (bb)); + JUMP_LABEL (start) = simple_return_rtx; + emit_barrier_after (start); + + simple_return_block = bb; + make_edge (bb, EXIT_BLOCK_PTR, 0); + } + + restart_scan: + for (ei2 = ei_start (last_bb->preds); (e = ei_safe_edge (ei2)); ) + { + basic_block bb = e->src; + + if (bb != ENTRY_BLOCK_PTR + && !bitmap_bit_p (&bb_flags, bb->index)) + { + redirect_edge_and_branch_force (e, simple_return_block); + goto restart_scan; + } + ei_next (&ei2); + + } + } +#endif + #ifdef HAVE_sibcall_epilogue /* Emit sibling epilogues before any sibling call sites. */ for (ei = ei_start (EXIT_BLOCK_PTR->preds); (e = ei_safe_edge (ei)); ) { basic_block bb = e->src; rtx insn = BB_END (bb); + rtx ep_seq; if (!CALL_P (insn) - || ! SIBLING_CALL_P (insn)) + || ! SIBLING_CALL_P (insn) + || (entry_edge != orig_entry_edge + && !bitmap_bit_p (&bb_flags, bb->index))) { ei_next (&ei); continue; } - start_sequence (); - emit_note (NOTE_INSN_EPILOGUE_BEG); - emit_insn (gen_sibcall_epilogue ()); - seq = get_insns (); - end_sequence (); + ep_seq = gen_sibcall_epilogue (); + if (ep_seq) + { + start_sequence (); + emit_note (NOTE_INSN_EPILOGUE_BEG); + emit_insn (ep_seq); + seq = get_insns (); + end_sequence (); - /* Retain a map of the epilogue insns. Used in life analysis to - avoid getting rid of sibcall epilogue insns. Do this before we - actually emit the sequence. */ - record_insns (seq, NULL, &epilogue_insn_hash); - set_insn_locators (seq, epilogue_locator); + /* Retain a map of the epilogue insns. Used in life analysis to + avoid getting rid of sibcall epilogue insns. Do this before we + actually emit the sequence. */ + record_insns (seq, NULL, &epilogue_insn_hash); + set_insn_locators (seq, epilogue_locator); - emit_insn_before (seq, insn); + emit_insn_before (seq, insn); + } ei_next (&ei); } #endif @@ -5609,6 +6009,8 @@ epilogue_done: } #endif + bitmap_clear (&bb_flags); + /* Threading the prologue and epilogue changes the artificial refs in the entry and exit blocks. */ epilogue_completed = 1; Index: gcc/genemit.c =================================================================== --- gcc.orig/genemit.c +++ gcc/genemit.c @@ -226,6 +226,9 @@ gen_exp (rtx x, enum rtx_code subroutine case RETURN: printf ("ret_rtx"); return; + case SIMPLE_RETURN: + printf ("simple_return_rtx"); + return; case CLOBBER: if (REG_P (XEXP (x, 0))) { @@ -549,8 +552,8 @@ gen_expand (rtx expand) || (GET_CODE (next) == PARALLEL && ((GET_CODE (XVECEXP (next, 0, 0)) == SET && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC) - || GET_CODE (XVECEXP (next, 0, 0)) == RETURN)) - || GET_CODE (next) == RETURN) + || ANY_RETURN_P (XVECEXP (next, 0, 0)))) + || ANY_RETURN_P (next)) printf (" emit_jump_insn ("); else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL) || GET_CODE (next) == CALL @@ -668,7 +671,7 @@ gen_split (rtx split) || (GET_CODE (next) == PARALLEL && GET_CODE (XVECEXP (next, 0, 0)) == SET && GET_CODE (SET_DEST (XVECEXP (next, 0, 0))) == PC) - || GET_CODE (next) == RETURN) + || ANY_RETURN_P (next)) printf (" emit_jump_insn ("); else if ((GET_CODE (next) == SET && GET_CODE (SET_SRC (next)) == CALL) || GET_CODE (next) == CALL Index: gcc/haifa-sched.c =================================================================== --- gcc.orig/haifa-sched.c +++ gcc/haifa-sched.c @@ -5310,6 +5310,11 @@ check_cfg (rtx head, rtx tail) gcc_assert (/* Usual case. */ (EDGE_COUNT (bb->succs) > 1 && !BARRIER_P (NEXT_INSN (head))) + /* Special cases, see cfglayout.c: + fixup_reorder_chain. */ + || (EDGE_COUNT (bb->succs) == 1 + && (!onlyjump_p (head) + || returnjump_p (head))) /* Or jump to the next instruction. */ || (EDGE_COUNT (bb->succs) == 1 && (BB_HEAD (EDGE_I (bb->succs, 0)->dest) Index: gcc/ifcvt.c =================================================================== --- gcc.orig/ifcvt.c +++ gcc/ifcvt.c @@ -103,7 +103,7 @@ static int cond_exec_find_if_block (ce_i static int find_if_case_1 (basic_block, edge, edge); static int find_if_case_2 (basic_block, edge, edge); static int dead_or_predicable (basic_block, basic_block, basic_block, - basic_block, int); + edge, int); static void noce_emit_move_insn (rtx, rtx); static rtx block_has_only_trap (basic_block); @@ -3793,6 +3793,7 @@ find_if_case_1 (basic_block test_bb, edg basic_block then_bb = then_edge->dest; basic_block else_bb = else_edge->dest; basic_block new_bb; + rtx else_target = NULL_RTX; int then_bb_index; /* If we are partitioning hot/cold basic blocks, we don't want to @@ -3842,9 +3843,16 @@ find_if_case_1 (basic_block test_bb, edg predictable_edge_p (then_edge))))) return FALSE; + if (else_bb == EXIT_BLOCK_PTR) + { + rtx jump = BB_END (else_edge->src); + gcc_assert (JUMP_P (jump)); + else_target = JUMP_LABEL (jump); + } + /* Registers set are dead, or are predicable. */ if (! dead_or_predicable (test_bb, then_bb, else_bb, - single_succ (then_bb), 1)) + single_succ_edge (then_bb), 1)) return FALSE; /* Conversion went ok, including moving the insns and fixing up the @@ -3861,6 +3869,9 @@ find_if_case_1 (basic_block test_bb, edg redirect_edge_succ (FALLTHRU_EDGE (test_bb), else_bb); new_bb = 0; } + else if (else_bb == EXIT_BLOCK_PTR) + new_bb = force_nonfallthru_and_redirect (FALLTHRU_EDGE (test_bb), + else_bb, else_target); else new_bb = redirect_edge_and_branch_force (FALLTHRU_EDGE (test_bb), else_bb); @@ -3959,7 +3970,7 @@ find_if_case_2 (basic_block test_bb, edg return FALSE; /* Registers set are dead, or are predicable. */ - if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ->dest, 0)) + if (! dead_or_predicable (test_bb, else_bb, then_bb, else_succ, 0)) return FALSE; /* Conversion went ok, including moving the insns and fixing up the @@ -3988,12 +3999,34 @@ find_if_case_2 (basic_block test_bb, edg static int dead_or_predicable (basic_block test_bb, basic_block merge_bb, - basic_block other_bb, basic_block new_dest, int reversep) + basic_block other_bb, edge dest_edge, int reversep) { - rtx head, end, jump, earliest = NULL_RTX, old_dest, new_label = NULL_RTX; + basic_block new_dest = dest_edge->dest; + rtx head, end, jump, earliest = NULL_RTX, old_dest; bitmap merge_set = NULL; /* Number of pending changes. */ int n_validated_changes = 0; + rtx new_dest_label; + + jump = BB_END (dest_edge->src); + if (JUMP_P (jump)) + { + new_dest_label = JUMP_LABEL (jump); + if (new_dest_label == NULL_RTX) + { + new_dest_label = PATTERN (jump); + gcc_assert (ANY_RETURN_P (new_dest_label)); + } + } + else if (other_bb != new_dest) + { + if (new_dest == EXIT_BLOCK_PTR) + new_dest_label = ret_rtx; + else + new_dest_label = block_label (new_dest); + } + else + new_dest_label = NULL_RTX; jump = BB_END (test_bb); @@ -4131,10 +4164,9 @@ dead_or_predicable (basic_block test_bb, old_dest = JUMP_LABEL (jump); if (other_bb != new_dest) { - new_label = block_label (new_dest); if (reversep - ? ! invert_jump_1 (jump, new_label) - : ! redirect_jump_1 (jump, new_label)) + ? ! invert_jump_1 (jump, new_dest_label) + : ! redirect_jump_1 (jump, new_dest_label)) goto cancel; } @@ -4145,7 +4177,7 @@ dead_or_predicable (basic_block test_bb, if (other_bb != new_dest) { - redirect_jump_2 (jump, old_dest, new_label, 0, reversep); + redirect_jump_2 (jump, old_dest, new_dest_label, 0, reversep); redirect_edge_succ (BRANCH_EDGE (test_bb), new_dest); if (reversep) Index: gcc/jump.c =================================================================== --- gcc.orig/jump.c +++ gcc/jump.c @@ -29,7 +29,8 @@ along with GCC; see the file COPYING3. JUMP_LABEL internal field. With this we can detect labels that become unused because of the deletion of all the jumps that formerly used them. The JUMP_LABEL info is sometimes looked - at by later passes. + at by later passes. For return insns, it contains either a + RETURN or a SIMPLE_RETURN rtx. The subroutines redirect_jump and invert_jump are used from other passes as well. */ @@ -741,10 +742,10 @@ condjump_p (const_rtx insn) return (GET_CODE (x) == IF_THEN_ELSE && ((GET_CODE (XEXP (x, 2)) == PC && (GET_CODE (XEXP (x, 1)) == LABEL_REF - || GET_CODE (XEXP (x, 1)) == RETURN)) + || ANY_RETURN_P (XEXP (x, 1)))) || (GET_CODE (XEXP (x, 1)) == PC && (GET_CODE (XEXP (x, 2)) == LABEL_REF - || GET_CODE (XEXP (x, 2)) == RETURN)))); + || ANY_RETURN_P (XEXP (x, 2)))))); } /* Return nonzero if INSN is a (possibly) conditional jump inside a @@ -773,11 +774,11 @@ condjump_in_parallel_p (const_rtx insn) return 0; if (XEXP (SET_SRC (x), 2) == pc_rtx && (GET_CODE (XEXP (SET_SRC (x), 1)) == LABEL_REF - || GET_CODE (XEXP (SET_SRC (x), 1)) == RETURN)) + || ANY_RETURN_P (XEXP (SET_SRC (x), 1)) == RETURN)) return 1; if (XEXP (SET_SRC (x), 1) == pc_rtx && (GET_CODE (XEXP (SET_SRC (x), 2)) == LABEL_REF - || GET_CODE (XEXP (SET_SRC (x), 2)) == RETURN)) + || ANY_RETURN_P (XEXP (SET_SRC (x), 2)))) return 1; return 0; } @@ -839,8 +840,9 @@ any_condjump_p (const_rtx insn) a = GET_CODE (XEXP (SET_SRC (x), 1)); b = GET_CODE (XEXP (SET_SRC (x), 2)); - return ((b == PC && (a == LABEL_REF || a == RETURN)) - || (a == PC && (b == LABEL_REF || b == RETURN))); + return ((b == PC && (a == LABEL_REF || a == RETURN || a == SIMPLE_RETURN)) + || (a == PC + && (b == LABEL_REF || b == RETURN || b == SIMPLE_RETURN))); } /* Return the label of a conditional jump. */ @@ -877,6 +879,7 @@ returnjump_p_1 (rtx *loc, void *data ATT switch (GET_CODE (x)) { case RETURN: + case SIMPLE_RETURN: case EH_RETURN: return true; @@ -1199,7 +1202,7 @@ delete_related_insns (rtx insn) /* If deleting a jump, decrement the count of the label, and delete the label if it is now unused. */ - if (JUMP_P (insn) && JUMP_LABEL (insn)) + if (JUMP_P (insn) && JUMP_LABEL (insn) && !ANY_RETURN_P (JUMP_LABEL (insn))) { rtx lab = JUMP_LABEL (insn), lab_next; @@ -1330,6 +1333,18 @@ delete_for_peephole (rtx from, rtx to) is also an unconditional jump in that case. */ } +/* A helper function for redirect_exp_1; examines its input X and returns + either a LABEL_REF around a label, or a RETURN if X was NULL. */ +static rtx +redirect_target (rtx x) +{ + if (x == NULL_RTX) + return ret_rtx; + if (!ANY_RETURN_P (x)) + return gen_rtx_LABEL_REF (Pmode, x); + return x; +} + /* Throughout LOC, redirect OLABEL to NLABEL. Treat null OLABEL or NLABEL as a return. Accrue modifications into the change group. */ @@ -1341,37 +1356,19 @@ redirect_exp_1 (rtx *loc, rtx olabel, rt int i; const char *fmt; - if (code == LABEL_REF) - { - if (XEXP (x, 0) == olabel) - { - rtx n; - if (nlabel) - n = gen_rtx_LABEL_REF (Pmode, nlabel); - else - n = ret_rtx; - - validate_change (insn, loc, n, 1); - return; - } - } - else if (code == RETURN && olabel == 0) + if ((code == LABEL_REF && XEXP (x, 0) == olabel) + || x == olabel) { - if (nlabel) - x = gen_rtx_LABEL_REF (Pmode, nlabel); - else - x = ret_rtx; - if (loc == &PATTERN (insn)) - x = gen_rtx_SET (VOIDmode, pc_rtx, x); - validate_change (insn, loc, x, 1); + validate_change (insn, loc, redirect_target (nlabel), 1); return; } - if (code == SET && nlabel == 0 && SET_DEST (x) == pc_rtx + if (code == SET && SET_DEST (x) == pc_rtx + && ANY_RETURN_P (nlabel) && GET_CODE (SET_SRC (x)) == LABEL_REF && XEXP (SET_SRC (x), 0) == olabel) { - validate_change (insn, loc, ret_rtx, 1); + validate_change (insn, loc, nlabel, 1); return; } @@ -1408,6 +1405,7 @@ redirect_jump_1 (rtx jump, rtx nlabel) int ochanges = num_validated_changes (); rtx *loc, asmop; + gcc_assert (nlabel); asmop = extract_asm_operands (PATTERN (jump)); if (asmop) { @@ -1429,17 +1427,20 @@ redirect_jump_1 (rtx jump, rtx nlabel) jump target label is unused as a result, it and the code following it may be deleted. - If NLABEL is zero, we are to turn the jump into a (possibly conditional) - RETURN insn. + Normally, NLABEL will be a label, but it may also be a RETURN or + SIMPLE_RETURN rtx; in that case we are to turn the jump into a + (possibly conditional) return insn. The return value will be 1 if the change was made, 0 if it wasn't - (this can only occur for NLABEL == 0). */ + (this can only occur when trying to produce return insns). */ int redirect_jump (rtx jump, rtx nlabel, int delete_unused) { rtx olabel = JUMP_LABEL (jump); + gcc_assert (nlabel != NULL_RTX); + if (nlabel == olabel) return 1; @@ -1451,7 +1452,7 @@ redirect_jump (rtx jump, rtx nlabel, int } /* Fix up JUMP_LABEL and label ref counts after OLABEL has been replaced with - NLABEL in JUMP. + NEW_DEST in JUMP. If DELETE_UNUSED is positive, delete related insn to OLABEL if its ref count has dropped to zero. */ void @@ -1467,13 +1468,14 @@ redirect_jump_2 (rtx jump, rtx olabel, r about this. */ gcc_assert (delete_unused >= 0); JUMP_LABEL (jump) = nlabel; - if (nlabel) + if (nlabel && !ANY_RETURN_P (nlabel)) ++LABEL_NUSES (nlabel); /* Update labels in any REG_EQUAL note. */ if ((note = find_reg_note (jump, REG_EQUAL, NULL_RTX)) != NULL_RTX) { - if (!nlabel || (invert && !invert_exp_1 (XEXP (note, 0), jump))) + if (ANY_RETURN_P (nlabel) + || (invert && !invert_exp_1 (XEXP (note, 0), jump))) remove_note (jump, note); else { @@ -1482,7 +1484,8 @@ redirect_jump_2 (rtx jump, rtx olabel, r } } - if (olabel && --LABEL_NUSES (olabel) == 0 && delete_unused > 0 + if (olabel && !ANY_RETURN_P (olabel) + && --LABEL_NUSES (olabel) == 0 && delete_unused > 0 /* Undefined labels will remain outside the insn stream. */ && INSN_UID (olabel)) delete_related_insns (olabel); Index: gcc/print-rtl.c =================================================================== --- gcc.orig/print-rtl.c +++ gcc/print-rtl.c @@ -314,9 +314,16 @@ print_rtx (const_rtx in_rtx) } } else if (i == 8 && JUMP_P (in_rtx) && JUMP_LABEL (in_rtx) != NULL) - /* Output the JUMP_LABEL reference. */ - fprintf (outfile, "\n%s%*s -> %d", print_rtx_head, indent * 2, "", - INSN_UID (JUMP_LABEL (in_rtx))); + { + /* Output the JUMP_LABEL reference. */ + fprintf (outfile, "\n%s%*s -> ", print_rtx_head, indent * 2, ""); + if (GET_CODE (JUMP_LABEL (in_rtx)) == RETURN) + fprintf (outfile, "return"); + else if (GET_CODE (JUMP_LABEL (in_rtx)) == SIMPLE_RETURN) + fprintf (outfile, "simple_return"); + else + fprintf (outfile, "%d", INSN_UID (JUMP_LABEL (in_rtx))); + } else if (i == 0 && GET_CODE (in_rtx) == VALUE) { #ifndef GENERATOR_FILE Index: gcc/reorg.c =================================================================== --- gcc.orig/reorg.c +++ gcc/reorg.c @@ -161,8 +161,11 @@ static rtx *unfilled_firstobj; #define unfilled_slots_next \ ((rtx *) obstack_next_free (&unfilled_slots_obstack)) -/* Points to the label before the end of the function. */ -static rtx end_of_function_label; +/* Points to the label before the end of the function, or before a + return insn. */ +static rtx function_return_label; +/* Likewise for a simple_return. */ +static rtx function_simple_return_label; /* Mapping between INSN_UID's and position in the code since INSN_UID's do not always monotonically increase. */ @@ -175,7 +178,7 @@ static int stop_search_p (rtx, int); static int resource_conflicts_p (struct resources *, struct resources *); static int insn_references_resource_p (rtx, struct resources *, bool); static int insn_sets_resource_p (rtx, struct resources *, bool); -static rtx find_end_label (void); +static rtx find_end_label (rtx); static rtx emit_delay_sequence (rtx, rtx, int); static rtx add_to_delay_list (rtx, rtx); static rtx delete_from_delay_slot (rtx); @@ -220,6 +223,15 @@ static void relax_delay_slots (rtx); static void make_return_insns (rtx); #endif +/* Return true iff INSN is a simplejump, or any kind of return insn. */ + +static bool +simplejump_or_return_p (rtx insn) +{ + return (JUMP_P (insn) + && (simplejump_p (insn) || ANY_RETURN_P (PATTERN (insn)))); +} + /* Return TRUE if this insn should stop the search for insn to fill delay slots. LABELS_P indicates that labels should terminate the search. In all cases, jumps terminate the search. */ @@ -335,23 +347,29 @@ insn_sets_resource_p (rtx insn, struct r ??? There may be a problem with the current implementation. Suppose we start with a bare RETURN insn and call find_end_label. It may set - end_of_function_label just before the RETURN. Suppose the machinery + function_return_label just before the RETURN. Suppose the machinery is able to fill the delay slot of the RETURN insn afterwards. Then - end_of_function_label is no longer valid according to the property + function_return_label is no longer valid according to the property described above and find_end_label will still return it unmodified. Note that this is probably mitigated by the following observation: - once end_of_function_label is made, it is very likely the target of + once function_return_label is made, it is very likely the target of a jump, so filling the delay slot of the RETURN will be much more difficult. */ static rtx -find_end_label (void) +find_end_label (rtx kind) { rtx insn; + rtx *plabel; + + if (kind == ret_rtx) + plabel = &function_return_label; + else + plabel = &function_simple_return_label; /* If we found one previously, return it. */ - if (end_of_function_label) - return end_of_function_label; + if (*plabel) + return *plabel; /* Otherwise, see if there is a label at the end of the function. If there is, it must be that RETURN insns aren't needed, so that is our return @@ -366,44 +384,44 @@ find_end_label (void) /* When a target threads its epilogue we might already have a suitable return insn. If so put a label before it for the - end_of_function_label. */ + function_return_label. */ if (BARRIER_P (insn) && JUMP_P (PREV_INSN (insn)) - && GET_CODE (PATTERN (PREV_INSN (insn))) == RETURN) + && PATTERN (PREV_INSN (insn)) == kind) { rtx temp = PREV_INSN (PREV_INSN (insn)); - end_of_function_label = gen_label_rtx (); - LABEL_NUSES (end_of_function_label) = 0; + rtx label = gen_label_rtx (); + LABEL_NUSES (label) = 0; /* Put the label before an USE insns that may precede the RETURN insn. */ while (GET_CODE (temp) == USE) temp = PREV_INSN (temp); - emit_label_after (end_of_function_label, temp); + emit_label_after (label, temp); + *plabel = label; } else if (LABEL_P (insn)) - end_of_function_label = insn; + *plabel = insn; else { - end_of_function_label = gen_label_rtx (); - LABEL_NUSES (end_of_function_label) = 0; + rtx label = gen_label_rtx (); + LABEL_NUSES (label) = 0; /* If the basic block reorder pass moves the return insn to some other place try to locate it again and put our - end_of_function_label there. */ - while (insn && ! (JUMP_P (insn) - && (GET_CODE (PATTERN (insn)) == RETURN))) + function_return_label there. */ + while (insn && ! (JUMP_P (insn) && (PATTERN (insn) == kind))) insn = PREV_INSN (insn); if (insn) { insn = PREV_INSN (insn); - /* Put the label before an USE insns that may proceed the + /* Put the label before an USE insns that may precede the RETURN insn. */ while (GET_CODE (insn) == USE) insn = PREV_INSN (insn); - emit_label_after (end_of_function_label, insn); + emit_label_after (label, insn); } else { @@ -413,19 +431,16 @@ find_end_label (void) && ! HAVE_return #endif ) - { - /* The RETURN insn has its delay slot filled so we cannot - emit the label just before it. Since we already have - an epilogue and cannot emit a new RETURN, we cannot - emit the label at all. */ - end_of_function_label = NULL_RTX; - return end_of_function_label; - } + /* The RETURN insn has its delay slot filled so we cannot + emit the label just before it. Since we already have + an epilogue and cannot emit a new RETURN, we cannot + emit the label at all. */ + return NULL_RTX; #endif /* HAVE_epilogue */ /* Otherwise, make a new label and emit a RETURN and BARRIER, if needed. */ - emit_label (end_of_function_label); + emit_label (label); #ifdef HAVE_return /* We don't bother trying to create a return insn if the epilogue has filled delay-slots; we would have to try and @@ -437,19 +452,21 @@ find_end_label (void) /* The return we make may have delay slots too. */ rtx insn = gen_return (); insn = emit_jump_insn (insn); + JUMP_LABEL (insn) = ret_rtx; emit_barrier (); if (num_delay_slots (insn) > 0) obstack_ptr_grow (&unfilled_slots_obstack, insn); } #endif } + *plabel = label; } /* Show one additional use for this label so it won't go away until we are done. */ - ++LABEL_NUSES (end_of_function_label); + ++LABEL_NUSES (*plabel); - return end_of_function_label; + return *plabel; } /* Put INSN and LIST together in a SEQUENCE rtx of LENGTH, and replace @@ -797,10 +814,8 @@ optimize_skip (rtx insn) if ((next_trial == next_active_insn (JUMP_LABEL (insn)) && ! (next_trial == 0 && crtl->epilogue_delay_list != 0)) || (next_trial != 0 - && JUMP_P (next_trial) - && JUMP_LABEL (insn) == JUMP_LABEL (next_trial) - && (simplejump_p (next_trial) - || GET_CODE (PATTERN (next_trial)) == RETURN))) + && simplejump_or_return_p (next_trial) + && JUMP_LABEL (insn) == JUMP_LABEL (next_trial))) { if (eligible_for_annul_false (insn, 0, trial, flags)) { @@ -819,13 +834,11 @@ optimize_skip (rtx insn) branch, thread our jump to the target of that branch. Don't change this into a RETURN here, because it may not accept what we have in the delay slot. We'll fix this up later. */ - if (next_trial && JUMP_P (next_trial) - && (simplejump_p (next_trial) - || GET_CODE (PATTERN (next_trial)) == RETURN)) + if (next_trial && simplejump_or_return_p (next_trial)) { rtx target_label = JUMP_LABEL (next_trial); - if (target_label == 0) - target_label = find_end_label (); + if (ANY_RETURN_P (target_label)) + target_label = find_end_label (target_label); if (target_label) { @@ -866,7 +879,7 @@ get_jump_flags (rtx insn, rtx label) if (JUMP_P (insn) && (condjump_p (insn) || condjump_in_parallel_p (insn)) && INSN_UID (insn) <= max_uid - && label != 0 + && label != 0 && !ANY_RETURN_P (label) && INSN_UID (label) <= max_uid) flags = (uid_to_ruid[INSN_UID (label)] > uid_to_ruid[INSN_UID (insn)]) @@ -1038,7 +1051,7 @@ get_branch_condition (rtx insn, rtx targ pat = XVECEXP (pat, 0, 0); if (GET_CODE (pat) == RETURN) - return target == 0 ? const_true_rtx : 0; + return ANY_RETURN_P (target) ? const_true_rtx : 0; else if (GET_CODE (pat) != SET || SET_DEST (pat) != pc_rtx) return 0; @@ -1358,8 +1371,7 @@ steal_delay_list_from_fallthrough (rtx i /* We can't do anything if SEQ's delay insn isn't an unconditional branch. */ - if (! simplejump_p (XVECEXP (seq, 0, 0)) - && GET_CODE (PATTERN (XVECEXP (seq, 0, 0))) != RETURN) + if (! simplejump_or_return_p (XVECEXP (seq, 0, 0))) return delay_list; for (i = 1; i < XVECLEN (seq, 0); i++) @@ -2245,7 +2257,8 @@ fill_simple_delay_slots (int non_jumps_p && (!JUMP_P (insn) || ((condjump_p (insn) || condjump_in_parallel_p (insn)) && ! simplejump_p (insn) - && JUMP_LABEL (insn) != 0))) + && JUMP_LABEL (insn) != 0 + && !ANY_RETURN_P (JUMP_LABEL (insn))))) { /* Invariant: If insn is a JUMP_INSN, the insn's jump label. Otherwise, zero. */ @@ -2270,7 +2283,7 @@ fill_simple_delay_slots (int non_jumps_p target = JUMP_LABEL (insn); } - if (target == 0) + if (target == 0 || ANY_RETURN_P (target)) for (trial = next_nonnote_insn (insn); trial; trial = next_trial) { next_trial = next_nonnote_insn (trial); @@ -2349,6 +2362,7 @@ fill_simple_delay_slots (int non_jumps_p && JUMP_P (trial) && simplejump_p (trial) && (target == 0 || JUMP_LABEL (trial) == target) + && !ANY_RETURN_P (JUMP_LABEL (trial)) && (next_trial = next_active_insn (JUMP_LABEL (trial))) != 0 && ! (NONJUMP_INSN_P (next_trial) && GET_CODE (PATTERN (next_trial)) == SEQUENCE) @@ -2371,7 +2385,7 @@ fill_simple_delay_slots (int non_jumps_p if (new_label != 0) new_label = get_label_before (new_label); else - new_label = find_end_label (); + new_label = find_end_label (simple_return_rtx); if (new_label) { @@ -2503,7 +2517,8 @@ fill_simple_delay_slots (int non_jumps_p /* Follow any unconditional jump at LABEL; return the ultimate label reached by any such chain of jumps. - Return null if the chain ultimately leads to a return instruction. + Return a suitable return rtx if the chain ultimately leads to a + return instruction. If LABEL is not followed by a jump, return LABEL. If the chain loops or we can't find end, return LABEL, since that tells caller to avoid changing the insn. */ @@ -2518,6 +2533,7 @@ follow_jumps (rtx label) for (depth = 0; (depth < 10 + && !ANY_RETURN_P (value) && (insn = next_active_insn (value)) != 0 && JUMP_P (insn) && ((JUMP_LABEL (insn) != 0 && any_uncondjump_p (insn) @@ -2527,18 +2543,22 @@ follow_jumps (rtx label) && BARRIER_P (next)); depth++) { - rtx tem; + rtx this_label = JUMP_LABEL (insn); /* If we have found a cycle, make the insn jump to itself. */ - if (JUMP_LABEL (insn) == label) + if (this_label == label) return label; - tem = next_active_insn (JUMP_LABEL (insn)); - if (tem && (GET_CODE (PATTERN (tem)) == ADDR_VEC + if (!ANY_RETURN_P (this_label)) + { + rtx tem = next_active_insn (this_label); + if (tem + && (GET_CODE (PATTERN (tem)) == ADDR_VEC || GET_CODE (PATTERN (tem)) == ADDR_DIFF_VEC)) - break; + break; + } - value = JUMP_LABEL (insn); + value = this_label; } if (depth == 10) return label; @@ -2985,16 +3005,14 @@ fill_slots_from_thread (rtx insn, rtx co gcc_assert (thread_if_true); - if (new_thread && JUMP_P (new_thread) - && (simplejump_p (new_thread) - || GET_CODE (PATTERN (new_thread)) == RETURN) + if (new_thread && simplejump_or_return_p (new_thread) && redirect_with_delay_list_safe_p (insn, JUMP_LABEL (new_thread), delay_list)) new_thread = follow_jumps (JUMP_LABEL (new_thread)); - if (new_thread == 0) - label = find_end_label (); + if (ANY_RETURN_P (new_thread)) + label = find_end_label (new_thread); else if (LABEL_P (new_thread)) label = new_thread; else @@ -3340,11 +3358,12 @@ relax_delay_slots (rtx first) group of consecutive labels. */ if (JUMP_P (insn) && (condjump_p (insn) || condjump_in_parallel_p (insn)) - && (target_label = JUMP_LABEL (insn)) != 0) + && (target_label = JUMP_LABEL (insn)) != 0 + && !ANY_RETURN_P (target_label)) { target_label = skip_consecutive_labels (follow_jumps (target_label)); - if (target_label == 0) - target_label = find_end_label (); + if (ANY_RETURN_P (target_label)) + target_label = find_end_label (target_label); if (target_label && next_active_insn (target_label) == next && ! condjump_in_parallel_p (insn)) @@ -3359,9 +3378,8 @@ relax_delay_slots (rtx first) /* See if this jump conditionally branches around an unconditional jump. If so, invert this jump and point it to the target of the second jump. */ - if (next && JUMP_P (next) + if (next && simplejump_or_return_p (next) && any_condjump_p (insn) - && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN) && target_label && next_active_insn (target_label) == next_active_insn (next) && no_labels_between_p (insn, next)) @@ -3376,7 +3394,7 @@ relax_delay_slots (rtx first) invert_jump fails. */ ++LABEL_NUSES (target_label); - if (label) + if (label && LABEL_P (label)) ++LABEL_NUSES (label); if (invert_jump (insn, label, 1)) @@ -3385,7 +3403,7 @@ relax_delay_slots (rtx first) next = insn; } - if (label) + if (label && LABEL_P (label)) --LABEL_NUSES (label); if (--LABEL_NUSES (target_label) == 0) @@ -3403,8 +3421,7 @@ relax_delay_slots (rtx first) Don't do this if we expect the conditional branch to be true, because we would then be making the more common case longer. */ - if (JUMP_P (insn) - && (simplejump_p (insn) || GET_CODE (PATTERN (insn)) == RETURN) + if (simplejump_or_return_p (insn) && (other = prev_active_insn (insn)) != 0 && any_condjump_p (other) && no_labels_between_p (other, insn) @@ -3445,10 +3462,10 @@ relax_delay_slots (rtx first) Only do so if optimizing for size since this results in slower, but smaller code. */ if (optimize_function_for_size_p (cfun) - && GET_CODE (PATTERN (delay_insn)) == RETURN + && ANY_RETURN_P (PATTERN (delay_insn)) && next && JUMP_P (next) - && GET_CODE (PATTERN (next)) == RETURN) + && PATTERN (next) == PATTERN (delay_insn)) { rtx after; int i; @@ -3487,14 +3504,16 @@ relax_delay_slots (rtx first) continue; target_label = JUMP_LABEL (delay_insn); + if (target_label && ANY_RETURN_P (target_label)) + continue; if (target_label) { /* If this jump goes to another unconditional jump, thread it, but don't convert a jump into a RETURN here. */ trial = skip_consecutive_labels (follow_jumps (target_label)); - if (trial == 0) - trial = find_end_label (); + if (ANY_RETURN_P (trial)) + trial = find_end_label (trial); if (trial && trial != target_label && redirect_with_delay_slots_safe_p (delay_insn, trial, insn)) @@ -3517,7 +3536,7 @@ relax_delay_slots (rtx first) later incorrectly compute register live/death info. */ rtx tmp = next_active_insn (trial); if (tmp == 0) - tmp = find_end_label (); + tmp = find_end_label (simple_return_rtx); if (tmp) { @@ -3537,14 +3556,12 @@ relax_delay_slots (rtx first) delay list and that insn is redundant, thread the jump. */ if (trial && GET_CODE (PATTERN (trial)) == SEQUENCE && XVECLEN (PATTERN (trial), 0) == 2 - && JUMP_P (XVECEXP (PATTERN (trial), 0, 0)) - && (simplejump_p (XVECEXP (PATTERN (trial), 0, 0)) - || GET_CODE (PATTERN (XVECEXP (PATTERN (trial), 0, 0))) == RETURN) + && simplejump_or_return_p (XVECEXP (PATTERN (trial), 0, 0)) && redundant_insn (XVECEXP (PATTERN (trial), 0, 1), insn, 0)) { target_label = JUMP_LABEL (XVECEXP (PATTERN (trial), 0, 0)); - if (target_label == 0) - target_label = find_end_label (); + if (ANY_RETURN_P (target_label)) + target_label = find_end_label (target_label); if (target_label && redirect_with_delay_slots_safe_p (delay_insn, target_label, @@ -3622,16 +3639,15 @@ relax_delay_slots (rtx first) a RETURN here. */ if (! INSN_ANNULLED_BRANCH_P (delay_insn) && any_condjump_p (delay_insn) - && next && JUMP_P (next) - && (simplejump_p (next) || GET_CODE (PATTERN (next)) == RETURN) + && next && simplejump_or_return_p (next) && next_active_insn (target_label) == next_active_insn (next) && no_labels_between_p (insn, next)) { rtx label = JUMP_LABEL (next); rtx old_label = JUMP_LABEL (delay_insn); - if (label == 0) - label = find_end_label (); + if (ANY_RETURN_P (label)) + label = find_end_label (label); /* find_end_label can generate a new label. Check this first. */ if (label @@ -3692,7 +3708,8 @@ static void make_return_insns (rtx first) { rtx insn, jump_insn, pat; - rtx real_return_label = end_of_function_label; + rtx real_return_label = function_return_label; + rtx real_simple_return_label = function_simple_return_label; int slots, i; #ifdef DELAY_SLOTS_FOR_EPILOGUE @@ -3710,15 +3727,22 @@ make_return_insns (rtx first) made for END_OF_FUNCTION_LABEL. If so, set up anything we can't change into a RETURN to jump to it. */ for (insn = first; insn; insn = NEXT_INSN (insn)) - if (JUMP_P (insn) && GET_CODE (PATTERN (insn)) == RETURN) + if (JUMP_P (insn) && ANY_RETURN_P (PATTERN (insn))) { - real_return_label = get_label_before (insn); + rtx t = get_label_before (insn); + if (PATTERN (insn) == ret_rtx) + real_return_label = t; + else + real_simple_return_label = t; break; } /* Show an extra usage of REAL_RETURN_LABEL so it won't go away if it was equal to END_OF_FUNCTION_LABEL. */ - LABEL_NUSES (real_return_label)++; + if (real_return_label) + LABEL_NUSES (real_return_label)++; + if (real_simple_return_label) + LABEL_NUSES (real_simple_return_label)++; /* Clear the list of insns to fill so we can use it. */ obstack_free (&unfilled_slots_obstack, unfilled_firstobj); @@ -3726,13 +3750,27 @@ make_return_insns (rtx first) for (insn = first; insn; insn = NEXT_INSN (insn)) { int flags; + rtx kind, real_label; /* Only look at filled JUMP_INSNs that go to the end of function label. */ if (!NONJUMP_INSN_P (insn) || GET_CODE (PATTERN (insn)) != SEQUENCE - || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0)) - || JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) != end_of_function_label) + || !JUMP_P (XVECEXP (PATTERN (insn), 0, 0))) + continue; + + if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) == function_return_label) + { + kind = ret_rtx; + real_label = real_return_label; + } + else if (JUMP_LABEL (XVECEXP (PATTERN (insn), 0, 0)) + == function_simple_return_label) + { + kind = simple_return_rtx; + real_label = real_simple_return_label; + } + else continue; pat = PATTERN (insn); @@ -3740,14 +3778,12 @@ make_return_insns (rtx first) /* If we can't make the jump into a RETURN, try to redirect it to the best RETURN and go on to the next insn. */ - if (! reorg_redirect_jump (jump_insn, NULL_RTX)) + if (! reorg_redirect_jump (jump_insn, kind)) { /* Make sure redirecting the jump will not invalidate the delay slot insns. */ - if (redirect_with_delay_slots_safe_p (jump_insn, - real_return_label, - insn)) - reorg_redirect_jump (jump_insn, real_return_label); + if (redirect_with_delay_slots_safe_p (jump_insn, real_label, insn)) + reorg_redirect_jump (jump_insn, real_label); continue; } @@ -3787,7 +3823,7 @@ make_return_insns (rtx first) RETURN, delete the SEQUENCE and output the individual insns, followed by the RETURN. Then set things up so we try to find insns for its delay slots, if it needs some. */ - if (GET_CODE (PATTERN (jump_insn)) == RETURN) + if (ANY_RETURN_P (PATTERN (jump_insn))) { rtx prev = PREV_INSN (insn); @@ -3804,13 +3840,16 @@ make_return_insns (rtx first) else /* It is probably more efficient to keep this with its current delay slot as a branch to a RETURN. */ - reorg_redirect_jump (jump_insn, real_return_label); + reorg_redirect_jump (jump_insn, real_label); } /* Now delete REAL_RETURN_LABEL if we never used it. Then try to fill any new delay slots we have created. */ - if (--LABEL_NUSES (real_return_label) == 0) + if (real_return_label != NULL_RTX && --LABEL_NUSES (real_return_label) == 0) delete_related_insns (real_return_label); + if (real_simple_return_label != NULL_RTX + && --LABEL_NUSES (real_simple_return_label) == 0) + delete_related_insns (real_simple_return_label); fill_simple_delay_slots (1); fill_simple_delay_slots (0); @@ -3878,7 +3917,7 @@ dbr_schedule (rtx first) init_resource_info (epilogue_insn); /* Show we haven't computed an end-of-function label yet. */ - end_of_function_label = 0; + function_return_label = function_simple_return_label = NULL_RTX; /* Initialize the statistics for this function. */ memset (num_insns_needing_delays, 0, sizeof num_insns_needing_delays); @@ -3900,11 +3939,23 @@ dbr_schedule (rtx first) /* If we made an end of function label, indicate that it is now safe to delete it by undoing our prior adjustment to LABEL_NUSES. If it is now unused, delete it. */ - if (end_of_function_label && --LABEL_NUSES (end_of_function_label) == 0) - delete_related_insns (end_of_function_label); + if (function_return_label && --LABEL_NUSES (function_return_label) == 0) + delete_related_insns (function_return_label); + if (function_simple_return_label + && --LABEL_NUSES (function_simple_return_label) == 0) + delete_related_insns (function_simple_return_label); +#if defined HAVE_return || defined HAVE_simple_return + if ( #ifdef HAVE_return - if (HAVE_return && end_of_function_label != 0) + (HAVE_return && function_return_label != 0) +#else + 0 +#endif +#ifdef HAVE_simple_return + || (HAVE_simple_return && function_simple_return_label != 0) +#endif + ) make_return_insns (first); #endif Index: gcc/resource.c =================================================================== --- gcc.orig/resource.c +++ gcc/resource.c @@ -495,6 +495,8 @@ find_dead_or_set_registers (rtx target, || GET_CODE (PATTERN (this_jump_insn)) == RETURN) { next = JUMP_LABEL (this_jump_insn); + if (next && ANY_RETURN_P (next)) + next = NULL_RTX; if (jump_insn == 0) { jump_insn = insn; @@ -562,9 +564,10 @@ find_dead_or_set_registers (rtx target, AND_COMPL_HARD_REG_SET (scratch, needed.regs); AND_COMPL_HARD_REG_SET (fallthrough_res.regs, scratch); - find_dead_or_set_registers (JUMP_LABEL (this_jump_insn), - &target_res, 0, jump_count, - target_set, needed); + if (!ANY_RETURN_P (JUMP_LABEL (this_jump_insn))) + find_dead_or_set_registers (JUMP_LABEL (this_jump_insn), + &target_res, 0, jump_count, + target_set, needed); find_dead_or_set_registers (next, &fallthrough_res, 0, jump_count, set, needed); @@ -1097,6 +1100,8 @@ mark_target_live_regs (rtx insns, rtx ta struct resources new_resources; rtx stop_insn = next_active_insn (jump_insn); + if (jump_target && ANY_RETURN_P (jump_target)) + jump_target = NULL_RTX; mark_target_live_regs (insns, next_active_insn (jump_target), &new_resources); CLEAR_RESOURCE (&set); Index: gcc/rtl.c =================================================================== --- gcc.orig/rtl.c +++ gcc/rtl.c @@ -256,6 +256,7 @@ copy_rtx (rtx orig) case PC: case CC0: case RETURN: + case SIMPLE_RETURN: case SCRATCH: /* SCRATCH must be shared because they represent distinct values. */ return orig; Index: gcc/rtl.def =================================================================== --- gcc.orig/rtl.def +++ gcc/rtl.def @@ -296,6 +296,10 @@ DEF_RTL_EXPR(CALL, "call", "ee", RTX_EXT DEF_RTL_EXPR(RETURN, "return", "", RTX_EXTRA) +/* A plain return, to be used on paths that are reached without going + through the function prologue. */ +DEF_RTL_EXPR(SIMPLE_RETURN, "simple_return", "", RTX_EXTRA) + /* Special for EH return from subroutine. */ DEF_RTL_EXPR(EH_RETURN, "eh_return", "", RTX_EXTRA) Index: gcc/rtl.h =================================================================== --- gcc.orig/rtl.h +++ gcc/rtl.h @@ -412,6 +412,10 @@ struct GTY((variable_size)) rtvec_def { (JUMP_P (INSN) && (GET_CODE (PATTERN (INSN)) == ADDR_VEC || \ GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC)) +/* Predicate yielding nonzero iff X is a return or simple_preturn. */ +#define ANY_RETURN_P(X) \ + (GET_CODE (X) == RETURN || GET_CODE (X) == SIMPLE_RETURN) + /* 1 if X is a unary operator. */ #define UNARY_P(X) \ @@ -2046,6 +2050,7 @@ enum global_rtl_index GR_PC, GR_CC0, GR_RETURN, + GR_SIMPLE_RETURN, GR_STACK_POINTER, GR_FRAME_POINTER, /* For register elimination to work properly these hard_frame_pointer_rtx, @@ -2136,6 +2141,7 @@ extern struct target_rtl *this_target_rt /* Standard pieces of rtx, to be substituted directly into things. */ #define pc_rtx (global_rtl[GR_PC]) #define ret_rtx (global_rtl[GR_RETURN]) +#define simple_return_rtx (global_rtl[GR_SIMPLE_RETURN]) #define cc0_rtx (global_rtl[GR_CC0]) /* All references to certain hard regs, except those created Index: gcc/rtlanal.c =================================================================== --- gcc.orig/rtlanal.c +++ gcc/rtlanal.c @@ -2662,6 +2662,7 @@ tablejump_p (const_rtx insn, rtx *labelp if (JUMP_P (insn) && (label = JUMP_LABEL (insn)) != NULL_RTX + && !ANY_RETURN_P (label) && (table = next_active_insn (label)) != NULL_RTX && JUMP_TABLE_DATA_P (table)) { Index: gcc/sched-vis.c =================================================================== --- gcc.orig/sched-vis.c +++ gcc/sched-vis.c @@ -554,6 +554,9 @@ print_pattern (char *buf, const_rtx x, i case RETURN: sprintf (buf, "return"); break; + case SIMPLE_RETURN: + sprintf (buf, "simple_return"); + break; case CALL: print_exp (buf, x, verbose); break; Index: gcc/gengenrtl.c =================================================================== --- gcc.orig/gengenrtl.c +++ gcc/gengenrtl.c @@ -131,6 +131,7 @@ special_rtx (int idx) || strcmp (defs[idx].enumname, "PC") == 0 || strcmp (defs[idx].enumname, "CC0") == 0 || strcmp (defs[idx].enumname, "RETURN") == 0 + || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0 || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0); } Index: gcc/common.opt =================================================================== --- gcc.orig/common.opt +++ gcc/common.opt @@ -1718,6 +1718,11 @@ fshow-column Common Report Var(flag_show_column) Init(1) Show column numbers in diagnostics, when available. Default on +fshrink-wrap +Common Report Var(flag_shrink_wrap) Optimization +Emit function prologues only before parts of the function that need it, +rather than at the top of the function. + fsignaling-nans Common Report Var(flag_signaling_nans) Optimization SetByCombined Disable optimizations observable by IEEE signaling NaNs Index: gcc/opts.c =================================================================== --- gcc.orig/opts.c +++ gcc/opts.c @@ -442,6 +442,7 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS, OPT_fipa_reference, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fipa_profile, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fmerge_constants, NULL, 1 }, + { OPT_LEVELS_1_PLUS, OPT_fshrink_wrap, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_ftree_ccp, NULL, 1 }, { OPT_LEVELS_1_PLUS, OPT_ftree_bit_ccp, NULL, 1 }, Index: gcc/doc/invoke.texi =================================================================== --- gcc.orig/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -384,10 +384,10 @@ Objective-C and Objective-C++ Dialects}. -fschedule-insns -fschedule-insns2 -fsection-anchors @gol -fselective-scheduling -fselective-scheduling2 @gol -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol --fsignaling-nans -fsingle-precision-constant -fsplit-ivs-in-unroller @gol --fsplit-wide-types -fstack-protector -fstack-protector-all @gol --fstrict-aliasing -fstrict-overflow -fthread-jumps -ftracer @gol --ftree-bit-ccp @gol +-fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol +-fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol +-fstack-protector-all -fstrict-aliasing -fstrict-overflow @gol +-fthread-jumps -ftracer -ftree-bit-ccp @gol -ftree-builtin-call-dce -ftree-ccp -ftree-ch -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -6708,6 +6708,12 @@ This option has no effect until one of @ When pipelining loops during selective scheduling, also pipeline outer loops. This option has no effect until @option{-fsel-sched-pipelining} is turned on. +@item -fshrink-wrap +@opindex fshrink-wrap +Emit function prologues only before parts of the function that need it, +rather than at the top of the function. This flag is enabled by default at +@option{-O} and higher. + @item -fcaller-saves @opindex fcaller-saves Enable values to be allocated in registers that will be clobbered by