Message ID | 00ef01d9bcbb$d50aeeb0$7f20cc10$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | Replace lra-spill.cc's return_regno_p with return_reg_p. | expand |
On Sat, Jul 22, 2023 at 6:45 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch is my attempt to address the compile-time hog issue > in PR rtl-optimization/110587. Richard Biener's analysis shows that > compilation of pr28071.c with -O0 currently spends ~70% in timer > "LRA non-specific" due to return_regno_p failing to filter a large > number of calls to regno_in_use_p, resulting in quadratic behaviour. > > For this pathological test case, things can be improved significantly. > Although the return register (%rax) is indeed mentioned a large > number of times in this function, due to inlining, the inlined functions > access the returned register in TImode, whereas the current function > returns a DImode. Hence the check to see if we're the last SET of the > return register, which should be followed by a USE, can be improved > by also testing the mode. Implementation-wise, rather than pass an > additional mode parameter to LRA's local return_regno_p function, which > only has a single caller, it's more convenient to pass the rtx REG_P, > and from this extract both the REGNO and the mode in the callee, and > rename this function to return_reg_p. > > The good news is that with this change "LRA non-specific" drops from > 70% to 13%. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, with no new failures. Ok for mainline? I think this is a good heuristic improvement, but I don't feel competent on assessing the constraints this implies on the structure of the return value setting instructions. Previously we'd preserve noop copies mentioning the return hardreg but now just if that noop copy has the same mode as the return_rtx. I don't even understand why we need to preserve this noop copy in the first place. Looking at the history of return_regno_p and the surrounding of the single caller tells this is a place of "interestingness" ... Can somebody summarize why we need to preserve a noop-move between the return_rtx (hard?)regs? The comment /* IRA can generate move insns involving pseudos. It is better remove them earlier to speed up compiler a bit. It is also better to do it here as they might not pass final RTL check in LRA, (e.g. insn moving a control register into itself). So remove an useless move insn unless next insn is USE marking the return reg (we should save this as some subsequent optimizations assume that such original insns are saved). */ says this is about removing noop copies of _pseudos_ for correctness reasons. So, can't we simply scrap the return_regno_p and regno_in_use_p checks and always preserve noop moves between hardregs here, leaving that to other passes? I'm going to bootstrap & test that for the fun of it. Thanks, Richard. > > > 2023-07-22 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR middle-end/28071 > PR rtl-optimization/110587 > * lra-spills.cc (return_regno_p): Change argument and rename to... > (return_reg_p): Check if the given register RTX has the same > REGNO and machine mode as the function's return value. > (lra_final_code_change): Update call to return_reg_p. > > > Thanks in advance, > Roger > -- >
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index 3a7bb7e..ae147ad 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -705,10 +705,10 @@ alter_subregs (rtx *loc, bool final_p) return res; } -/* Return true if REGNO is used for return in the current - function. */ +/* Return true if register REG, known to be REG_P, is used for return + in the current function. */ static bool -return_regno_p (unsigned int regno) +return_reg_p (rtx reg) { rtx outgoing = crtl->return_rtx; @@ -716,7 +716,8 @@ return_regno_p (unsigned int regno) return false; if (REG_P (outgoing)) - return REGNO (outgoing) == regno; + return REGNO (outgoing) == REGNO (reg) + && GET_MODE (outgoing) == GET_MODE (reg); else if (GET_CODE (outgoing) == PARALLEL) { int i; @@ -725,7 +726,9 @@ return_regno_p (unsigned int regno) { rtx x = XEXP (XVECEXP (outgoing, 0, i), 0); - if (REG_P (x) && REGNO (x) == regno) + if (REG_P (x) + && REGNO (x) == REGNO (reg) + && GET_MODE (x) == GET_MODE (reg)) return true; } } @@ -821,7 +824,7 @@ lra_final_code_change (void) if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat)) && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat)) - && (! return_regno_p (REGNO (SET_SRC (pat))) + && (! return_reg_p (SET_SRC (pat)) || ! regno_in_use_p (insn, REGNO (SET_SRC (pat))))) { lra_invalidate_insn_data (insn);