diff mbox series

Replace lra-spill.cc's return_regno_p with return_reg_p.

Message ID 00ef01d9bcbb$d50aeeb0$7f20cc10$@nextmovesoftware.com
State New
Headers show
Series Replace lra-spill.cc's return_regno_p with return_reg_p. | expand

Commit Message

Roger Sayle July 22, 2023, 4:44 p.m. UTC
This patch is my attempt to address the compile-time hog issue
in PR rtl-optimization/110587.  Richard Biener's analysis shows that
compilation of pr28071.c with -O0 currently spends ~70% in timer
"LRA non-specific" due to return_regno_p failing to filter a large
number of calls to regno_in_use_p, resulting in quadratic behaviour.

For this pathological test case, things can be improved significantly.
Although the return register (%rax) is indeed mentioned a large
number of times in this function, due to inlining, the inlined functions
access the returned register in TImode, whereas the current function
returns a DImode.  Hence the check to see if we're the last SET of the
return register, which should be followed by a USE, can be improved
by also testing the mode.  Implementation-wise, rather than pass an
additional mode parameter to LRA's local return_regno_p function, which
only has a single caller, it's more convenient to pass the rtx REG_P,
and from this extract both the REGNO and the mode in the callee, and
rename this function to return_reg_p.

The good news is that with this change "LRA non-specific" drops from
70% to 13%.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, with no new failures.  Ok for mainline?


2023-07-22  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR middle-end/28071
        PR rtl-optimization/110587
        * lra-spills.cc (return_regno_p): Change argument and rename to...
        (return_reg_p): Check if the given register RTX has the same
        REGNO and machine mode as the function's return value.
        (lra_final_code_change): Update call to return_reg_p.


Thanks in advance,
Roger
--

Comments

Richard Biener July 25, 2023, 11:18 a.m. UTC | #1
On Sat, Jul 22, 2023 at 6:45 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch is my attempt to address the compile-time hog issue
> in PR rtl-optimization/110587.  Richard Biener's analysis shows that
> compilation of pr28071.c with -O0 currently spends ~70% in timer
> "LRA non-specific" due to return_regno_p failing to filter a large
> number of calls to regno_in_use_p, resulting in quadratic behaviour.
>
> For this pathological test case, things can be improved significantly.
> Although the return register (%rax) is indeed mentioned a large
> number of times in this function, due to inlining, the inlined functions
> access the returned register in TImode, whereas the current function
> returns a DImode.  Hence the check to see if we're the last SET of the
> return register, which should be followed by a USE, can be improved
> by also testing the mode.  Implementation-wise, rather than pass an
> additional mode parameter to LRA's local return_regno_p function, which
> only has a single caller, it's more convenient to pass the rtx REG_P,
> and from this extract both the REGNO and the mode in the callee, and
> rename this function to return_reg_p.
>
> The good news is that with this change "LRA non-specific" drops from
> 70% to 13%.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, with no new failures.  Ok for mainline?

I think this is a good heuristic improvement, but I don't feel competent
on assessing the constraints this implies on the structure of the
return value setting instructions.  Previously we'd preserve
noop copies mentioning the return hardreg but now just if that noop
copy has the same mode as the return_rtx.  I don't even understand
why we need to preserve this noop copy in the first place.

Looking at the history of return_regno_p and the surrounding of the
single caller tells this is a place of "interestingness" ...

Can somebody summarize why we need to preserve a noop-move
between the return_rtx (hard?)regs?

The comment

          /* IRA can generate move insns involving pseudos.  It is
             better remove them earlier to speed up compiler a bit.
             It is also better to do it here as they might not pass
             final RTL check in LRA, (e.g. insn moving a control
             register into itself).  So remove an useless move insn
             unless next insn is USE marking the return reg (we should
             save this as some subsequent optimizations assume that
             such original insns are saved).  */

says this is about removing noop copies of _pseudos_ for correctness
reasons.  So, can't we simply scrap the return_regno_p and regno_in_use_p
checks and always preserve noop moves between hardregs here, leaving
that to other passes?

I'm going to bootstrap & test that for the fun of it.

Thanks,
Richard.

>
>
> 2023-07-22  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR middle-end/28071
>         PR rtl-optimization/110587
>         * lra-spills.cc (return_regno_p): Change argument and rename to...
>         (return_reg_p): Check if the given register RTX has the same
>         REGNO and machine mode as the function's return value.
>         (lra_final_code_change): Update call to return_reg_p.
>
>
> Thanks in advance,
> Roger
> --
>
diff mbox series

Patch

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3a7bb7e..ae147ad 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -705,10 +705,10 @@  alter_subregs (rtx *loc, bool final_p)
   return res;
 }
 
-/* Return true if REGNO is used for return in the current
-   function.  */
+/* Return true if register REG, known to be REG_P, is used for return
+   in the current function.  */
 static bool
-return_regno_p (unsigned int regno)
+return_reg_p (rtx reg)
 {
   rtx outgoing = crtl->return_rtx;
 
@@ -716,7 +716,8 @@  return_regno_p (unsigned int regno)
     return false;
 
   if (REG_P (outgoing))
-    return REGNO (outgoing) == regno;
+    return REGNO (outgoing) == REGNO (reg)
+	   && GET_MODE (outgoing) == GET_MODE (reg);
   else if (GET_CODE (outgoing) == PARALLEL)
     {
       int i;
@@ -725,7 +726,9 @@  return_regno_p (unsigned int regno)
 	{
 	  rtx x = XEXP (XVECEXP (outgoing, 0, i), 0);
 
-	  if (REG_P (x) && REGNO (x) == regno)
+	  if (REG_P (x)
+	      && REGNO (x) == REGNO (reg)
+	      && GET_MODE (x) == GET_MODE (reg))
 	    return true;
 	}
     }
@@ -821,7 +824,7 @@  lra_final_code_change (void)
 	  if (NONJUMP_INSN_P (insn) && GET_CODE (pat) == SET
 	      && REG_P (SET_SRC (pat)) && REG_P (SET_DEST (pat))
 	      && REGNO (SET_SRC (pat)) == REGNO (SET_DEST (pat))
-	      && (! return_regno_p (REGNO (SET_SRC (pat)))
+	      && (! return_reg_p (SET_SRC (pat))
 		  || ! regno_in_use_p (insn, REGNO (SET_SRC (pat)))))
 	    {
 	      lra_invalidate_insn_data (insn);