[v6,1/8] Improve must tail in RTL backend

Message ID	20240521143203.2893096-2-ak@linux.intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CD03E3858D33 From: Andi Kleen <ak@linux.intel.com> To: gcc-patches@gcc.gnu.org Cc: richard.guenther@gmail.com, nathan@acm.org, josmyers@redhat.com, richard.sandiford@arm.com, jason@redhat.com, Andi Kleen <ak@linux.intel.com> Subject: [PATCH v6 1/8] Improve must tail in RTL backend Date: Tue, 21 May 2024 07:28:34 -0700 Message-ID: <20240521143203.2893096-2-ak@linux.intel.com> In-Reply-To: <20240521143203.2893096-1-ak@linux.intel.com> References: <20240521143203.2893096-1-ak@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[v6,1/8] Improve must tail in RTL backend \| expand [v6,1/8] Improve must tail in RTL backend [v6,2/8] Add a musttail generic attribute to the c-attribs table [v6,3/8] C++: Support clang compatible [[musttail]] (PR83324) [v6,4/8] C: Implement musttail attribute for returns [v6,5/8] Add tests for C/C++ musttail attributes [v6,6/8] Enable musttail tail conversion even when not optimizing [v6,7/8] Give better error messages for musttail [v6,8/8] Add documentation for musttail attribute

Andi Kleen May 21, 2024, 2:28 p.m. UTC

- Give error messages for all causes of non sibling call generation
- When giving error messages clear the musttail flag to avoid ICEs
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message.

	PR83324

gcc/ChangeLog:

	* calls.cc (expand_call): Fix mustcall implementation.
	(maybe_complain_about_tail_call): Clear must tail flag on error.
---
 gcc/calls.cc | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

Michael Matz May 29, 2024, 1:39 p.m. UTC | #1

On Tue, 21 May 2024, Andi Kleen wrote:

> - Give error messages for all causes of non sibling call generation
> - When giving error messages clear the musttail flag to avoid ICEs
> - Error out when tree-tailcall failed to mark a must-tail call
> sibcall. In this case it doesn't know the true reason and only gives
> a vague message.

Sorry for jumping in late, Richi triggered me :)  But some general 
remarks:

I think the ultimate knowledge if a call can or cannot be implemented as 
tail-call lies within calls.cc/expand_call: It is inherently 
target and ABI specific how arguments and returns are layed out, how the 
stack frame is generated, if arguments are or aren't removed by callers 
or callees and so on; all of that being knowledge that tree-tailcall 
doesn't have and doesn't want to have.  As such tree-tailcall should 
not be regarded as ultimate truth, and failures of tree-tailcall to 
recognize something as tail-callable shouldn't matter.

It then follows that tree-tailcall needn't be run at -O0 merely for 
setting the flag.  Instead calls.cc simply should try expanding a 
tail-call when it sees the must-tail flag (as it right now would do), i.e. 
trust the user.  If that fails for some reasons then that means that the 
checks within calls.cc aren't complete enough (and that tree-tailcall 
papered over that problem).  That would be (IMHO) an independend bug to be 
solved.  But _when_ those bugs are fixed then what you merely need to do 
for the musttail attribute is to set that flag on the gimple_call, 
possibly make sure that nothing (tree-tailcall!) removes the flag, and be 
done.

(For avoidance of doubt: with tree-tailcall I mean the tree sibcall call 
pass, "tailc", not the tail-recursion pass).

IOW: I don't see why the tree pass needs to be run at -O0 for musttail.  
If something doesn't work currently then that points to other 
deficiencies.

Ciao,
Michael.

> 
> 	PR83324
> 
> gcc/ChangeLog:
> 
> 	* calls.cc (expand_call): Fix mustcall implementation.
> 	(maybe_complain_about_tail_call): Clear must tail flag on error.
> ---
>  gcc/calls.cc | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 21d78f9779fe..161e36839654 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const char *reason)
>      return;
>  
>    error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
> +  CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0;
>  }
>  
>  /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
> @@ -2650,7 +2651,11 @@ expand_call (tree exp, rtx target, int ignore)
>    /* The type of the function being called.  */
>    tree fntype;
>    bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> +  /* tree-tailcall decided not to do tail calls. Error for the musttail case,
> +     unfortunately we don't know the reason so it's fairly vague.
> +     When tree-tailcall reported an error it already cleared the flag.  */
> +  if (!try_tail_call)
> +      maybe_complain_about_tail_call (exp, "other reasons");
>    int pass;
>  
>    /* Register in which non-BLKmode value will be returned,
> @@ -3022,10 +3027,21 @@ expand_call (tree exp, rtx target, int ignore)
>       pushed these optimizations into -O2.  Don't try if we're already
>       expanding a call, as that means we're an argument.  Don't try if
>       there's cleanups, as we know there's code to follow the call.  */
> -  if (currently_expanding_call++ != 0
> -      || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> -      || args_size.var
> -      || dbg_cnt (tail_call) == false)
> +  if (currently_expanding_call++ != 0)
> +    {
> +      maybe_complain_about_tail_call (exp, "inside another call");
> +      try_tail_call = 0;
> +    }
> +  if (!flag_optimize_sibling_calls
> +	&& !CALL_FROM_THUNK_P (exp)
> +	&& !CALL_EXPR_MUST_TAIL_CALL (exp))
> +    try_tail_call = 0;
> +  if (args_size.var)
> +    {
> +      maybe_complain_about_tail_call (exp, "variable size arguments");
> +      try_tail_call = 0;
> +    }
> +  if (dbg_cnt (tail_call) == false)
>      try_tail_call = 0;
>  
>    /* Workaround buggy C/C++ wrappers around Fortran routines with
> @@ -3046,13 +3062,15 @@ expand_call (tree exp, rtx target, int ignore)
>  	    if (MEM_P (*iter))
>  	      {
>  		try_tail_call = 0;
> +		maybe_complain_about_tail_call (exp,
> +				"hidden string length argument passed on stack");
>  		break;
>  	      }
>  	}
>  
>    /* If the user has marked the function as requiring tail-call
>       optimization, attempt it.  */
> -  if (must_tail_call)
> +  if (CALL_EXPR_MUST_TAIL_CALL (exp))
>      try_tail_call = 1;
>  
>    /*  Rest of purposes for tail call optimizations to fail.  */
>

Andi Kleen May 31, 2024, 6 p.m. UTC | #2

> I think the ultimate knowledge if a call can or cannot be implemented as 
> tail-call lies within calls.cc/expand_call: It is inherently 
> target and ABI specific how arguments and returns are layed out, how the 
> stack frame is generated, if arguments are or aren't removed by callers 
> or callees and so on; all of that being knowledge that tree-tailcall 
> doesn't have and doesn't want to have.  As such tree-tailcall should 
> not be regarded as ultimate truth, and failures of tree-tailcall to 
> recognize something as tail-callable shouldn't matter.

It's not the ultimate truth, but some of the checks it does are not
duplicated at expand time nor the backend. So it's one necessary pre condition
with the current code base.

Yes maybe the checks could be all moved, but that's a much larger
project.

-Andi

Michael Matz June 3, 2024, 5:02 p.m. UTC | #3

Hello,

On Fri, 31 May 2024, Andi Kleen wrote:

> > I think the ultimate knowledge if a call can or cannot be implemented as 
> > tail-call lies within calls.cc/expand_call: It is inherently 
> > target and ABI specific how arguments and returns are layed out, how the 
> > stack frame is generated, if arguments are or aren't removed by callers 
> > or callees and so on; all of that being knowledge that tree-tailcall 
> > doesn't have and doesn't want to have.  As such tree-tailcall should 
> > not be regarded as ultimate truth, and failures of tree-tailcall to 
> > recognize something as tail-callable shouldn't matter.
> 
> It's not the ultimate truth, but some of the checks it does are not 
> duplicated at expand time nor the backend. So it's one necessary pre 
> condition with the current code base.
> 
> Yes maybe the checks could be all moved, but that's a much larger 
> project.

Hmm.  I count six tests in about 25 lines of code in 
tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.

Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
find_tail_calls)?  Why would that be needed for musttail?  Is that 
attribute sometimes applied to calls that aren't in fact sibcall-able?

One thing I'm worried about is the need for a new sibcall pass at O0 just 
for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
live local variables for the whole function, potentially being quadratic.

Ciao,
Michael.

Jakub Jelinek June 3, 2024, 5:17 p.m. UTC | #4

On Mon, Jun 03, 2024 at 07:02:00PM +0200, Michael Matz wrote:
> Hello,
> 
> On Fri, 31 May 2024, Andi Kleen wrote:
> 
> > > I think the ultimate knowledge if a call can or cannot be implemented as 
> > > tail-call lies within calls.cc/expand_call: It is inherently 
> > > target and ABI specific how arguments and returns are layed out, how the 
> > > stack frame is generated, if arguments are or aren't removed by callers 
> > > or callees and so on; all of that being knowledge that tree-tailcall 
> > > doesn't have and doesn't want to have.  As such tree-tailcall should 
> > > not be regarded as ultimate truth, and failures of tree-tailcall to 
> > > recognize something as tail-callable shouldn't matter.
> > 
> > It's not the ultimate truth, but some of the checks it does are not 
> > duplicated at expand time nor the backend. So it's one necessary pre 
> > condition with the current code base.
> > 
> > Yes maybe the checks could be all moved, but that's a much larger 
> > project.
> 
> Hmm.  I count six tests in about 25 lines of code in 
> tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.
> 
> Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
> find_tail_calls)?  Why would that be needed for musttail?  Is that 
> attribute sometimes applied to calls that aren't in fact sibcall-able?
> 
> One thing I'm worried about is the need for a new sibcall pass at O0 just 
> for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
> live local variables for the whole function, potentially being quadratic.

But the pass could be done only if there is at least one musttail call
in a function (remembered in some cfun flag).  If people use that attribute,
guess they are willing to pay for it.

	Jakub

Andi Kleen June 3, 2024, 5:31 p.m. UTC | #5

> > Yes maybe the checks could be all moved, but that's a much larger 
> > project.
> 
> Hmm.  I count six tests in about 25 lines of code in 
> tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.

There are more checks in find_tail_calls. The logic is fairly spread
out. Some of it is needed to determine if it is valid.

> 
> Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
> find_tail_calls)? Why would that be needed for musttail?  Is that 
> attribute sometimes applied to calls that aren't in fact sibcall-able?

The rules the compilers use for this are hard to understand for
programmers.  So that's the whole point of the attribute. If they miss some subtle 
requirement they get a compile time error instead of a stack overflow at
runtime.

So yes it has to do all the checks.

> 
> One thing I'm worried about is the need for a new sibcall pass at O0 just 
> for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
> live local variables for the whole function, potentially being quadratic.

The live local variables computation is only done when there are actual
suitable tail calls. And the new -O0 variant only does it for musttail, nothing
else. So by default it is just a BB backwards walk until it sees a BB with 
enough edges to give up.


-Andi

Michael Matz June 4, 2024, 1:49 p.m. UTC | #6

Hello,

On Mon, 3 Jun 2024, Jakub Jelinek wrote:

> > Hmm.  I count six tests in about 25 lines of code in 
> > tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p.
> > 
> > Are you perhaps worrying about the sibcall discovery itself (i.e. much of 
> > find_tail_calls)?  Why would that be needed for musttail?  Is that 
> > attribute sometimes applied to calls that aren't in fact sibcall-able?
> > 
> > One thing I'm worried about is the need for a new sibcall pass at O0 just 
> > for sibcall discovery.  find_tail_calls isn't cheap, because it computes 
> > live local variables for the whole function, potentially being quadratic.
> 
> But the pass could be done only if there is at least one musttail call 
> in a function (remembered in some cfun flag).  If people use that 
> attribute, guess they are willing to pay for it.

Yeah, but I think the way the current proposal is doing it is mostly 
equivalent and fine enough, as Andi mentioned (in my worry I haven't 
considered that overall the backward walk stops fairly soon and then only 
does something when a musttail is there).

I still think that the tree pass being necessary for correctness is bad 
design, in the grand scheme of things, especially for those tests that are 
done for the call statement in isolation (i.e. tests about arguments like 
address-taken and suchlike, and return value, flags on the callee, and 
facts about the current function).  Those should all move to calls.cc or 
cfgexpand IMHO.

But I will yield on the discovery part that tree-tailcall is doing (i.e. 
those pieces that need to look at multiple statements, e.g. how the call 
result is used later); those are a bit harder to do in expand and how it's 
structured, and without getting rid of that part in tree-tailcall we have 
to run it at O0 anyway for musttail.  And moving only parts of the tests 
to calls.cc doesn't seem so worthwhile to hold up the patch.

So, I have no objections on the patch design anymore.

Ciao,
Michael.

[v6,1/8] Improve must tail in RTL backend

Commit Message

Comments

Patch