Message ID | 20240521143203.2893096-2-ak@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | [v6,1/8] Improve must tail in RTL backend | expand |
On Tue, 21 May 2024, Andi Kleen wrote: > - Give error messages for all causes of non sibling call generation > - When giving error messages clear the musttail flag to avoid ICEs > - Error out when tree-tailcall failed to mark a must-tail call > sibcall. In this case it doesn't know the true reason and only gives > a vague message. Sorry for jumping in late, Richi triggered me :) But some general remarks: I think the ultimate knowledge if a call can or cannot be implemented as tail-call lies within calls.cc/expand_call: It is inherently target and ABI specific how arguments and returns are layed out, how the stack frame is generated, if arguments are or aren't removed by callers or callees and so on; all of that being knowledge that tree-tailcall doesn't have and doesn't want to have. As such tree-tailcall should not be regarded as ultimate truth, and failures of tree-tailcall to recognize something as tail-callable shouldn't matter. It then follows that tree-tailcall needn't be run at -O0 merely for setting the flag. Instead calls.cc simply should try expanding a tail-call when it sees the must-tail flag (as it right now would do), i.e. trust the user. If that fails for some reasons then that means that the checks within calls.cc aren't complete enough (and that tree-tailcall papered over that problem). That would be (IMHO) an independend bug to be solved. But _when_ those bugs are fixed then what you merely need to do for the musttail attribute is to set that flag on the gimple_call, possibly make sure that nothing (tree-tailcall!) removes the flag, and be done. (For avoidance of doubt: with tree-tailcall I mean the tree sibcall call pass, "tailc", not the tail-recursion pass). IOW: I don't see why the tree pass needs to be run at -O0 for musttail. If something doesn't work currently then that points to other deficiencies. Ciao, Michael. > > PR83324 > > gcc/ChangeLog: > > * calls.cc (expand_call): Fix mustcall implementation. > (maybe_complain_about_tail_call): Clear must tail flag on error. > --- > gcc/calls.cc | 30 ++++++++++++++++++++++++------ > 1 file changed, 24 insertions(+), 6 deletions(-) > > diff --git a/gcc/calls.cc b/gcc/calls.cc > index 21d78f9779fe..161e36839654 100644 > --- a/gcc/calls.cc > +++ b/gcc/calls.cc > @@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const char *reason) > return; > > error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason); > + CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0; > } > > /* Fill in ARGS_SIZE and ARGS array based on the parameters found in > @@ -2650,7 +2651,11 @@ expand_call (tree exp, rtx target, int ignore) > /* The type of the function being called. */ > tree fntype; > bool try_tail_call = CALL_EXPR_TAILCALL (exp); > - bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp); > + /* tree-tailcall decided not to do tail calls. Error for the musttail case, > + unfortunately we don't know the reason so it's fairly vague. > + When tree-tailcall reported an error it already cleared the flag. */ > + if (!try_tail_call) > + maybe_complain_about_tail_call (exp, "other reasons"); > int pass; > > /* Register in which non-BLKmode value will be returned, > @@ -3022,10 +3027,21 @@ expand_call (tree exp, rtx target, int ignore) > pushed these optimizations into -O2. Don't try if we're already > expanding a call, as that means we're an argument. Don't try if > there's cleanups, as we know there's code to follow the call. */ > - if (currently_expanding_call++ != 0 > - || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp)) > - || args_size.var > - || dbg_cnt (tail_call) == false) > + if (currently_expanding_call++ != 0) > + { > + maybe_complain_about_tail_call (exp, "inside another call"); > + try_tail_call = 0; > + } > + if (!flag_optimize_sibling_calls > + && !CALL_FROM_THUNK_P (exp) > + && !CALL_EXPR_MUST_TAIL_CALL (exp)) > + try_tail_call = 0; > + if (args_size.var) > + { > + maybe_complain_about_tail_call (exp, "variable size arguments"); > + try_tail_call = 0; > + } > + if (dbg_cnt (tail_call) == false) > try_tail_call = 0; > > /* Workaround buggy C/C++ wrappers around Fortran routines with > @@ -3046,13 +3062,15 @@ expand_call (tree exp, rtx target, int ignore) > if (MEM_P (*iter)) > { > try_tail_call = 0; > + maybe_complain_about_tail_call (exp, > + "hidden string length argument passed on stack"); > break; > } > } > > /* If the user has marked the function as requiring tail-call > optimization, attempt it. */ > - if (must_tail_call) > + if (CALL_EXPR_MUST_TAIL_CALL (exp)) > try_tail_call = 1; > > /* Rest of purposes for tail call optimizations to fail. */ >
> I think the ultimate knowledge if a call can or cannot be implemented as > tail-call lies within calls.cc/expand_call: It is inherently > target and ABI specific how arguments and returns are layed out, how the > stack frame is generated, if arguments are or aren't removed by callers > or callees and so on; all of that being knowledge that tree-tailcall > doesn't have and doesn't want to have. As such tree-tailcall should > not be regarded as ultimate truth, and failures of tree-tailcall to > recognize something as tail-callable shouldn't matter. It's not the ultimate truth, but some of the checks it does are not duplicated at expand time nor the backend. So it's one necessary pre condition with the current code base. Yes maybe the checks could be all moved, but that's a much larger project. -Andi
Hello, On Fri, 31 May 2024, Andi Kleen wrote: > > I think the ultimate knowledge if a call can or cannot be implemented as > > tail-call lies within calls.cc/expand_call: It is inherently > > target and ABI specific how arguments and returns are layed out, how the > > stack frame is generated, if arguments are or aren't removed by callers > > or callees and so on; all of that being knowledge that tree-tailcall > > doesn't have and doesn't want to have. As such tree-tailcall should > > not be regarded as ultimate truth, and failures of tree-tailcall to > > recognize something as tail-callable shouldn't matter. > > It's not the ultimate truth, but some of the checks it does are not > duplicated at expand time nor the backend. So it's one necessary pre > condition with the current code base. > > Yes maybe the checks could be all moved, but that's a much larger > project. Hmm. I count six tests in about 25 lines of code in tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p. Are you perhaps worrying about the sibcall discovery itself (i.e. much of find_tail_calls)? Why would that be needed for musttail? Is that attribute sometimes applied to calls that aren't in fact sibcall-able? One thing I'm worried about is the need for a new sibcall pass at O0 just for sibcall discovery. find_tail_calls isn't cheap, because it computes live local variables for the whole function, potentially being quadratic. Ciao, Michael.
On Mon, Jun 03, 2024 at 07:02:00PM +0200, Michael Matz wrote: > Hello, > > On Fri, 31 May 2024, Andi Kleen wrote: > > > > I think the ultimate knowledge if a call can or cannot be implemented as > > > tail-call lies within calls.cc/expand_call: It is inherently > > > target and ABI specific how arguments and returns are layed out, how the > > > stack frame is generated, if arguments are or aren't removed by callers > > > or callees and so on; all of that being knowledge that tree-tailcall > > > doesn't have and doesn't want to have. As such tree-tailcall should > > > not be regarded as ultimate truth, and failures of tree-tailcall to > > > recognize something as tail-callable shouldn't matter. > > > > It's not the ultimate truth, but some of the checks it does are not > > duplicated at expand time nor the backend. So it's one necessary pre > > condition with the current code base. > > > > Yes maybe the checks could be all moved, but that's a much larger > > project. > > Hmm. I count six tests in about 25 lines of code in > tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p. > > Are you perhaps worrying about the sibcall discovery itself (i.e. much of > find_tail_calls)? Why would that be needed for musttail? Is that > attribute sometimes applied to calls that aren't in fact sibcall-able? > > One thing I'm worried about is the need for a new sibcall pass at O0 just > for sibcall discovery. find_tail_calls isn't cheap, because it computes > live local variables for the whole function, potentially being quadratic. But the pass could be done only if there is at least one musttail call in a function (remembered in some cfun flag). If people use that attribute, guess they are willing to pay for it. Jakub
> > Yes maybe the checks could be all moved, but that's a much larger > > project. > > Hmm. I count six tests in about 25 lines of code in > tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p. There are more checks in find_tail_calls. The logic is fairly spread out. Some of it is needed to determine if it is valid. > > Are you perhaps worrying about the sibcall discovery itself (i.e. much of > find_tail_calls)? Why would that be needed for musttail? Is that > attribute sometimes applied to calls that aren't in fact sibcall-able? The rules the compilers use for this are hard to understand for programmers. So that's the whole point of the attribute. If they miss some subtle requirement they get a compile time error instead of a stack overflow at runtime. So yes it has to do all the checks. > > One thing I'm worried about is the need for a new sibcall pass at O0 just > for sibcall discovery. find_tail_calls isn't cheap, because it computes > live local variables for the whole function, potentially being quadratic. The live local variables computation is only done when there are actual suitable tail calls. And the new -O0 variant only does it for musttail, nothing else. So by default it is just a BB backwards walk until it sees a BB with enough edges to give up. -Andi
Hello, On Mon, 3 Jun 2024, Jakub Jelinek wrote: > > Hmm. I count six tests in about 25 lines of code in > > tree-tailcall.cc:suitable_for_tail_opt_p and suitable_for_tail_call_opt_p. > > > > Are you perhaps worrying about the sibcall discovery itself (i.e. much of > > find_tail_calls)? Why would that be needed for musttail? Is that > > attribute sometimes applied to calls that aren't in fact sibcall-able? > > > > One thing I'm worried about is the need for a new sibcall pass at O0 just > > for sibcall discovery. find_tail_calls isn't cheap, because it computes > > live local variables for the whole function, potentially being quadratic. > > But the pass could be done only if there is at least one musttail call > in a function (remembered in some cfun flag). If people use that > attribute, guess they are willing to pay for it. Yeah, but I think the way the current proposal is doing it is mostly equivalent and fine enough, as Andi mentioned (in my worry I haven't considered that overall the backward walk stops fairly soon and then only does something when a musttail is there). I still think that the tree pass being necessary for correctness is bad design, in the grand scheme of things, especially for those tests that are done for the call statement in isolation (i.e. tests about arguments like address-taken and suchlike, and return value, flags on the callee, and facts about the current function). Those should all move to calls.cc or cfgexpand IMHO. But I will yield on the discovery part that tree-tailcall is doing (i.e. those pieces that need to look at multiple statements, e.g. how the call result is used later); those are a bit harder to do in expand and how it's structured, and without getting rid of that part in tree-tailcall we have to run it at O0 anyway for musttail. And moving only parts of the tests to calls.cc doesn't seem so worthwhile to hold up the patch. So, I have no objections on the patch design anymore. Ciao, Michael.
diff --git a/gcc/calls.cc b/gcc/calls.cc index 21d78f9779fe..161e36839654 100644 --- a/gcc/calls.cc +++ b/gcc/calls.cc @@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const char *reason) return; error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason); + CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0; } /* Fill in ARGS_SIZE and ARGS array based on the parameters found in @@ -2650,7 +2651,11 @@ expand_call (tree exp, rtx target, int ignore) /* The type of the function being called. */ tree fntype; bool try_tail_call = CALL_EXPR_TAILCALL (exp); - bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp); + /* tree-tailcall decided not to do tail calls. Error for the musttail case, + unfortunately we don't know the reason so it's fairly vague. + When tree-tailcall reported an error it already cleared the flag. */ + if (!try_tail_call) + maybe_complain_about_tail_call (exp, "other reasons"); int pass; /* Register in which non-BLKmode value will be returned, @@ -3022,10 +3027,21 @@ expand_call (tree exp, rtx target, int ignore) pushed these optimizations into -O2. Don't try if we're already expanding a call, as that means we're an argument. Don't try if there's cleanups, as we know there's code to follow the call. */ - if (currently_expanding_call++ != 0 - || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp)) - || args_size.var - || dbg_cnt (tail_call) == false) + if (currently_expanding_call++ != 0) + { + maybe_complain_about_tail_call (exp, "inside another call"); + try_tail_call = 0; + } + if (!flag_optimize_sibling_calls + && !CALL_FROM_THUNK_P (exp) + && !CALL_EXPR_MUST_TAIL_CALL (exp)) + try_tail_call = 0; + if (args_size.var) + { + maybe_complain_about_tail_call (exp, "variable size arguments"); + try_tail_call = 0; + } + if (dbg_cnt (tail_call) == false) try_tail_call = 0; /* Workaround buggy C/C++ wrappers around Fortran routines with @@ -3046,13 +3062,15 @@ expand_call (tree exp, rtx target, int ignore) if (MEM_P (*iter)) { try_tail_call = 0; + maybe_complain_about_tail_call (exp, + "hidden string length argument passed on stack"); break; } } /* If the user has marked the function as requiring tail-call optimization, attempt it. */ - if (must_tail_call) + if (CALL_EXPR_MUST_TAIL_CALL (exp)) try_tail_call = 1; /* Rest of purposes for tail call optimizations to fail. */