[v5,1/5] Improve must tail in RTL backend

Message ID	20240505181458.2903045-1-ak@linux.intel.com
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1C1E23858D28 From: Andi Kleen <ak@linux.intel.com> To: gcc-patches@gcc.gnu.org Cc: nathan@acm.org, josmyers@redhat.com, richard.sandiford@arm.com, jason@redhat.com, Andi Kleen <ak@linux.intel.com> Subject: [PATCH v5 1/5] Improve must tail in RTL backend Date: Sun, 5 May 2024 11:14:54 -0700 Message-ID: <20240505181458.2903045-1-ak@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	[v5,1/5] Improve must tail in RTL backend \| expand [v5,1/5] Improve must tail in RTL backend [v5,2/5] C++: Support clang compatible [[musttail]] (PR83324) [v5,3/5] C: Implement musttail attribute for returns [v5,4/5] Add tests for C/C++ musttail attributes [v5,5/5] Add documentation for musttail attribute

Andi Kleen May 5, 2024, 6:14 p.m. UTC

- Give error messages for all causes of non sibling call generation
- Don't override choices of other non sibling call checks with
must tail. This causes ICEs. The must tail attribute now only
overrides flag_optimize_sibling_calls locally.
- Error out when tree-tailcall failed to mark a must-tail call
sibcall. In this case it doesn't know the true reason and only gives
a vague message (this could be improved, but it's already useful without
that) tree-tailcall usually fails without optimization, so must
adjust the existing must-tail plugin test to specify -O2.

	PR83324

gcc/ChangeLog:

	* calls.cc (expand_call): Fix mustcall implementation.

gcc/testsuite/ChangeLog:

	* gcc.dg/plugin/must-tail-call-1.c: Adjust.
---
 gcc/calls.cc                                  | 30 ++++++++++++-------
 .../gcc.dg/plugin/must-tail-call-1.c          |  1 +
 2 files changed, 21 insertions(+), 10 deletions(-)

Richard Biener May 14, 2024, 2:15 p.m. UTC | #1

On Sun, May 5, 2024 at 8:16 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> - Give error messages for all causes of non sibling call generation
> - Don't override choices of other non sibling call checks with
> must tail. This causes ICEs. The must tail attribute now only
> overrides flag_optimize_sibling_calls locally.
> - Error out when tree-tailcall failed to mark a must-tail call
> sibcall. In this case it doesn't know the true reason and only gives
> a vague message (this could be improved, but it's already useful without
> that) tree-tailcall usually fails without optimization, so must
> adjust the existing must-tail plugin test to specify -O2.
>
>         PR83324
>
> gcc/ChangeLog:
>
>         * calls.cc (expand_call): Fix mustcall implementation.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/plugin/must-tail-call-1.c: Adjust.
> ---
>  gcc/calls.cc                                  | 30 ++++++++++++-------
>  .../gcc.dg/plugin/must-tail-call-1.c          |  1 +
>  2 files changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 21d78f9779fe..a6b8ee44cc29 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
>    /* The type of the function being called.  */
>    tree fntype;
>    bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> +  /* tree-tailcall decided not to do tail calls. Error for the musttail case.  */
> +  if (!try_tail_call)
> +      maybe_complain_about_tail_call (exp, "other reasons");
>    int pass;
>
>    /* Register in which non-BLKmode value will be returned,
> @@ -3022,10 +3024,22 @@ expand_call (tree exp, rtx target, int ignore)
>       pushed these optimizations into -O2.  Don't try if we're already
>       expanding a call, as that means we're an argument.  Don't try if
>       there's cleanups, as we know there's code to follow the call.  */
> -  if (currently_expanding_call++ != 0
> -      || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> -      || args_size.var
> -      || dbg_cnt (tail_call) == false)
> +  if (currently_expanding_call++ != 0)
> +    {
> +      maybe_complain_about_tail_call (exp, "inside another call");
> +      try_tail_call = 0;
> +    }
> +  if (!flag_optimize_sibling_calls
> +       && !CALL_FROM_THUNK_P (exp)
> +       && !CALL_EXPR_MUST_TAIL_CALL (exp))
> +    try_tail_call = 0;
> +  if (args_size.var)

If we are both inside another call and run into this we give two errors,
but I guess that's OK ...

> +    {
> +      /* ??? correct message?  */
> +      maybe_complain_about_tail_call (exp, "stack space needed");

args_size.var != NULL_TREE means the argument size is not constant.
I'm quite sure this is an overly conservative check.

> +      try_tail_call = 0;
> +    }
> +  if (dbg_cnt (tail_call) == false)
>      try_tail_call = 0;
>
>    /* Workaround buggy C/C++ wrappers around Fortran routines with
> @@ -3046,15 +3060,11 @@ expand_call (tree exp, rtx target, int ignore)
>             if (MEM_P (*iter))
>               {
>                 try_tail_call = 0;
> +               maybe_complain_about_tail_call (exp, "hidden string length argument");

"hidden string length argument passed on stack"

from what I read the code.

>                 break;
>               }
>         }
>
> -  /* If the user has marked the function as requiring tail-call
> -     optimization, attempt it.  */
> -  if (must_tail_call)
> -    try_tail_call = 1;
> -
>    /*  Rest of purposes for tail call optimizations to fail.  */
>    if (try_tail_call)
>      try_tail_call = can_implement_as_sibling_call_p (exp,
> diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> index 3a6d4cceaba7..44af361e2925 100644
> --- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> +++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile { target tail_call } } */
> +/* { dg-options "-O2" } */

So I think this is unfortunate - I think when there's a must-tail attribute
we should either run the tailcall pass to check the call even at -O0 or
trust the user with correctness  (hoping no optimization interfered with
the ability to tail-call).

What were the ICEs you ran into?

I would guess it's for example problematic to duplicate must-tail calls?

Thanks,
Richard.

>  /* { dg-options "-fdelayed-branch" { target sparc*-*-* } } */
>
>  extern void abort (void);
> --
> 2.44.0
>

Andi Kleen May 14, 2024, 5:19 p.m. UTC | #2

> > diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > index 3a6d4cceaba7..44af361e2925 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > @@ -1,4 +1,5 @@
> >  /* { dg-do compile { target tail_call } } */
> > +/* { dg-options "-O2" } */
> 
> So I think this is unfortunate - I think when there's a must-tail attribute
> we should either run the tailcall pass to check the call even at -O0 or
> trust the user with correctness  (hoping no optimization interfered with
> the ability to tail-call).
> 
> What were the ICEs you ran into?

I just tried and I don't see ICEs with the current test cases anymore.
Unfortunately I didn't save the earlier ones and I'm not fully sure
what test case i used then. I'll undo that change and watch it.

Maybe together with moving the tree pass that's enough to fix -O0.

-Andi

Andi Kleen May 20, 2024, 4:53 a.m. UTC | #3

On Tue, May 14, 2024 at 04:15:08PM +0200, Richard Biener wrote:
> On Sun, May 5, 2024 at 8:16 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > - Give error messages for all causes of non sibling call generation
> > - Don't override choices of other non sibling call checks with
> > must tail. This causes ICEs. The must tail attribute now only
> > overrides flag_optimize_sibling_calls locally.
> > - Error out when tree-tailcall failed to mark a must-tail call
> > sibcall. In this case it doesn't know the true reason and only gives
> > a vague message (this could be improved, but it's already useful without
> > that) tree-tailcall usually fails without optimization, so must
> > adjust the existing must-tail plugin test to specify -O2.
> >
> >         PR83324
> >
> > gcc/ChangeLog:
> >
> >         * calls.cc (expand_call): Fix mustcall implementation.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.dg/plugin/must-tail-call-1.c: Adjust.
> > ---
> >  gcc/calls.cc                                  | 30 ++++++++++++-------
> >  .../gcc.dg/plugin/must-tail-call-1.c          |  1 +
> >  2 files changed, 21 insertions(+), 10 deletions(-)
> >
> > diff --git a/gcc/calls.cc b/gcc/calls.cc
> > index 21d78f9779fe..a6b8ee44cc29 100644
> > --- a/gcc/calls.cc
> > +++ b/gcc/calls.cc
> > @@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
> >    /* The type of the function being called.  */
> >    tree fntype;
> >    bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> > -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> > +  /* tree-tailcall decided not to do tail calls. Error for the musttail case.  */
> > +  if (!try_tail_call)
> > +      maybe_complain_about_tail_call (exp, "other reasons");
> >    int pass;
> >
> >    /* Register in which non-BLKmode value will be returned,
> > @@ -3022,10 +3024,22 @@ expand_call (tree exp, rtx target, int ignore)
> >       pushed these optimizations into -O2.  Don't try if we're already
> >       expanding a call, as that means we're an argument.  Don't try if
> >       there's cleanups, as we know there's code to follow the call.  */
> > -  if (currently_expanding_call++ != 0
> > -      || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> > -      || args_size.var
> > -      || dbg_cnt (tail_call) == false)
> > +  if (currently_expanding_call++ != 0)
> > +    {
> > +      maybe_complain_about_tail_call (exp, "inside another call");
> > +      try_tail_call = 0;
> > +    }
> > +  if (!flag_optimize_sibling_calls
> > +       && !CALL_FROM_THUNK_P (exp)
> > +       && !CALL_EXPR_MUST_TAIL_CALL (exp))
> > +    try_tail_call = 0;
> > +  if (args_size.var)
> 
> If we are both inside another call and run into this we give two errors,
> but I guess that's OK ...
> 
> > +    {
> > +      /* ??? correct message?  */
> > +      maybe_complain_about_tail_call (exp, "stack space needed");
> 
> args_size.var != NULL_TREE means the argument size is not constant.
> I'm quite sure this is an overly conservative check.
> 
> > +      try_tail_call = 0;
> > +    }
> > +  if (dbg_cnt (tail_call) == false)
> >      try_tail_call = 0;
> >
> >    /* Workaround buggy C/C++ wrappers around Fortran routines with
> > @@ -3046,15 +3060,11 @@ expand_call (tree exp, rtx target, int ignore)
> >             if (MEM_P (*iter))
> >               {
> >                 try_tail_call = 0;
> > +               maybe_complain_about_tail_call (exp, "hidden string length argument");
> 
> "hidden string length argument passed on stack"
> 
> from what I read the code.
> 
> >                 break;
> >               }
> >         }
> >
> > -  /* If the user has marked the function as requiring tail-call
> > -     optimization, attempt it.  */
> > -  if (must_tail_call)
> > -    try_tail_call = 1;
> > -
> >    /*  Rest of purposes for tail call optimizations to fail.  */
> >    if (try_tail_call)
> >      try_tail_call = can_implement_as_sibling_call_p (exp,
> > diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > index 3a6d4cceaba7..44af361e2925 100644
> > --- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > +++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > @@ -1,4 +1,5 @@
> >  /* { dg-do compile { target tail_call } } */
> > +/* { dg-options "-O2" } */
> 
> So I think this is unfortunate - I think when there's a must-tail attribute
> we should either run the tailcall pass to check the call even at -O0 or
> trust the user with correctness  (hoping no optimization interfered with
> the ability to tail-call).
> 
> What were the ICEs you ran into?
> 
> I would guess it's for example problematic to duplicate must-tail calls?

I experimented more with this, the problem I have currently is that in
-O0 when returning a struct I get something like 

  D.2846 = caller3<Bar> (D.2836); [must tail call]

  <bb 3> :
  D.2836 ={v} {CLOBBER(eos)};
  return D.2846;

And this always fails this check in tree-tailcall:

      /* If the statement references memory or volatile operands, fail.  */
      if (gimple_references_memory_p (stmt)
          || gimple_has_volatile_ops (stmt))
        return;

In a optimized build this passes, but with -O0 it always fails
when the pass is placed before pass_optimizations_g. I assume
it's some problem with mem ssa form.

Any ideas how to fix that? Otherwise I can restrict musttail to non
structs.

-Andi

Richard Biener May 21, 2024, 8:31 a.m. UTC | #4

On Mon, May 20, 2024 at 6:53 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Tue, May 14, 2024 at 04:15:08PM +0200, Richard Biener wrote:
> > On Sun, May 5, 2024 at 8:16 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > - Give error messages for all causes of non sibling call generation
> > > - Don't override choices of other non sibling call checks with
> > > must tail. This causes ICEs. The must tail attribute now only
> > > overrides flag_optimize_sibling_calls locally.
> > > - Error out when tree-tailcall failed to mark a must-tail call
> > > sibcall. In this case it doesn't know the true reason and only gives
> > > a vague message (this could be improved, but it's already useful without
> > > that) tree-tailcall usually fails without optimization, so must
> > > adjust the existing must-tail plugin test to specify -O2.
> > >
> > >         PR83324
> > >
> > > gcc/ChangeLog:
> > >
> > >         * calls.cc (expand_call): Fix mustcall implementation.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.dg/plugin/must-tail-call-1.c: Adjust.
> > > ---
> > >  gcc/calls.cc                                  | 30 ++++++++++++-------
> > >  .../gcc.dg/plugin/must-tail-call-1.c          |  1 +
> > >  2 files changed, 21 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/gcc/calls.cc b/gcc/calls.cc
> > > index 21d78f9779fe..a6b8ee44cc29 100644
> > > --- a/gcc/calls.cc
> > > +++ b/gcc/calls.cc
> > > @@ -2650,7 +2650,9 @@ expand_call (tree exp, rtx target, int ignore)
> > >    /* The type of the function being called.  */
> > >    tree fntype;
> > >    bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> > > -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> > > +  /* tree-tailcall decided not to do tail calls. Error for the musttail case.  */
> > > +  if (!try_tail_call)
> > > +      maybe_complain_about_tail_call (exp, "other reasons");
> > >    int pass;
> > >
> > >    /* Register in which non-BLKmode value will be returned,
> > > @@ -3022,10 +3024,22 @@ expand_call (tree exp, rtx target, int ignore)
> > >       pushed these optimizations into -O2.  Don't try if we're already
> > >       expanding a call, as that means we're an argument.  Don't try if
> > >       there's cleanups, as we know there's code to follow the call.  */
> > > -  if (currently_expanding_call++ != 0
> > > -      || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> > > -      || args_size.var
> > > -      || dbg_cnt (tail_call) == false)
> > > +  if (currently_expanding_call++ != 0)
> > > +    {
> > > +      maybe_complain_about_tail_call (exp, "inside another call");
> > > +      try_tail_call = 0;
> > > +    }
> > > +  if (!flag_optimize_sibling_calls
> > > +       && !CALL_FROM_THUNK_P (exp)
> > > +       && !CALL_EXPR_MUST_TAIL_CALL (exp))
> > > +    try_tail_call = 0;
> > > +  if (args_size.var)
> >
> > If we are both inside another call and run into this we give two errors,
> > but I guess that's OK ...
> >
> > > +    {
> > > +      /* ??? correct message?  */
> > > +      maybe_complain_about_tail_call (exp, "stack space needed");
> >
> > args_size.var != NULL_TREE means the argument size is not constant.
> > I'm quite sure this is an overly conservative check.
> >
> > > +      try_tail_call = 0;
> > > +    }
> > > +  if (dbg_cnt (tail_call) == false)
> > >      try_tail_call = 0;
> > >
> > >    /* Workaround buggy C/C++ wrappers around Fortran routines with
> > > @@ -3046,15 +3060,11 @@ expand_call (tree exp, rtx target, int ignore)
> > >             if (MEM_P (*iter))
> > >               {
> > >                 try_tail_call = 0;
> > > +               maybe_complain_about_tail_call (exp, "hidden string length argument");
> >
> > "hidden string length argument passed on stack"
> >
> > from what I read the code.
> >
> > >                 break;
> > >               }
> > >         }
> > >
> > > -  /* If the user has marked the function as requiring tail-call
> > > -     optimization, attempt it.  */
> > > -  if (must_tail_call)
> > > -    try_tail_call = 1;
> > > -
> > >    /*  Rest of purposes for tail call optimizations to fail.  */
> > >    if (try_tail_call)
> > >      try_tail_call = can_implement_as_sibling_call_p (exp,
> > > diff --git a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > > index 3a6d4cceaba7..44af361e2925 100644
> > > --- a/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > > +++ b/gcc/testsuite/gcc.dg/plugin/must-tail-call-1.c
> > > @@ -1,4 +1,5 @@
> > >  /* { dg-do compile { target tail_call } } */
> > > +/* { dg-options "-O2" } */
> >
> > So I think this is unfortunate - I think when there's a must-tail attribute
> > we should either run the tailcall pass to check the call even at -O0 or
> > trust the user with correctness  (hoping no optimization interfered with
> > the ability to tail-call).
> >
> > What were the ICEs you ran into?
> >
> > I would guess it's for example problematic to duplicate must-tail calls?
>
> I experimented more with this, the problem I have currently is that in
> -O0 when returning a struct I get something like
>
>   D.2846 = caller3<Bar> (D.2836); [must tail call]
>
>   <bb 3> :
>   D.2836 ={v} {CLOBBER(eos)};
>   return D.2846;
>
> And this always fails this check in tree-tailcall:
>
>       /* If the statement references memory or volatile operands, fail.  */
>       if (gimple_references_memory_p (stmt)
>           || gimple_has_volatile_ops (stmt))
>         return;

I can't see how this triggers on the IL above, the loop should have
ignored both the return and the clobber and when recursing to
the predecessor stop before the above check when runnig into the
call?

> In a optimized build this passes, but with -O0 it always fails
> when the pass is placed before pass_optimizations_g. I assume
> it's some problem with mem ssa form.
>
> Any ideas how to fix that? Otherwise I can restrict musttail to non
> structs.

I wonder how this works when optimizing?

>
> -Andi

Andi Kleen May 21, 2024, 1:35 p.m. UTC | #5

> I can't see how this triggers on the IL above, the loop should have
> ignored both the return and the clobber and when recursing to
> the predecessor stop before the above check when runnig into the
> call?

Yes, I tracked that down later. The problem was that there
were multiple successors to the BB due to exception handling,
which makes the find_tail_calls walker give up.

Putting the new pass after ehcleanup fixed that, but there
are still cases when ehcleanup cannot get rid of them and
then it gives up. musttail checking at expand time still
works, but can only give a vague error message.

> 
> > In a optimized build this passes, but with -O0 it always fails
> > when the pass is placed before pass_optimizations_g. I assume
> > it's some problem with mem ssa form.
> >
> > Any ideas how to fix that? Otherwise I can restrict musttail to non
> > structs.
> 
> I wonder how this works when optimizing?

It just doesn't. You need optimization to do tail calls with
structs. The only alternative would be to detect the situation
and pull in some extra passes.

Also even with optimization it only works for structs that
fit into registers. This could be maybe fixed, but is out of scope
for this patch kit.

-Andi

Richard Biener May 21, 2024, 4:41 p.m. UTC | #6

On Tue, May 21, 2024 at 3:35 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> > I can't see how this triggers on the IL above, the loop should have
> > ignored both the return and the clobber and when recursing to
> > the predecessor stop before the above check when runnig into the
> > call?
>
> Yes, I tracked that down later. The problem was that there
> were multiple successors to the BB due to exception handling,
> which makes the find_tail_calls walker give up.
>
> Putting the new pass after ehcleanup fixed that, but there
> are still cases when ehcleanup cannot get rid of them and
> then it gives up. musttail checking at expand time still
> works, but can only give a vague error message.
>
> >
> > > In a optimized build this passes, but with -O0 it always fails
> > > when the pass is placed before pass_optimizations_g. I assume
> > > it's some problem with mem ssa form.
> > >
> > > Any ideas how to fix that? Otherwise I can restrict musttail to non
> > > structs.
> >
> > I wonder how this works when optimizing?
>
> It just doesn't. You need optimization to do tail calls with
> structs. The only alternative would be to detect the situation
> and pull in some extra passes.
>
> Also even with optimization it only works for structs that
> fit into registers. This could be maybe fixed, but is out of scope
> for this patch kit.

I see.  I do wonder how we should deal with the inherent
dependence on optimization for [[musttail]] to work then?  "Solve"
the problem with good documentation?  Offer a -fignore-musttail
option to allow a -O0 build to at least succeed?  But then [[musttail]]
would rather be [[shouldtail]] and can no longer be for correctness?

How does clang solve this?

Richard.

>
> -Andi

Andi Kleen May 21, 2024, 9:45 p.m. UTC | #7

> I see.  I do wonder how we should deal with the inherent
> dependence on optimization for [[musttail]] to work then?  "Solve"
> the problem with good documentation?

For now that's good enough I think. If it's a significant problem
it can always be improved later.

> Offer a -fignore-musttail
> option to allow a -O0 build to at least succeed?  But then [[musttail]]
> would rather be [[shouldtail]] and can no longer be for correctness?

I don't think ignore-musttail is a good idea because an interpreter
relying on this would almost certainly die very quickly after overflowing
its stack. The feature is only useful when it can be relied on.

> How does clang solve this?

clang++ 17 seems to handle both cases (large struct in memory or struct
fitting into registers) without optimization.
I haven't checked the LLVM source on the exact details.

-Andi

[v5,1/5] Improve must tail in RTL backend

Commit Message

Comments

Patch