diff mbox series

rs6000: Support ELFv2 sibcall for indirect calls [PR96787]

Message ID 20200827132134.32958-1-wschmidt@linux.ibm.com
State New
Headers show
Series rs6000: Support ELFv2 sibcall for indirect calls [PR96787] | expand

Commit Message

Bill Schmidt Aug. 27, 2020, 1:21 p.m. UTC
Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
we need to be sure that r12 is set up prior to such a call.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Is this okay for trunk?

Thanks,
Bill


2020-08-27  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	PR target/96787
	* config/rs6000/rs6000-logue.c (rs6000_sibcall_aix): Support
	indirect call for ELFv2.

gcc/testsuite/

	PR target/96787
	* gcc.target/powerpc/pr96787-1.c: New.
	* gcc.target/powerpc/pr96787-2.c: New.
---
 gcc/config/rs6000/rs6000.c                   | 19 +++++++++-
 gcc/testsuite/gcc.target/powerpc/pr96787-1.c | 38 ++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pr96787-2.c | 35 ++++++++++++++++++
 3 files changed, 91 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96787-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96787-2.c

Comments

Alan Modra Aug. 27, 2020, 1:38 p.m. UTC | #1
On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
> Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
> we need to be sure that r12 is set up prior to such a call.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this okay for trunk?

FWIW, looks fine to me.
David Edelsohn Aug. 27, 2020, 1:47 p.m. UTC | #2
On Thu, Aug 27, 2020 at 9:21 AM Bill Schmidt <wschmidt@linux.ibm.com> wrote:
>
> Prior to P10, ELFv2 hasn't implemented nonlocal sibcalls.  Now that we do,
> we need to be sure that r12 is set up prior to such a call.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this okay for trunk?
>
> Thanks,
> Bill
>
>
> 2020-08-27  Bill Schmidt  <wschmidt@linux.ibm.com>
>
> gcc/
>         PR target/96787
>         * config/rs6000/rs6000-logue.c (rs6000_sibcall_aix): Support
>         indirect call for ELFv2.
>
> gcc/testsuite/
>
>         PR target/96787
>         * gcc.target/powerpc/pr96787-1.c: New.
>         * gcc.target/powerpc/pr96787-2.c: New.

Okay.

Thanks, David
Segher Boessenkool Aug. 27, 2020, 6:41 p.m. UTC | #3
Hi!

On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
> +  /* For ELFv2, r12 and CTR need to hold the function address
> +     for an indirect call.  */
> +  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
> +    {
> +      r12 = gen_rtx_REG (Pmode, 12);
> +      if (!rtx_equal_p (r12, func_desc))
> +	emit_move_insn (r12, func_desc);

These last two lines aren't needed?  A move from r12 to r12 is harmless,
and should be optimised away just fine?

>    /* Create the call.  */
> -  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
> +  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);

I don't understand this change?  (Maybe I'm not looking well enough.)

Looks fine otherwise, yes :-)


Segher
Li, Pan2 via Gcc-patches Aug. 27, 2020, 6:51 p.m. UTC | #4
Hi!

On 8/27/20 1:41 PM, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Aug 27, 2020 at 08:21:34AM -0500, Bill Schmidt wrote:
>> +  /* For ELFv2, r12 and CTR need to hold the function address
>> +     for an indirect call.  */
>> +  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
>> +    {
>> +      r12 = gen_rtx_REG (Pmode, 12);
>> +      if (!rtx_equal_p (r12, func_desc))
>> +	emit_move_insn (r12, func_desc);
> These last two lines aren't needed?  A move from r12 to r12 is harmless,
> and should be optimised away just fine?


On entry to this function, func_desc has the function address, and the 
problem is for a sibcall it is generally not in r12.  So I'm forcing it 
into r12 here.  I guess we could remove the !rtx_equal_p test and 
generate an unnecessary copy, but I don't see why we should do that.  
This is the same thing we do elsewhere (such as rs6000_aix_call).

>
>>     /* Create the call.  */
>> -  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
>> +  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);
> I don't understand this change?  (Maybe I'm not looking well enough.)

Prior to this change, func_desc comes in as a parameter and is never 
changed.  Now it's either that case, or it's the new case, so this just 
is the join point of that decision.

Thanks,

Bill

>
> Looks fine otherwise, yes :-)
>
>
> Segher
Segher Boessenkool Aug. 27, 2020, 8:17 p.m. UTC | #5
On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> >>+  /* For ELFv2, r12 and CTR need to hold the function address
> >>+     for an indirect call.  */
> >>+  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
> >>+    {
> >>+      r12 = gen_rtx_REG (Pmode, 12);
> >>+      if (!rtx_equal_p (r12, func_desc))
> >>+	emit_move_insn (r12, func_desc);
> >These last two lines aren't needed?  A move from r12 to r12 is harmless,
> >and should be optimised away just fine?
> 
> On entry to this function, func_desc has the function address, and the 
> problem is for a sibcall it is generally not in r12.  So I'm forcing it 
> into r12 here.  I guess we could remove the !rtx_equal_p test and 

Yes, that is what I meant, sorry.

> generate an unnecessary copy, but I don't see why we should do that.  

Because it is simpler code.  Micro-optimisations are often not
optimisations at all, just obfuscations, and those hurt in various
ways (they make bigger binary code than necessary, but perhaps more
importantly they make bigger source code, which hurts in various ways).

It not the copy that is unnecessary: the preventing it *here*, manually,
is what is unnecessary.

> This is the same thing we do elsewhere (such as rs6000_aix_call).

That does not mean we have to do the same harmful (or just not useful)
thing everywhere else as well ;-)

> >>    /* Create the call.  */
> >>-  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), 
> >>tlsarg);
> >>+  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), 
> >>tlsarg);
> >I don't understand this change?  (Maybe I'm not looking well enough.)
> 
> Prior to this change, func_desc comes in as a parameter and is never 
> changed.  Now it's either that case, or it's the new case, so this just 
> is the join point of that decision.

So I'm not looking well enough, okay :-)


Segher
Alan Modra Aug. 28, 2020, 1:18 a.m. UTC | #6
On Thu, Aug 27, 2020 at 03:17:45PM -0500, Segher Boessenkool wrote:
> On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> It not the copy that is unnecessary: the preventing it *here*, manually,
> is what is unnecessary.

Blame me for the original !rtx_equal_p in rs6000_call_aix that Bill
copied.  So does emit_move_insn prevent the copy?  I can't spot where,
maybe I haven't looked hard enough.

If emit_move_insn doesn't prevent it, then why create useless RTL that
is only going to make work for optimisation passes that remove such
nops?
Segher Boessenkool Aug. 28, 2020, 6:17 a.m. UTC | #7
On Fri, Aug 28, 2020 at 10:48:43AM +0930, Alan Modra wrote:
> On Thu, Aug 27, 2020 at 03:17:45PM -0500, Segher Boessenkool wrote:
> > On Thu, Aug 27, 2020 at 01:51:25PM -0500, Bill Schmidt wrote:
> > It not the copy that is unnecessary: the preventing it *here*, manually,
> > is what is unnecessary.
> 
> Blame me for the original !rtx_equal_p in rs6000_call_aix that Bill
> copied.  So does emit_move_insn prevent the copy?  I can't spot where,
> maybe I haven't looked hard enough.

It doesn't.  But this is generated during expand already, many later
passes can optimise it away.

> If emit_move_insn doesn't prevent it, then why create useless RTL that
> is only going to make work for optimisation passes that remove such
> nops?

1) Very many unnecessary moves are generated during expand *anyway*, a
few more will not hurt;
2) In practice we always generate a move from a pseudo into r12 here,
never a copy from r12 into r12;
3) Why generate dead code optimising cases that do not happen, making
the compiler bigger and slower, but more importantly, making the
compiler code harder to read?

Such optimisations can be useful for code we generate very late, but
not for stuff run at expand.  It is distracting to the reader, who then
can think a copy from r12 to r12 can happen here, and even is frequent
enough to warrant a special optimisation.

Older GCC was chock-full of such questionable micro-optimisations.  But
imo we should try to make the code base more modern now.


Segher
Alan Modra Aug. 28, 2020, 12:25 p.m. UTC | #8
On Fri, Aug 28, 2020 at 01:17:27AM -0500, Segher Boessenkool wrote:
> 1) Very many unnecessary moves are generated during expand *anyway*, a
> few more will not hurt;
> 2) In practice we always generate a move from a pseudo into r12 here,
> never a copy from r12 into r12;
> 3) Why generate dead code optimising cases that do not happen, making
> the compiler bigger and slower, but more importantly, making the
> compiler code harder to read?

Uh, OK point 2 and followup 3 is persuasive.  I should have thought of
that myself, thanks.
Li, Pan2 via Gcc-patches Aug. 28, 2020, 1:20 p.m. UTC | #9
On 8/28/20 7:25 AM, Alan Modra wrote:
> On Fri, Aug 28, 2020 at 01:17:27AM -0500, Segher Boessenkool wrote:
>> 1) Very many unnecessary moves are generated during expand *anyway*, a
>> few more will not hurt;
>> 2) In practice we always generate a move from a pseudo into r12 here,
>> never a copy from r12 into r12;
>> 3) Why generate dead code optimising cases that do not happen, making
>> the compiler bigger and slower, but more importantly, making the
>> compiler code harder to read?
> Uh, OK point 2 and followup 3 is persuasive.  I should have thought of
> that myself, thanks.
>
OK.  I had already upstreamed the patch on David's ok, but I can go back 
in and clear out those two unnecessary if's in _call and _sibcall fns.

Bill
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1c1caa90ede..09545278dcf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -24833,14 +24833,27 @@  rs6000_sibcall_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
 {
   rtx call[2];
   rtx insn;
+  rtx r12 = NULL_RTX;
+  rtx func_addr = func_desc;
 
   gcc_assert (INTVAL (cookie) == 0);
 
   if (global_tlsarg)
     tlsarg = global_tlsarg;
 
+  /* For ELFv2, r12 and CTR need to hold the function address
+     for an indirect call.  */
+  if (GET_CODE (func_desc) != SYMBOL_REF && DEFAULT_ABI == ABI_ELFv2)
+    {
+      r12 = gen_rtx_REG (Pmode, 12);
+      if (!rtx_equal_p (r12, func_desc))
+	emit_move_insn (r12, func_desc);
+      func_addr = gen_rtx_REG (Pmode, CTR_REGNO);
+      emit_move_insn (func_addr, r12);
+    }
+
   /* Create the call.  */
-  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_desc), tlsarg);
+  call[0] = gen_rtx_CALL (VOIDmode, gen_rtx_MEM (SImode, func_addr), tlsarg);
   if (value != NULL_RTX)
     call[0] = gen_rtx_SET (value, call[0]);
 
@@ -24853,6 +24866,10 @@  rs6000_sibcall_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
   if (!rs6000_pcrel_p (cfun))
     use_reg (&CALL_INSN_FUNCTION_USAGE (insn),
 	     gen_rtx_REG (Pmode, TOC_REGNUM));
+
+  /* Note use of r12.  */
+  if (r12)
+    use_reg (&CALL_INSN_FUNCTION_USAGE (insn), r12);
 }
 
 /* Expand code to perform a call under the SYSV4 ABI.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96787-1.c b/gcc/testsuite/gcc.target/powerpc/pr96787-1.c
new file mode 100644
index 00000000000..3c58e63797f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96787-1.c
@@ -0,0 +1,38 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_elfv2 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify that we generate an indirect sibcall for ELFv2 on P10 and
+   later, with r12 and CTR containing the function address.  PR96787.  */
+
+extern int f (int);
+
+int main ()
+{
+  if (f (3) != 6)
+    return 1;
+  return 0;
+}
+
+
+int g (int a)
+{
+  return a * 2;
+}
+
+
+int h (int a)
+{
+  return a + 2;
+}
+
+int __attribute__((__noinline__)) f (int a)
+{
+  int (*x) (int) = a % 2 ? &g : &h;
+  (*x) (a);
+}
+
+/* { dg-final { scan-assembler {\mmtctr 12\M} } } */
+/* { dg-final { scan-assembler {\mbctr\M} } } */
+/* { dg-final { scan-assembler-not {\mbctrl\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96787-2.c b/gcc/testsuite/gcc.target/powerpc/pr96787-2.c
new file mode 100644
index 00000000000..b10ab7a8ce8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96787-2.c
@@ -0,0 +1,35 @@ 
+/* { dg-do run } */
+/* { dg-require-effective-target power10_hw } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+/* Verify that we generate an indirect sibcall for ELFv2 on P10 and
+   later, with r12 and CTR containing the function address.  PR96787.  */
+
+extern void abort (void);
+extern int f (int);
+
+int main ()
+{
+  if (f (3) != 6)
+    abort ();
+  return 0;
+}
+
+
+int g (int a)
+{
+  return a * 2;
+}
+
+
+int h (int a)
+{
+  return a + 2;
+}
+
+int __attribute__((__noinline__)) f (int a)
+{
+  int (*x) (int) = a % 2 ? &g : &h;
+  (*x) (a);
+}
+