diff mbox

[expr.c] PR 65358 Avoid clobbering partial argument during sibcall

Message ID 55378B54.2040904@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov April 22, 2015, 11:51 a.m. UTC
On 21/04/15 18:33, Kyrill Tkachov wrote:
> On 21/04/15 15:09, Jeff Law wrote:
>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>    From reading config/stormy16/stormy-abi it seems to me that we don't
>>> pass arguments partially in stormy16, so this code would never be called
>>> there. That leaves pa as the potential problematic target.
>>> I don't suppose there's an easy way to test on pa? My checkout of binutils
>>> doesn't seem to include a sim target for it.
>> No simulator, no machines in the testfarm, the box I had access to via
>> parisc-linux.org seems dead and my ancient PA overheats well before a
>> bootstrap could complete.  I often regret knowing about the backwards
>> way many things were done on the PA because it makes me think about
>> cases that only matter on dead architectures.
> So what should be the action plan here? I can't add an assert on
> positive result as a negative result is valid.
>
> We want to catch the case where this would cause trouble on
> pa, or change the patch until we're confident that it's fine
> for pa.
>
> That being said, reading the documentation of STACK_GROWS_UPWARD
> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
> where this would cause trouble on pa.
>
> Is the problem that in the function:
>
> +/* Add SIZE to X and check whether it's greater than Y.
> +   If it is, return the constant amount by which it's greater or smaller.
> +   If the two are not statically comparable (for example, X and Y contain
> +   different registers) return -1.  This is used in expand_push_insn to
> +   figure out if reading SIZE bytes from location X will end up reading from
> +   location Y.  */
> +static int
> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> +{
> +  rtx tmp = plus_constant (Pmode, x, size);
> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
> +
> +  if (!CONST_INT_P (sub))
> +    return -1;
> +
> +  return INTVAL (sub);
> +}
>
> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
> so the function should something like the following?
>
> static int
> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
> {
> #ifdef ARGS_GROW_DOWNWARD
>     rtx tmp = plus_constant (Pmode, x, -size);
> #else
>     rtx tmp = plus_constant (Pmode, x, size);
> #endif
>     rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>
>     if (!CONST_INT_P (sub))
>       return -1;
>
> #ifdef ARGS_GROW_DOWNWARD
>     return INTVAL (-sub);
> #else
>     return INTVAL (sub);
> #endif
> }
>
> now, say for x == sp + 4,  y == sp + 8, size == 16:
> This would be a problematic case for arm, so this code on arm
> (where ARGS_GROW_DOWNWARD is *not* defined) would return
> 12, which is the number of bytes that overlap.
>
> On a target where ARGS_GROW_DOWNWARD is defined this would return
> -20, meaning that no overlap occurs (because we read in the descending
> direction from x, IIUC).
Hi Jeff,

Here's an attempt to make this more concrete.
Only the memory_load_overlap function has changed.
This time I tried to take into account the case when
ARGS_GROW_DOWNWARD.

Take the case where x == sp, y == sp + 8, size == 16.
For arm, this would return 8 as that is the number of bytes
that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
would return -1 as we're reading down from x rather than up
towards y.

In the case when x == sp + 8, y == sp, size == 16
This would return -1 on arm since we're reading upwards from x
and thefore no overlap would happen.

On pa, this would return 8, which I think is the right thing.
But again, I don't have access to any pa means of testing.

What do you think of this approach?

Thanks,
Kyrill

P.S. I've included the testcase from Honggyu in the patch.

2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65358
     * expr.c (memory_load_overlap): New function.
     (emit_push_insn): When pushing partial args to the stack would
     clobber the register part load the overlapping part into a pseudo
     and put it into the hard reg after pushing.

2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>

     PR target/65358
     * gcc.dg/pr65358.c: New test.

>
>
> Thanks,
> Kyrill
>
>>
>> Jeff
>>

Comments

Kyrylo Tkachov April 27, 2015, 10:12 a.m. UTC | #1
On 22/04/15 12:51, Kyrill Tkachov wrote:
> On 21/04/15 18:33, Kyrill Tkachov wrote:
>> On 21/04/15 15:09, Jeff Law wrote:
>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>     From reading config/stormy16/stormy-abi it seems to me that we don't
>>>> pass arguments partially in stormy16, so this code would never be called
>>>> there. That leaves pa as the potential problematic target.
>>>> I don't suppose there's an easy way to test on pa? My checkout of binutils
>>>> doesn't seem to include a sim target for it.
>>> No simulator, no machines in the testfarm, the box I had access to via
>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>> bootstrap could complete.  I often regret knowing about the backwards
>>> way many things were done on the PA because it makes me think about
>>> cases that only matter on dead architectures.
>> So what should be the action plan here? I can't add an assert on
>> positive result as a negative result is valid.
>>
>> We want to catch the case where this would cause trouble on
>> pa, or change the patch until we're confident that it's fine
>> for pa.
>>
>> That being said, reading the documentation of STACK_GROWS_UPWARD
>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>> where this would cause trouble on pa.
>>
>> Is the problem that in the function:
>>
>> +/* Add SIZE to X and check whether it's greater than Y.
>> +   If it is, return the constant amount by which it's greater or smaller.
>> +   If the two are not statically comparable (for example, X and Y contain
>> +   different registers) return -1.  This is used in expand_push_insn to
>> +   figure out if reading SIZE bytes from location X will end up reading from
>> +   location Y.  */
>> +static int
>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> +{
>> +  rtx tmp = plus_constant (Pmode, x, size);
>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>> +
>> +  if (!CONST_INT_P (sub))
>> +    return -1;
>> +
>> +  return INTVAL (sub);
>> +}
>>
>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>> so the function should something like the following?
>>
>> static int
>> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>> {
>> #ifdef ARGS_GROW_DOWNWARD
>>      rtx tmp = plus_constant (Pmode, x, -size);
>> #else
>>      rtx tmp = plus_constant (Pmode, x, size);
>> #endif
>>      rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>
>>      if (!CONST_INT_P (sub))
>>        return -1;
>>
>> #ifdef ARGS_GROW_DOWNWARD
>>      return INTVAL (-sub);
>> #else
>>      return INTVAL (sub);
>> #endif
>> }
>>
>> now, say for x == sp + 4,  y == sp + 8, size == 16:
>> This would be a problematic case for arm, so this code on arm
>> (where ARGS_GROW_DOWNWARD is *not* defined) would return
>> 12, which is the number of bytes that overlap.
>>
>> On a target where ARGS_GROW_DOWNWARD is defined this would return
>> -20, meaning that no overlap occurs (because we read in the descending
>> direction from x, IIUC).
> Hi Jeff,
>
> Here's an attempt to make this more concrete.
> Only the memory_load_overlap function has changed.
> This time I tried to take into account the case when
> ARGS_GROW_DOWNWARD.
>
> Take the case where x == sp, y == sp + 8, size == 16.
> For arm, this would return 8 as that is the number of bytes
> that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
> would return -1 as we're reading down from x rather than up
> towards y.
>
> In the case when x == sp + 8, y == sp, size == 16
> This would return -1 on arm since we're reading upwards from x
> and thefore no overlap would happen.
>
> On pa, this would return 8, which I think is the right thing.
> But again, I don't have access to any pa means of testing.
>
> What do you think of this approach?

Hi Dave,

Would it be possible for you to test this patch on a 64-bit hppa
or at least bootstrap it?
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html

There is a concern that it may potentially affect the passing of
complex arguments partially on the stack and partially in regs
on pa because of the way the args and stack grow on that target.

Unfortunately I don't have access to any hardware or simulators.
It would help a lot with getting this patch in.

Thanks,
Kyrill


>
> Thanks,
> Kyrill
>
> P.S. I've included the testcase from Honggyu in the patch.
>
> 2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       PR target/65358
>       * expr.c (memory_load_overlap): New function.
>       (emit_push_insn): When pushing partial args to the stack would
>       clobber the register part load the overlapping part into a pseudo
>       and put it into the hard reg after pushing.
>
> 2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>
>
>       PR target/65358
>       * gcc.dg/pr65358.c: New test.
>
>>
>> Thanks,
>> Kyrill
>>
>>> Jeff
>>>
John David Anglin April 27, 2015, 1:16 p.m. UTC | #2
On 2015-04-27 6:12 AM, Kyrill Tkachov wrote:
>
> On 22/04/15 12:51, Kyrill Tkachov wrote:
>> On 21/04/15 18:33, Kyrill Tkachov wrote:
>>> On 21/04/15 15:09, Jeff Law wrote:
>>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote:
>>>>>     From reading config/stormy16/stormy-abi it seems to me that we 
>>>>> don't
>>>>> pass arguments partially in stormy16, so this code would never be 
>>>>> called
>>>>> there. That leaves pa as the potential problematic target.
>>>>> I don't suppose there's an easy way to test on pa? My checkout of 
>>>>> binutils
>>>>> doesn't seem to include a sim target for it.
>>>> No simulator, no machines in the testfarm, the box I had access to via
>>>> parisc-linux.org seems dead and my ancient PA overheats well before a
>>>> bootstrap could complete.  I often regret knowing about the backwards
>>>> way many things were done on the PA because it makes me think about
>>>> cases that only matter on dead architectures.
>>> So what should be the action plan here? I can't add an assert on
>>> positive result as a negative result is valid.
>>>
>>> We want to catch the case where this would cause trouble on
>>> pa, or change the patch until we're confident that it's fine
>>> for pa.
>>>
>>> That being said, reading the documentation of STACK_GROWS_UPWARD
>>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case
>>> where this would cause trouble on pa.
>>>
>>> Is the problem that in the function:
>>>
>>> +/* Add SIZE to X and check whether it's greater than Y.
>>> +   If it is, return the constant amount by which it's greater or 
>>> smaller.
>>> +   If the two are not statically comparable (for example, X and Y 
>>> contain
>>> +   different registers) return -1.  This is used in 
>>> expand_push_insn to
>>> +   figure out if reading SIZE bytes from location X will end up 
>>> reading from
>>> +   location Y.  */
>>> +static int
>>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> +{
>>> +  rtx tmp = plus_constant (Pmode, x, size);
>>> +  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>> +
>>> +  if (!CONST_INT_P (sub))
>>> +    return -1;
>>> +
>>> +  return INTVAL (sub);
>>> +}
>>>
>>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x,
>>> so the function should something like the following?
>>>
>>> static int
>>> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
>>> {
>>> #ifdef ARGS_GROW_DOWNWARD
>>>      rtx tmp = plus_constant (Pmode, x, -size);
>>> #else
>>>      rtx tmp = plus_constant (Pmode, x, size);
>>> #endif
>>>      rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
>>>
>>>      if (!CONST_INT_P (sub))
>>>        return -1;
>>>
>>> #ifdef ARGS_GROW_DOWNWARD
>>>      return INTVAL (-sub);
>>> #else
>>>      return INTVAL (sub);
>>> #endif
>>> }
>>>
>>> now, say for x == sp + 4,  y == sp + 8, size == 16:
>>> This would be a problematic case for arm, so this code on arm
>>> (where ARGS_GROW_DOWNWARD is *not* defined) would return
>>> 12, which is the number of bytes that overlap.
>>>
>>> On a target where ARGS_GROW_DOWNWARD is defined this would return
>>> -20, meaning that no overlap occurs (because we read in the descending
>>> direction from x, IIUC).
>> Hi Jeff,
>>
>> Here's an attempt to make this more concrete.
>> Only the memory_load_overlap function has changed.
>> This time I tried to take into account the case when
>> ARGS_GROW_DOWNWARD.
>>
>> Take the case where x == sp, y == sp + 8, size == 16.
>> For arm, this would return 8 as that is the number of bytes
>> that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it
>> would return -1 as we're reading down from x rather than up
>> towards y.
>>
>> In the case when x == sp + 8, y == sp, size == 16
>> This would return -1 on arm since we're reading upwards from x
>> and thefore no overlap would happen.
>>
>> On pa, this would return 8, which I think is the right thing.
>> But again, I don't have access to any pa means of testing.
>>
>> What do you think of this approach?
>
> Hi Dave,
>
> Would it be possible for you to test this patch on a 64-bit hppa
> or at least bootstrap it?
> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html
I started a build and test with your patch on hppa64-hp-hpux11.11 this 
morning.

>
> There is a concern that it may potentially affect the passing of
> complex arguments partially on the stack and partially in regs
> on pa because of the way the args and stack grow on that target.
>
> Unfortunately I don't have access to any hardware or simulators.
> It would help a lot with getting this patch in.
If you write to linux-parisc@vger.kernel.org, arrangements can be made for
an account on a Debian parisc linux machine for development testing.

Helge Deller has arranged for some new machines since we took over the
Debian buildd infrastructure  for parisc.  More info is here:
https://parisc.wiki.kernel.org/index.php/Main_Page

>
> Thanks,
> Kyrill
>
>
>>
>> Thanks,
>> Kyrill
>>
>> P.S. I've included the testcase from Honggyu in the patch.
>>
>> 2015-04-22  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>       PR target/65358
>>       * expr.c (memory_load_overlap): New function.
>>       (emit_push_insn): When pushing partial args to the stack would
>>       clobber the register part load the overlapping part into a pseudo
>>       and put it into the hard reg after pushing.
>>
>> 2015-04-22  Honggyu Kim  <hong.gyu.kim@lge.com>
>>
>>       PR target/65358
>>       * gcc.dg/pr65358.c: New test.
>>
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>> Jeff
>>>>
>
>
>

Regards,
Dave
John David Anglin May 6, 2015, 6:57 p.m. UTC | #3
On 2015-04-27 9:16 AM, John David Anglin wrote:
>> Hi Dave,
>>
>> Would it be possible for you to test this patch on a 64-bit hppa
>> or at least bootstrap it?
>> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html
> I started a build and test with your patch on hppa64-hp-hpux11.11 this 
> morning.

The patch didn't cause any regressions on hppa2.0w-hp-hpux11.11 and 
hppa64-hp-hpux11.11.

Dave
diff mbox

Patch

commit 39c9cb0bff1088cffaf31208b9a735d2bc0c505a
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Mar 18 13:42:37 2015 +0000

    [expr.c] PR 65358 Avoid clobbering partial argument during sibcall

diff --git a/gcc/expr.c b/gcc/expr.c
index 530a944..ccf94e0 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -4121,6 +4121,36 @@  emit_single_push_insn (machine_mode mode, rtx x, tree type)
 }
 #endif
 
+/* If reading SIZE bytes from X will end up reading from
+   Y return the number of bytes that overlap.  Return -1
+   in all other cases.  */
+
+static int
+memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size)
+{
+  rtx tmp = plus_constant (Pmode, x,
+#ifdef ARGS_GROW_DOWNWARD
+			   -size);
+#else
+			   size);
+#endif
+
+
+  rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y);
+
+  if (!CONST_INT_P (sub))
+    return -1;
+
+  HOST_WIDE_INT val =
+#ifdef ARGS_GROW_DOWNWARD
+    -INTVAL (sub);
+#else
+     INTVAL (sub);
+#endif
+
+  return IN_RANGE (val, 1, size) ? val : -1;
+}
+
 /* Generate code to push X onto the stack, assuming it has mode MODE and
    type TYPE.
    MODE is redundant except when X is a CONST_INT (since they don't
@@ -4179,6 +4209,10 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 
   xinner = x;
 
+  int nregs = partial / UNITS_PER_WORD;
+  rtx *tmp_regs = NULL;
+  int overlapping = 0;
+
   if (mode == BLKmode
       || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode)))
     {
@@ -4309,6 +4343,35 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	     PARM_BOUNDARY.  Assume the caller isn't lying.  */
 	  set_mem_align (target, align);
 
+	  /* If part should go in registers and pushing to that part would
+	     overwrite some of the values that need to go into regs, load the
+	     overlapping values into temporary pseudos to be moved into the hard
+	     regs at the end after the stack pushing has completed.
+	     We cannot load them directly into the hard regs here because
+	     they can be clobbered by the block move expansions.
+	     See PR 65358.  */
+
+	  if (partial > 0 && reg != 0 && mode == BLKmode
+	      && GET_CODE (reg) != PARALLEL)
+	    {
+	      overlapping = memory_load_overlap (XEXP (x, 0), temp, partial);
+	      if (overlapping > 0)
+	        {
+		  gcc_assert (overlapping % UNITS_PER_WORD == 0);
+		  overlapping /= UNITS_PER_WORD;
+
+		  tmp_regs = XALLOCAVEC (rtx, overlapping);
+
+		  for (int i = 0; i < overlapping; i++)
+		    tmp_regs[i] = gen_reg_rtx (word_mode);
+
+		  for (int i = 0; i < overlapping; i++)
+		    emit_move_insn (tmp_regs[i],
+				    operand_subword_force (target, i, mode));
+	        }
+	      else
+		overlapping = 0;
+	    }
 	  emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM);
 	}
     }
@@ -4411,9 +4474,8 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
 	}
     }
 
-  /* If part should go in registers, copy that part
-     into the appropriate registers.  Do this now, at the end,
-     since mem-to-mem copies above may do function calls.  */
+  /* Move the partial arguments into the registers and any overlapping
+     values that we moved into the pseudos in tmp_regs.  */
   if (partial > 0 && reg != 0)
     {
       /* Handle calls that pass values in multiple non-contiguous locations.
@@ -4421,9 +4483,15 @@  emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
       if (GET_CODE (reg) == PARALLEL)
 	emit_group_load (reg, x, type, -1);
       else
-	{
+        {
 	  gcc_assert (partial % UNITS_PER_WORD == 0);
-	  move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode);
+	  move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode);
+
+	  for (int i = 0; i < overlapping; i++)
+	    emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg)
+						    + nregs - overlapping + i),
+			    tmp_regs[i]);
+
 	}
     }
 
diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c
new file mode 100644
index 0000000..ba89fd4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr65358.c
@@ -0,0 +1,33 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+struct pack
+{
+  int fine;
+  int victim;
+  int killer;
+};
+
+int __attribute__ ((__noinline__, __noclone__))
+bar (int a, int b, struct pack p)
+{
+  if (a != 20 || b != 30)
+    __builtin_abort ();
+  if (p.fine != 40 || p.victim != 50 || p.killer != 60)
+    __builtin_abort ();
+  return 0;
+}
+
+int __attribute__ ((__noinline__, __noclone__))
+foo (int arg1, int arg2, int arg3, struct pack p)
+{
+  return bar (arg2, arg3, p);
+}
+
+int main (void)
+{
+  struct pack p = { 40, 50, 60 };
+
+  (void) foo (10, 20, 30, p);
+  return 0;
+}