Message ID | 55378B54.2040904@arm.com |
---|---|
State | New |
Headers | show |
On 22/04/15 12:51, Kyrill Tkachov wrote: > On 21/04/15 18:33, Kyrill Tkachov wrote: >> On 21/04/15 15:09, Jeff Law wrote: >>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: >>>> From reading config/stormy16/stormy-abi it seems to me that we don't >>>> pass arguments partially in stormy16, so this code would never be called >>>> there. That leaves pa as the potential problematic target. >>>> I don't suppose there's an easy way to test on pa? My checkout of binutils >>>> doesn't seem to include a sim target for it. >>> No simulator, no machines in the testfarm, the box I had access to via >>> parisc-linux.org seems dead and my ancient PA overheats well before a >>> bootstrap could complete. I often regret knowing about the backwards >>> way many things were done on the PA because it makes me think about >>> cases that only matter on dead architectures. >> So what should be the action plan here? I can't add an assert on >> positive result as a negative result is valid. >> >> We want to catch the case where this would cause trouble on >> pa, or change the patch until we're confident that it's fine >> for pa. >> >> That being said, reading the documentation of STACK_GROWS_UPWARD >> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case >> where this would cause trouble on pa. >> >> Is the problem that in the function: >> >> +/* Add SIZE to X and check whether it's greater than Y. >> + If it is, return the constant amount by which it's greater or smaller. >> + If the two are not statically comparable (for example, X and Y contain >> + different registers) return -1. This is used in expand_push_insn to >> + figure out if reading SIZE bytes from location X will end up reading from >> + location Y. */ >> +static int >> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >> +{ >> + rtx tmp = plus_constant (Pmode, x, size); >> + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >> + >> + if (!CONST_INT_P (sub)) >> + return -1; >> + >> + return INTVAL (sub); >> +} >> >> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, >> so the function should something like the following? >> >> static int >> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >> { >> #ifdef ARGS_GROW_DOWNWARD >> rtx tmp = plus_constant (Pmode, x, -size); >> #else >> rtx tmp = plus_constant (Pmode, x, size); >> #endif >> rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >> >> if (!CONST_INT_P (sub)) >> return -1; >> >> #ifdef ARGS_GROW_DOWNWARD >> return INTVAL (-sub); >> #else >> return INTVAL (sub); >> #endif >> } >> >> now, say for x == sp + 4, y == sp + 8, size == 16: >> This would be a problematic case for arm, so this code on arm >> (where ARGS_GROW_DOWNWARD is *not* defined) would return >> 12, which is the number of bytes that overlap. >> >> On a target where ARGS_GROW_DOWNWARD is defined this would return >> -20, meaning that no overlap occurs (because we read in the descending >> direction from x, IIUC). > Hi Jeff, > > Here's an attempt to make this more concrete. > Only the memory_load_overlap function has changed. > This time I tried to take into account the case when > ARGS_GROW_DOWNWARD. > > Take the case where x == sp, y == sp + 8, size == 16. > For arm, this would return 8 as that is the number of bytes > that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it > would return -1 as we're reading down from x rather than up > towards y. > > In the case when x == sp + 8, y == sp, size == 16 > This would return -1 on arm since we're reading upwards from x > and thefore no overlap would happen. > > On pa, this would return 8, which I think is the right thing. > But again, I don't have access to any pa means of testing. > > What do you think of this approach? Hi Dave, Would it be possible for you to test this patch on a 64-bit hppa or at least bootstrap it? https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html There is a concern that it may potentially affect the passing of complex arguments partially on the stack and partially in regs on pa because of the way the args and stack grow on that target. Unfortunately I don't have access to any hardware or simulators. It would help a lot with getting this patch in. Thanks, Kyrill > > Thanks, > Kyrill > > P.S. I've included the testcase from Honggyu in the patch. > > 2015-04-22 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR target/65358 > * expr.c (memory_load_overlap): New function. > (emit_push_insn): When pushing partial args to the stack would > clobber the register part load the overlapping part into a pseudo > and put it into the hard reg after pushing. > > 2015-04-22 Honggyu Kim <hong.gyu.kim@lge.com> > > PR target/65358 > * gcc.dg/pr65358.c: New test. > >> >> Thanks, >> Kyrill >> >>> Jeff >>>
On 2015-04-27 6:12 AM, Kyrill Tkachov wrote: > > On 22/04/15 12:51, Kyrill Tkachov wrote: >> On 21/04/15 18:33, Kyrill Tkachov wrote: >>> On 21/04/15 15:09, Jeff Law wrote: >>>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: >>>>> From reading config/stormy16/stormy-abi it seems to me that we >>>>> don't >>>>> pass arguments partially in stormy16, so this code would never be >>>>> called >>>>> there. That leaves pa as the potential problematic target. >>>>> I don't suppose there's an easy way to test on pa? My checkout of >>>>> binutils >>>>> doesn't seem to include a sim target for it. >>>> No simulator, no machines in the testfarm, the box I had access to via >>>> parisc-linux.org seems dead and my ancient PA overheats well before a >>>> bootstrap could complete. I often regret knowing about the backwards >>>> way many things were done on the PA because it makes me think about >>>> cases that only matter on dead architectures. >>> So what should be the action plan here? I can't add an assert on >>> positive result as a negative result is valid. >>> >>> We want to catch the case where this would cause trouble on >>> pa, or change the patch until we're confident that it's fine >>> for pa. >>> >>> That being said, reading the documentation of STACK_GROWS_UPWARD >>> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case >>> where this would cause trouble on pa. >>> >>> Is the problem that in the function: >>> >>> +/* Add SIZE to X and check whether it's greater than Y. >>> + If it is, return the constant amount by which it's greater or >>> smaller. >>> + If the two are not statically comparable (for example, X and Y >>> contain >>> + different registers) return -1. This is used in >>> expand_push_insn to >>> + figure out if reading SIZE bytes from location X will end up >>> reading from >>> + location Y. */ >>> +static int >>> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >>> +{ >>> + rtx tmp = plus_constant (Pmode, x, size); >>> + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >>> + >>> + if (!CONST_INT_P (sub)) >>> + return -1; >>> + >>> + return INTVAL (sub); >>> +} >>> >>> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, >>> so the function should something like the following? >>> >>> static int >>> memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >>> { >>> #ifdef ARGS_GROW_DOWNWARD >>> rtx tmp = plus_constant (Pmode, x, -size); >>> #else >>> rtx tmp = plus_constant (Pmode, x, size); >>> #endif >>> rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >>> >>> if (!CONST_INT_P (sub)) >>> return -1; >>> >>> #ifdef ARGS_GROW_DOWNWARD >>> return INTVAL (-sub); >>> #else >>> return INTVAL (sub); >>> #endif >>> } >>> >>> now, say for x == sp + 4, y == sp + 8, size == 16: >>> This would be a problematic case for arm, so this code on arm >>> (where ARGS_GROW_DOWNWARD is *not* defined) would return >>> 12, which is the number of bytes that overlap. >>> >>> On a target where ARGS_GROW_DOWNWARD is defined this would return >>> -20, meaning that no overlap occurs (because we read in the descending >>> direction from x, IIUC). >> Hi Jeff, >> >> Here's an attempt to make this more concrete. >> Only the memory_load_overlap function has changed. >> This time I tried to take into account the case when >> ARGS_GROW_DOWNWARD. >> >> Take the case where x == sp, y == sp + 8, size == 16. >> For arm, this would return 8 as that is the number of bytes >> that overlap. On pa, since ARGS_GROW_DOWNWARD is defined it >> would return -1 as we're reading down from x rather than up >> towards y. >> >> In the case when x == sp + 8, y == sp, size == 16 >> This would return -1 on arm since we're reading upwards from x >> and thefore no overlap would happen. >> >> On pa, this would return 8, which I think is the right thing. >> But again, I don't have access to any pa means of testing. >> >> What do you think of this approach? > > Hi Dave, > > Would it be possible for you to test this patch on a 64-bit hppa > or at least bootstrap it? > https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html I started a build and test with your patch on hppa64-hp-hpux11.11 this morning. > > There is a concern that it may potentially affect the passing of > complex arguments partially on the stack and partially in regs > on pa because of the way the args and stack grow on that target. > > Unfortunately I don't have access to any hardware or simulators. > It would help a lot with getting this patch in. If you write to linux-parisc@vger.kernel.org, arrangements can be made for an account on a Debian parisc linux machine for development testing. Helge Deller has arranged for some new machines since we took over the Debian buildd infrastructure for parisc. More info is here: https://parisc.wiki.kernel.org/index.php/Main_Page > > Thanks, > Kyrill > > >> >> Thanks, >> Kyrill >> >> P.S. I've included the testcase from Honggyu in the patch. >> >> 2015-04-22 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> PR target/65358 >> * expr.c (memory_load_overlap): New function. >> (emit_push_insn): When pushing partial args to the stack would >> clobber the register part load the overlapping part into a pseudo >> and put it into the hard reg after pushing. >> >> 2015-04-22 Honggyu Kim <hong.gyu.kim@lge.com> >> >> PR target/65358 >> * gcc.dg/pr65358.c: New test. >> >>> >>> Thanks, >>> Kyrill >>> >>>> Jeff >>>> > > > Regards, Dave
On 2015-04-27 9:16 AM, John David Anglin wrote: >> Hi Dave, >> >> Would it be possible for you to test this patch on a 64-bit hppa >> or at least bootstrap it? >> https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01288.html > I started a build and test with your patch on hppa64-hp-hpux11.11 this > morning. The patch didn't cause any regressions on hppa2.0w-hp-hpux11.11 and hppa64-hp-hpux11.11. Dave
commit 39c9cb0bff1088cffaf31208b9a735d2bc0c505a Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Wed Mar 18 13:42:37 2015 +0000 [expr.c] PR 65358 Avoid clobbering partial argument during sibcall diff --git a/gcc/expr.c b/gcc/expr.c index 530a944..ccf94e0 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -4121,6 +4121,36 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type) } #endif +/* If reading SIZE bytes from X will end up reading from + Y return the number of bytes that overlap. Return -1 + in all other cases. */ + +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, +#ifdef ARGS_GROW_DOWNWARD + -size); +#else + size); +#endif + + + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) + return -1; + + HOST_WIDE_INT val = +#ifdef ARGS_GROW_DOWNWARD + -INTVAL (sub); +#else + INTVAL (sub); +#endif + + return IN_RANGE (val, 1, size) ? val : -1; +} + /* Generate code to push X onto the stack, assuming it has mode MODE and type TYPE. MODE is redundant except when X is a CONST_INT (since they don't @@ -4179,6 +4209,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, xinner = x; + int nregs = partial / UNITS_PER_WORD; + rtx *tmp_regs = NULL; + int overlapping = 0; + if (mode == BLKmode || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode))) { @@ -4309,6 +4343,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, PARM_BOUNDARY. Assume the caller isn't lying. */ set_mem_align (target, align); + /* If part should go in registers and pushing to that part would + overwrite some of the values that need to go into regs, load the + overlapping values into temporary pseudos to be moved into the hard + regs at the end after the stack pushing has completed. + We cannot load them directly into the hard regs here because + they can be clobbered by the block move expansions. + See PR 65358. */ + + if (partial > 0 && reg != 0 && mode == BLKmode + && GET_CODE (reg) != PARALLEL) + { + overlapping = memory_load_overlap (XEXP (x, 0), temp, partial); + if (overlapping > 0) + { + gcc_assert (overlapping % UNITS_PER_WORD == 0); + overlapping /= UNITS_PER_WORD; + + tmp_regs = XALLOCAVEC (rtx, overlapping); + + for (int i = 0; i < overlapping; i++) + tmp_regs[i] = gen_reg_rtx (word_mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (tmp_regs[i], + operand_subword_force (target, i, mode)); + } + else + overlapping = 0; + } emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM); } } @@ -4411,9 +4474,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, } } - /* If part should go in registers, copy that part - into the appropriate registers. Do this now, at the end, - since mem-to-mem copies above may do function calls. */ + /* Move the partial arguments into the registers and any overlapping + values that we moved into the pseudos in tmp_regs. */ if (partial > 0 && reg != 0) { /* Handle calls that pass values in multiple non-contiguous locations. @@ -4421,9 +4483,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, if (GET_CODE (reg) == PARALLEL) emit_group_load (reg, x, type, -1); else - { + { gcc_assert (partial % UNITS_PER_WORD == 0); - move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode); + move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg) + + nregs - overlapping + i), + tmp_regs[i]); + } } diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c new file mode 100644 index 0000000..ba89fd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr65358.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +struct pack +{ + int fine; + int victim; + int killer; +}; + +int __attribute__ ((__noinline__, __noclone__)) +bar (int a, int b, struct pack p) +{ + if (a != 20 || b != 30) + __builtin_abort (); + if (p.fine != 40 || p.victim != 50 || p.killer != 60) + __builtin_abort (); + return 0; +} + +int __attribute__ ((__noinline__, __noclone__)) +foo (int arg1, int arg2, int arg3, struct pack p) +{ + return bar (arg2, arg3, p); +} + +int main (void) +{ + struct pack p = { 40, 50, 60 }; + + (void) foo (10, 20, 30, p); + return 0; +}