Message ID | 550ADF8F.7030300@arm.com |
---|---|
State | New |
Headers | show |
Ping. https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html Thanks, Kyrill On 19/03/15 14:39, Kyrill Tkachov wrote: > Hi all, > > This patch fixes PR 65358. For details look at the excellent write-up > by Honggyu in bugzilla. The problem is that we're trying to pass a struct > partially on the stack and partially in regs during a tail-call optimisation > but the struct we're passing is also a partial incoming arg though the split > between stack and regs is different from its outgoing usage. > > The emit_push_insn code ends up doing a block move for the on-stack part but > ends up overwriting the part that needs to be loaded into regs. > My first thought was to just load the regs part first and then do the stack > part but that doesn't work as multiple comments in that function indicate > (the block move being expanded to movmem or other functions being one of the > reasons). > > My proposed solution is to detect when the overlap happens, find the > overlapping region and load it before the stack pushing into pseudos and > after the stack pushing is done move the overlapping values from the pseudos > into the hard argument regs that they're supposed to go. > > That way this new functionality should only ever be triggered when there's > the overlap in this PR (causing wrong-code) and shouldn't affect codegen > anywhere else. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64-linux-gnu. > > According to the PR this appears at least as far back 4.6 so this isn't a > regression on the release branches, but it is a wrong-code bug. > > I'll let Honggyu upstream the testcase separately > (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html) After guidance from Jeff, I'll take this testcase in as well if this is approved. > > I'll be testing this on the 4.8 and 4.9 branches. > Thoughts on this approach? > > Thanks, > Kyrill > > 2015-03-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR middle-end/65358 > * expr.c (memory_load_overlap): New function. > (emit_push_insn): When pushing partial args to the stack would > clobber the register part load the overlapping part into a pseudo > and put it into the hard reg after pushing.
On Thu, Mar 19, 2015 at 02:39:11PM +0000, Kyrill Tkachov wrote: > Hi all, > > This patch fixes PR 65358. For details look at the excellent write-up > by Honggyu in bugzilla. The problem is that we're trying to pass a struct > partially on the stack and partially in regs during a tail-call optimisation > but the struct we're passing is also a partial incoming arg though the split > between stack and regs is different from its outgoing usage. > > The emit_push_insn code ends up doing a block move for the on-stack part but > ends up overwriting the part that needs to be loaded into regs. > My first thought was to just load the regs part first and then do the stack > part but that doesn't work as multiple comments in that function indicate > (the block move being expanded to movmem or other functions being one of the > reasons). > > My proposed solution is to detect when the overlap happens, find the > overlapping region and load it before the stack pushing into pseudos and > after the stack pushing is done move the overlapping values from the pseudos > into the hard argument regs that they're supposed to go. > > That way this new functionality should only ever be triggered when there's > the overlap in this PR (causing wrong-code) and shouldn't affect codegen > anywhere else. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64-linux-gnu. > > According to the PR this appears at least as far back 4.6 so this isn't a > regression on the release branches, but it is a wrong-code bug. > > I'll let Honggyu upstream the testcase separately > (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html) > > I'll be testing this on the 4.8 and 4.9 branches. > Thoughts on this approach? > > Thanks, > Kyrill > Hi, Kyrill I have verified the generated assembly code and tested on the target board. PR 65358 testcase works fine now with your patch. https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html Here is the generated assembly code of PR 65358: foo: (without this patch) | (with this patch) sub sp, sp, #8 | sub sp, sp, #8 mov r0, r1 | mov r0, r1 str lr, [sp, #-4]! | str lr, [sp, #-4]! add ip, sp, #8 | mov r1, r2 (1) ldr lr, [sp, #16] | ldr lr, [sp, #16] mov r1, r2 | mov r2, r3 str r3, [sp, #8] | ldr ip, [sp, #12] (2) str lr, [sp, #12] | str r3, [sp, #8] ldr lr, [sp], #4 | str lr, [sp, #12] ldmia ip, {r2, r3} | ldr lr, [sp], #4 add sp, sp, #8 | mov r3, ip b bar | add sp, sp, #8 | b bar One the left side(previous code), (1) loads "p.killer", then (2) overwrites "p.victim" value. Until this point, "p.victim" is never copied anyway, which makes the value disappear. But this bug is clearly fixed with this patch as shown on the right side. I have tested on x86_64 and got the working code regardless of having this patch. I appreciate your patch, Kyrill. Honggyu
Ping. https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html Jeff, could you help review this patch? Or could you point me to someone who can review this? I can't figure out from MAINTAINERS who should be in charge of this part of the compiler. Thanks, Kyrill On 19/03/15 14:39, Kyrill Tkachov wrote: > Hi all, > > This patch fixes PR 65358. For details look at the excellent write-up > by Honggyu in bugzilla. The problem is that we're trying to pass a struct > partially on the stack and partially in regs during a tail-call optimisation > but the struct we're passing is also a partial incoming arg though the split > between stack and regs is different from its outgoing usage. > > The emit_push_insn code ends up doing a block move for the on-stack part but > ends up overwriting the part that needs to be loaded into regs. > My first thought was to just load the regs part first and then do the stack > part but that doesn't work as multiple comments in that function indicate > (the block move being expanded to movmem or other functions being one of the > reasons). > > My proposed solution is to detect when the overlap happens, find the > overlapping region and load it before the stack pushing into pseudos and > after the stack pushing is done move the overlapping values from the pseudos > into the hard argument regs that they're supposed to go. > > That way this new functionality should only ever be triggered when there's > the overlap in this PR (causing wrong-code) and shouldn't affect codegen > anywhere else. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64-linux-gnu. > > According to the PR this appears at least as far back 4.6 so this isn't a > regression on the release branches, but it is a wrong-code bug. > > I'll let Honggyu upstream the testcase separately > (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html) > > I'll be testing this on the 4.8 and 4.9 branches. > Thoughts on this approach? > > Thanks, > Kyrill > > 2015-03-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR middle-end/65358 > * expr.c (memory_load_overlap): New function. > (emit_push_insn): When pushing partial args to the stack would > clobber the register part load the overlapping part into a pseudo > and put it into the hard reg after pushing.
On 04/13/2015 08:01 AM, Kyrill Tkachov wrote: > Ping. > https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01014.html > > Jeff, could you help review this patch? > Or could you point me to someone who can review this? > I can't figure out from MAINTAINERS who should be in charge of this part > of the compiler. It's in the queue of things to look at. It wasn't marked as a regression and thus I deferred it. With stage1 opening up we'll be working through the queue of deferred stuff shortly. jeff
On 03/19/2015 08:39 AM, Kyrill Tkachov wrote: > Hi all, > > This patch fixes PR 65358. For details look at the excellent write-up > by Honggyu in bugzilla. The problem is that we're trying to pass a struct > partially on the stack and partially in regs during a tail-call > optimisation > but the struct we're passing is also a partial incoming arg though the > split > between stack and regs is different from its outgoing usage. > > The emit_push_insn code ends up doing a block move for the on-stack part > but > ends up overwriting the part that needs to be loaded into regs. > My first thought was to just load the regs part first and then do the stack > part but that doesn't work as multiple comments in that function indicate > (the block move being expanded to movmem or other functions being one of > the > reasons). > > My proposed solution is to detect when the overlap happens, find the > overlapping region and load it before the stack pushing into pseudos and > after the stack pushing is done move the overlapping values from the > pseudos > into the hard argument regs that they're supposed to go. > > That way this new functionality should only ever be triggered when there's > the overlap in this PR (causing wrong-code) and shouldn't affect codegen > anywhere else. > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu > and x86_64-linux-gnu. > > According to the PR this appears at least as far back 4.6 so this isn't a > regression on the release branches, but it is a wrong-code bug. > > I'll let Honggyu upstream the testcase separately > (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html) > > I'll be testing this on the 4.8 and 4.9 branches. > Thoughts on this approach? > > Thanks, > Kyrill > > 2015-03-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR middle-end/65358 > * expr.c (memory_load_overlap): New function. > (emit_push_insn): When pushing partial args to the stack would > clobber the register part load the overlapping part into a pseudo > and put it into the hard reg after pushing. > > expr.patch > > > commit 490c5f2074d76a2927afaea99e4dd0bacccb413c > Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com> > Date: Wed Mar 18 13:42:37 2015 +0000 > > [expr.c] PR 65358 Avoid clobbering partial argument during sibcall > > diff --git a/gcc/expr.c b/gcc/expr.c > index dc13a14..d3b9156 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type) > } > #endif > > +/* Add SIZE to X and check whether it's greater than Y. > + If it is, return the constant amount by which it's greater or smaller. > + If the two are not statically comparable (for example, X and Y contain > + different registers) return -1. This is used in expand_push_insn to > + figure out if reading SIZE bytes from location X will end up reading from > + location Y. */ > + > +static int > +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) > +{ > + rtx tmp = plus_constant (Pmode, x, size); > + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); > + > + if (!CONST_INT_P (sub)) > + return -1; > + > + return INTVAL (sub); > +} Hmmm, so what happens if the difference is < 0? I'd be a bit worried about that case for the PA (for example). So how about asserting that the INTVAL is >= 0 prior to returning so that we catch that case if it ever occurs? OK for the trunk with the added assert. Please commit the testcase from Honggyu at the same time you commit the patch. Let's let it simmer for a while on the trunk before considering it to be backported. jeff
Hi Jeff, On 17/04/15 18:26, Jeff Law wrote: > On 03/19/2015 08:39 AM, Kyrill Tkachov wrote: >> Hi all, >> >> This patch fixes PR 65358. For details look at the excellent write-up >> by Honggyu in bugzilla. The problem is that we're trying to pass a struct >> partially on the stack and partially in regs during a tail-call >> optimisation >> but the struct we're passing is also a partial incoming arg though the >> split >> between stack and regs is different from its outgoing usage. >> >> The emit_push_insn code ends up doing a block move for the on-stack part >> but >> ends up overwriting the part that needs to be loaded into regs. >> My first thought was to just load the regs part first and then do the stack >> part but that doesn't work as multiple comments in that function indicate >> (the block move being expanded to movmem or other functions being one of >> the >> reasons). >> >> My proposed solution is to detect when the overlap happens, find the >> overlapping region and load it before the stack pushing into pseudos and >> after the stack pushing is done move the overlapping values from the >> pseudos >> into the hard argument regs that they're supposed to go. >> >> That way this new functionality should only ever be triggered when there's >> the overlap in this PR (causing wrong-code) and shouldn't affect codegen >> anywhere else. >> >> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu >> and x86_64-linux-gnu. >> >> According to the PR this appears at least as far back 4.6 so this isn't a >> regression on the release branches, but it is a wrong-code bug. >> >> I'll let Honggyu upstream the testcase separately >> (https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00984.html) >> >> I'll be testing this on the 4.8 and 4.9 branches. >> Thoughts on this approach? >> >> Thanks, >> Kyrill >> >> 2015-03-19 Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> PR middle-end/65358 >> * expr.c (memory_load_overlap): New function. >> (emit_push_insn): When pushing partial args to the stack would >> clobber the register part load the overlapping part into a pseudo >> and put it into the hard reg after pushing. >> >> expr.patch >> >> >> commit 490c5f2074d76a2927afaea99e4dd0bacccb413c >> Author: Kyrylo Tkachov<kyrylo.tkachov@arm.com> >> Date: Wed Mar 18 13:42:37 2015 +0000 >> >> [expr.c] PR 65358 Avoid clobbering partial argument during sibcall >> >> diff --git a/gcc/expr.c b/gcc/expr.c >> index dc13a14..d3b9156 100644 >> --- a/gcc/expr.c >> +++ b/gcc/expr.c >> @@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type) >> } >> #endif >> >> +/* Add SIZE to X and check whether it's greater than Y. >> + If it is, return the constant amount by which it's greater or smaller. >> + If the two are not statically comparable (for example, X and Y contain >> + different registers) return -1. This is used in expand_push_insn to >> + figure out if reading SIZE bytes from location X will end up reading from >> + location Y. */ >> + >> +static int >> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >> +{ >> + rtx tmp = plus_constant (Pmode, x, size); >> + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >> + >> + if (!CONST_INT_P (sub)) >> + return -1; >> + >> + return INTVAL (sub); >> +} > Hmmm, so what happens if the difference is < 0? I'd be a bit worried > about that case for the PA (for example). > > So how about asserting that the INTVAL is >= 0 prior to returning so > that we catch that case if it ever occurs? INTVAL being >= 0 is the case that I want to catch with this function. INTVAL <0 is the usual case on leaf call optimisation. On arm, at least, it means that x and y use the same base register (i.e. same stack frame) but the offsets are such that reading SIZE bytes from X will not overlap with Y, thus not requiring the workaround in this patch. Thus, asserting that the result is positive is not right here. What characteristic on pa makes this problematic? Is it the STACK_GROWS_UPWARD? Should I then extend this function to do something like: HOST_WIDE_INT res = INTVAL (sub); #ifndef STACK_GROWS_DOWNWARD res = -res; #endif return res? > > OK for the trunk with the added assert. Please commit the testcase from > Honggyu at the same time you commit the patch. Thanks, will do after the above is resolved. Kyrill > > Let's let it simmer for a while on the trunk before considering it to be > backported. > > jeff >
On 04/20/2015 02:25 AM, Kyrill Tkachov wrote: > Hi Jeff, >> Hmmm, so what happens if the difference is < 0? I'd be a bit worried >> about that case for the PA (for example). >> >> So how about asserting that the INTVAL is >= 0 prior to returning so >> that we catch that case if it ever occurs? > > INTVAL being >= 0 is the case that I want to catch with this function. > INTVAL <0 is the usual case on leaf call optimisation. On arm, at least, > it means that x and y use the same base register (i.e. same stack frame) > but the offsets are such that reading SIZE bytes from X will not overlap > with Y, thus not requiring the workaround in this patch. > Thus, asserting that the result is positive is not right here. > > What characteristic on pa makes this problematic? Is it the > STACK_GROWS_UPWARD? Yea or more correctly that {STACK,FRAME}_GROWS_UPWARD and ARGS_GROW_DOWNWARD. I think the stormy16 may have downward growing args too. > Should I then extend this function to do something like: > > HOST_WIDE_INT res = INTVAL (sub); > #ifndef STACK_GROWS_DOWNWARD > res = -res; > #endif > > return res? It certainly feels like something is needed for targets where growth is in the opposite direction -- but my guess is that without a concrete case that triggers on those targets (just the PA in 64 bit mode and stormy?) we'll probably get it wrong in one way or another. Hence my suggestion that we assert rather than try to handle it and silently generate incorrect code in the process. Jeff
On 20/04/15 19:02, Jeff Law wrote: > On 04/20/2015 02:25 AM, Kyrill Tkachov wrote: >> Hi Jeff, >>> Hmmm, so what happens if the difference is < 0? I'd be a bit worried >>> about that case for the PA (for example). >>> >>> So how about asserting that the INTVAL is >= 0 prior to returning so >>> that we catch that case if it ever occurs? >> INTVAL being >= 0 is the case that I want to catch with this function. >> INTVAL <0 is the usual case on leaf call optimisation. On arm, at least, >> it means that x and y use the same base register (i.e. same stack frame) >> but the offsets are such that reading SIZE bytes from X will not overlap >> with Y, thus not requiring the workaround in this patch. >> Thus, asserting that the result is positive is not right here. >> >> What characteristic on pa makes this problematic? Is it the >> STACK_GROWS_UPWARD? > Yea or more correctly that {STACK,FRAME}_GROWS_UPWARD and > ARGS_GROW_DOWNWARD. I think the stormy16 may have downward growing args > too. > > >> Should I then extend this function to do something like: >> >> HOST_WIDE_INT res = INTVAL (sub); >> #ifndef STACK_GROWS_DOWNWARD >> res = -res; >> #endif >> >> return res? > It certainly feels like something is needed for targets where growth is > in the opposite direction -- but my guess is that without a concrete > case that triggers on those targets (just the PA in 64 bit mode and > stormy?) we'll probably get it wrong in one way or another. Hence my > suggestion that we assert rather than try to handle it and silently > generate incorrect code in the process. However, this function is expected to return negative numbers when there is no overlap i.e. in the vast majority of cases when this bug doesn't manifest. So asserting that it's positive is just going to ICE at -O2 in almost any code. From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. Kyrill > > > Jeff >
On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: > > From reading config/stormy16/stormy-abi it seems to me that we don't > pass arguments partially in stormy16, so this code would never be called > there. That leaves pa as the potential problematic target. > I don't suppose there's an easy way to test on pa? My checkout of binutils > doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. Jeff
On 21/04/15 15:09, Jeff Law wrote: > On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: >> From reading config/stormy16/stormy-abi it seems to me that we don't >> pass arguments partially in stormy16, so this code would never be called >> there. That leaves pa as the potential problematic target. >> I don't suppose there's an easy way to test on pa? My checkout of binutils >> doesn't seem to include a sim target for it. > No simulator, no machines in the testfarm, the box I had access to via > parisc-linux.org seems dead and my ancient PA overheats well before a > bootstrap could complete. I often regret knowing about the backwards > way many things were done on the PA because it makes me think about > cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) + return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? static int memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) { #ifdef ARGS_GROW_DOWNWARD rtx tmp = plus_constant (Pmode, x, -size); #else rtx tmp = plus_constant (Pmode, x, size); #endif rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); if (!CONST_INT_P (sub)) return -1; #ifdef ARGS_GROW_DOWNWARD return INTVAL (-sub); #else return INTVAL (sub); #endif } now, say for x == sp + 4, y == sp + 8, size == 16: This would be a problematic case for arm, so this code on arm (where ARGS_GROW_DOWNWARD is *not* defined) would return 12, which is the number of bytes that overlap. On a target where ARGS_GROW_DOWNWARD is defined this would return -20, meaning that no overlap occurs (because we read in the descending direction from x, IIUC). Thanks, Kyrill > > > Jeff >
On 04/21/2015 11:33 AM, Kyrill Tkachov wrote: > > On 21/04/15 15:09, Jeff Law wrote: >> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: >>> From reading config/stormy16/stormy-abi it seems to me that we don't >>> pass arguments partially in stormy16, so this code would never be called >>> there. That leaves pa as the potential problematic target. >>> I don't suppose there's an easy way to test on pa? My checkout of >>> binutils >>> doesn't seem to include a sim target for it. >> No simulator, no machines in the testfarm, the box I had access to via >> parisc-linux.org seems dead and my ancient PA overheats well before a >> bootstrap could complete. I often regret knowing about the backwards >> way many things were done on the PA because it makes me think about >> cases that only matter on dead architectures. > > So what should be the action plan here? I can't add an assert on > positive result as a negative result is valid. > > We want to catch the case where this would cause trouble on > pa, or change the patch until we're confident that it's fine > for pa. > > That being said, reading the documentation of STACK_GROWS_UPWARD > and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case > where this would cause trouble on pa. > > Is the problem that in the function: > > +/* Add SIZE to X and check whether it's greater than Y. > + If it is, return the constant amount by which it's greater or smaller. > + If the two are not statically comparable (for example, X and Y contain > + different registers) return -1. This is used in expand_push_insn to > + figure out if reading SIZE bytes from location X will end up reading > from > + location Y. */ > +static int > +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) > +{ > + rtx tmp = plus_constant (Pmode, x, size); > + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); > + > + if (!CONST_INT_P (sub)) > + return -1; > + > + return INTVAL (sub); > +} > > for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, > so the function should something like the following? So I had to go back and compile some simple examples. References to outgoing arguments will be SP relative. References to the incoming arguments will be ARGP relative. And that brings me to the another issue. Isn't X in this context the incoming argument slot and the destination an outgoing argument slot? If so, the approach of memory_load_overlap simply won't work on a target with calling conventions like the PA. And you might really want to consider punting for these kind of calling conventions If you hadn't already done the work, I'd suggest punting for any case where we have args partially in regs and partially in memory :-) More thoughts when I can get an hour or two to remind myself how all this stuff works on the PA. I will note that testing on the PA is unlikely to show anything simply because it uses 8 parameter passing registers. So it's rare to pass anything in memory at all. Even rarer to have something partially in memory and partially in registers. Jeff
commit 490c5f2074d76a2927afaea99e4dd0bacccb413c Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com> Date: Wed Mar 18 13:42:37 2015 +0000 [expr.c] PR 65358 Avoid clobbering partial argument during sibcall diff --git a/gcc/expr.c b/gcc/expr.c index dc13a14..d3b9156 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -4121,6 +4121,25 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type) } #endif +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ + +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) + return -1; + + return INTVAL (sub); +} + /* Generate code to push X onto the stack, assuming it has mode MODE and type TYPE. MODE is redundant except when X is a CONST_INT (since they don't @@ -4179,6 +4198,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, xinner = x; + int nregs = partial / UNITS_PER_WORD; + rtx *tmp_regs = NULL; + int overlapping = 0; + if (mode == BLKmode || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode))) { @@ -4309,6 +4332,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, PARM_BOUNDARY. Assume the caller isn't lying. */ set_mem_align (target, align); + /* If part should go in registers and pushing to that part would + overwrite some of the values that need to go into regs, load the + overlapping values into temporary pseudos to be moved into the hard + regs at the end after the stack pushing has completed. + We cannot load them directly into the hard regs here because + they can be clobbered by the block move expansions. + See PR 65358. */ + + if (partial > 0 && reg != 0 && mode == BLKmode + && GET_CODE (reg) != PARALLEL) + { + overlapping = memory_load_overlap (XEXP (x, 0), temp, partial); + if (overlapping > 0) + { + gcc_assert (overlapping % UNITS_PER_WORD == 0); + overlapping /= UNITS_PER_WORD; + + tmp_regs = XALLOCAVEC (rtx, overlapping); + + for (int i = 0; i < overlapping; i++) + tmp_regs[i] = gen_reg_rtx (word_mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (tmp_regs[i], + operand_subword_force (target, i, mode)); + } + else + overlapping = 0; + } emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM); } } @@ -4411,9 +4463,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, } } - /* If part should go in registers, copy that part - into the appropriate registers. Do this now, at the end, - since mem-to-mem copies above may do function calls. */ + /* Move the partial arguments into the registers and any overlapping + values that we moved into the pseudos in tmp_regs. */ if (partial > 0 && reg != 0) { /* Handle calls that pass values in multiple non-contiguous locations. @@ -4421,9 +4472,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, if (GET_CODE (reg) == PARALLEL) emit_group_load (reg, x, type, -1); else - { + { gcc_assert (partial % UNITS_PER_WORD == 0); - move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode); + move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg) + + nregs - overlapping + i), + tmp_regs[i]); + } }