From patchwork Tue Apr 28 09:54:03 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 465454 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B80DE14016A for ; Tue, 28 Apr 2015 19:54:18 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass reason="1024-bit key; unprotected key" header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Lv++gmvw; dkim-adsp=none (unprotected policy); dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=KZGdp+/u0bqfKA0J0 5eL1uRgllDnNb98Vht44auCt2PuivJpVQlWGZfesqY0lq92bwtaIKDSbwqgDbBzL RoHCH5M1gmID47ufO06TsMUIuLiCdApbOLmIHm1UNuRbaxC7lF7/+Sk2oulATtDL 1KftuiuD9WncYeJqa32M0m5sSs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=D7p5POwINX0ciZvsntTOFBk wBrI=; b=Lv++gmvwkm8GSbvcerxz+x9Yh1jbAGuBySFjke6V2wryUc1dUAYq9cy mTuq3Jq56UHI1d7XJNqdJN3nML1IH2Aw8TxtfHHnWejsP2bhAbW+f0nEyX1Z//HW Ht7NSbkhkN1GR3Qx5xnx8fyf7mFVtEjVpQu2DlK74rv9YU8j2zGw= Received: (qmail 49893 invoked by alias); 28 Apr 2015 09:54:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 49880 invoked by uid 89); 28 Apr 2015 09:54:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 28 Apr 2015 09:54:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0EA3729; Tue, 28 Apr 2015 02:53:41 -0700 (PDT) Received: from [10.2.207.50] (e100706-lin.cambridge.arm.com [10.2.207.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DDD983F21A; Tue, 28 Apr 2015 02:54:04 -0700 (PDT) Message-ID: <553F58BB.4070508@foss.arm.com> Date: Tue, 28 Apr 2015 10:54:03 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Jeff Law , GCC Patches CC: Honggyu Kim Subject: Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall References: <550ADF8F.7030300@arm.com> <55314236.2000806@redhat.com> <5534B811.2020107@arm.com> <55353F25.9010906@redhat.com> <55360AA0.9070106@arm.com> <55365A06.7010408@redhat.com> <553689F0.5090606@arm.com> <553E9869.30502@redhat.com> In-Reply-To: <553E9869.30502@redhat.com> On 27/04/15 21:13, Jeff Law wrote: > On 04/21/2015 11:33 AM, Kyrill Tkachov wrote: >> On 21/04/15 15:09, Jeff Law wrote: >>> On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: >>>> From reading config/stormy16/stormy-abi it seems to me that we don't >>>> pass arguments partially in stormy16, so this code would never be called >>>> there. That leaves pa as the potential problematic target. >>>> I don't suppose there's an easy way to test on pa? My checkout of >>>> binutils >>>> doesn't seem to include a sim target for it. >>> No simulator, no machines in the testfarm, the box I had access to via >>> parisc-linux.org seems dead and my ancient PA overheats well before a >>> bootstrap could complete. I often regret knowing about the backwards >>> way many things were done on the PA because it makes me think about >>> cases that only matter on dead architectures. >> So what should be the action plan here? I can't add an assert on >> positive result as a negative result is valid. >> >> We want to catch the case where this would cause trouble on >> pa, or change the patch until we're confident that it's fine >> for pa. >> >> That being said, reading the documentation of STACK_GROWS_UPWARD >> and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case >> where this would cause trouble on pa. >> >> Is the problem that in the function: >> >> +/* Add SIZE to X and check whether it's greater than Y. >> + If it is, return the constant amount by which it's greater or smaller. >> + If the two are not statically comparable (for example, X and Y contain >> + different registers) return -1. This is used in expand_push_insn to >> + figure out if reading SIZE bytes from location X will end up reading >> from >> + location Y. */ >> +static int >> +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) >> +{ >> + rtx tmp = plus_constant (Pmode, x, size); >> + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); >> + >> + if (!CONST_INT_P (sub)) >> + return -1; >> + >> + return INTVAL (sub); >> +} >> >> for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, >> so the function should something like the following? > So I had to go back and compile some simple examples. > > References to outgoing arguments will be SP relative. References to the > incoming arguments will be ARGP relative. And that brings me to the > another issue. Isn't X in this context the incoming argument slot and > the destination an outgoing argument slot? > > If so, the approach of memory_load_overlap simply won't work on a target > with calling conventions like the PA. And you might really want to > consider punting for these kind of calling conventions Ok, thanks for the guidance. How about this? This patch disables sibcall optimisation when encountering a partial argument when ARGS_GROW_DOWNWARD && !STACK_GROWS_DOWNWARD. Hopefully this shouldn't harm codegen on parisc if, as you say, it's rare to have partial arguments anyway on PA due to the large number of argument regs. I tested this on arm and bootstrapped on x86_64. I am now going through the process of getting access to a Debian PA machine to give it a test there (thanks Dave!) Ok if testing comes clean? Thanks, Kyrill 2015-04-28 Kyrylo Tkachov PR target/65358 * calls.c (expand_call): Cancel sibcall optimisation when encountering partial argument on targets with ARGS_GROW_DOWNWARD and !STACK_GROWS_DOWNWARD. * expr.c (memory_load_overlap): New function. (emit_push_insn): When pushing partial args to the stack would clobber the register part load the overlapping part into a pseudo and put it into the hard reg after pushing. 2015-04-28 Honggyu Kim PR target/65358 * gcc.dg/pr65358.c: New test. > > If you hadn't already done the work, I'd suggest punting for any case > where we have args partially in regs and partially in memory :-) > > More thoughts when I can get an hour or two to remind myself how all > this stuff works on the PA. > > I will note that testing on the PA is unlikely to show anything simply > because it uses 8 parameter passing registers. So it's rare to pass > anything in memory at all. Even rarer to have something partially in > memory and partially in registers. > > > > Jeff > > commit 61beb521f63db2f7178e2c41e3747982ddcc8869 Author: Kyrylo Tkachov Date: Wed Mar 18 13:42:37 2015 +0000 [expr.c] PR 65358 Avoid clobbering partial argument during sibcall diff --git a/gcc/calls.c b/gcc/calls.c index 3be7ca5..8159232 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -3229,6 +3229,13 @@ expand_call (tree exp, rtx target, int ignore) { rtx_insn *before_arg = get_last_insn (); + /* On targets with weird calling conventions (e.g. PA) it's + hard to ensure that all cases of argument overlap between + stack and registers work. Play it safe and bail out. */ +#if defined (ARGS_GROW_DOWNWARD) && !defined (STACK_GROWS_DOWNWARD) + sibcall_failure = 1; + break; +#endif if (store_one_arg (&args[i], argblock, flags, adjusted_args_size.var != 0, reg_parm_stack_space) diff --git a/gcc/expr.c b/gcc/expr.c index 530a944..071a335 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -4121,6 +4121,24 @@ emit_single_push_insn (machine_mode mode, rtx x, tree type) } #endif +/* If reading SIZE bytes from X will end up reading from + Y return the number of bytes that overlap. Return -1 + in all other cases. */ + +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) + return -1; + + HOST_WIDE_INT val = INTVAL (sub); + + return IN_RANGE (val, 1, size) ? val : -1; +} + /* Generate code to push X onto the stack, assuming it has mode MODE and type TYPE. MODE is redundant except when X is a CONST_INT (since they don't @@ -4179,6 +4197,10 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, xinner = x; + int nregs = partial / UNITS_PER_WORD; + rtx *tmp_regs = NULL; + int overlapping = 0; + if (mode == BLKmode || (STRICT_ALIGNMENT && align < GET_MODE_ALIGNMENT (mode))) { @@ -4309,6 +4331,35 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, PARM_BOUNDARY. Assume the caller isn't lying. */ set_mem_align (target, align); + /* If part should go in registers and pushing to that part would + overwrite some of the values that need to go into regs, load the + overlapping values into temporary pseudos to be moved into the hard + regs at the end after the stack pushing has completed. + We cannot load them directly into the hard regs here because + they can be clobbered by the block move expansions. + See PR 65358. */ + + if (partial > 0 && reg != 0 && mode == BLKmode + && GET_CODE (reg) != PARALLEL) + { + overlapping = memory_load_overlap (XEXP (x, 0), temp, partial); + if (overlapping > 0) + { + gcc_assert (overlapping % UNITS_PER_WORD == 0); + overlapping /= UNITS_PER_WORD; + + tmp_regs = XALLOCAVEC (rtx, overlapping); + + for (int i = 0; i < overlapping; i++) + tmp_regs[i] = gen_reg_rtx (word_mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (tmp_regs[i], + operand_subword_force (target, i, mode)); + } + else + overlapping = 0; + } emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM); } } @@ -4411,9 +4462,8 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, } } - /* If part should go in registers, copy that part - into the appropriate registers. Do this now, at the end, - since mem-to-mem copies above may do function calls. */ + /* Move the partial arguments into the registers and any overlapping + values that we moved into the pseudos in tmp_regs. */ if (partial > 0 && reg != 0) { /* Handle calls that pass values in multiple non-contiguous locations. @@ -4421,9 +4471,15 @@ emit_push_insn (rtx x, machine_mode mode, tree type, rtx size, if (GET_CODE (reg) == PARALLEL) emit_group_load (reg, x, type, -1); else - { + { gcc_assert (partial % UNITS_PER_WORD == 0); - move_block_to_reg (REGNO (reg), x, partial / UNITS_PER_WORD, mode); + move_block_to_reg (REGNO (reg), x, nregs - overlapping, mode); + + for (int i = 0; i < overlapping; i++) + emit_move_insn (gen_rtx_REG (word_mode, REGNO (reg) + + nregs - overlapping + i), + tmp_regs[i]); + } } diff --git a/gcc/testsuite/gcc.dg/pr65358.c b/gcc/testsuite/gcc.dg/pr65358.c new file mode 100644 index 0000000..ba89fd4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr65358.c @@ -0,0 +1,33 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +struct pack +{ + int fine; + int victim; + int killer; +}; + +int __attribute__ ((__noinline__, __noclone__)) +bar (int a, int b, struct pack p) +{ + if (a != 20 || b != 30) + __builtin_abort (); + if (p.fine != 40 || p.victim != 50 || p.killer != 60) + __builtin_abort (); + return 0; +} + +int __attribute__ ((__noinline__, __noclone__)) +foo (int arg1, int arg2, int arg3, struct pack p) +{ + return bar (arg2, arg3, p); +} + +int main (void) +{ + struct pack p = { 40, 50, 60 }; + + (void) foo (10, 20, 30, p); + return 0; +}