From patchwork Wed May 25 13:32:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dominik Vogt X-Patchwork-Id: 626188 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3rFCs00S9yz9sdm for ; Wed, 25 May 2016 23:33:07 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=qHHrDA8m; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=KEGuYPgkjvOK9oHOy GPRKV+zhkdUiLm+Dz7g+FanBEdWVyvzjE48q99sYeP2rhU+eApsufnhz1Dl1pzYa ZKas5A2i2ADxJgA/tNqx99F5bsHz0k5fS7Os1TTQCBZB46cyxKDFp1bQEjB401a/ Tm+mVyaLt6IuOtuTYlRLhQDpkI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=Hg+qkR3FPdfS2L3tW9BQFaG Kq0U=; b=qHHrDA8moRp3x+NRdBiEWK3bRGa15v4OG+bfvGJmU9DG5B8lnjTieY1 cF6arI2BVAy3+jPNYbUuVGr4gvd+upfhdDMSOqbaGjd4Q9x+OjXrgqwndzePjQpW a36NIfDj9FXWkhgFLTPTQqELe59X5/gAds0vdQTHw/o5807xdHbQ= Received: (qmail 125104 invoked by alias); 25 May 2016 13:33:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 125091 invoked by uid 89); 25 May 2016 13:32:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=adjustment, reserves, calculation, 9747 X-HELO: e06smtp12.uk.ibm.com Received: from e06smtp12.uk.ibm.com (HELO e06smtp12.uk.ibm.com) (195.75.94.108) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 25 May 2016 13:32:57 +0000 Received: from localhost by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 25 May 2016 14:32:54 +0100 Received: from d06dlp01.portsmouth.uk.ibm.com (9.149.20.13) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 25 May 2016 14:32:43 +0100 X-IBM-Helo: d06dlp01.portsmouth.uk.ibm.com X-IBM-MailFrom: vogt@linux.vnet.ibm.com X-IBM-RcptTo: gcc-patches@gcc.gnu.org Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp01.portsmouth.uk.ibm.com (Postfix) with ESMTP id B176B17D805D for ; Wed, 25 May 2016 14:33:46 +0100 (BST) Received: from d06av06.portsmouth.uk.ibm.com (d06av06.portsmouth.uk.ibm.com [9.149.37.217]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u4PDWgk850724950 for ; Wed, 25 May 2016 13:32:42 GMT Received: from d06av06.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u4PDWgc3026425 for ; Wed, 25 May 2016 09:32:42 -0400 Received: from bl3ahm9f.de.ibm.com (dyn-9-152-212-210.boeblingen.de.ibm.com [9.152.212.210]) by d06av06.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u4PDWgS0026422; Wed, 25 May 2016 09:32:42 -0400 Received: from dvogt by bl3ahm9f.de.ibm.com with local (Exim 4.76) (envelope-from ) id 1b5YvR-00026t-Qh; Wed, 25 May 2016 15:32:41 +0200 Date: Wed, 25 May 2016 14:32:41 +0100 From: Dominik Vogt To: Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel Subject: Re: [PATCH 2/2][v3] Drop excess size used for run time allocated stack variables. Message-ID: <20160525133241.GB6938@linux.vnet.ibm.com> Reply-To: vogt@linux.vnet.ibm.com Mail-Followup-To: vogt@linux.vnet.ibm.com, Jeff Law , gcc-patches@gcc.gnu.org, Andreas Krebbel References: <20160429221242.GA2205@linux.vnet.ibm.com> <20160503141753.GA17351@linux.vnet.ibm.com> <20160525133054.GA6938@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160525133054.GA6938@linux.vnet.ibm.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16052513-0009-0000-0000-00001876694D X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused On Wed, May 25, 2016 at 02:30:54PM +0100, Dominik Vogt wrote: > On Tue, May 03, 2016 at 03:17:53PM +0100, Dominik Vogt wrote: > > Version two of the patch including a test case. > > > > On Mon, May 02, 2016 at 09:10:25AM -0600, Jeff Law wrote: > > > On 04/29/2016 04:12 PM, Dominik Vogt wrote: > > > >The attached patch removes excess stack space allocation with > > > >alloca in some situations. Plese check the commit message in the > > > >patch for details. > > > > > However, I would strongly recommend some tests, even if they are > > > target specific. You can always copy pr36728-1 into the s390x > > > directory and look at size of the generated stack. Simliarly for > > > pr50938 for x86. > > > > However, x86 uses the "else" branch in round_push, i.e. it uses > > "virtual_preferred_stack_boundary_rtx" to calculate the number of > > bytes to add for stack alignment. That value is unknown at the > > time round_push is called, so the test case fails on such targets, > > and I've no idea how to fix this properly. > > Third version of the patch with the suggested cleanup in the first > patch and the functional stuff in the second one. The first patch > is based on Jeff's draft with the change suggested by Eric and > more cleanup added by me. This is the updated funtional patch. Re-tested with limited effort, i.e. tested and bootstrapped on s390x biarch (but did not look for performance regressions compared to version 2 of the patch). Ciao Dominik ^_^ ^_^ From 4296d353e1d153b5b5ee435a44cae6117bf2fff0 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Fri, 29 Apr 2016 08:36:59 +0100 Subject: [PATCH 2/2] Drop excess size used for run time allocated stack variables. The present calculation sometimes led to more stack memory being used than necessary with alloca. First, (STACK_BOUNDARY -1) would be added to the allocated size: size = plus_constant (Pmode, size, extra); size = force_operand (size, NULL_RTX); Then round_push was called and added another (STACK_BOUNDARY - 1) before rounding down to a multiple of STACK_BOUNDARY. On s390x this resulted in adding 14 before rounding down for "x" in the test case pr36728-1.c. round_push() now takes an argument to inform it about what has already been added to size. --- gcc/explow.c | 45 +++++++++++++++++++++--------------- gcc/testsuite/gcc.dg/pr50938.c | 52 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr50938.c diff --git a/gcc/explow.c b/gcc/explow.c index 09a0330..85596e2 100644 --- a/gcc/explow.c +++ b/gcc/explow.c @@ -949,24 +949,30 @@ anti_adjust_stack (rtx adjust) } /* Round the size of a block to be pushed up to the boundary required - by this machine. SIZE is the desired size, which need not be constant. */ + by this machine. SIZE is the desired size, which need not be constant. + ALREADY_ADDED is the number of units that have already been added to SIZE + for other alignment reasons. +*/ static rtx -round_push (rtx size) +round_push (rtx size, int already_added) { - rtx align_rtx, alignm1_rtx; + rtx align_rtx, add_rtx; if (!SUPPORTS_STACK_ALIGNMENT || crtl->preferred_stack_boundary == MAX_SUPPORTED_STACK_ALIGNMENT) { int align = crtl->preferred_stack_boundary / BITS_PER_UNIT; + int add; if (align == 1) return size; + add = (align > already_added) ? align - already_added - 1 : 0; + if (CONST_INT_P (size)) { - HOST_WIDE_INT new_size = (INTVAL (size) + align - 1) / align * align; + HOST_WIDE_INT new_size = (INTVAL (size) + add) / align * align; if (INTVAL (size) != new_size) size = GEN_INT (new_size); @@ -974,7 +980,7 @@ round_push (rtx size) } align_rtx = GEN_INT (align); - alignm1_rtx = GEN_INT (align - 1); + add_rtx = (add > 0) ? GEN_INT (add) : const0_rtx; } else { @@ -983,15 +989,15 @@ round_push (rtx size) substituted by the right value in vregs pass and optimized during combine. */ align_rtx = virtual_preferred_stack_boundary_rtx; - alignm1_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), - NULL_RTX); + add_rtx = force_operand (plus_constant (Pmode, align_rtx, -1), NULL_RTX); } /* CEIL_DIV_EXPR needs to worry about the addition overflowing, but we know it can't. So add ourselves and then do TRUNC_DIV_EXPR. */ - size = expand_binop (Pmode, add_optab, size, alignm1_rtx, - NULL_RTX, 1, OPTAB_LIB_WIDEN); + if (add_rtx != const0_rtx) + size = expand_binop (Pmode, add_optab, size, add_rtx, + NULL_RTX, 1, OPTAB_LIB_WIDEN); size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx, NULL_RTX, 1); size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1); @@ -1174,7 +1180,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, HOST_WIDE_INT stack_usage_size = -1; rtx_code_label *final_label; rtx final_target, target; - unsigned extra; + unsigned extra = 0; /* If we're asking for zero bytes, it doesn't matter what we point to since we can't dereference it. But return a reasonable @@ -1251,15 +1257,18 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, example), so we must preventively align the value. We leave space in SIZE for the hole that might result from the alignment operation. */ - extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; - size = plus_constant (Pmode, size, extra); - size = force_operand (size, NULL_RTX); + if (required_align > BITS_PER_UNIT) + { + extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT; + size = plus_constant (Pmode, size, extra); + size = force_operand (size, NULL_RTX); - if (flag_stack_usage_info) - stack_usage_size += extra; + if (flag_stack_usage_info) + stack_usage_size += extra; - if (extra && size_align > BITS_PER_UNIT) - size_align = BITS_PER_UNIT; + if (size_align > BITS_PER_UNIT) + size_align = BITS_PER_UNIT; + } /* Round the size to a multiple of the required stack alignment. Since the stack if presumed to be rounded before this allocation, @@ -1276,7 +1285,7 @@ allocate_dynamic_stack_space (rtx size, unsigned size_align, momentarily mis-aligning the stack. */ if (size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0) { - size = round_push (size); + size = round_push (size, extra); if (flag_stack_usage_info) { diff --git a/gcc/testsuite/gcc.dg/pr50938.c b/gcc/testsuite/gcc.dg/pr50938.c new file mode 100644 index 0000000..14b7540 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr50938.c @@ -0,0 +1,52 @@ +/* PR/50938: Check that alloca () reserves the correct amount of stack space. + */ + +/* { dg-do run } */ +/* { dg-options "-O0" } */ +/* { dg-require-effective-target alloca } */ + +__attribute__ ((noinline)) +static void +dummy (void) +{ +} + +__attribute__ ((noinline)) +static char * +get_frame_addr (void *p) +{ + void *faddr = __builtin_frame_address (0); + /* Call a function to make sure that a stack frame exists. */ + dummy (); + return faddr; +} + +__attribute__ ((noinline)) +static void * +stack_var (void) +{ + /* 1024 bytes on the stack. */ + char s[1024]; + /* One stack slot. */ + void *p = (void *)s; + /* Keep stack memory alive by passing it to the function. */ + return get_frame_addr (p); +} + +__attribute__ ((noinline)) +static void * +alloca_var (void) +{ + /* 1024 bytes on the stack plus one stack slot for a. */ + void *a = __builtin_alloca (1024); + return get_frame_addr (a); +} + +int main (void) +{ + if (stack_var () != alloca_var ()) + /* The functions used a different amount of stack space. */ + __builtin_abort (); + + return 0; +} -- 2.3.0