From patchwork Fri Sep 17 21:31:35 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 65106 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 791D71007D2 for ; Sat, 18 Sep 2010 07:31:47 +1000 (EST) Received: (qmail 11530 invoked by alias); 17 Sep 2010 21:31:45 -0000 Received: (qmail 11518 invoked by uid 22791); 17 Sep 2010 21:31:43 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE X-Spam-Check-By: sourceware.org Received: from mail-vw0-f47.google.com (HELO mail-vw0-f47.google.com) (209.85.212.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 17 Sep 2010 21:31:37 +0000 Received: by vws9 with SMTP id 9so1913681vws.20 for ; Fri, 17 Sep 2010 14:31:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.123.42 with SMTP id n42mr2885983vcr.48.1284759095449; Fri, 17 Sep 2010 14:31:35 -0700 (PDT) Received: by 10.220.202.9 with HTTP; Fri, 17 Sep 2010 14:31:35 -0700 (PDT) In-Reply-To: References: <20100917165833.GA17163@intel.com> <4C93A958.6000502@redhat.com> <20100917181519.GE1269@tyan-ft48-01.lab.bos.redhat.com> Date: Fri, 17 Sep 2010 14:31:35 -0700 Message-ID: Subject: Re: PATCH: PR middle-end/45678: [4.4/4.5/4.6 Regression] crash on vector code with -m32 -msse From: "H.J. Lu" To: Jakub Jelinek Cc: Richard Henderson , gcc-patches@gcc.gnu.org X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Fri, Sep 17, 2010 at 1:03 PM, H.J. Lu wrote: > On Fri, Sep 17, 2010 at 12:18 PM, H.J. Lu wrote: >> On Fri, Sep 17, 2010 at 11:32 AM, H.J. Lu wrote: >>> On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek wrote: >>>> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote: >>>>> On 09/17/2010 09:58 AM, H.J. Lu wrote: >>>>> >     PR middle-end/45678 >>>>> >     * cfgexpand.c (update_stack_alignment): New. >>>>> >     (get_decl_align_unit): Use it. >>>>> >     (expand_one_stack_var_at): Call update_stack_alignment. >>>>> >>>>> Ok. >>>> >>>> I don't deny this fixes the problem and probably is a good idea in any case, >>>> the question I have is just if this won't pessimize i?86 code too much, >>>> especially with non-default options where incoming stack boundary is smaller >>>> than PREFERRED_STACK_BOUNDARY. >>>> The thing is that this expand_one_stack_var_at is just an optimization, >>>> but we don't know if anything will actually make use of the increased >>>> alignment.  If there will be just variables that need 32-bit alignment >>>> and say incoming stack alignment is also 32-bit, but preferred stack >>>> boundary is 128-bit, as soon as some variable will be placed in slot 16 >>>> bytes from the base, we'll force realignment even when nothing actually >>>> needs it. >>>> >>>> So, I wonder if >>>>      max_align = MAX (crtl->max_used_stack_slot_alignment, >>>>                       PREFERRED_STACK_BOUNDARY); >>>> shouldn't be replaced by something like: >>>> 1) >>>>      max_align = MAX (crtl->max_used_stack_slot_alignment, >>>>                       SUPPORTS_STACK_ALIGNMENT >>>>                       ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >>>> or: >>>> 2) >>>>      max_align = MAX (crtl->max_used_stack_slot_alignment, >>>>                       SUPPORTS_STACK_ALIGNMENT >>>>                       ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >>>> or: >>>> 3) >>>>      max_align = SUPPORTS_STACK_ALIGNMENT >>>>                  ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY) >>>>                  : PREFERRED_STACK_BOUNDARY; >>>> or: >>>> 4) >>>>      max_align = SUPPORTS_STACK_ALIGNMENT >>>>                  ? MAX (crtl->max_used_stack_slot_alignment, >>>>                         INCOMING_STACK_BOUNDARY) >>>>                  : PREFERRED_STACK_BOUNDARY; >>>> For 2) or 4) we'd need to do >>>>  if (targetm.calls.update_stack_boundary) >>>>    targetm.calls.update_stack_boundary (); >>>> earlier (before all the expand_one_stack_var_at calls). >>>> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc >>>> would often happily use just 32-bit alignment of the stack base and even if >>>> incoming stack boundary was 128-bit, it would tell other routines they don't >>>> need to bother with that alignment, but e.g. on x86-64 it would still assume >>>> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting >>>> PREFERRED_STACK_BOUNDARY below 128 bits.  I think usually it doesn't buy us >>>> too much to align below INCOMING_STACK_BOUNDARY.  So perhaps 2)/4), where >>>> needlessly bumping alignment in expand_one_stack_var_at too much would be >>>> far more costly (would cause stack realignment). >>>> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets, >>>> I wonder as they are never realigning the stack whether we should ever >>>> use there higher alignment. >>>> >>> >>> I think it is related to >>> >>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542 >>> >>> One of my attempts may work. >>> >>> FWIW, I am fine with any solution as long as when we do >>> >>> DECL_ALIGN (decl) = align; >>> >>> we guarantee that it will be aligned at that boundary at run-time. >> >> I don't think we should use offset as alignment unless the variable >> should be aligned at offset bytes.  My patch >> >> http://gcc.gnu.org/bugzilla/attachment.cgi?id=20922&action=view >> >> may align the local variable at boundary specified for the variable. >> >> >> >> -- >> H.J. >> > > Using stack offset for local variable alignment is a bad idea. My patch > caused: > > FAIL: gcc.target/i386/incoming-9.c scan-assembler-not andl[\\t > ]*\\$-16,[\\t ]*%esp Here is a patch to align local variable with specified alignment. There is no need to call update_stack_alignment in expand_one_stack_var_at since we don't use its offset to increase is alignment. OK for trunk if there are regressions? Thanks. diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 1e67e77..684032c 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -715,13 +715,14 @@ dump_stack_var_partition (void) } } -/* Assign rtl to DECL at frame offset OFFSET. */ +/* Assign rtl to DECL at frame offset OFFSET with alignment ALIGN. */ static void -expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset) +expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset, + unsigned HOST_WIDE_INT align) { /* Alignment is unsigned. */ - unsigned HOST_WIDE_INT align, max_align; + unsigned HOST_WIDE_INT max_align; rtx x; /* If this fails, we've overflowed the stack frame. Error nicely? */ @@ -735,15 +736,11 @@ expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset) /* Set alignment we actually gave this decl if it isn't an SSA name. If it is we generate stack slots only accidentally so it isn't as important, we'll simply use the alignment that is already set. */ - offset -= frame_phase; - align = offset & -offset; - align *= BITS_PER_UNIT; max_align = MAX (crtl->max_used_stack_slot_alignment, PREFERRED_STACK_BOUNDARY); if (align == 0 || align > max_align) align = max_align; - update_stack_alignment (align); DECL_ALIGN (decl) = align; DECL_USER_ALIGN (decl) = 0; } @@ -792,7 +789,8 @@ expand_stack_vars (bool (*pred) (tree)) { gcc_assert (stack_vars[j].offset <= stack_vars[i].size); expand_one_stack_var_at (stack_vars[j].decl, - stack_vars[j].offset + offset); + stack_vars[j].offset + offset, + stack_vars[j].alignb * BITS_PER_UNIT); } } } @@ -825,13 +823,15 @@ account_stack_vars (void) static void expand_one_stack_var (tree var) { - HOST_WIDE_INT size, offset, align; + HOST_WIDE_INT size, offset; + /* Alignment is unsigned. */ + unsigned HOST_WIDE_INT align; size = tree_low_cst (DECL_SIZE_UNIT (SSAVAR (var)), 1); align = get_decl_align_unit (SSAVAR (var)); offset = alloc_stack_frame_space (size, align); - expand_one_stack_var_at (var, offset); + expand_one_stack_var_at (var, offset, align * BITS_PER_UNIT); } /* A subroutine of expand_one_var. Called to assign rtl to a VAR_DECL --- /dev/null 2010-09-09 09:16:30.485584932 -0700 +++ gcc/gcc/testsuite/gcc.dg/torture/pr45678-3.c 2010-09-17 14:16:34.655001231 -0700 @@ -0,0 +1,19 @@ +typedef float v4sf __attribute__ ((vector_size (16))); + +extern void abort (void); + +int main(void) +{ + float m4[4]; + v4sf m3; + float m2[4] = {4, 3, 2, 1}; + v4sf one2 = {5, 15, 25, 35}; + + __builtin_memcpy (&m3, &m2, sizeof(m2)); + m3 = m3 * one2; + __builtin_memcpy (&m4, &m3, sizeof(m4)); + if (m4[0] != 20) + abort (); + + return 0; +}