Message ID | 20100917165833.GA17163@intel.com |
---|---|
State | New |
Headers | show |
On 09/17/2010 09:58 AM, H.J. Lu wrote: > PR middle-end/45678 > * cfgexpand.c (update_stack_alignment): New. > (get_decl_align_unit): Use it. > (expand_one_stack_var_at): Call update_stack_alignment. Ok. r~
On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote: > On 09/17/2010 09:58 AM, H.J. Lu wrote: > > PR middle-end/45678 > > * cfgexpand.c (update_stack_alignment): New. > > (get_decl_align_unit): Use it. > > (expand_one_stack_var_at): Call update_stack_alignment. > > Ok. I don't deny this fixes the problem and probably is a good idea in any case, the question I have is just if this won't pessimize i?86 code too much, especially with non-default options where incoming stack boundary is smaller than PREFERRED_STACK_BOUNDARY. The thing is that this expand_one_stack_var_at is just an optimization, but we don't know if anything will actually make use of the increased alignment. If there will be just variables that need 32-bit alignment and say incoming stack alignment is also 32-bit, but preferred stack boundary is 128-bit, as soon as some variable will be placed in slot 16 bytes from the base, we'll force realignment even when nothing actually needs it. So, I wonder if max_align = MAX (crtl->max_used_stack_slot_alignment, PREFERRED_STACK_BOUNDARY); shouldn't be replaced by something like: 1) max_align = MAX (crtl->max_used_stack_slot_alignment, SUPPORTS_STACK_ALIGNMENT ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); or: 2) max_align = MAX (crtl->max_used_stack_slot_alignment, SUPPORTS_STACK_ALIGNMENT ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); or: 3) max_align = SUPPORTS_STACK_ALIGNMENT ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY) : PREFERRED_STACK_BOUNDARY; or: 4) max_align = SUPPORTS_STACK_ALIGNMENT ? MAX (crtl->max_used_stack_slot_alignment, INCOMING_STACK_BOUNDARY) : PREFERRED_STACK_BOUNDARY; For 2) or 4) we'd need to do if (targetm.calls.update_stack_boundary) targetm.calls.update_stack_boundary (); earlier (before all the expand_one_stack_var_at calls). 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc would often happily use just 32-bit alignment of the stack base and even if incoming stack boundary was 128-bit, it would tell other routines they don't need to bother with that alignment, but e.g. on x86-64 it would still assume max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting PREFERRED_STACK_BOUNDARY below 128 bits. I think usually it doesn't buy us too much to align below INCOMING_STACK_BOUNDARY. So perhaps 2)/4), where needlessly bumping alignment in expand_one_stack_var_at too much would be far more costly (would cause stack realignment). The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets, I wonder as they are never realigning the stack whether we should ever use there higher alignment. Jakub
On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote: > On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote: >> On 09/17/2010 09:58 AM, H.J. Lu wrote: >> > PR middle-end/45678 >> > * cfgexpand.c (update_stack_alignment): New. >> > (get_decl_align_unit): Use it. >> > (expand_one_stack_var_at): Call update_stack_alignment. >> >> Ok. > > I don't deny this fixes the problem and probably is a good idea in any case, > the question I have is just if this won't pessimize i?86 code too much, > especially with non-default options where incoming stack boundary is smaller > than PREFERRED_STACK_BOUNDARY. > The thing is that this expand_one_stack_var_at is just an optimization, > but we don't know if anything will actually make use of the increased > alignment. If there will be just variables that need 32-bit alignment > and say incoming stack alignment is also 32-bit, but preferred stack > boundary is 128-bit, as soon as some variable will be placed in slot 16 > bytes from the base, we'll force realignment even when nothing actually > needs it. > > So, I wonder if > max_align = MAX (crtl->max_used_stack_slot_alignment, > PREFERRED_STACK_BOUNDARY); > shouldn't be replaced by something like: > 1) > max_align = MAX (crtl->max_used_stack_slot_alignment, > SUPPORTS_STACK_ALIGNMENT > ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); > or: > 2) > max_align = MAX (crtl->max_used_stack_slot_alignment, > SUPPORTS_STACK_ALIGNMENT > ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); > or: > 3) > max_align = SUPPORTS_STACK_ALIGNMENT > ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY) > : PREFERRED_STACK_BOUNDARY; > or: > 4) > max_align = SUPPORTS_STACK_ALIGNMENT > ? MAX (crtl->max_used_stack_slot_alignment, > INCOMING_STACK_BOUNDARY) > : PREFERRED_STACK_BOUNDARY; > For 2) or 4) we'd need to do > if (targetm.calls.update_stack_boundary) > targetm.calls.update_stack_boundary (); > earlier (before all the expand_one_stack_var_at calls). > 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc > would often happily use just 32-bit alignment of the stack base and even if > incoming stack boundary was 128-bit, it would tell other routines they don't > need to bother with that alignment, but e.g. on x86-64 it would still assume > max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting > PREFERRED_STACK_BOUNDARY below 128 bits. I think usually it doesn't buy us > too much to align below INCOMING_STACK_BOUNDARY. So perhaps 2)/4), where > needlessly bumping alignment in expand_one_stack_var_at too much would be > far more costly (would cause stack realignment). > The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets, > I wonder as they are never realigning the stack whether we should ever > use there higher alignment. > I think it is related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542 One of my attempts may work. FWIW, I am fine with any solution as long as when we do DECL_ALIGN (decl) = align; we guarantee that it will be aligned at that boundary at run-time.
On Fri, Sep 17, 2010 at 11:32 AM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote: >>> On 09/17/2010 09:58 AM, H.J. Lu wrote: >>> > PR middle-end/45678 >>> > * cfgexpand.c (update_stack_alignment): New. >>> > (get_decl_align_unit): Use it. >>> > (expand_one_stack_var_at): Call update_stack_alignment. >>> >>> Ok. >> >> I don't deny this fixes the problem and probably is a good idea in any case, >> the question I have is just if this won't pessimize i?86 code too much, >> especially with non-default options where incoming stack boundary is smaller >> than PREFERRED_STACK_BOUNDARY. >> The thing is that this expand_one_stack_var_at is just an optimization, >> but we don't know if anything will actually make use of the increased >> alignment. If there will be just variables that need 32-bit alignment >> and say incoming stack alignment is also 32-bit, but preferred stack >> boundary is 128-bit, as soon as some variable will be placed in slot 16 >> bytes from the base, we'll force realignment even when nothing actually >> needs it. >> >> So, I wonder if >> max_align = MAX (crtl->max_used_stack_slot_alignment, >> PREFERRED_STACK_BOUNDARY); >> shouldn't be replaced by something like: >> 1) >> max_align = MAX (crtl->max_used_stack_slot_alignment, >> SUPPORTS_STACK_ALIGNMENT >> ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >> or: >> 2) >> max_align = MAX (crtl->max_used_stack_slot_alignment, >> SUPPORTS_STACK_ALIGNMENT >> ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >> or: >> 3) >> max_align = SUPPORTS_STACK_ALIGNMENT >> ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY) >> : PREFERRED_STACK_BOUNDARY; >> or: >> 4) >> max_align = SUPPORTS_STACK_ALIGNMENT >> ? MAX (crtl->max_used_stack_slot_alignment, >> INCOMING_STACK_BOUNDARY) >> : PREFERRED_STACK_BOUNDARY; >> For 2) or 4) we'd need to do >> if (targetm.calls.update_stack_boundary) >> targetm.calls.update_stack_boundary (); >> earlier (before all the expand_one_stack_var_at calls). >> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc >> would often happily use just 32-bit alignment of the stack base and even if >> incoming stack boundary was 128-bit, it would tell other routines they don't >> need to bother with that alignment, but e.g. on x86-64 it would still assume >> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting >> PREFERRED_STACK_BOUNDARY below 128 bits. I think usually it doesn't buy us >> too much to align below INCOMING_STACK_BOUNDARY. So perhaps 2)/4), where >> needlessly bumping alignment in expand_one_stack_var_at too much would be >> far more costly (would cause stack realignment). >> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets, >> I wonder as they are never realigning the stack whether we should ever >> use there higher alignment. >> > > I think it is related to > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542 > > One of my attempts may work. > > FWIW, I am fine with any solution as long as when we do > > DECL_ALIGN (decl) = align; > > we guarantee that it will be aligned at that boundary at run-time. I don't think we should use offset as alignment unless the variable should be aligned at offset bytes. My patch http://gcc.gnu.org/bugzilla/attachment.cgi?id=20922&action=view may align the local variable at boundary specified for the variable.
On Fri, Sep 17, 2010 at 12:18 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Fri, Sep 17, 2010 at 11:32 AM, H.J. Lu <hjl.tools@gmail.com> wrote: >> On Fri, Sep 17, 2010 at 11:15 AM, Jakub Jelinek <jakub@redhat.com> wrote: >>> On Fri, Sep 17, 2010 at 10:46:00AM -0700, Richard Henderson wrote: >>>> On 09/17/2010 09:58 AM, H.J. Lu wrote: >>>> > PR middle-end/45678 >>>> > * cfgexpand.c (update_stack_alignment): New. >>>> > (get_decl_align_unit): Use it. >>>> > (expand_one_stack_var_at): Call update_stack_alignment. >>>> >>>> Ok. >>> >>> I don't deny this fixes the problem and probably is a good idea in any case, >>> the question I have is just if this won't pessimize i?86 code too much, >>> especially with non-default options where incoming stack boundary is smaller >>> than PREFERRED_STACK_BOUNDARY. >>> The thing is that this expand_one_stack_var_at is just an optimization, >>> but we don't know if anything will actually make use of the increased >>> alignment. If there will be just variables that need 32-bit alignment >>> and say incoming stack alignment is also 32-bit, but preferred stack >>> boundary is 128-bit, as soon as some variable will be placed in slot 16 >>> bytes from the base, we'll force realignment even when nothing actually >>> needs it. >>> >>> So, I wonder if >>> max_align = MAX (crtl->max_used_stack_slot_alignment, >>> PREFERRED_STACK_BOUNDARY); >>> shouldn't be replaced by something like: >>> 1) >>> max_align = MAX (crtl->max_used_stack_slot_alignment, >>> SUPPORTS_STACK_ALIGNMENT >>> ? STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >>> or: >>> 2) >>> max_align = MAX (crtl->max_used_stack_slot_alignment, >>> SUPPORTS_STACK_ALIGNMENT >>> ? INCOMING_STACK_BOUNDARY : PREFERRED_STACK_BOUNDARY); >>> or: >>> 3) >>> max_align = SUPPORTS_STACK_ALIGNMENT >>> ? MAX (crtl->max_used_stack_slot_alignment, STACK_BOUNDARY) >>> : PREFERRED_STACK_BOUNDARY; >>> or: >>> 4) >>> max_align = SUPPORTS_STACK_ALIGNMENT >>> ? MAX (crtl->max_used_stack_slot_alignment, >>> INCOMING_STACK_BOUNDARY) >>> : PREFERRED_STACK_BOUNDARY; >>> For 2) or 4) we'd need to do >>> if (targetm.calls.update_stack_boundary) >>> targetm.calls.update_stack_boundary (); >>> earlier (before all the expand_one_stack_var_at calls). >>> 1) and 3) are conservative choices for i?86, seems without H.J's patch gcc >>> would often happily use just 32-bit alignment of the stack base and even if >>> incoming stack boundary was 128-bit, it would tell other routines they don't >>> need to bother with that alignment, but e.g. on x86-64 it would still assume >>> max_align of 8 bytes initially, even when e.g. x86-64 doesn't allow setting >>> PREFERRED_STACK_BOUNDARY below 128 bits. I think usually it doesn't buy us >>> too much to align below INCOMING_STACK_BOUNDARY. So perhaps 2)/4), where >>> needlessly bumping alignment in expand_one_stack_var_at too much would be >>> far more costly (would cause stack realignment). >>> The 1)/2) vs. 3)/4) difference is for !SUPPORTS_STACK_ALIGNMENT targets, >>> I wonder as they are never realigning the stack whether we should ever >>> use there higher alignment. >>> >> >> I think it is related to >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44542 >> >> One of my attempts may work. >> >> FWIW, I am fine with any solution as long as when we do >> >> DECL_ALIGN (decl) = align; >> >> we guarantee that it will be aligned at that boundary at run-time. > > I don't think we should use offset as alignment unless the variable > should be aligned at offset bytes. My patch > > http://gcc.gnu.org/bugzilla/attachment.cgi?id=20922&action=view > > may align the local variable at boundary specified for the variable. > > > > -- > H.J. > Using stack offset for local variable alignment is a bad idea. My patch caused: FAIL: gcc.target/i386/incoming-9.c scan-assembler-not andl[\\t ]*\\$-16,[\\t ]*%esp
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 4237276..1e67e77 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -205,19 +205,11 @@ static bool has_protected_decls; smaller than our cutoff threshold. Used for -Wstack-protector. */ static bool has_short_buffer; -/* Discover the byte alignment to use for DECL. Ignore alignment - we can't do with expected alignment of the stack boundary. */ +/* Update stack alignment requirement. */ -static unsigned int -get_decl_align_unit (tree decl) +static void +update_stack_alignment (unsigned int align) { - unsigned int align; - - align = LOCAL_DECL_ALIGNMENT (decl); - - if (align > MAX_SUPPORTED_STACK_ALIGNMENT) - align = MAX_SUPPORTED_STACK_ALIGNMENT; - if (SUPPORTS_STACK_ALIGNMENT) { if (crtl->stack_alignment_estimated < align) @@ -233,6 +225,22 @@ get_decl_align_unit (tree decl) crtl->stack_alignment_needed = align; if (crtl->max_used_stack_slot_alignment < align) crtl->max_used_stack_slot_alignment = align; +} + +/* Discover the byte alignment to use for DECL. Ignore alignment + we can't do with expected alignment of the stack boundary. */ + +static unsigned int +get_decl_align_unit (tree decl) +{ + unsigned int align; + + align = LOCAL_DECL_ALIGNMENT (decl); + + if (align > MAX_SUPPORTED_STACK_ALIGNMENT) + align = MAX_SUPPORTED_STACK_ALIGNMENT; + + update_stack_alignment (align); return align / BITS_PER_UNIT; } @@ -735,6 +743,7 @@ expand_one_stack_var_at (tree decl, HOST_WIDE_INT offset) if (align == 0 || align > max_align) align = max_align; + update_stack_alignment (align); DECL_ALIGN (decl) = align; DECL_USER_ALIGN (decl) = 0; } --- /dev/null 2010-09-09 09:16:30.485584932 -0700 +++ gcc/gcc/testsuite/gcc.dg/torture/pr45678-2.c 2010-09-17 09:53:20.510888017 -0700 @@ -0,0 +1,16 @@ +/* { dg-do run } */ + +typedef float V __attribute__ ((vector_size (16))); +V g; + +int +main () +{ + float d[4] = { 4, 3, 2, 1 }; + V e; + __builtin_memcpy (&e, &d, sizeof (d)); + V f = { 5, 15, 25, 35 }; + e = e * f; + g = e; + return 0; +}