Message ID | AANLkTinhpMJjRWZG9+r210ebSQ7A+UpvnfC36SoKgEEK@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Wed, Sep 22, 2010 at 3:01 PM, H.J. Lu <hjl.tools@gmail.com> wrote: > On Wed, Sep 22, 2010 at 9:47 AM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Tue, Sep 21, 2010 at 09:30:16PM +0200, Eric Botcazou wrote: >>> > > OK to backport it to 4.4/4.5 branches? >>> > >>> > Yeah, sure. >>> >>> This badly breaks on i586 though. On the 4.5 branch: >> >> Here is a shorter testcase: >> >> /* { dg-do compile } */ >> /* { dg-options "-march=i586" { target ilp32 } } */ >> >> struct S { union { double b[4]; } a[18]; } s, a[5]; >> void foo (struct S); >> struct S bar (struct S, struct S *, struct S); >> >> void >> foo (struct S arg) >> { >> } >> >> void >> baz (void) >> { >> foo (bar (s, &a[1], a[2])); >> } >> >> Unless this is resolved soon, I think the 4.5/4.4 backports of this patch >> should be reverted. > > I am testing this patch on trunk and 4.5. > > -- > H.J. > --- > gcc/ > > 2010-09-22 H.J. Lu <hongjiu.lu@intel.com> > > PR middle-end/45234 > * calls.c (expand_call): Don't call check all variable sized > adjustments are multiple of preferred stack boundary after > stack alignment when calling builtin functions. > > gcc/testsuite/ > > 2010-09-22 Jakub Jelinek <jakub@redhat.com> > > PR middle-end/45234 > * gcc.target/i386/pr45234.c: New. > There are no regressions on trunk and 4.5. I configured --with-arch=i586 for 32bit gcc. I am testing it on 4.4.
On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote: > 2010-09-22 H.J. Lu <hongjiu.lu@intel.com> > > PR middle-end/45234 > * calls.c (expand_call): Don't call check all variable sized > adjustments are multiple of preferred stack boundary after > stack alignment when calling builtin functions. > --- a/gcc/calls.c > +++ b/gcc/calls.c > @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore) > > preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT; > > - if (SUPPORTS_STACK_ALIGNMENT) > + if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl)) > { > /* All variable sized adjustments must be multiple of preferred > stack boundary. Stack alignment may change preferred stack I'd like to hear explanation on why you think this is the right approach. What is so special about builtins in this case? If they are to be expanded as calls instead of expanded inline, I'd say they behave like any other call. Isn't the problem instead with nested expand_calls (which is happening in the testcase that is now failing)? Not sure if we still ever TER const/pure calls into other call arguments or not[*], so if that is possible for arbitrary calls, or the problem is just memcpy emitted through emit_block_move_hints with method BLOCK_OP_CALL_PARM. I admit I haven't debugged the original PR45234 testcase, so I don't know why exactly is your patch needed and thus why it isn't needed when expanding the call param move calls. [*] In long long foo (int, int) __attribute__((const)); int bar (int, int, long long, int, int); int baz (int x) { return bar (2, 3, foo (4, 5), 6, 7) + 1; } foo is now evaluated before starting the pushes with -m32 -O2 -march=i586. I vaguely remember const/pure calls used to be TERed at some point, but perhaps I'm just using a wrong testcase. Jakub
On Wed, Sep 22, 2010 at 11:56 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Wed, Sep 22, 2010 at 03:01:49PM -0700, H.J. Lu wrote: >> 2010-09-22 H.J. Lu <hongjiu.lu@intel.com> >> >> PR middle-end/45234 >> * calls.c (expand_call): Don't call check all variable sized >> adjustments are multiple of preferred stack boundary after >> stack alignment when calling builtin functions. > >> --- a/gcc/calls.c >> +++ b/gcc/calls.c >> @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore) >> >> preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT; >> >> - if (SUPPORTS_STACK_ALIGNMENT) >> + if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl)) >> { >> /* All variable sized adjustments must be multiple of preferred >> stack boundary. Stack alignment may change preferred stack > > I'd like to hear explanation on why you think this is the right approach. > What is so special about builtins in this case? If they are to be expanded > as calls instead of expanded inline, I'd say they behave like any other > call. > > Isn't the problem instead with nested expand_calls (which is happening > in the testcase that is now failing)? Not sure if we still ever TER const/pure > calls into other call arguments or not[*], so if that is possible for arbitrary > calls, or the problem is just memcpy emitted through > emit_block_move_hints with method BLOCK_OP_CALL_PARM. > I admit I haven't debugged the original PR45234 testcase, so I don't know > why exactly is your patch needed and thus why it isn't needed when expanding > the call param move calls. > > [*] In > long long foo (int, int) __attribute__((const)); > int bar (int, int, long long, int, int); > int baz (int x) > { > return bar (2, 3, foo (4, 5), 6, 7) + 1; > } > foo is now evaluated before starting the pushes with -m32 -O2 -march=i586. > I vaguely remember const/pure calls used to be TERed at some point, but > perhaps I'm just using a wrong testcase. > I didn't find exactly why. From what I can see, builtin functions are treated differently from the normal ones.
On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote: > > I'd like to hear explanation on why you think this is the right approach. > > What is so special about builtins in this case? If they are to be expanded > > as calls instead of expanded inline, I'd say they behave like any other > > call. > > > > Isn't the problem instead with nested expand_calls (which is happening > > in the testcase that is now failing)? Not sure if we still ever TER const/pure > > calls into other call arguments or not[*], so if that is possible for arbitrary > > calls, or the problem is just memcpy emitted through > > emit_block_move_hints with method BLOCK_OP_CALL_PARM. > > I admit I haven't debugged the original PR45234 testcase, so I don't know > > why exactly is your patch needed and thus why it isn't needed when expanding > > the call param move calls. > > > > [*] In > > long long foo (int, int) __attribute__((const)); > > int bar (int, int, long long, int, int); > > int baz (int x) > > { > > return bar (2, 3, foo (4, 5), 6, 7) + 1; > > } > > foo is now evaluated before starting the pushes with -m32 -O2 -march=i586. > > I vaguely remember const/pure calls used to be TERed at some point, but > > perhaps I'm just using a wrong testcase. > > > > I didn't find exactly why. From what I can see, builtin functions are > treated differently from the normal ones. I've briefly looked at the original PR45234 testcase, and the specifics of the call are that it increased crtl->preferred_stack_boundary in between the start of the expand_call function and the if (SUPPORTS_STACK_ALIGNMENT) block you've added, I'd quite understand we need to align the stack sufficiently in that case somewhere. The question is where exactly and by what exact value. On the currently failing testcase obviously no increase in crtl->preferred_stack_boundary for the nested memcpys happens, so if that could be used as a condition, that testcase would work. I guess what should be investigated is where the nested memcpy calls on that testcase actually do align the stack properly (and how), because it looks like they all have %esp % 16 == 0 on the call insn. Jakub
On Thu, Sep 23, 2010 at 01:25:03AM -0700, H.J. Lu wrote: > I didn't find exactly why. From what I can see, builtin functions are > treated differently from the normal ones. I have looked into it some more and I believe the PR45234 "fix" is just completely wrong, the problem is elsewhere (allocate_dynamic_stack_space) and there are actually many issues in there. So, IMNSHO that calls.c change should be reverted. The problems I see in allocate_dynamic_stack_space for SUPPORTS_STACK_ALIGNMENT targets are: 1) In /* We can't attempt to minimize alignment necessary, because we don't know the final value of preferred_stack_boundary yet while executing this code. */ crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; I'm afraid this can actually decrease crtl->preferred_stack_boundary if it has been increased above PREFERRED_STACK_BOUNDARY already earlier. IMNSHO, if it needs to be done, it should be if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; 2) I'd say MUST_ALIGN macro could use crtl->preferred_stack_boundary instead of PREFERRED_STACK_BOUNDARY, if we already know we want to align stack more than PREFERRED_STACK_BOUNDARY, we wouldn't need to try to align the result. 3) Most importantly, in if (!known_align_valid || known_align % PREFERRED_STACK_BOUNDARY != 0) { size = round_push (size); if (flag_stack_usage) { int align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT; stack_usage_size = (stack_usage_size + align - 1) / align * align; } } I think the first % PREFERRED_STACK_BOUNDARY should be % MAX_SUPPORTED_STACK_ALIGNMENT, because crtl->preferred_stack_boundary can always grow afterwards, and round_push if crtl->preferred_stack_boundary is smaller than MAX_SUPPORTED_STACK_ALIGNMENT should as align in bytes use a new virtual register which would be replaced by constant equal to the actual final crtl->preferred_stack_boundary (and let cse/combiner optimize all the calculations up). For flag_stack_usage, I'm afraid it can't be computed reliably. 4) When this function uses anti_adjust_stack_and_probe or anti_adjust_stack, it should IMNSHO ensure that stack_pointer_delta isn't modified even when size is constant (e.g. by saving/restoring the value around it). The constant allocas should have the same properties as dynamic allocas. For i?86/x86_64 with the change in 3) constants won't happen any longer, but for other targets. For the future, it would be nice if we had some REG note for al the alloca sp adjustments, because e.g. the unwinding code is confused by the constant size allocas e.g. when computing DW_OP_GNU_args_size. 5) == CONST_INT and == REG should be replaced with CONST_INT_P and REG_P The alternative to this would be to prescan the trees right before actual expansion and (conservatively) guess what the stack alignment will actually need to be (e.g. take into account call argument alignments, etc.), and don't allow crtl->preferred_stack_boundary etc. actually grow during the expansion. I'm attaching a testcase that shows that non-constant alloca has exactly the same issue as the original PR testcase, but the "fix" obviously can't do anything about it. Jakub /* PR middle-end/45234 */ /* { dg-do run { target { { i?86-*-* x86_64-*-* } && ilp32 } } } */ /* { dg-options "-mincoming-stack-boundary=2 -mpreferred-stack-boundary=2" } */ #include "check.h" void __attribute__ ((noinline)) bar (__float128 f) { check (&f, __alignof__(f)); } volatile int z = 6; int main (void) { char *p = __builtin_alloca (z); bar (0); __builtin_strncpy (p, "good", 5); if (__builtin_strncmp (p, "good", 5) != 0) { #ifdef DEBUG p[z - 1] = '\0'; printf ("Failed: %s != good\n", p); #endif abort (); } return 0; }
diff --git a/gcc/calls.c b/gcc/calls.c index aef823f..0c7588a 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -2369,7 +2369,7 @@ expand_call (tree exp, rtx target, int ignore) preferred_unit_stack_boundary = preferred_stack_boundary / BITS_PER_UNIT; - if (SUPPORTS_STACK_ALIGNMENT) + if (SUPPORTS_STACK_ALIGNMENT && fndecl && !DECL_IS_BUILTIN (fndecl)) { /* All variable sized adjustments must be multiple of preferred stack boundary. Stack alignment may change preferred stack --- /dev/null 2010-09-09 09:16:30.485584932 -0700 +++ gcc-release/gcc/testsuite/gcc.target/i386/pr45234.c 2010-09-22 13:34:53.415871265 -0700 @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=i586" { target ilp32 } } */ + +struct S { union { double b[4]; } a[18]; } s, a[5]; +void foo (struct S); +struct S bar (struct S, struct S *, struct S); + +void +foo (struct S arg) +{ +} + +void +baz (void) +{ + foo (bar (s, &a[1], a[2])); +}