Message ID | alpine.DEB.1.10.1408291552410.2958@tp.orcam.me.uk |
---|---|
State | Accepted |
Headers | show |
On Fri, Aug 29, 2014 at 10:46 PM, Maciej W. Rozycki <macro@codesourcery.com> wrote: > Hi, > > The loop-19.c test case has regressed from 4.8 to 4.9 and trunk on > classic FPU Power targets, these failures are now seen: > > FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )a," 2 > FAIL: gcc.dg/tree-ssa/loop-19.c scan-tree-dump-times optimized "MEM.(base: &|symbol: )c," 2 > > However upon the inpection of generated code it is obvious that its > quality has improved, the autoincrement rather than indexed addressing > mode is now used in the loop produced, reducing the number of instructions > in the loop from 4 to 3 and also removing another instruction from outside > the loop, i.e. (new code): > > .globl tuned_STREAM_Copy > .type tuned_STREAM_Copy, @function > tuned_STREAM_Copy: > lis 8,0x1e > lis 10,a-8@ha > ori 8,8,33920 > lis 9,c-8@ha > mtctr 8 > la 10,a-8@l(10) > la 9,c-8@l(9) > .L2: > lfdu 0,8(10) > stfdu 0,8(9) > bdnz .L2 > blr > .size tuned_STREAM_Copy, .-tuned_STREAM_Copy > > vs (old code): > > .globl tuned_STREAM_Copy > .type tuned_STREAM_Copy, @function > tuned_STREAM_Copy: > lis 7,0x1e > ori 7,7,33920 > mtctr 7 > lis 8,c@ha > lis 10,a@ha > li 9,0 > la 8,c@l(8) > la 10,a@l(10) > .L3: > lfdx 0,10,9 > stfdx 0,8,9 > addi 9,9,8 > bdnz .L3 > blr > .size tuned_STREAM_Copy,.-tuned_STREAM_Copy > > The only Power targets that still pass this test are e500v2 ones such as > `-mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe' that use the SPE unit > for FP operations, because the indexed mode is still used (there's no > autoincrement addressing mode available for the memory access instructions > concerned): > > .globl tuned_STREAM_Copy > .type tuned_STREAM_Copy, @function > tuned_STREAM_Copy: > lis 10,0x1e > lis 7,c@ha > lis 8,a@ha > ori 10,10,0x8480 > li 9,0 > la 7,c@l(7) > la 8,a@l(8) > mtctr 10 > .L2: > evlddx 10,8,9 > evstddx 10,7,9 > addi 9,9,8 > bdnz .L2 > blr > .size tuned_STREAM_Copy,.-tuned_STREAM_Copy > > [I have removed "-fno-common" from the current test flags for the purpose > of this consideration to compare apples to apples; 4.8 didn't have it. > The presence or absence of this flag does not appear to make a difference > for this test case for Power targets.] > > The obvious reason of the failure is the offset of -8 now seen in new > classic FP code for preinitialising the pointers before entering the loop. > The initial offset is needed so that it is cancelled by the offset of 8 > used in the loop itself to autoincrement these pointers. So the new code > not only is better, but it actually has to use these offsets as well or > autoincrementation would not work. > > Therefore I think at this point the test case is invalid for classic FP > Power, so I propose that we exclude it from testing here, only leaving SPE > FP Power for whatever value the test case may have for it, and especially > x86 variants where there's actual code size penalty for using an immediate > offset (displacement) in addition to a base register. > > For the record here are the optimization dumps examined by the test case, > for the old generated code that passes: > > ;; Function tuned_STREAM_Copy (tuned_STREAM_Copy, funcdef_no=0, decl_uid=1382, cgraph_uid=0) > > tuned_STREAM_Copy () > { > sizetype ivtmp.10; > double _4; > > <bb 2>: > > <bb 3>: > # ivtmp.10_8 = PHI <ivtmp.10_2(4), 0(2)> > _4 = MEM[symbol: a, index: ivtmp.10_8, offset: 0B]; > MEM[symbol: c, index: ivtmp.10_8, offset: 0B] = _4; > ivtmp.10_2 = ivtmp.10_8 + 8; > if (ivtmp.10_2 != 16000000) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > goto <bb 3>; > > <bb 5>: > return; > > } > > and for the new code that fails: > > ;; Function tuned_STREAM_Copy (tuned_STREAM_Copy, funcdef_no=0, decl_uid=2191, symbol_order=2) > > Removing basic block 5 > tuned_STREAM_Copy () > { > unsigned int ivtmp.13; > unsigned int ivtmp.9; > double _4; > void * _15; > void * _16; > unsigned int _17; > > <bb 2>: > ivtmp.9_11 = (unsigned int) &MEM[(void *)&a + 4294967288B]; > ivtmp.13_14 = (unsigned int) &MEM[(void *)&c + 4294967288B]; > _17 = (unsigned int) &MEM[(void *)&a + 15999992B]; > > <bb 3>: > # ivtmp.9_8 = PHI <ivtmp.9_2(3), ivtmp.9_11(2)> > # ivtmp.13_12 = PHI <ivtmp.13_13(3), ivtmp.13_14(2)> > ivtmp.9_2 = ivtmp.9_8 + 8; > _15 = (void *) ivtmp.9_2; > _4 = MEM[base: _15, offset: 0B]; > ivtmp.13_13 = ivtmp.13_12 + 8; > _16 = (void *) ivtmp.13_13; > MEM[base: _16, offset: 0B] = _4; > if (ivtmp.9_2 != _17) > goto <bb 3>; > else > goto <bb 4>; > > <bb 4>: > return; > > } > > Tested with the following powerpc-gnu-linux multilibs with the respective > results noted on the right: > > -mcpu=603e UNSUPPORTED > -mcpu=603e -msoft-float UNSUPPORTED > -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe UNSUPPORTED > -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe PASS > -mcpu=7400 -maltivec -mabi=altivec UNSUPPORTED > -mcpu=e6500 -maltivec -mabi=altivec UNSUPPORTED > -mcpu=e5500 -m64 UNSUPPORTED > -mcpu=e6500 -m64 -maltivec -mabi=altivec UNSUPPORTED > > Original results: > > -mcpu=603e FAIL > -mcpu=603e -msoft-float UNSUPPORTED > -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=spe UNSUPPORTED > -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=spe PASS > -mcpu=7400 -maltivec -mabi=altivec FAIL > -mcpu=e6500 -maltivec -mabi=altivec FAIL > -mcpu=e5500 -m64 FAIL > -mcpu=e6500 -m64 -maltivec -mabi=altivec FAIL > > OK to apply (for trunk and 4.9)? > > 2014-08-30 Maciej W. Rozycki <macro@codesourcery.com> > > * gcc.dg/tree-ssa/loop-19.c: Exclude classic FPU Power targets. > > Maciej > > gcc-test-power-loop-19.diff > Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c > =================================================================== > --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-29 16:45:27.748122597 +0100 > +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-30 02:53:03.658955978 +0100 > @@ -4,7 +4,7 @@ > > The testcase comes from PR 29256 (and originally, the stream benchmark). */ > > -/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || powerpc_hard_double } } } } */ > +/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || { powerpc_hard_double && { ! powerpc_fprs } } } } } } */ > /* { dg-require-effective-target nonpic } */ > /* { dg-options "-O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common" } */ > Okay. Thanks, David
On Sat, 30 Aug 2014, David Edelsohn wrote: > > 2014-08-30 Maciej W. Rozycki <macro@codesourcery.com> > > > > * gcc.dg/tree-ssa/loop-19.c: Exclude classic FPU Power targets. > > > > Maciej > > > > gcc-test-power-loop-19.diff > > Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c > > =================================================================== > > --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-29 16:45:27.748122597 +0100 > > +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-30 02:53:03.658955978 +0100 > > @@ -4,7 +4,7 @@ > > > > The testcase comes from PR 29256 (and originally, the stream benchmark). */ > > > > -/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || powerpc_hard_double } } } } */ > > +/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || { powerpc_hard_double && { ! powerpc_fprs } } } } } } */ > > /* { dg-require-effective-target nonpic } */ > > /* { dg-options "-O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common" } */ > > > > Okay. Applied to trunk now and backported to 4.9. Thanks. Maciej
Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c =================================================================== --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-29 16:45:27.748122597 +0100 +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/tree-ssa/loop-19.c 2014-08-30 02:53:03.658955978 +0100 @@ -4,7 +4,7 @@ The testcase comes from PR 29256 (and originally, the stream benchmark). */ -/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || powerpc_hard_double } } } } */ +/* { dg-do compile { target { i?86-*-* || { x86_64-*-* || { powerpc_hard_double && { ! powerpc_fprs } } } } } } */ /* { dg-require-effective-target nonpic } */ /* { dg-options "-O3 -fno-tree-loop-distribute-patterns -fno-prefetch-loop-arrays -fdump-tree-optimized -fno-common" } */