Message ID | 4CDC3159.8040705@redhat.com |
---|---|
State | New |
Headers | show |
On Thu, Nov 11, 2010 at 10:09:29AM -0800, Richard Henderson wrote: > According to Meissner via IRC, there are a number of regressions on the > powerpc port. One example is cc.target/powerpc/ppc-fmadd-3.c, a testcase > that wants to see FMA generated even when the exact operation isn't > available. I think there is some sense to this, as indicated in the > comment I added to the code. > > Ok? To repeat what I said on IRC, I compared three compilers. Rth (latest patches from Richard including this one), Mrm (the patches I sent out yesterday that don't remove the old combiner insns), and Trunk (svn trunk from yesterday). Rth Mrm Trunk gcc.dg/var-expand3.c fail pass pass gcc.target/powerpc/ppc-fma-1.c pass pass fail gcc.target/powerpc/ppc-fma-2.c fail pass fail gcc.target/powerpc/ppc-fma-3.c pass pass fail gcc.target/powerpc/ppc-fma-4.c fail pass fail gfortran.fortran-torture/execute/in-pack.f90 fail pass pass This is for both 32-bit and 64-bit compilations. gcc.dg/var-expand3.c is a function using Altivec fused multiply/add instructions, and the loop unroller would unroll loops with (plus (mult (..))) but not (fma ...). The code is analyze_insn_to_expand_var in loop-unroll.c where it doesn't realize fma is a loop accumulator. I imagine it is probably simple to fix this function. gcc.target/powerpc/ppc-fma-{2,4}.c are failing because they use -mno-fused-madd, and the compiler gives a new warning that isn't accounted for. Obviously we just need to switch to use -ffp-contract=off for these two tests (or add the warning message). gfortran.fortran-torture/execute/in-pack.f90 is getting an internal compiler error when -maltivec is used: /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90: In function 'csub4': /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:59:0: error: unrecognizable insn: (insn 395 394 396 45 (set (reg:V4SF 592) (fma:V4SF (reg:V4SF 363 [ vect_var_.731 ]) (reg:V4SF 593) (reg:V4SI 594))) /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:55 -1 (expr_list:REG_EQUAL (mult:V4SF (reg:V4SF 363 [ vect_var_.731 ]) (reg:V4SF 593)) (nil))) The third argument should not be V4SI. Perhaps a test for all three arguments being the same mode, or doing a conversion? BTW, ever since I started using GCC 4.5 and newer, the calculix has failed with fma's are generated because of a -0.0 issue, and I needed to add a wrapper around atan2 to prevent it from return -0.0 when given (-0.0, 0.0) args. My patches don't address this issue, but Richard's does, and I no longer need the wrapper with his patches. I built calculix with both the trunk and with Richard's patches, and the run was about 0.8% slower than the trunk, which is not enough to worry about. As I mentioned, I will be away from the mailing lists until Monday afternoon. Here are the instruction differences between trunk and Richard's patches: Instruction Trunk Rth =========== ===== === add 19,420 19,436 addi 67,585 67,911 addic 106 107 addic. 108 108 addis 10,942 10,990 addze 114 114 and 231 231 and. 1 1 andi. 146 146 bctr 26 26 bdnz 1,582 1,582 bdz 144 144 beq 23,038 23,178 beqlr 25 25 bge 1,552 1,562 bgt 2,000 2,000 bgtlr 3 3 bl 24,666 24,673 ble 5,872 5,863 blelr 17 17 blr 1,698 1,696 blt 1,541 1,543 bltlr 1 1 bne 6,737 6,739 bng 1 1 cmpd 2,065 2,085 cmpdi 8,471 8,489 cmpld 14 14 cmplw 1,233 1,245 cmplwi 653 642 cmpw 9,006 9,012 cmpwi 17,834 17,928 cntlzd 7 7 cntlzw 45 45 crnot 6 6 divw 170 170 divwu 5 5 extsb 3 3 extsw 21,797 21,859 extsw. 180 180 fadds 19 19 fcfids 40 40 fcmpu 44 44 fctiwz 114 114 fdivs 14 14 fmadds 3 3 fmr 81 90 fmuls 34 34 fneg 1 1 frsp 15 14 fsel 12 12 fsubs 9 9 lbz 515 515 lbzu 31 31 lbzx 141 141 ld 39,720 39,718 ldu 396 396 ldux 80 80 ldx 4,317 4,319 lfd 15,136 15,403 lfdu 1,311 1,421 lfdux 456 473 lfiwax 282 284 lfs 119 119 lhz 25 25 li 33,356 33,388 lis 464 464 lvsr 54 54 lwa 4,199 4,202 lwaux 4 4 lwax 2,437 2,445 lwz 15,532 15,562 lwzu 2,556 2,553 lwzux 24 24 lwzx 6,816 6,822 lxsdx 11,407 11,568 lxvd2x 4,626 5,034 lxvdsx 1 1 lxvw4x 863 863 mcrf 20 20 mfcr 233 233 mflr 1,256 1,256 mfvrsave 66 67 mr 29,098 29,122 mr. 1,022 1,022 mtcrf 254 254 mtctr 1,755 1,755 mtlr 1,616 1,614 mtvrsave 139 141 mulhw 54 54 mulhwu 9 9 mulld 321 321 mulli 2,996 2,996 mullw 723 723 mullw. 13 13 neg 791 791 nop 24,085 24,092 nor 536 539 or 142 142 ori 1,379 1,379 oris 69 70 rldic 174 176 rldicl 5,558 5,614 rldicl. 2,769 2,770 rldicr 684 684 rlwinm 1,519 1,523 rlwinm. 15 15 sld 7 7 sldi 20,039 20,081 slwi 2,639 2,656 sradi 93 93 srawi 258 258 srdi 1,217 1,229 srwi 451 451 stb 604 604 stbu 17 17 stbx 184 184 std 28,783 28,804 stdu 2,014 2,014 stdux 65 65 stdx 4,479 4,477 stfd 7,412 7,537 stfdu 180 180 stfdux 17 17 stfiwx 114 114 stfs 4 4 stfsx 14 14 sth 65 65 stvewx 31 31 stw 16,098 16,116 stwu 741 741 stwx 5,295 5,295 stxsdx 4,778 4,898 stxvd2x 3,113 3,169 stxvw4x 761 761 subf 2,244 2,241 subfc 284 284 subfe 292 292 subfic 109 109 vadduwm 478 478 vmaxsw 23 23 vperm 627 627 vspltisw 12 12 vsubuwm 152 152 xor 60 60 xori 15 15 xsabsdp 1,074 1,077 xsadddp 3,842 4,363 xscmpudp 1,590 1,593 xscvdpsxds 1 1 xscvsxddp 241 243 xsdivdp 508 511 xsmaddadp 9,097 8,818 xsmaddmdp 361 323 xsmaxdp 293 293 xsmindp 35 35 xsmsubadp 1,238 1,348 xsmsubmdp 35 40 xsmuldp 7,806 8,470 xsnegdp 481 475 xsnmaddadp 13 11 xsnmaddmdp 1 1 xsnmsubadp 1,578 1,897 xsnmsubmdp 81 78 xsrdpip 1 1 xsrsqrtedp 27 27 xssqrtdp 161 163 xssubdp 2,884 2,819 xvabsdp 196 196 xvadddp 441 441 xvcmpgtdp 9 9 xvcvsxwdp 44 44 xvdivdp 78 78 xvmaddadp 1,075 1,482 xvmaddmdp 2 81 xvmaxdp 106 106 xvmsubadp 37 37 xvmsubmdp 16 16 xvmuldp 885 799 xvnegdp 122 118 xvnmsubadp 337 354 xvnmsubmdp 1 4 xvsqrtdp 11 11 xvsubdp 316 356 xxlor 2,105 2,142 xxlxor 671 711 xxmrghw 22 22 xxmrglw 22 22 xxpermdi 419 419 xxsel 9 9 xxsldwi 199 199
On 11/11/2010 12:06 PM, Michael Meissner wrote: > gcc.dg/var-expand3.c is a function using Altivec fused multiply/add > instructions, and the loop unroller would unroll loops with (plus (mult (..))) > but not (fma ...). The code is analyze_insn_to_expand_var in loop-unroll.c > where it doesn't realize fma is a loop accumulator. I imagine it is probably > simple to fix this function. Fixed by http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01248.html as yet unreviewed. > gcc.target/powerpc/ppc-fma-{2,4}.c are failing because they use > -mno-fused-madd, and the compiler gives a new warning that isn't accounted > for. Obviously we just need to switch to use -ffp-contract=off for these two > tests (or add the warning message). Fixed here. > gfortran.fortran-torture/execute/in-pack.f90 is getting an internal compiler > error when -maltivec is used: > > /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90: In function 'csub4': > /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:59:0: error: unrecognizable insn: > (insn 395 394 396 45 (set (reg:V4SF 592) > (fma:V4SF (reg:V4SF 363 [ vect_var_.731 ]) > (reg:V4SF 593) > (reg:V4SI 594))) /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:55 -1 > (expr_list:REG_EQUAL (mult:V4SF (reg:V4SF 363 [ vect_var_.731 ]) > (reg:V4SF 593)) > (nil))) Fixed here. A silly mistake converting the altivec mulv4sf pattern. r~
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 2840150..a2dfac6 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1531,7 +1531,6 @@ convert_mult_to_fma (gimple mul_stmt) enum tree_code use_code; tree result = mul_result; bool negate_p = false; - optab opt; use_stmt = USE_STMT (use_p); @@ -1574,32 +1573,31 @@ convert_mult_to_fma (gimple mul_stmt) negate_p = true; } - /* Determine if the target supports the exact form we found. */ switch (use_code) { case MINUS_EXPR: - if (gimple_assign_rhs1 (use_stmt) == result) - { - opt = negate_p ? fnms_optab : fms_optab; - break; - } - negate_p = !negate_p; - /* FALLTHRU */ - + if (gimple_assign_rhs2 (use_stmt) == result) + negate_p = !negate_p; + break; case PLUS_EXPR: - opt = negate_p ? fnma_optab : fma_optab; break; - default: /* FMA can only be formed from PLUS and MINUS. */ return false; } - if (optab_handler (opt, TYPE_MODE (type)) == CODE_FOR_nothing) - return false; /* We can't handle a * b + a * b. */ if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false; + + /* While it is possible to validate whether or not the exact form + that we've recognized is available in the backend, the assumption + is that the transformation is never a loss. For instance, suppose + the target only has the plain FMA pattern available. Consider + a*b-c -> fma(a,b,-c): we've exchanged MUL+SUB for FMA+NEG, which + is still two operations. Consider -(a*b)-c -> fma(-a,b,-c): we + still have 3 operations, but in the FMA form the two NEGs are + independant and could be run in parallel. */ } FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)