[4/N,rs6000,cft] -mfused-add cleanup

Message ID	4CDC3159.8040705@redhat.com
State	New
Headers	show Return-Path: <gcc-patches-return-278099-incoming=patchwork.ozlabs.org@gcc.gnu.org> Message-ID: <4CDC3159.8040705@redhat.com> Date: Thu, 11 Nov 2010 10:09:29 -0800 From: Richard Henderson <rth@redhat.com> User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101103 Fedora/1.0-0.33.b2pre.fc14 Thunderbird/3.1.6 MIME-Version: 1.0 To: GCC Patches <gcc-patches@gcc.gnu.org> CC: meissner@linux.vnet.ibm.com, dje.gcc@gmail.com, richard.guenther@gmail.com Subject: Re: [patch 4/N][rs6000, cft] -mfused-add cleanup References: <4CDB3BA6.1010500@redhat.com> In-Reply-To: <4CDB3BA6.1010500@redhat.com> Content-Type: multipart/mixed; boundary="------------020709090600010001090402" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Message ID

4CDC3159.8040705@redhat.com

State

New

Headers

Message-ID: <4CDC3159.8040705@redhat.com>
Date: Thu, 11 Nov 2010 10:09:29 -0800
From: Richard Henderson <rth@redhat.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.12) Gecko/20101103 Fedora/1.0-0.33.b2pre.fc14
	Thunderbird/3.1.6
MIME-Version: 1.0
To: GCC Patches <gcc-patches@gcc.gnu.org>
CC: meissner@linux.vnet.ibm.com, dje.gcc@gmail.com,
	richard.guenther@gmail.com
Subject: Re: [patch 4/N][rs6000, cft] -mfused-add cleanup
References: <4CDB3BA6.1010500@redhat.com>
In-Reply-To: <4CDB3BA6.1010500@redhat.com>
Content-Type: multipart/mixed;
	boundary="------------020709090600010001090402"
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org

Commit Message

Richard Henderson Nov. 11, 2010, 6:09 p.m. UTC

According to Meissner via IRC, there are a number of regressions on the
powerpc port.  One example is cc.target/powerpc/ppc-fmadd-3.c, a testcase
that wants to see FMA generated even when the exact operation isn't 
available.  I think there is some sense to this, as indicated in the 
comment I added to the code.

Ok?


r~
* tree-ssa-math-opts.c (convert_mult_to_fma): Do not verify
	that the target has the exact fma operation that we matched.

Comments

Michael Meissner Nov. 11, 2010, 8:06 p.m. UTC | #1

On Thu, Nov 11, 2010 at 10:09:29AM -0800, Richard Henderson wrote:
> According to Meissner via IRC, there are a number of regressions on the
> powerpc port.  One example is cc.target/powerpc/ppc-fmadd-3.c, a testcase
> that wants to see FMA generated even when the exact operation isn't 
> available.  I think there is some sense to this, as indicated in the 
> comment I added to the code.
> 
> Ok?

To repeat what I said on IRC, I compared three compilers.  Rth (latest patches
from Richard including this one), Mrm (the patches I sent out yesterday that
don't remove the old combiner insns), and Trunk (svn trunk from yesterday).

						Rth	Mrm	Trunk
gcc.dg/var-expand3.c				fail	pass	pass
gcc.target/powerpc/ppc-fma-1.c			pass	pass	fail
gcc.target/powerpc/ppc-fma-2.c			fail	pass	fail
gcc.target/powerpc/ppc-fma-3.c			pass	pass	fail
gcc.target/powerpc/ppc-fma-4.c			fail	pass	fail
gfortran.fortran-torture/execute/in-pack.f90	fail	pass	pass

This is for both 32-bit and 64-bit compilations.

gcc.dg/var-expand3.c is a function using Altivec fused multiply/add
instructions, and the loop unroller would unroll loops with (plus (mult (..)))
but not (fma ...).  The code is analyze_insn_to_expand_var in loop-unroll.c
where it doesn't realize fma is a loop accumulator.  I imagine it is probably
simple to fix this function.

gcc.target/powerpc/ppc-fma-{2,4}.c are failing because they use
-mno-fused-madd, and the compiler gives a new warning that isn't accounted
for.  Obviously we just need to switch to use -ffp-contract=off for these two
tests (or add the warning message).

gfortran.fortran-torture/execute/in-pack.f90 is getting an internal compiler
error when -maltivec is used:

/home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90: In function 'csub4':
/home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:59:0: error: unrecognizable insn:
(insn 395 394 396 45 (set (reg:V4SF 592)
        (fma:V4SF (reg:V4SF 363 [ vect_var_.731 ])
            (reg:V4SF 593)
            (reg:V4SI 594))) /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:55 -1
     (expr_list:REG_EQUAL (mult:V4SF (reg:V4SF 363 [ vect_var_.731 ])
            (reg:V4SF 593))
        (nil)))

The third argument should not be V4SI.  Perhaps a test for all three arguments
being the same mode, or doing a conversion?

BTW, ever since I started using GCC 4.5 and newer, the calculix has failed with
fma's are generated because of a -0.0 issue, and I needed to add a wrapper
around atan2 to prevent it from return -0.0 when given (-0.0, 0.0) args.  My
patches don't address this issue, but Richard's does, and I no longer need the
wrapper with his patches.

I built calculix with both the trunk and with Richard's patches, and the run
was about 0.8% slower than the trunk, which is not enough to worry about.

As I mentioned, I will be away from the mailing lists until Monday afternoon.

Here are the instruction differences between trunk and Richard's patches:


	Instruction      Trunk			       Rth
	===========      =====  		       ===
	add             19,420  		    19,436
	addi            67,585  		    67,911
	addic              106  		       107
	addic.             108  		       108
	addis           10,942  		    10,990
	addze              114  		       114
	and                231  		       231
	and.                 1  		         1
	andi.              146  		       146
	bctr                26  		        26
	bdnz             1,582  		     1,582
	bdz                144  		       144
	beq             23,038  		    23,178
	beqlr               25  		        25
	bge              1,552  		     1,562
	bgt              2,000  		     2,000
	bgtlr                3  		         3
	bl              24,666  		    24,673
	ble              5,872  		     5,863
	blelr               17  		        17
	blr              1,698  		     1,696
	blt              1,541  		     1,543
	bltlr                1  		         1
	bne              6,737  		     6,739
	bng                  1  		         1
	cmpd             2,065  		     2,085
	cmpdi            8,471  		     8,489
	cmpld               14  		        14
	cmplw            1,233  		     1,245
	cmplwi             653  		       642
	cmpw             9,006  		     9,012
	cmpwi           17,834  		    17,928
	cntlzd               7  		         7
	cntlzw              45  		        45
	crnot                6  		         6
	divw               170  		       170
	divwu                5  		         5
	extsb                3  		         3
	extsw           21,797  		    21,859
	extsw.             180  		       180
	fadds               19  		        19
	fcfids              40  		        40
	fcmpu               44  		        44
	fctiwz             114  		       114
	fdivs               14  		        14
	fmadds               3  		         3
	fmr                 81  		        90
	fmuls               34  		        34
	fneg                 1  		         1
	frsp                15  		        14
	fsel                12  		        12
	fsubs                9  		         9
	lbz                515  		       515
	lbzu                31  		        31
	lbzx               141  		       141
	ld              39,720  		    39,718
	ldu                396  		       396
	ldux                80  		        80
	ldx              4,317  		     4,319
	lfd             15,136  		    15,403
	lfdu             1,311  		     1,421
	lfdux              456  		       473
	lfiwax             282  		       284
	lfs                119  		       119
	lhz                 25  		        25
	li              33,356  		    33,388
	lis                464  		       464
	lvsr                54  		        54
	lwa              4,199  		     4,202
	lwaux                4  		         4
	lwax             2,437  		     2,445
	lwz             15,532  		    15,562
	lwzu             2,556  		     2,553
	lwzux               24  		        24
	lwzx             6,816  		     6,822
	lxsdx           11,407  		    11,568
	lxvd2x           4,626  		     5,034
	lxvdsx               1  		         1
	lxvw4x             863  		       863
	mcrf                20  		        20
	mfcr               233  		       233
	mflr             1,256  		     1,256
	mfvrsave            66  		        67
	mr              29,098  		    29,122
	mr.              1,022  		     1,022
	mtcrf              254  		       254
	mtctr            1,755  		     1,755
	mtlr             1,616  		     1,614
	mtvrsave           139  		       141
	mulhw               54  		        54
	mulhwu               9  		         9
	mulld              321  		       321
	mulli            2,996  		     2,996
	mullw              723  		       723
	mullw.              13  		        13
	neg                791  		       791
	nop             24,085  		    24,092
	nor                536  		       539
	or                 142  		       142
	ori              1,379  		     1,379
	oris                69  		        70
	rldic              174  		       176
	rldicl           5,558  		     5,614
	rldicl.          2,769  		     2,770
	rldicr             684  		       684
	rlwinm           1,519  		     1,523
	rlwinm.             15  		        15
	sld                  7  		         7
	sldi            20,039  		    20,081
	slwi             2,639  		     2,656
	sradi               93  		        93
	srawi              258  		       258
	srdi             1,217  		     1,229
	srwi               451  		       451
	stb                604  		       604
	stbu                17  		        17
	stbx               184  		       184
	std             28,783  		    28,804
	stdu             2,014  		     2,014
	stdux               65  		        65
	stdx             4,479  		     4,477
	stfd             7,412  		     7,537
	stfdu              180  		       180
	stfdux              17  		        17
	stfiwx             114  		       114
	stfs                 4  		         4
	stfsx               14  		        14
	sth                 65  		        65
	stvewx              31  		        31
	stw             16,098  		    16,116
	stwu               741  		       741
	stwx             5,295  		     5,295
	stxsdx           4,778  		     4,898
	stxvd2x          3,113  		     3,169
	stxvw4x            761  		       761
	subf             2,244  		     2,241
	subfc              284  		       284
	subfe              292  		       292
	subfic             109  		       109
	vadduwm            478  		       478
	vmaxsw              23  		        23
	vperm              627  		       627
	vspltisw            12  		        12
	vsubuwm            152  		       152
	xor                 60  		        60
	xori                15  		        15
	xsabsdp          1,074  		     1,077
	xsadddp          3,842  		     4,363
	xscmpudp         1,590  		     1,593
	xscvdpsxds           1  		         1
	xscvsxddp          241  		       243
	xsdivdp            508  		       511
	xsmaddadp        9,097  		     8,818
	xsmaddmdp          361  		       323
	xsmaxdp            293  		       293
	xsmindp             35  		        35
	xsmsubadp        1,238  		     1,348
	xsmsubmdp           35  		        40
	xsmuldp          7,806  		     8,470
	xsnegdp            481  		       475
	xsnmaddadp          13  		        11
	xsnmaddmdp           1  		         1
	xsnmsubadp       1,578  		     1,897
	xsnmsubmdp          81  		        78
	xsrdpip              1  		         1
	xsrsqrtedp          27  		        27
	xssqrtdp           161  		       163
	xssubdp          2,884  		     2,819
	xvabsdp            196  		       196
	xvadddp            441  		       441
	xvcmpgtdp            9  		         9
	xvcvsxwdp           44  		        44
	xvdivdp             78  		        78
	xvmaddadp        1,075  		     1,482
	xvmaddmdp            2  		        81
	xvmaxdp            106  		       106
	xvmsubadp           37  		        37
	xvmsubmdp           16  		        16
	xvmuldp            885  		       799
	xvnegdp            122  		       118
	xvnmsubadp         337  		       354
	xvnmsubmdp           1  		         4
	xvsqrtdp            11  		        11
	xvsubdp            316  		       356
	xxlor            2,105  		     2,142
	xxlxor             671  		       711
	xxmrghw             22  		        22
	xxmrglw             22  		        22
	xxpermdi           419  		       419
	xxsel                9  		         9
	xxsldwi            199  		       199

Richard Henderson Nov. 14, 2010, 12:14 a.m. UTC | #2

On 11/11/2010 12:06 PM, Michael Meissner wrote:
> gcc.dg/var-expand3.c is a function using Altivec fused multiply/add
> instructions, and the loop unroller would unroll loops with (plus (mult (..)))
> but not (fma ...).  The code is analyze_insn_to_expand_var in loop-unroll.c
> where it doesn't realize fma is a loop accumulator.  I imagine it is probably
> simple to fix this function.

Fixed by 
  http://gcc.gnu.org/ml/gcc-patches/2010-11/msg01248.html
as yet unreviewed.

> gcc.target/powerpc/ppc-fma-{2,4}.c are failing because they use
> -mno-fused-madd, and the compiler gives a new warning that isn't accounted
> for.  Obviously we just need to switch to use -ffp-contract=off for these two
> tests (or add the warning message).

Fixed here.

> gfortran.fortran-torture/execute/in-pack.f90 is getting an internal compiler
> error when -maltivec is used:
> 
> /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90: In function 'csub4':
> /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:59:0: error: unrecognizable insn:
> (insn 395 394 396 45 (set (reg:V4SF 592)
>         (fma:V4SF (reg:V4SF 363 [ vect_var_.731 ])
>             (reg:V4SF 593)
>             (reg:V4SI 594))) /home/meissner/fsf-src/rth/gcc/testsuite/gfortran.fortran-torture/execute/in-pack.f90:55 -1
>      (expr_list:REG_EQUAL (mult:V4SF (reg:V4SF 363 [ vect_var_.731 ])
>             (reg:V4SF 593))
>         (nil)))

Fixed here.  A silly mistake converting the altivec mulv4sf pattern.


r~

diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 2840150..a2dfac6 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1531,7 +1531,6 @@  convert_mult_to_fma (gimple mul_stmt)
       enum tree_code use_code;
       tree result = mul_result;
       bool negate_p = false;
-      optab opt;
 
       use_stmt = USE_STMT (use_p);
 
@@ -1574,32 +1573,31 @@  convert_mult_to_fma (gimple mul_stmt)
 	  negate_p = true;
 	}
 
-      /* Determine if the target supports the exact form we found.  */
       switch (use_code)
 	{
 	case MINUS_EXPR:
-	  if (gimple_assign_rhs1 (use_stmt) == result)
-	    {
-	      opt = negate_p ? fnms_optab : fms_optab;
-	      break;
-	    }
-	  negate_p = !negate_p;
-	  /* FALLTHRU */
-
+	  if (gimple_assign_rhs2 (use_stmt) == result)
+	    negate_p = !negate_p;
+	  break;
 	case PLUS_EXPR:
-	  opt = negate_p ? fnma_optab : fma_optab;
 	  break;
-
 	default:
 	  /* FMA can only be formed from PLUS and MINUS.  */
 	  return false;
 	}
-      if (optab_handler (opt, TYPE_MODE (type)) == CODE_FOR_nothing)
-	return false;
 
       /* We can't handle a * b + a * b.  */
       if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
 	return false;
+
+      /* While it is possible to validate whether or not the exact form
+	 that we've recognized is available in the backend, the assumption
+	 is that the transformation is never a loss.  For instance, suppose
+	 the target only has the plain FMA pattern available.  Consider
+	 a*b-c -> fma(a,b,-c): we've exchanged MUL+SUB for FMA+NEG, which
+	 is still two operations.  Consider -(a*b)-c -> fma(-a,b,-c): we
+	 still have 3 operations, but in the FMA form the two NEGs are
+	 independant and could be run in parallel.  */
     }
 
   FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result)