From patchwork Wed Feb 20 23:50:31 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Maciej W. Rozycki" <macro@codesourcery.com>
X-Patchwork-Id: 222162
Return-Path: 
 <gcc-patches-return-337219-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 4D3EF2C0084
	for <incoming@patchwork.ozlabs.org>;
	Thu, 21 Feb 2013 10:51:10 +1100 (EST)
Comment: DKIM? See http://www.dkim.org
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed;
	d=gcc.gnu.org; s=default; x=1362009071; h=Comment:
	DomainKey-Signature:Received:Received:Received:Received:Received:
	Received:Date:From:To:CC:Subject:Message-ID:User-Agent:
	MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:
	List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:
	Delivered-To; bh=WxSZOyI3+925SYhsTddTopfpLLA=; b=HsGxgE/q+QPgdLk
	ZR0W7JGI/7acH+fL1N7jWCxtt+++OtJWYqfXDBDpmf1CfXxTa5n4RNSNnzHVVHu9
	z2NyFTuzpeE+qV+hhXirzHakhfphxOzjAgo26N8lXr075mDNaXNH+MB2w+Kbcx9E
	EBvjsmJ2RTAIVfU7/3xtAB7o8Keg=
Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org;
	h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Received:Date:From:To:CC:Subject:Message-ID:User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To;
	b=VEOxW4BBnBfdXTTZ4zP0VEkr0l3GjqnUVX0SwL+qg3uWdqbicSdLhkeePy5LGv
	odmgmiEq0o5Bf6r9s11Hslq5q5iMPjnPe8w/3lU4fPi3lo4zsGPr6Q3nADA0/mWt
	7rZnVl2zZMzYk8YO2J3/bXgpthcrth7TN8n3rqHEjc6ic=;
Received: (qmail 27284 invoked by alias); 20 Feb 2013 23:51:04 -0000
Received: (qmail 27275 invoked by uid 22791); 20 Feb 2013 23:51:04 -0000
X-SWARE-Spam-Status: No, hits=-3.8 required=5.0	tests=AWL, BAYES_00,
	KHOP_RCVD_UNTRUST, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL
X-Spam-Check-By: sourceware.org
Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131)
	by sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Wed, 20 Feb 2013 23:50:41 +0000
Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93])	by
	relay1.mentorg.com with esmtp id 1U8JQu-0005Hd-E3 from
	Maciej_Rozycki@mentor.com ; Wed, 20 Feb 2013 15:50:40 -0800
Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by
	svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with
	Microsoft SMTPSVC(6.0.3790.4675); Wed, 20 Feb 2013 15:50:40 -0800
Received: from [172.30.64.249] (137.202.0.76) by
	SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft
	SMTP Server id 14.1.289.1; Wed, 20 Feb 2013 23:50:37 +0000
Date: Wed, 20 Feb 2013 23:50:31 +0000
From: "Maciej W. Rozycki" <macro@codesourcery.com>
To: Richard Sandiford <rdsandiford@googlemail.com>
CC: Steve Ellcey <Steve.Ellcey@imgtec.com>, <gcc-patches@gcc.gnu.org>
Subject: [PATCH] MIPS: MIPS32r2 FP MADD instruction set support
Message-ID: <alpine.DEB.1.10.1302200035560.6762@tp.orcam.me.uk>
User-Agent: Alpine 1.10 (DEB 962 2008-03-14)
MIME-Version: 1.0
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Hi,

 This issue was originally raised here:

http://gcc.gnu.org/ml/gcc-patches/2012-12/msg00863.html

 We have a shortcoming in GCC in that we only allow the use half of the FP 
MADD instruction subset (MADD.fmt and MSUB.fmt) in the 64-bit/32-register 
mode (CP0.Status.FR == 1) on MIPS32r2 processors.  Furthermore we never 
enable the other half (NMADD.fmt and NMSUB.fmt) on those processors.  
However this whole instruction subset is always available on MIPS32r2 FPUs 
regardless of the mode selected, just as it always has been on FPUs of the 
64-bit ISA line from MIPS IV up.

 The paired-single format however is indeed only available in the 
64-bit/32-register mode as from the MIPS V ISA up.  We do explicitly allow 
it for some (or no) reason for NMADD.PS and NMSUB.PS on MIPS32r2 
processors in the 32-bit/16-register FPU mode (this is probably globally 
overridden elsewhere).

 I'm not sure where this GCC limitation came from, but there were typos in 
the formats listed for the MSUB.S, MSUB.D, NMADD.S, NMADD.D, NMSUB.S and 
NMSUB.D instructions up to and including rev. 2.50 of vol. II of the 
MIPS32r2 architecture documentation set (MIPS doc #MD00086).  This may or 
may not have contributed to this problem as these instructions were listed 
as available from "MIPS64" up rather than from "MIPS64, MIPS32 Release 2" 
up, so no mention of the FPU mode there.

 The change below lifts the relevant restrictions removing a lot of 
clutter that's not needed anymore now that the data mode does not have to 
be checked.

 Also, according to MIPS IV ISA documentation these operations are only 
fused (i.e. don't match original IEEE 754-1985 accuracy requirements) on 
the original MIPS IV R8000 CPU, and MIPS architecture specs don't mention 
any limitations of these instructions either, so I have updated the GCC 
manual to document that on non-R8000 CPUs (which are ones we really care 
about) they are numerically equivalent to computations made with 
corresponding individual operations.

 Finally, while at it, I found it interesting that we have separate 
conditions to cover MADD.fmt/MSUB.fmt (ISA_HAS_FP_MADD4_MSUB4) and 
NMADD.fmt/NMADD.fmt (ISA_HAS_NMADD4_NMSUB4) while all the four 
instructions need to be implemented as a whole group per data format 
supported and cannot be separated (the MIPS architecture specification 
explicitly forbids subsetting).  The difference between the two conditions 
is the former expands to ISA_HAS_FP4, that is enables the subsubset for 
any MIPS IV and up FPU while the latter has an extra "&& (!TARGET_MIPS5400 
|| TARGET_MAD)" qualifier.

 I went ahead and checked available NEC VR54xx documentation and here's 
what I came up with:

1. "VR5400 MIPS RISC Microprocessor Family" datasheet (NEC doc #13362) 
   says:

   "The VR5400 processor family complies with the MIPS IV instruction set 
   and IEEE-754 floating-point and IEEE-1149.1/1149.1a JTAG specification, 
   [...]"

2. "VR5432 MIPS RISC Microprocessor User's Manual, Volume 2" (NEC doc 
   #13751) lists all the individual MADD.fmt, MSUB.fmt, NMADD.fmt and
   NMSUB.fmt instructions in Chapter 18 "Floating-Point Unit Instruction 
   Set" with no restrictions as to their availability (the only other 
   member of the VR54xx family I know of is the VR5464 that is a 
   high-performance version of the VR5432 and is fully software 
   compatible).

 Further to that TARGET_MAD controls whether to "Use PMC-style 'mad' 
instructions" that are all CPU rather than FPU instructions.  The VR5432 
indeed supports extra integer multiply-accumulate instructions, as 
documented in #2 above; these are the MACC/MACCHI/MACCHIU/MACCU and 
MSAC/MSACHI/MSACHIU/MSACU instructions as roughly covered by our 
ISA_HAS_MACC, ISA_HAS_MSAC and ISA_HAS_MACCHI knobs (the latter is not 
implied for TARGET_MIPS5400, perhaps because the family does not support 
the doubleword variants).

 All in all it looks to me like a misplaced hunk.  It was introduced in 
rev. 56471 (you were named as one of the contributors on that commit, so 
you may be able to remember and/or correct me if I am wrong here anywhere) 
and it looks to me it should have been applied to the ISA_HAS_MADD_MSUB 
macro instead that's still just a few lines above ISA_HAS_NMADD4_NMSUB4 
(and was even closer to ISA_HAS_NMADD_NMSUB as the latter was then called; 
the bodies were close enough back then for a hunk to apply cleanly to 
either).

 These days we handle ISA_HAS_MADD_MSUB indirectly through 
GENERATE_MADD_MSUB and in many more places than back at rev. 56471.  We 
also handle TARGET_MAD and ISA_HAS_MACC/ISA_HAS_MSAC/ISA_HAS_MACCHI 
explicitly throughout mips.md, so I think we should simply discard this 
incorrect condition, and then, as ISA_HAS_FP_MADD4_MSUB4 and 
ISA_HAS_NMADD4_NMSUB4 will have become identical, fold the two macros into 
one, perhaps ISA_HAS_FP_MADD4.  And likewise ISA_HAS_FP_MADD3.  Thoughts?

 Back to the change considered here, it was successfully regression-tested 
with the gcc, g++ and libstdc++ testsuites, with the mips-linux-gnu target 
and the o32/mips32r2 and n64/mips64r2 multilibs, both endiannesses.  I 
examined some test cases executed to verify the instructions concerned 
have been emitted as appropriate where previously they were not.  I hope 
this change is OK to apply as soon as 4.9 has opened.

 BTW, do you happen to know a way to reliable force all our testsuites NOT 
to delete executables after run?  Personally I think it's missing the 
point to have them deleted -- how can one debug any regressions then?  I 
have a rather gross hack patching up most of the TCL scripts throughout to 
remove the various instances of file deletion commands and stick the 
-keep-output option onto dg-test calls, but there always seems to be 
something that escapes.  I find it very frustrating.

 What do other people do?  I can't believe all the GCC developers are 
happy to accept this pain.

2013-02-20  Maciej W. Rozycki  <macro@codesourcery.com>

	gcc/
	* config/mips/mips.h (ISA_HAS_FP4): Don't restrict ISA_MIPS32R2
	to TARGET_FLOAT64.
	(ISA_HAS_NMADD4_NMSUB4): Remove the MODE argument; don't restrict 
	ISA_MIPS32R2.
	(ISA_HAS_NMADD3_NMSUB3): Remove the MODE argument.
	* config/mips/mips.c (mips_rtx_costs): Update according to
	changes to ISA_HAS_NMADD4_NMSUB4 and ISA_HAS_NMADD3_NMSUB3.
	* config/mips/mips.md (nmadd4<mode>, nmadd3<mode>): Likewise.
	(nmadd4<mode>_fastmath, nmadd3<mode>_fastmath): Likewise.
	(nmsub4<mode>, nmsub3<mode>): Likewise.
	(nmsub4<mode>_fastmath, nmsub3<mode>_fastmath): Likewise.
	* doc/invoke.texi (MIPS Options): Update documentation of the
	floating-point multiply-accumulate instruction restrictions.

  Maciej

gcc-mips32r2-madd.patch

Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.c
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.c	2013-02-07 02:59:05.465114046 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.c	2013-02-07 02:59:48.575511623 +0000
@@ -3798,7 +3798,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case MINUS:
       if (float_mode_p
-	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
+	  && (ISA_HAS_NMADD4_NMSUB4 || ISA_HAS_NMADD3_NMSUB3)
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && !HONOR_SIGNED_ZEROS (mode))
@@ -3850,7 +3850,7 @@ mips_rtx_costs (rtx x, int code, int out
 
     case NEG:
       if (float_mode_p
-	  && (ISA_HAS_NMADD4_NMSUB4 (mode) || ISA_HAS_NMADD3_NMSUB3 (mode))
+	  && (ISA_HAS_NMADD4_NMSUB4 || ISA_HAS_NMADD3_NMSUB3)
 	  && TARGET_FUSED_MADD
 	  && !HONOR_NANS (mode)
 	  && HONOR_SIGNED_ZEROS (mode))
Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.h
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.h	2013-02-07 02:35:34.024073830 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.h	2013-02-07 02:59:48.575511623 +0000
@@ -855,7 +855,7 @@ struct mips_cpu_info {
    FP madd and msub instructions, and the FP recip and recip sqrt
    instructions.  */
 #define ISA_HAS_FP4		((ISA_MIPS4				\
-				  || (ISA_MIPS32R2 && TARGET_FLOAT64)   \
+				  || ISA_MIPS32R2			\
 				  || ISA_MIPS64				\
 				  || ISA_MIPS64R2)			\
 				 && !TARGET_MIPS16)
@@ -885,18 +885,12 @@ struct mips_cpu_info {
 
 /* ISA has floating-point nmadd and nmsub instructions
    'd = -((a * b) [+-] c)'.  */
-#define ISA_HAS_NMADD4_NMSUB4(MODE)					\
-				((ISA_MIPS4				\
-				  || (ISA_MIPS32R2 && (MODE) == V2SFmode) \
-				  || ISA_MIPS64				\
-				  || ISA_MIPS64R2)			\
-				 && (!TARGET_MIPS5400 || TARGET_MAD)	\
-				 && !TARGET_MIPS16)
+#define ISA_HAS_NMADD4_NMSUB4	(ISA_HAS_FP4				\
+				 && (!TARGET_MIPS5400 || TARGET_MAD))
 
 /* ISA has floating-point nmadd and nmsub instructions
    'c = -((a * b) [+-] c)'.  */
-#define ISA_HAS_NMADD3_NMSUB3(MODE)					\
-                                TARGET_LOONGSON_2EF
+#define ISA_HAS_NMADD3_NMSUB3	TARGET_LOONGSON_2EF
 
 /* ISA has count leading zeroes/ones instruction (not implemented).  */
 #define ISA_HAS_CLZ_CLO		((ISA_MIPS32				\
Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.md
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.md	2013-02-07 02:35:34.004034605 +0000
+++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.md	2013-02-07 02:59:48.585476315 +0000
@@ -2344,7 +2344,7 @@
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2359,7 +2359,7 @@
 		   (mult:ANYF (match_operand:ANYF 1 "register_operand" "f")
 			      (match_operand:ANYF 2 "register_operand" "f"))
 		   (match_operand:ANYF 3 "register_operand" "0"))))]
-  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+  "ISA_HAS_NMADD3_NMSUB3
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2374,7 +2374,7 @@
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "f")))]
-  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2389,7 +2389,7 @@
 	 (mult:ANYF (neg:ANYF (match_operand:ANYF 1 "register_operand" "f"))
 		    (match_operand:ANYF 2 "register_operand" "f"))
 	 (match_operand:ANYF 3 "register_operand" "0")))]
-  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+  "ISA_HAS_NMADD3_NMSUB3
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2404,7 +2404,7 @@
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "f"))))]
-  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2419,7 +2419,7 @@
 		   (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 			      (match_operand:ANYF 3 "register_operand" "f"))
 		   (match_operand:ANYF 1 "register_operand" "0"))))]
-  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+  "ISA_HAS_NMADD3_NMSUB3
    && TARGET_FUSED_MADD
    && HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2434,7 +2434,7 @@
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "f"))))]
-  "ISA_HAS_NMADD4_NMSUB4 (<MODE>mode)
+  "ISA_HAS_NMADD4_NMSUB4
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
@@ -2449,7 +2449,7 @@
 	 (match_operand:ANYF 1 "register_operand" "f")
 	 (mult:ANYF (match_operand:ANYF 2 "register_operand" "f")
 		    (match_operand:ANYF 3 "register_operand" "0"))))]
-  "ISA_HAS_NMADD3_NMSUB3 (<MODE>mode)
+  "ISA_HAS_NMADD3_NMSUB3
    && TARGET_FUSED_MADD
    && !HONOR_SIGNED_ZEROS (<MODE>mode)
    && !HONOR_NANS (<MODE>mode)"
Index: gcc-fsf-trunk-quilt/gcc/doc/invoke.texi
===================================================================
--- gcc-fsf-trunk-quilt.orig/gcc/doc/invoke.texi	2013-02-07 02:21:07.574131467 +0000
+++ gcc-fsf-trunk-quilt/gcc/doc/invoke.texi	2013-02-07 02:59:48.585476315 +0000
@@ -16440,10 +16440,12 @@ Enable (disable) use of the floating-poi
 instructions, when they are available.  The default is
 @option{-mfused-madd}.
 
-When multiply-accumulate instructions are used, the intermediate
-product is calculated to infinite precision and is not subject to
-the FCSR Flush to Zero bit.  This may be undesirable in some
-circumstances.
+On the R8000 CPU when multiply-accumulate instructions are used,
+the intermediate product is calculated to infinite precision
+and is not subject to the FCSR Flush to Zero bit.  This may be
+undesirable in some circumstances.  On other processors the result
+is numerically identical to the equivalent computation using
+separate multiply, add, subtract and negate instructions.
 
 @item -nocpp
 @opindex nocpp