From patchwork Fri Oct 24 11:54:51 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Lawrence <alan.lawrence@arm.com>
X-Patchwork-Id: 402826
Return-Path: 
 <gcc-patches-return-381761-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id 6606D14007D
	for <incoming@patchwork.ozlabs.org>;
	Fri, 24 Oct 2014 22:55:10 +1100 (AEDT)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	q=dns; s=default; b=CaiCUhu/1iKV/saFDjqQTGpUBSd4J/k9ouYzfGXZw8P
	eXYwb2InlIBEcKo/xqbdQQe48/6dAkI+XY42rAYUKvj8/JU3NjvcuclyHSD6demp
	HZ+P2QE4QkRlO+OZqSA2hpOQWSU+2SxIo5PLWge7kx/P07Y8Gg10T/m0YluQI0Yc
	=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:message-id:date:from:mime-version:to:cc:subject:content-type;
	s=default; bh=3C5+15wTZA8XHHb7tSEcrMg+4KA=; b=v0P0zvx+PMIogfrMS
	odcD6mzcwu3Gqua0UWkKzt4TxC26ivpRONah1sxVul7tB5Wt07JTfiq/6ORKl7F0
	0bj5l7vonOlfQkJzK2BOzL+5X+bDwexRnMJfrPj8A5PjTY8QhKwpQi9LpGo0DLEi
	Va+7+qsfexFMZRuKwmqYnLwUb0=
Received: (qmail 24863 invoked by alias); 24 Oct 2014 11:55:02 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 24823 invoked by uid 89); 24 Oct 2014 11:55:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: service87.mimecast.com
Received: from service87.mimecast.com (HELO service87.mimecast.com)
	(91.220.42.44) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
	Fri, 24 Oct 2014 11:54:57 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
	[217.140.96.21]) by service87.mimecast.com;
	Fri, 24 Oct 2014 12:54:53 +0100
Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with
	Microsoft SMTPSVC(6.0.3790.3959); Fri, 24 Oct 2014 12:54:52 +0100
Message-ID: <544A3E0B.2000803@arm.com>
Date: Fri, 24 Oct 2014 12:54:51 +0100
From: Alan Lawrence <alan.lawrence@arm.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20101213)
MIME-Version: 1.0
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
CC: Richard Biener <rguenther@suse.de>,  David Edelsohn <dje.gcc@gmail.com>
Subject: [PATCH v2 0-6/11] Fix PR/61114,
	make direct vector reductions endianness-neutral
X-MC-Unique: 114102412545302701
X-IsSubscribed: yes

This is the first half of my previous patch series 
(https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part 
making the REDUC_..._EXPR tree codes endian-neutral, and adding a new 
reduce-to-scalar optab in place of the endianness-dependent 
reduc_[us](plus|min|max)_optab.

I'm leaving the vec_shr portion out of this patch series, as the link between 
the two halves is only the end goal of removing an "if (BYTES_BIG_ENDIAN)" from 
tree-vect-loop.c; this series removes that from one code path so can stand alone.

Patches 1-6 are as previously posted apart from rebasing and removing the 
old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 
and 6 have already been approved; patch 3 was discussed somewhat but I think we 
decided against most of the ideas raised, I have added comment to 
scalar_reduc_to_vector. I now reread Richie's "Otherwise the patch looks good to 
me" and wonder if I should have taken that as an approval but I didn't read it 
that way at the time...???

Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to 
the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out 
how to do the same for MIPS (specifically what I need to add to 
mips_expand_vec_reduc), and have had no response from the maintainers, so am 
leaving that for now. Also I haven't migrated (or worked out how to target) 
rs6000/paired.md, help would be most welcome.


The suggestion was then to "complete" the migration, by removing the old optabs. 
There are a few options here and I'll follow up with appropriate patches 
according to feedback received. I see options:

(1) just delete the old optabs (and the migration code). This would 
performance-regress the MIPS backend, but should not break it, although one 
should really do *something* with the then-unused reduc_[us](plus|min|max)_optab 
in config/mips/loongson.md.

(2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the 
MIPS backend if something were not done with it's existing patterns.

(2a) Alternatively I could just use a different new name, e.g. reduce_...., 
reduct_...., vec_reduc_..., anything that's less of a mouthful than 
reduc_..._scal. Whilst being only-very-slightly-different from the current 
reduc_... might be confusing, so might changing the meaning of the optab, and 
its signature, with the existing name, so am open to suggestions?

Cheers, Alan

commit 846d5932041e04bbf386efbc739aee9749051bc7
Author: Alan Lawrence <alan.lawrence@arm.com>
Date:   Wed Aug 13 17:25:13 2014 +0100

    AArch64: Reintroduce gimple_fold for min+max+plus

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index a49da89..283469b 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1188,9 +1188,6 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args,
   return NULL_TREE;
 }
 
-/* Handling of reduction operations temporarily removed so as to decouple
-   changes to tree codes from AArch64 NEON Intrinsics.  */
-#if 0
 bool
 aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
@@ -1200,19 +1197,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   tree fndecl;
   gimple new_stmt = NULL;
 
-  /* The operations folded below are reduction operations.  These are
-     defined to leave their result in the 0'th element (from the perspective
-     of GCC).  The architectural instruction we are folding will leave the
-     result in the 0'th element (from the perspective of the architecture).
-     For big-endian systems, these perspectives are not aligned.
-
-     It is therefore wrong to perform this fold on big-endian.  There
-     are some tricks we could play with shuffling, but the mid-end is
-     inconsistent in the way it treats reduction operations, so we will
-     end up in difficulty.  Until we fix the ambiguity - just bail out.  */
-  if (BYTES_BIG_ENDIAN)
-    return false;
-
   if (call)
     {
       fndecl = gimple_call_fndecl (stmt);
@@ -1224,23 +1208,28 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 			? gimple_call_arg_ptr (stmt, 0)
 			: &error_mark_node);
 
+	  /* We use gimple's REDUC_(PLUS|MIN|MAX)_EXPRs for float, signed int
+	     and unsigned int; it will distinguish according to the types of
+	     the arguments to the __builtin.  */
 	  switch (fcode)
 	    {
-	      BUILTIN_VALL (UNOP, reduc_splus_, 10)
-		new_stmt = gimple_build_assign_with_ops (
+	      BUILTIN_VALL (UNOP, reduc_plus_scal_, 10)
+	        new_stmt = gimple_build_assign_with_ops (
 						REDUC_PLUS_EXPR,
 						gimple_call_lhs (stmt),
 						args[0],
 						NULL_TREE);
 		break;
-	      BUILTIN_VDQIF (UNOP, reduc_smax_, 10)
+	      BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10)
+	      BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10)
 		new_stmt = gimple_build_assign_with_ops (
 						REDUC_MAX_EXPR,
 						gimple_call_lhs (stmt),
 						args[0],
 						NULL_TREE);
 		break;
-	      BUILTIN_VDQIF (UNOP, reduc_smin_, 10)
+	      BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10)
+	      BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10)
 		new_stmt = gimple_build_assign_with_ops (
 						REDUC_MIN_EXPR,
 						gimple_call_lhs (stmt),
@@ -1262,7 +1251,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 
   return changed;
 }
-#endif
 
 void
 aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 27d82f3..db5ff59 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10015,8 +10015,8 @@ aarch64_asan_shadow_offset (void)
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required
 
-//#undef TARGET_GIMPLE_FOLD_BUILTIN
-//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
+#undef TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin
 
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr