From patchwork Fri Oct 24 11:54:51 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 402826 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 6606D14007D for ; Fri, 24 Oct 2014 22:55:10 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=CaiCUhu/1iKV/saFDjqQTGpUBSd4J/k9ouYzfGXZw8P eXYwb2InlIBEcKo/xqbdQQe48/6dAkI+XY42rAYUKvj8/JU3NjvcuclyHSD6demp HZ+P2QE4QkRlO+OZqSA2hpOQWSU+2SxIo5PLWge7kx/P07Y8Gg10T/m0YluQI0Yc = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=3C5+15wTZA8XHHb7tSEcrMg+4KA=; b=v0P0zvx+PMIogfrMS odcD6mzcwu3Gqua0UWkKzt4TxC26ivpRONah1sxVul7tB5Wt07JTfiq/6ORKl7F0 0bj5l7vonOlfQkJzK2BOzL+5X+bDwexRnMJfrPj8A5PjTY8QhKwpQi9LpGo0DLEi Va+7+qsfexFMZRuKwmqYnLwUb0= Received: (qmail 24863 invoked by alias); 24 Oct 2014 11:55:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 24823 invoked by uid 89); 24 Oct 2014 11:55:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Oct 2014 11:54:57 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 24 Oct 2014 12:54:53 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 24 Oct 2014 12:54:52 +0100 Message-ID: <544A3E0B.2000803@arm.com> Date: Fri, 24 Oct 2014 12:54:51 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Richard Biener , David Edelsohn Subject: [PATCH v2 0-6/11] Fix PR/61114, make direct vector reductions endianness-neutral X-MC-Unique: 114102412545302701 X-IsSubscribed: yes This is the first half of my previous patch series (https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01456.html), that is the part making the REDUC_..._EXPR tree codes endian-neutral, and adding a new reduce-to-scalar optab in place of the endianness-dependent reduc_[us](plus|min|max)_optab. I'm leaving the vec_shr portion out of this patch series, as the link between the two halves is only the end goal of removing an "if (BYTES_BIG_ENDIAN)" from tree-vect-loop.c; this series removes that from one code path so can stand alone. Patches 1-6 are as previously posted apart from rebasing and removing the old/poisoned AArch64 patterns as per maintainer's request. Patches 1, 2, 4, 5 and 6 have already been approved; patch 3 was discussed somewhat but I think we decided against most of the ideas raised, I have added comment to scalar_reduc_to_vector. I now reread Richie's "Otherwise the patch looks good to me" and wonder if I should have taken that as an approval but I didn't read it that way at the time...??? Patches 7-11 migrate migrate ARM, x86, IA64 (I think), and mostly PowerPC, to the new reduc_(plus|[us](min|max))_scal_optab. I have not managed to work out how to do the same for MIPS (specifically what I need to add to mips_expand_vec_reduc), and have had no response from the maintainers, so am leaving that for now. Also I haven't migrated (or worked out how to target) rs6000/paired.md, help would be most welcome. The suggestion was then to "complete" the migration, by removing the old optabs. There are a few options here and I'll follow up with appropriate patches according to feedback received. I see options: (1) just delete the old optabs (and the migration code). This would performance-regress the MIPS backend, but should not break it, although one should really do *something* with the then-unused reduc_[us](plus|min|max)_optab in config/mips/loongson.md. (2) also renaming reduc_..._scal_optab back to reduc_..._optab; would break the MIPS backend if something were not done with it's existing patterns. (2a) Alternatively I could just use a different new name, e.g. reduce_...., reduct_...., vec_reduc_..., anything that's less of a mouthful than reduc_..._scal. Whilst being only-very-slightly-different from the current reduc_... might be confusing, so might changing the meaning of the optab, and its signature, with the existing name, so am open to suggestions? Cheers, Alan commit 846d5932041e04bbf386efbc739aee9749051bc7 Author: Alan Lawrence Date: Wed Aug 13 17:25:13 2014 +0100 AArch64: Reintroduce gimple_fold for min+max+plus diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index a49da89..283469b 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1188,9 +1188,6 @@ aarch64_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED, tree *args, return NULL_TREE; } -/* Handling of reduction operations temporarily removed so as to decouple - changes to tree codes from AArch64 NEON Intrinsics. */ -#if 0 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) { @@ -1200,19 +1197,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) tree fndecl; gimple new_stmt = NULL; - /* The operations folded below are reduction operations. These are - defined to leave their result in the 0'th element (from the perspective - of GCC). The architectural instruction we are folding will leave the - result in the 0'th element (from the perspective of the architecture). - For big-endian systems, these perspectives are not aligned. - - It is therefore wrong to perform this fold on big-endian. There - are some tricks we could play with shuffling, but the mid-end is - inconsistent in the way it treats reduction operations, so we will - end up in difficulty. Until we fix the ambiguity - just bail out. */ - if (BYTES_BIG_ENDIAN) - return false; - if (call) { fndecl = gimple_call_fndecl (stmt); @@ -1224,23 +1208,28 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) ? gimple_call_arg_ptr (stmt, 0) : &error_mark_node); + /* We use gimple's REDUC_(PLUS|MIN|MAX)_EXPRs for float, signed int + and unsigned int; it will distinguish according to the types of + the arguments to the __builtin. */ switch (fcode) { - BUILTIN_VALL (UNOP, reduc_splus_, 10) - new_stmt = gimple_build_assign_with_ops ( + BUILTIN_VALL (UNOP, reduc_plus_scal_, 10) + new_stmt = gimple_build_assign_with_ops ( REDUC_PLUS_EXPR, gimple_call_lhs (stmt), args[0], NULL_TREE); break; - BUILTIN_VDQIF (UNOP, reduc_smax_, 10) + BUILTIN_VDQIF (UNOP, reduc_smax_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umax_scal_, 10) new_stmt = gimple_build_assign_with_ops ( REDUC_MAX_EXPR, gimple_call_lhs (stmt), args[0], NULL_TREE); break; - BUILTIN_VDQIF (UNOP, reduc_smin_, 10) + BUILTIN_VDQIF (UNOP, reduc_smin_scal_, 10) + BUILTIN_VDQ_BHSI (UNOPU, reduc_umin_scal_, 10) new_stmt = gimple_build_assign_with_ops ( REDUC_MIN_EXPR, gimple_call_lhs (stmt), @@ -1262,7 +1251,6 @@ aarch64_gimple_fold_builtin (gimple_stmt_iterator *gsi) return changed; } -#endif void aarch64_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 27d82f3..db5ff59 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10015,8 +10015,8 @@ aarch64_asan_shadow_offset (void) #undef TARGET_FRAME_POINTER_REQUIRED #define TARGET_FRAME_POINTER_REQUIRED aarch64_frame_pointer_required -//#undef TARGET_GIMPLE_FOLD_BUILTIN -//#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin +#undef TARGET_GIMPLE_FOLD_BUILTIN +#define TARGET_GIMPLE_FOLD_BUILTIN aarch64_gimple_fold_builtin #undef TARGET_GIMPLIFY_VA_ARG_EXPR #define TARGET_GIMPLIFY_VA_ARG_EXPR aarch64_gimplify_va_arg_expr