From patchwork Thu Sep 18 11:54:40 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 390772 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 6059514012C for ; Thu, 18 Sep 2014 21:55:05 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; q=dns; s=default; b=VPuFtxTd5wqB5yhxW LAiu4zU+FZKUx6Vt4aXn21voB3WDb2eq07swQCSI1N9O+Q29DVs3ge3SyJF5SYJt zcsu6Qj8TzF+Z+47lZl/QfIJQb8D6VSko5blp9BCiAH6kCO8Y8rDAXonCxucKHZy tKiH1ZyBC8jsIAGpfpnmdW0W/U= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:references :in-reply-to:content-type; s=default; bh=akQtzY7wrvJEe/bYmLli/Gx cB8c=; b=x6mdh2tnmGJtQoDRxbxiPoQPu+oXydBpY4sqPeCOQdjKaiu0cTfv/Mf rwNqz05Q07Wf6SA42knsf0L//+fxSseH8FaI9l+l5HfDX4d18E7o0uam+pDapzpn edlEWOmoKjCM9nNvqCPaLuyuKjBKWEbbxzBfmw/Jtaop2jX+4tIE= Received: (qmail 27230 invoked by alias); 18 Sep 2014 11:54:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27154 invoked by uid 89); 18 Sep 2014 11:54:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Sep 2014 11:54:43 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 18 Sep 2014 12:54:40 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 18 Sep 2014 12:54:40 +0100 Message-ID: <541AC800.2060907@arm.com> Date: Thu, 18 Sep 2014 12:54:40 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [PATCH 3/14] Add new optabs for reducing vectors to scalars References: <541AC4D2.9040901@arm.com> In-Reply-To: <541AC4D2.9040901@arm.com> X-MC-Unique: 114091812544010101 X-IsSubscribed: yes These match their corresponding tree codes, by taking a vector and returning a scalar; this is more architecturally neutral than the (somewhat loosely defined) previous optab that took a vector and returned a vector with the result in the least significant bits (i.e. element 0 for little-endian or N-1 for bigendian). However, the old optabs are preserved so as not to break existing backends, so clients check for both old + new optabs. Bootstrap, check-gcc and check-g++ on x86_64-none-linux-gnu. aarch64.exp + vect.exp on aarch64{,_be}-none-elf. (of course at this point in the series all these are using the old optab + migration path.) gcc/ChangeLog: * doc/md.texi (Standard Names): Add reduc_(plus,[us](min|max))|scal optabs, and note in reduc_[us](plus|min|max) to prefer the former. * expr.c (expand_expr_real_2): Use reduc_..._scal if available, fall back to old reduc_... + BIT_FIELD_REF only if not. * optabs.c (optab_for_tree_code): for REDUC_(MAX,MIN,PLUS)_EXPR, return the reduce-to-scalar (reduc_..._scal) optab. (scalar_reduc_to_vector): New. * optabs.def (reduc_smax_scal_optab, reduc_smin_scal_optab, reduc_plus_scal_optab, reduc_umax_scal_optab, reduc_umin_scal_optab): New. * optabs.h (scalar_reduc_to_vector): Declare. * tree-vect-loop.c (vectorizable_reduction): Look for optabs reducing to either scalar or vector. diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index dd7861188afb8afd01971f9f75f0e32da9f9f826..3f5fd6f0e3ac3fcc30f6c961e3e2709a35f4d413 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4811,29 +4811,48 @@ it is unspecified which of the two operands is returned as the result. @cindex @code{reduc_smax_@var{m}} instruction pattern @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}} Find the signed minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}. @cindex @code{reduc_umin_@var{m}} instruction pattern @cindex @code{reduc_umax_@var{m}} instruction pattern @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}} Find the unsigned minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}. @cindex @code{reduc_splus_@var{m}} instruction pattern -@item @samp{reduc_splus_@var{m}} -Compute the sum of the signed elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 -(also a vector). The output and input vector should have the same modes. - @cindex @code{reduc_uplus_@var{m}} instruction pattern -@item @samp{reduc_uplus_@var{m}} -Compute the sum of the unsigned elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 +@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}} +Compute the sum of the signed/unsigned elements of a vector. The vector is +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same modes. +These are legacy optabs, and platforms should prefer to implement +@samp{reduc_plus_scal_@var{m}}. + +@cindex @code{reduc_smin_scal_@var{m}} instruction pattern +@cindex @code{reduc_smax_scal_@var{m}} instruction pattern +@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}} +Find the signed minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_umin_scal_@var{m}} instruction pattern +@cindex @code{reduc_umax_scal_@var{m}} instruction pattern +@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}} +Find the unsigned minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_plus_scal_@var{m}} instruction pattern +@item @samp{reduc_plus_scal_@var{m}} +Compute the sum of the elements of a vector. The vector is operand 1, and +operand 0 is the scalar result, with mode equal to the mode of the elements of +the input vector. @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} diff --git a/gcc/expr.c b/gcc/expr.c index a293c06489f09586ed56dff1381467401687be45..11930ca121e4e1f3807261a2e5b0ca4f6723176d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9018,21 +9018,39 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: { - op0 = expand_normal (treeop0); - this_optab = optab_for_tree_code (code, type, optab_default); - enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); - temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); - gcc_assert (temp); - /* The tree code produces a scalar result, but (somewhat by convention) - the optab produces a vector with the result in element 0 if - little-endian, or element N-1 if big-endian. So pull the scalar - result out of that element. */ - int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0; - int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode)); - temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp, - target, mode, mode); - gcc_assert (temp); - return temp; + op0 = expand_normal (treeop0); + enum machine_mode vec_mode = GET_MODE (op0); + this_optab = optab_for_tree_code (code, type, optab_default); + + if (optab_handler (this_optab, vec_mode) != CODE_FOR_nothing) + { + struct expand_operand ops[2]; + enum insn_code icode = optab_handler (this_optab, vec_mode); + + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], op0, vec_mode); + if (maybe_expand_insn (icode, 2, ops)) + { + target = ops[0].value; + if (GET_MODE (target) != mode) + return gen_lowpart (tmode, target); + return target; + } + } + /* Fall back to optab with vector result, and then extract scalar. */ + this_optab = scalar_reduc_to_vector (this_optab, type); + temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); + gcc_assert (temp); + /* The tree code produces a scalar result, but (somewhat by convention) + the optab produces a vector with the result in element 0 if + little-endian, or element N-1 if big-endian. So pull the scalar + result out of that element. */ + int index = BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (vec_mode) - 1 : 0; + int bitsize = GET_MODE_BITSIZE (GET_MODE_INNER (vec_mode)); + temp = extract_bit_field (temp, bitsize, bitsize * index, unsignedp, + target, mode, mode); + gcc_assert (temp); + return temp; } case VEC_LSHIFT_EXPR: diff --git a/gcc/optabs.c b/gcc/optabs.c index d6412ec42d7c4908a74d098e80d7038e068ca557..e422bcce18d06a39b26547b510c35858efc2303e 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -505,13 +505,15 @@ optab_for_tree_code (enum tree_code code, const_tree type, return fma_optab; case REDUC_MAX_EXPR: - return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab; + return TYPE_UNSIGNED (type) + ? reduc_umax_scal_optab : reduc_smax_scal_optab; case REDUC_MIN_EXPR: - return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab; + return TYPE_UNSIGNED (type) + ? reduc_umin_scal_optab : reduc_smin_scal_optab; case REDUC_PLUS_EXPR: - return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab; + return reduc_plus_scal_optab; case VEC_LSHIFT_EXPR: return vec_shl_optab; @@ -607,7 +609,22 @@ optab_for_tree_code (enum tree_code code, const_tree type, return unknown_optab; } } - + +optab +scalar_reduc_to_vector (optab unoptab, const_tree type) +{ + switch (unoptab) + { + case reduc_plus_scal_optab: + return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab; + + case reduc_smin_scal_optab: return reduc_smin_optab; + case reduc_umin_scal_optab: return reduc_umin_optab; + case reduc_smax_scal_optab: return reduc_smax_optab; + case reduc_umax_scal_optab: return reduc_umax_optab; + default: return unknown_optab; + } +} /* Expand vector widening operations. diff --git a/gcc/optabs.def b/gcc/optabs.def index b75547006585267d9f5b4f17ba972ba388852cf5..131ea048b012b073345be3b426d4ac8f33061809 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -243,12 +243,20 @@ OPTAB_D (sin_optab, "sin$a2") OPTAB_D (sincos_optab, "sincos$a3") OPTAB_D (tan_optab, "tan$a2") +/* Vector reduction to a scalar. */ +OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a") +OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a") +OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a") +OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a") +OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a") +/* (Old) Vector reduction, returning a vector with the result in one lane. */ OPTAB_D (reduc_smax_optab, "reduc_smax_$a") OPTAB_D (reduc_smin_optab, "reduc_smin_$a") OPTAB_D (reduc_splus_optab, "reduc_splus_$a") OPTAB_D (reduc_umax_optab, "reduc_umax_$a") OPTAB_D (reduc_umin_optab, "reduc_umin_$a") OPTAB_D (reduc_uplus_optab, "reduc_uplus_$a") + OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") diff --git a/gcc/optabs.h b/gcc/optabs.h index 089b15a6fcd261bb15c898f185a157f1257284ba..d9f4900620a13d74fc3dfb1bac9bcb34416012de 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -162,6 +162,11 @@ enum optab_subtype vector shifts and rotates */ extern optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype); +/* Given optab UNOPTAB that reduces a vector to a scalar, find instead the old + optab that produces a vector with the reduction result in one element, + for a tree with type TYPE. */ +extern optab scalar_reduc_to_vector (optab unoptab, const_tree type); + /* The various uses that a comparison can have; used by can_compare_p: jumps, conditional moves, store flag operations. */ enum can_compare_purpose diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 36f51977845bf5ce451564ccd1eb8ad5f39ac8de..d0a29d312bfd9a7eb552d937e3c64cf9b30d558a 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5101,15 +5101,17 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi, epilog_reduc_code = ERROR_MARK; } - - if (reduc_optab - && optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing) + else if (optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "reduc op not supported by target.\n"); + optab = scalar_reduc_to_vector (reduc_optab, vectype_out); + if (optab_handler (optab, vec_mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "reduc op not supported by target.\n"); - epilog_reduc_code = ERROR_MARK; + epilog_reduc_code = ERROR_MARK; + } } } else