From patchwork Thu Sep 25 14:32:49 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Lawrence X-Patchwork-Id: 393356 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 2EA78140180 for ; Fri, 26 Sep 2014 00:33:10 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=gU6pIp1UQGMam9fz7 fzFR2bHpZ4UU8eU6JodYPOcTGr+A4N2kp1HIg6WSUeRq/JWfedgDjPyT3YQ/xnK9 gRy6CO9WfkDRnQrdQFMc1i3KVED0PZ+Im4MxthsyuOlryqbKd0ibJ8mH8P+QyETM NnMmsqTmHWqQRcIiVqu+zArgSw= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=+I7iH35fcwt96E/99/B8cJc CZB8=; b=jmeGGD7hyy/4aewEaRBOT0oYqHGW4fa6lgOVRff1NsF1LZwZAoaiL25 9vopomJuDt+Nf6u71WrTkmlgnrjBDQVCJ0I6i5OgKTO0rE6I4cvH6mlQ65X+8qya 4uDFI9NyvRlquEFgdjcP16Q4Bus0WKpmXbwt5Zx/bp9AazOF+YtY= Received: (qmail 15327 invoked by alias); 25 Sep 2014 14:33:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 15255 invoked by uid 89); 25 Sep 2014 14:33:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 25 Sep 2014 14:32:54 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Thu, 25 Sep 2014 15:32:50 +0100 Received: from [10.1.209.51] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 25 Sep 2014 15:32:50 +0100 Message-ID: <54242791.7010805@arm.com> Date: Thu, 25 Sep 2014 15:32:49 +0100 From: Alan Lawrence User-Agent: Thunderbird 2.0.0.24 (X11/20101213) MIME-Version: 1.0 To: Richard Biener CC: "gcc-patches@gcc.gnu.org" Subject: [PATCH/RFC v2 3/14] Add new optabs for reducing vectors to scalars References: <541AC4D2.9040901@arm.com> <541AC800.2060907@arm.com> <54202379.8010809@arm.com> In-Reply-To: X-MC-Unique: 114092515325000401 X-IsSubscribed: yes Ok, so, I've tried making reduc_plus optab take two modes: that of the vector to reduce, and the result; thus allowing platforms to provide a widening reduction. However, I'm keeping reduc_[us](min|max)_optab with only a single mode, as widening makes no sense there. I've not gone as far as making the vectorizer use any such a widening reduction, however: as previously stated, I'm not really sure what the input source code for that even looks like (maybe in a language other than C?). If we wanted to do a non-widening reduction using such an instruction (by discarding the extra bits), strikes me the platform can/should provide a non-widening optab for that case... Testing: bootstrapped on x86_64 linux + check-gcc; cross-tested aarch64-none-elf check-gcc; cross-tested aarch64_be-none-elf aarch64.exp + vect.exp. So, my feeling is that the extra complexity here doesn't really buy us anything; and that if we do want to support / use widening reductions in the future, we should do so with a separate, reduc_plus_widen... optab, and stick with the original patch/formulation for now. (In other words: this patch is a guide to how I think a dual-mode reduc_plus_optab looks, but I don't honestly like it!). If you agree, I shall transplant the comments on scalar_reduc_to_vector from this patch into the original, and then post that revised version? Cheers, Alan Richard Biener wrote: > On Mon, Sep 22, 2014 at 3:26 PM, Alan Lawrence wrote: >> Richard Biener wrote: >>> >>> scalar_reduc_to_vector misses a comment. >> >> Ok to reuse the comment in optabs.h in optabs.c also? > > Sure. > >>> I wonder if at the end we wouldn't transition all backends and then >>> renaming reduc_*_scal_optab back to reduc_*_optab makes sense. >> >> Yes, that sounds like a plan, the _scal is a bit of a mouthful. >> >>> The optabs have only one mode - I wouldn't be surprised if an ISA >>> invents for example v4si -> di reduction? So do we want to make >>> reduc_plus_scal_optab a little bit more future proof (maybe there >>> is already an ISA that supports this kind of reduction?). >> >> That sounds like a plausible thing for an ISA to do, indeed. However given >> these names are only used by the autovectorizer rather than directly, the >> question is what the corresponding source code looks like, and/or what >> changes to the autovectorizer we might have to make to (look for code to) >> exploit such an instruction. > > Ah, indeed. Would be sth like a REDUC_WIDEN_SUM_EXPR or so. > >> At this point I could go for a >> reduc_{plus,min_max}_scal_ which reduces from the first vector >> mode to the second scalar mode, and then make the vectorizer look only for >> cases where the second mode was the element type of the first; but I'm not >> sure I want to do anything more complicated than that at this stage. >> (However, indeed it would leave the possibility open for the future.) > > Yeah, agreed. For the min/max case a widen variant isn't useful anyway. > > Thanks, > Richard. > >> --Alan >> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 80e8bd6a079b8bf77ef396643aaba512cf83b317..0a9381fc3a26cdaad02e6f837b94c7738daa3a7f 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4783,29 +4783,49 @@ it is unspecified which of the two operands is returned as the result. @cindex @code{reduc_smax_@var{m}} instruction pattern @item @samp{reduc_smin_@var{m}}, @samp{reduc_smax_@var{m}} Find the signed minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_smin_scal_@var{m}} and @samp{reduc_smax_scal_@var{m}}. @cindex @code{reduc_umin_@var{m}} instruction pattern @cindex @code{reduc_umax_@var{m}} instruction pattern @item @samp{reduc_umin_@var{m}}, @samp{reduc_umax_@var{m}} Find the unsigned minimum/maximum of the elements of a vector. The vector is -operand 1, and the scalar result is stored in the least significant bits of +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same -modes. +modes. These are legacy optabs, and platforms should prefer to implement +@samp{reduc_umin_scal_@var{m}} and @samp{reduc_umax_scal_@var{m}}. @cindex @code{reduc_splus_@var{m}} instruction pattern -@item @samp{reduc_splus_@var{m}} -Compute the sum of the signed elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 -(also a vector). The output and input vector should have the same modes. - @cindex @code{reduc_uplus_@var{m}} instruction pattern -@item @samp{reduc_uplus_@var{m}} -Compute the sum of the unsigned elements of a vector. The vector is operand 1, -and the scalar result is stored in the least significant bits of operand 0 +@item @samp{reduc_splus_@var{m}}, @samp{reduc_uplus_@var{m}} +Compute the sum of the signed/unsigned elements of a vector. The vector is +operand 1, and the result is stored in the least significant bits of operand 0 (also a vector). The output and input vector should have the same modes. +These are legacy optabs, and platforms should prefer to implement +@samp{reduc_plus_scal_@var{m}@var{n}}. + +@cindex @code{reduc_smin_scal_@var{m}} instruction pattern +@cindex @code{reduc_smax_scal_@var{m}} instruction pattern +@item @samp{reduc_smin_scal_@var{m}}, @samp{reduc_smax_scal_@var{m}} +Find the signed minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_umin_scal_@var{m}} instruction pattern +@cindex @code{reduc_umax_scal_@var{m}} instruction pattern +@item @samp{reduc_umin_scal_@var{m}}, @samp{reduc_umax_scal_@var{m}} +Find the unsigned minimum/maximum of the elements of a vector. The vector is +operand 1, and operand 0 is the scalar result, with mode equal to the mode of +the elements of the input vector. + +@cindex @code{reduc_plus_scal_@var{m}@var{n}} instruction pattern +@item @samp{reduc_plus_scal_@var{m}@var{n}} +Compute the sum of the elements of a vector. The vector, of mode @var{m}, is +operand 1, and operand 0 is the scalar result, of mode @var{n}. Note that at +present the vectorizer only looks for patterns where @var{n} is the mode of the +elements of @var{m}. @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} diff --git a/gcc/expr.c b/gcc/expr.c index c7920282416747ab41afcd47179d4ed92d8fbc23..4bd5a3f248c7de487586abbae677770359098ecb 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -9045,6 +9045,23 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); enum machine_mode vec_mode = TYPE_MODE (TREE_TYPE (treeop0)); + enum insn_code icode = reduction_optab_handler (this_optab, vec_mode); + if (icode != CODE_FOR_nothing) + { + struct expand_operand ops[2]; + + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], op0, vec_mode); + if (maybe_expand_insn (icode, 2, ops)) + { + target = ops[0].value; + if (GET_MODE (target) != mode) + return gen_lowpart (tmode, target); + return target; + } + } + /* Fall back to optab with vector result, and then extract scalar. */ + this_optab = scalar_reduc_to_vector (this_optab, type); temp = expand_unop (vec_mode, this_optab, op0, NULL_RTX, unsignedp); gcc_assert (temp); /* The tree code produces a scalar result, but (somewhat by convention) diff --git a/gcc/optabs.c b/gcc/optabs.c index 605615d7458e794995dfd27d7fdf39e37baa910a..722fc1230b119fd78b1cb2074f96f56d24982fbb 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -506,13 +506,15 @@ optab_for_tree_code (enum tree_code code, const_tree type, return fma_optab; case REDUC_MAX_EXPR: - return TYPE_UNSIGNED (type) ? reduc_umax_optab : reduc_smax_optab; + return TYPE_UNSIGNED (type) + ? reduc_umax_scal_optab : reduc_smax_scal_optab; case REDUC_MIN_EXPR: - return TYPE_UNSIGNED (type) ? reduc_umin_optab : reduc_smin_optab; + return TYPE_UNSIGNED (type) + ? reduc_umin_scal_optab : reduc_smin_scal_optab; case REDUC_PLUS_EXPR: - return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab; + return reduc_plus_scal_optab; case VEC_LSHIFT_EXPR: return vec_shl_optab; @@ -608,7 +610,49 @@ optab_for_tree_code (enum tree_code code, const_tree type, return unknown_optab; } } - + +/* Given optab UNOPTAB that reduces a vector to a scalar, find instead the old + optab that produces a vector with the reduction result in one element, + for a tree with type TYPE. */ + +optab +scalar_reduc_to_vector (optab unoptab, const_tree type) +{ + switch (unoptab) + { + case reduc_plus_scal_optab: + return TYPE_UNSIGNED (type) ? reduc_uplus_optab : reduc_splus_optab; + + case reduc_smin_scal_optab: return reduc_smin_optab; + case reduc_umin_scal_optab: return reduc_umin_optab; + case reduc_smax_scal_optab: return reduc_smax_optab; + case reduc_umax_scal_optab: return reduc_umax_optab; + default: return unknown_optab; + } +} + +/* Given reduction optab OPTAB, find the handler that reduces a vector of mode + VEC_MODE to a scalar of mode the same as the vector elements. */ + +insn_code +reduction_optab_handler (optab optab, enum machine_mode vec_mode) +{ + gcc_assert (VECTOR_MODE_P (vec_mode)); + switch (optab) + { + case reduc_plus_scal_optab: + /* Optab allows for the scalar result to be different/wider than the + mode of the vector elements. However we don't yet exploit this. */ + return convert_optab_handler (optab, vec_mode, GET_MODE_INNER (vec_mode)); + case reduc_smin_scal_optab: + case reduc_umin_scal_optab: + case reduc_smax_scal_optab: + case reduc_umax_scal_optab: + return optab_handler (optab, vec_mode); + default: + return CODE_FOR_nothing; + } +} /* Expand vector widening operations. diff --git a/gcc/optabs.def b/gcc/optabs.def index b75547006585267d9f5b4f17ba972ba388852cf5..26eea26df73f416319afe1c7f9ac74f5c8ef48df 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -61,6 +61,9 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") OPTAB_CD(vcond_optab, "vcond$a$b") OPTAB_CD(vcondu_optab, "vcondu$a$b") +/* Vector reduction to a scalar, possibly widening. The second mode is for the + result, usually (but possibly wider than) the elements of the mode input. */ +OPTAB_CD (reduc_plus_scal_optab, "reduc_plus_scal_$a$b") OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc) OPTAB_NX(add_optab, "add$F$a3") @@ -243,12 +246,19 @@ OPTAB_D (sin_optab, "sin$a2") OPTAB_D (sincos_optab, "sincos$a3") OPTAB_D (tan_optab, "tan$a2") +/* Vector reduction to a scalar. */ +OPTAB_D (reduc_smax_scal_optab, "reduc_smax_scal_$a") +OPTAB_D (reduc_smin_scal_optab, "reduc_smin_scal_$a") +OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a") +OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a") +/* (Old) Vector reduction, returning a vector with the result in one lane. */ OPTAB_D (reduc_smax_optab, "reduc_smax_$a") OPTAB_D (reduc_smin_optab, "reduc_smin_$a") OPTAB_D (reduc_splus_optab, "reduc_splus_$a") OPTAB_D (reduc_umax_optab, "reduc_umax_$a") OPTAB_D (reduc_umin_optab, "reduc_umin_$a") OPTAB_D (reduc_uplus_optab, "reduc_uplus_$a") + OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") OPTAB_D (udot_prod_optab, "udot_prod$I$a") diff --git a/gcc/optabs.h b/gcc/optabs.h index 089b15a6fcd261bb15c898f185a157f1257284ba..10d080ef9347fc6e2b7d92d099a7b51a6b7eb1a0 100644 --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -162,6 +162,15 @@ enum optab_subtype vector shifts and rotates */ extern optab optab_for_tree_code (enum tree_code, const_tree, enum optab_subtype); +/* Given an optab that reduces a vector to a scalar, find instead the old + optab that produces a vector with the reduction result in one element, + for a tree with the specified type. */ +extern optab scalar_reduc_to_vector (optab, const_tree type); + +/* Given an optab that reduces a vector to a scalar, find the handler for the + specified vector mode. */ +extern insn_code reduction_optab_handler (optab, enum machine_mode); + /* The various uses that a comparison can have; used by can_compare_p: jumps, conditional moves, store flag operations. */ enum can_compare_purpose diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 8d97e176446f0f6963ecc443f1db0a84ebf2b169..89036e76ae2835bb22f2f3a51d20f1288e26f6db 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -5102,16 +5102,21 @@ vectorizable_reduction (gimple stmt, gimple_stmt_iterator *gsi, epilog_reduc_code = ERROR_MARK; } - - if (reduc_optab - && optab_handler (reduc_optab, vec_mode) == CODE_FOR_nothing) + else { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "reduc op not supported by target.\n"); + if (!reduction_optab_handler (reduc_optab, vec_mode)) + { + optab = scalar_reduc_to_vector (reduc_optab, vectype_out); + if (optab_handler (optab, vec_mode) == CODE_FOR_nothing) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "reduc op not supported by target.\n"); - epilog_reduc_code = ERROR_MARK; - } + epilog_reduc_code = ERROR_MARK; + } + } + } } else {