Vector Comparison patch

Message ID	CABYV9SWByyvdWo00G2jSROehUeJcux=3rTJVez2XHFL15g+Ewg@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-300228-incoming=patchwork.ozlabs.org@gcc.gnu.org> X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 97882B6F71 for <incoming@patchwork.ozlabs.org>; Mon, 22 Aug 2011 08:54:41 +1000 (EST) Received: (qmail 21728 invoked by alias); 21 Aug 2011 22:54:39 -0000 Received: (qmail 21717 invoked by uid 22791); 21 Aug 2011 22:54:35 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, TW_FG, TW_TM X-Spam-Check-By: sourceware.org Received: from mail-qy0-f182.google.com (HELO mail-qy0-f182.google.com) (209.85.216.182) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 21 Aug 2011 22:54:15 +0000 Received: by qyk9 with SMTP id 9so1987075qyk.20 for <gcc-patches@gcc.gnu.org>; Sun, 21 Aug 2011 15:54:12 -0700 (PDT) Received: by 10.229.87.137 with SMTP id w9mr885510qcl.284.1313967252256; Sun, 21 Aug 2011 15:54:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.120.82 with HTTP; Sun, 21 Aug 2011 15:53:52 -0700 (PDT) In-Reply-To: <CAFiYyc3b1tKoXvCbzW+1mpd1-bqB-4NnSJLodUXGuCAJMeET8A@mail.gmail.com> References: <CABYV9SVekz5BkOpm42_ABnnPLNp3omCjWCt=1ebH=N=aBkmTMQ@mail.gmail.com> <CAFiYyc3ZEka=HNO63yzkP0wdHLTLbchHpFDAur2SoFGGCHjOEQ@mail.gmail.com> <CABYV9SVLJVvEeuK-bQ4t6ipBmNLtns5wjQiQAVqYL5t9Zfir7Q@mail.gmail.com> <CAFiYyc3RuMr4LrFY8XrPoetinSBXQgo63892kuq+LxUFSJtLqQ@mail.gmail.com> <CABYV9SW5QixOdKC+=WKfRrCp9e=jGBmNcw+9fuYNjOowOm0=3g@mail.gmail.com> <CABYV9SXu444+qJmXJDbR-nh-EpK-Ahu568U5f5sBa9B7C+z+bQ@mail.gmail.com> <CAFiYyc1y64uPKc7K3B0=5RW85Fmzgu3_NMOHWxzubedTL2oJBA@mail.gmail.com> <CABYV9SW-VK5XHuLFvXNYsFk1FhJFqRBtiz6ztSSNpNyhvxeECQ@mail.gmail.com> <CAFiYyc2k1BxC4=0O=fDxYCxLw8DDBuFqZOxhTk==DCQTzyqfEw@mail.gmail.com> <CABYV9SXJT6qx7Ln6dgDNmFWdZOVp_ueVU=SH0mhCsa9xMovKYA@mail.gmail.com> <CABYV9SUgzQNs78tsHAdumSacgi+9=1M=wLnFcdE319YvnuH71w@mail.gmail.com> <CAFiYyc2zWd7WikQ8X+5jdB9GUignVfk-N7BqbD4ggyo486S8ww@mail.gmail.com> <4E4D224F.1020206@redhat.com> <CABYV9SVn9OBiY-S2K0peSVmEOi_ZVNXn091yUcnj6K8q4homQA@mail.gmail.com> <CAFiYyc3rFjQPD69gUZynykYZvAEWjQ5ucz+6pkJYtC3jr9Mp-w@mail.gmail.com> <CABYV9SXotm+_4eOQTgZtAAx9JC_6EH=Pd_5tFYesKZNuuU+haw@mail.gmail.com> <CAFiYyc3b1tKoXvCbzW+1mpd1-bqB-4NnSJLodUXGuCAJMeET8A@mail.gmail.com> From: Artem Shinkarov <artyom.shinkaroff@gmail.com> Date: Sun, 21 Aug 2011 23:53:52 +0100 Message-ID: <CABYV9SWByyvdWo00G2jSROehUeJcux=3rTJVez2XHFL15g+Ewg@mail.gmail.com> Subject: Re: Vector Comparison patch To: Richard Guenther <richard.guenther@gmail.com> Cc: Richard Henderson <rth@redhat.com>, gcc-patches@gcc.gnu.org, "Joseph S. Myers" <joseph@codesourcery.com> Content-Type: multipart/mixed; boundary=0016368324ec51378e04ab0bd774 X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: <gcc-patches.gcc.gnu.org> List-Unsubscribe: <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org> List-Archive: <http://gcc.gnu.org/ml/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-help@gcc.gnu.org> Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org

On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>>>> Richard >>>>>>>>>>> >>>>>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>>>>>> the several important approaches that I use in the patch. >>>>>>>>>>> >>>>>>>>>>> So how does it work. >>>>>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>>>>>> v1, {-1}, {0}>. >>>>>>>>>>> >>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>>>>> >>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>>>>> >>>>>>>>>>> So we end-up with the following functionality: >>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>>>>>> mask != {0}. >>>>>>>>>>> >>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>>>>> which correctly expands, without creating useless masking. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Basically for me there are two questions: >>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>>>>>> But first is it conceptually fine. >>>>>>>>>>> >>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>>>>> complicated. >>>>>>>>>> >>>>>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>>>>> separates comparison and ?: expression it's his own fault. >>>>>>>>> >>>>>>>>> Well, because all the information is there, and I perfectly envision >>>>>>>>> the case when user expressed comparison separately, just to avoid code >>>>>>>>> duplication. >>>>>>>>> >>>>>>>>> Like: >>>>>>>>> mask = a > b; >>>>>>>>> res1 = mask ? v0 : v1; >>>>>>>>> res2 = mask ? v2 : v3; >>>>>>>>> >>>>>>>>> Which in this case would be different from >>>>>>>>> res1 = a > b ? v0 : v1; >>>>>>>>> res2 = a > b ? v2 : v3; >>>>>>>>> >>>>>>>>>> Btw, the new hook is still in the patch. >>>>>>>>>> >>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>>>>> non-comparison operands during expansion though, but if >>>>>>>>>> you always forced a != 0 from the start you can then simply >>>>>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>>>>> modify the backends to have an alternate pattern or directly >>>>>>>>>> expand to masking operations - using the fake comparison >>>>>>>>>> RTX is too much of a hack). >>>>>>>>> >>>>>>>>> Richard, I think you didn't get the problem. >>>>>>>>> I really need the way, to pass the information, that the expression >>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>>>>> better way to do that. I could for example introduce another >>>>>>>>> tree-node, but it would be overkill as well. >>>>>>>>> >>>>>>>>> Now why do I need it so much: >>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>>>>> tree-vect-generic. >>>>>>>>> >>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>>>>> mask. >>>>>>>>> >>>>>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>>>>> function-comparison, or somehow else? >>>>>>>>> >>>>>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>>>>> >>>>>>>> Well, there is no problem in having the only valid mask operand for >>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>>>>> vec1 : vec2. >>>>>>> >>>>>>> This happens already in the new version of patch (not submitted yet). >>>>>>> >>>>>>>> This comparison can be eliminated by optimization passes >>>>>>>> that >>>>>>>> either replace it by the real comparison computing the mask or just >>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>>>>>> dropping the comparison against zero. >>>>>>> >>>>>>> This is not a problem, because the backend recognizes these patterns, >>>>>>> so no optimization is needed in this part. >>>>>> >>>>>> I mean for >>>>>> >>>>>> mask = v1 < v2 ? {-1,...} : {0,...}; >>>>>> val = VCOND_EXPR <mask != 0, v3, v4>; >>>>>> >>>>>> optimizers can see how mask is defined and drop the != 0 test or replace >>>>>> it by v1 < v2. >>>>> >>>>> Yes, sure. >>>>> >>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison >>>>>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>>>>> selection - what matters is the C frontend semantics which we need to >>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>>>>> do not have to agree). >>>>>>> >>>>>>> But it seems like another combinatorial explosion here. Considering >>>>>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>>>>> we just need to enumerate all the possible cases. Or I didn't >>>>>>> understand right? >>>>>> >>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>>>>> semantically does for a non-comparison operand. I'd argue that using >>>>>> the bitwise selection semantic gives us maximum flexibility and a native >>>>>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>>>>> also easy to implement in the middle-end expansion code if there is >>>>>> no native instruction for it - by simply emitting the bitwise operations. >>>>>> >>>>>> But I have the feeling we are talking past each other ...? >>>>> >>>>> I am all for the bitwise behaviour in the backend pattern, that is >>>>> something that I rely on at the moment. What I don't want to have is >>>>> the same behaviour in the frontend. So If we can guarantee, that we >>>>> add != 0, when we don't know the "nature" of the mask, then I am >>>>> perfectly fine with the back-end having bitwise-selection behaviour. >>>> >>>> Well, the C frontend would simply always add that != 0 (because it >>>> doesn't know). >>>> >>>>>>> I mean, I don't mind of course, but it seems to me that it would be >>>>>>> cleaner to have one generic enough pattern. >>>>>>> >>>>>>> Is there seriously no way to pass something from optab into the backend?? >>>>>> >>>>>> You can pass operands. And information is implicitly encoded in the name. >>>>> >>>>> I didn't quite get that, could you give an example? >>>> >>>> It was a larger variant of "no, apart from what is obvious". >>> >>> Ha, joking again. :) >>> >>>>>>>> If the mask is computed by a function you are of course out of luck, >>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>>>>>> >>>>>>> Well, take simpler example >>>>>>> >>>>>>> a = {0}; >>>>>>> for ( ; *p; p += 16) >>>>>>> a &= pattern > (vec)*p; >>>>>>> >>>>>>> res = a ? v0 : v1; >>>>>>> >>>>>>> In this case it is simple to analyse that a is a comparison, but you >>>>>>> cannot embed the operations of a into VEC_COND_EXPR. >>>>>> >>>>>> Sure, but if the above is C source the frontend would generate >>>>>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>>>>> vector contents though). >>>>> >>>>> Yeah, sure. My point is, that we must be able to pass this information >>>>> in the backend, that we checked everything, and we are sure that a is >>>>> a corerct mask, please don't add any != 0 to it. >>>> >>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR. >>>> That's the whole point of the bitwise semantics. It's only the C frontend >>>> that needs to be careful to impose its stricter semantics. >>>> >>>> Richard. >>>> >>> >>> Ok, I see the last difference in the approaches we envision. >>> I am assuming, that frontend does not put != 0, but the later >>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR >>> and does the same functionality as you describe. So the philosophical >>> question why it is better to first add and then remove, rather than >>> just add if needed? >> >> Well, it's "better be right than sorry". Thus, default to the >> conservatively correct >> way and let optimizers "optimize" it. > > How can we get sorry, it is impossible to skip the vcond during the > optimisation, but whatever, it is not really so important when to add. > Currently I have a bigger problem, see below. >> >>> In all the rest I think we agreed. >> >> Fine. >> >> Thanks, >> Richard. >> >>> >>> Artem. >>> >> > > I found out that I cannot really gimplify correctly the vcond<a >b , > c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is > that gimplifier pulls a > b always as a separate expression during the > gimplification, and I don't think that we can avoid it. So what > happens is: > > vcond <a > b , c , d> > transformed to > x = b > c; > x1 = vcond <x , -1, 0> > vcond <x1, c, d> > > and so on, infinitely long. Sounds like a bug that is possible to fix. > In order to fix the problem we need whether to introduce a new code > like VEC_COMP_LT, VEC_COMP_GT, and so on > whether a builtin function which we would lower > whether stick back to the idea of hook. > > Anyway, representing a >b using vcond does not work. Well, sure it will work, it just needs some work appearantly. > What would be your thinking here? Do you have a patch that exposes this problem? I can have a look tomorrow. Richard. > > Thanks, > Artem. >

Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 177665) +++ gcc/doc/tm.texi (working copy) @@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn +@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code}) +This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type}) Target builtin that implements vector permute. @end deftypefn Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 177665) +++ gcc/doc/tm.texi.in (working copy) @@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn +@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE + @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM Target builtin that implements vector permute. @end deftypefn Index: gcc/targhooks.c =================================================================== --- gcc/targhooks.c (revision 177665) +++ gcc/targhooks.c (working copy) @@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab return true; } +/* Replaces vector comparison with the target-specific instructions + and returns the resulting variable or NULL_TREE otherwise. */ +tree +default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, + tree type ATTRIBUTE_UNUSED, + tree v0 ATTRIBUTE_UNUSED, + tree v1 ATTRIBUTE_UNUSED, + enum tree_code code ATTRIBUTE_UNUSED) +{ + return NULL_TREE; +} + /* By default, assume that a target supports any factor of misalignment memory access if it supports movmisalign patten. is_packed is true if the memory access is defined in a packed struct. */ Index: gcc/targhooks.h =================================================================== --- gcc/targhooks.h (revision 177665) +++ gcc/targhooks.h (working copy) @@ -86,6 +86,11 @@ extern int default_builtin_vectorization extern tree default_builtin_reciprocal (unsigned int, bool, bool); extern bool default_builtin_vector_alignment_reachable (const_tree, bool); + +extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, + tree type, tree v0, tree v1, + enum tree_code code); + extern bool default_builtin_support_vector_misalignment (enum machine_mode mode, const_tree, Index: gcc/target.def =================================================================== --- gcc/target.def (revision 177665) +++ gcc/target.def (working copy) @@ -988,6 +988,15 @@ DEFHOOK bool, (tree vec_type, tree mask), hook_bool_tree_tree_true) +/* Implement hardware vector comparison or return false. */ +DEFHOOK +(builtin_vec_compare, + "This hook should check whether it is possible to express vector\ +comparison using the hardware-specific instructions and return result\ +tree. Hook should return NULL_TREE if expansion is impossible.", + tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code), + default_builtin_vec_compare) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 177665) +++ gcc/optabs.c (working copy) @@ -6572,16 +6572,37 @@ expand_vec_cond_expr (tree vec_cond_type if (icode == CODE_FOR_nothing) return 0; - comparison = vector_compare_rtx (op0, unsignedp, icode); rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); + + if (COMPARISON_CLASS_P (op0)) + { + comparison = vector_compare_rtx (op0, unsignedp, icode); + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], rtx_op1, mode); + create_input_operand (&ops[2], rtx_op2, mode); + create_fixed_operand (&ops[3], comparison); + create_fixed_operand (&ops[4], XEXP (comparison, 0)); + create_fixed_operand (&ops[5], XEXP (comparison, 1)); + + } + else + { + rtx rtx_op0; + rtx vec; + + rtx_op0 = expand_normal (op0); + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); + vec = CONST0_RTX (mode); + + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], rtx_op1, mode); + create_input_operand (&ops[2], rtx_op2, mode); + create_input_operand (&ops[3], comparison, mode); + create_input_operand (&ops[4], rtx_op0, mode); + create_input_operand (&ops[5], vec, mode); + } - create_output_operand (&ops[0], target, mode); - create_input_operand (&ops[1], rtx_op1, mode); - create_input_operand (&ops[2], rtx_op2, mode); - create_fixed_operand (&ops[3], comparison); - create_fixed_operand (&ops[4], XEXP (comparison, 0)); - create_fixed_operand (&ops[5], XEXP (comparison, 1)); expand_insn (icode, 6, ops); return ops[0].value; } Index: gcc/target.h =================================================================== --- gcc/target.h (revision 177665) +++ gcc/target.h (working copy) @@ -51,6 +51,7 @@ #define GCC_TARGET_H #include "insn-modes.h" +#include "gimple.h" #ifdef ENABLE_CHECKING Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 177665) +++ gcc/fold-const.c (working copy) @@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t } /* Return a node which has the indicated constant VALUE (either 0 or - 1), and is of the indicated TYPE. */ + 1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), + and is of the indicated TYPE. */ tree constant_boolean_node (int value, tree type) { - if (type == integer_type_node) + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value); + return build_vector_from_val (type, tval); + } + else if (type == integer_type_node) return value ? integer_one_node : integer_zero_node; else if (type == boolean_type_node) return value ? boolean_true_node : boolean_false_node; @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr floating-point, we can only do some of these simplifications.) */ if (operand_equal_p (arg0, arg1, 0)) { + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; + tree arg0_type = TREE_TYPE (arg0); + switch (code) { case EQ_EXPR: - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) - return constant_boolean_node (1, type); + if (! FLOAT_TYPE_P (arg0_type) + || ! HONOR_NANS (TYPE_MODE (arg0_type))) + return constant_boolean_node (true_val, type); break; case GE_EXPR: case LE_EXPR: - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) - return constant_boolean_node (1, type); + if (! FLOAT_TYPE_P (arg0_type) + || ! HONOR_NANS (TYPE_MODE (arg0_type))) + return constant_boolean_node (true_val, type); return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1); case NE_EXPR: /* For NE, we can only do this simplification if integer or we don't honor IEEE floating point NaNs. */ - if (FLOAT_TYPE_P (TREE_TYPE (arg0)) - && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) + if (FLOAT_TYPE_P (arg0_type) + && HONOR_NANS (TYPE_MODE (arg0_type))) break; /* ... fall through ... */ case GT_EXPR: Index: gcc/c-typeck.c =================================================================== --- gcc/c-typeck.c (revision 177665) +++ gcc/c-typeck.c (working copy) @@ -4058,6 +4058,94 @@ build_conditional_expr (location_t colon type2 = TREE_TYPE (op2); code2 = TREE_CODE (type2); + if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE) + { + bool maybe_const = true; + tree sc; + + if (TREE_CODE (type1) != VECTOR_TYPE + || TREE_CODE (type2) != VECTOR_TYPE) + { + error_at (colon_loc, "vector comparison arguments must be of " + "type vector"); + return error_mark_node; + } + + if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE) + { + error_at (colon_loc, "non-integer type in vector condition"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2) + || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp)) + != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (colon_loc, "vectors of different length found in " + "vector comparison"); + return error_mark_node; + } + + if (TREE_TYPE (type1) != TREE_TYPE (type2)) + { + error_at (colon_loc, "vectors of different types involved in " + "vector comparison"); + return error_mark_node; + } + + if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) + != TYPE_SIZE (TREE_TYPE (type1))) + { + error_at (colon_loc, "vector-condition element type must be " + "the same as result vector element type"); + return error_mark_node; + } + + /* Avoid C_MAYBE_CONST in VEC_COND_EXPR. */ + sc = c_fully_fold (ifexp, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + ifexp = c_wrap_maybe_const (sc, true); + else + ifexp = sc; + + sc = c_fully_fold (op1, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + op1 = c_wrap_maybe_const (sc, true); + else + op1 = sc; + + sc = c_fully_fold (op2, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + op2 = c_wrap_maybe_const (sc, true); + else + op2 = sc; + + /* Currently the expansion of VEC_COND_EXPR does not allow + expessions where the type of vectors you compare differs + form the type of vectors you select from. For the time + being we insert implicit conversions. */ + if ((COMPARISON_CLASS_P (ifexp) + && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1) + || TREE_TYPE (ifexp) != type1) + { + tree comp_type = COMPARISON_CLASS_P (ifexp) + ? TREE_TYPE (TREE_OPERAND (ifexp, 0)) + : TREE_TYPE (ifexp); + tree vcond; + + op1 = convert (comp_type, op1); + op2 = convert (comp_type, op2); + vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2); + vcond = convert (type1, vcond); + return vcond; + } + else + return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2); + } + /* C90 does not permit non-lvalue arrays in conditional expressions. In C99 they will be pointers by now. */ if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE) @@ -9906,6 +9995,37 @@ build_binary_op (location_t location, en case EQ_EXPR: case NE_EXPR: + if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) + { + tree intt; + if (TREE_TYPE (type0) != TREE_TYPE (type1)) + { + error_at (location, "comparing vectors with different " + "element types"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (location, "comparing vectors with different " + "number of elements"); + return error_mark_node; + } + + /* Always construct signed integer vector type. */ + intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0); + result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + converted = 1; + /*break; */ + + ret = build3 (VEC_COND_EXPR, result_type, + build2 (code, result_type, op0, op1), + build_vector_from_val (result_type, + build_int_cst (intt, -1)), + build_vector_from_val (result_type, + build_int_cst (intt, 0))); + goto return_build_binary_op; + } if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1)) warning_at (location, OPT_Wfloat_equal, @@ -10018,6 +10138,37 @@ build_binary_op (location_t location, en case GE_EXPR: case LT_EXPR: case GT_EXPR: + if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) + { + tree intt; + if (TREE_TYPE (type0) != TREE_TYPE (type1)) + { + error_at (location, "comparing vectors with different " + "element types"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (location, "comparing vectors with different " + "number of elements"); + return error_mark_node; + } + + /* Always construct signed integer vector type. */ + intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0); + result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + converted = 1; + /* break; */ + ret = build3 (VEC_COND_EXPR, result_type, + build2 (code, result_type, op0, op1), + build_vector_from_val (result_type, + build_int_cst (intt, -1)), + build_vector_from_val (result_type, + build_int_cst (intt, 0))); + goto return_build_binary_op; + + } build_type = integer_type_node; if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == FIXED_POINT_TYPE) @@ -10425,6 +10576,10 @@ c_objc_common_truthvalue_conversion (loc case FUNCTION_TYPE: gcc_unreachable (); + case VECTOR_TYPE: + error_at (location, "used vector type where scalar is required"); + return error_mark_node; + default: break; } Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c (revision 177665) +++ gcc/gimplify.c (working copy) @@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq } break; + case VEC_COND_EXPR: + { + enum gimplify_status r0, r1, r2; + + r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, + post_p, is_gimple_condexpr, fb_rvalue); + r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p, + post_p, is_gimple_val, fb_rvalue); + r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p, + post_p, is_gimple_val, fb_rvalue); + recalculate_side_effects (*expr_p); + + ret = MIN (r0, MIN (r1, r2)); + } + break; + case TARGET_MEM_REF: { enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE; @@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq { tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1)); + /* Vector comparisons is a valid gimple expression + which could be lowered down later. */ + if (TREE_CODE (type) == VECTOR_TYPE) + goto expr_2; + if (!AGGREGATE_TYPE_P (type)) { tree org_type = TREE_TYPE (*expr_p); Index: gcc/tree.def =================================================================== --- gcc/tree.def (revision 177665) +++ gcc/tree.def (working copy) @@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_ The others are allowed only for integer (or pointer or enumeral) or real types. In all cases the operands will have the same type, - and the value is always the type used by the language for booleans. */ + and the value is either the type used by the language for booleans + or an integer vector type of the same size and with the same number + of elements as the comparison operands. True for a vector of + comparison results has all bits set while false is equal to zero. */ DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2) DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2) DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2) Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 177665) +++ gcc/emit-rtl.c (working copy) @@ -5474,6 +5474,11 @@ gen_const_vector (enum machine_mode mode return tem; } +rtx +gen_const_vector1 (enum machine_mode mode, int constant) +{ + return gen_const_vector (mode, constant); +} /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when all elements are zero, and the one vector when all elements are one. */ rtx Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c (revision 177665) +++ gcc/tree-vect-generic.c (working copy) @@ -30,11 +30,16 @@ along with GCC; see the file COPYING3. #include "tree-pass.h" #include "flags.h" #include "ggc.h" +#include "target.h" /* Need to include rtl.h, expr.h, etc. for optabs. */ #include "expr.h" #include "optabs.h" + +static void expand_vector_operations_1 (gimple_stmt_iterator *); + + /* Build a constant of type TYPE, made of VALUE's bits replicated every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision. */ static tree @@ -125,6 +130,21 @@ do_binop (gimple_stmt_iterator *gsi, tre return gimplify_build2 (gsi, code, inner_type, a, b); } + +/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0; */ +static tree +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, + tree bitpos, tree bitsize, enum tree_code code) +{ + tree cond; + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); + cond = gimplify_build2 (gsi, code, inner_type, a, b); + return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, + build_int_cst (inner_type, -1), + build_int_cst (inner_type, 0)); +} + /* Expand vector addition to scalars. This does bit twiddling in order to increase parallelism: @@ -333,6 +353,24 @@ uniform_vector_p (tree vec) return NULL_TREE; } +/* Try to expand vector comparison expression OP0 CODE OP1 using + builtin_vec_compare hardware hook, in case target does not + support comparison of type TYPE, extract comparison piecewise. + GSI is used inside the target hook to create the code needed + for the given comparison. */ +static tree +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, + tree op1, enum tree_code code) +{ + tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code); + + if (t == NULL_TREE) + t = expand_vector_piecewise (gsi, do_compare, type, + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); + return t; + +} + static tree expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type, gimple assign, enum tree_code code) @@ -375,8 +413,24 @@ expand_vector_operation (gimple_stmt_ite case BIT_NOT_EXPR: return expand_vector_parallel (gsi, do_unop, type, gimple_assign_rhs1 (assign), - NULL_TREE, code); - + NULL_TREE, code); + case EQ_EXPR: + case NE_EXPR: + case GT_EXPR: + case LT_EXPR: + case GE_EXPR: + case LE_EXPR: + case UNEQ_EXPR: + case UNGT_EXPR: + case UNLT_EXPR: + case UNGE_EXPR: + case UNLE_EXPR: + case LTGT_EXPR: + case ORDERED_EXPR: + case UNORDERED_EXPR: + return expand_vector_comparison (gsi, type, + gimple_assign_rhs1 (assign), + gimple_assign_rhs2 (assign), code); default: break; } @@ -432,6 +486,122 @@ type_for_widest_vector_mode (enum machin } } + + +/* Expand vector condition EXP which should have the form + VEC_COND_EXPR<cond, vec0, vec1> into the following + vector: + {cond[i] != 0 ? vec0[i] : vec1[i], ... } + i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)). */ +static tree +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) +{ + tree cond = TREE_OPERAND (exp, 0); + tree vec0 = TREE_OPERAND (exp, 1); + tree vec1 = TREE_OPERAND (exp, 2); + tree type = TREE_TYPE (vec0); + tree lhs, rhs, notmask; + tree var, new_rhs; + optab op = NULL; + gimple new_stmt; + gimple_stmt_iterator gsi_tmp; + tree t; + + if (!COMPARISON_CLASS_P (cond)) + cond = build2 (EQ_EXPR, TREE_TYPE (cond), cond, + build_vector_from_val (TREE_TYPE (cond), + build_int_cst (TREE_TYPE (TREE_TYPE (cond)), -1))); + + /* Expand vector condition inside of VEC_COND_EXPR. */ + op = optab_for_tree_code (TREE_CODE (cond), type, optab_default); + if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing) + { + var = create_tmp_reg (TREE_TYPE (cond), "cond"); + new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond), + TREE_OPERAND (cond, 0), + TREE_OPERAND (cond, 1), + TREE_CODE (cond)); + new_stmt = gimple_build_assign (var, new_rhs); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + update_stmt (gsi_stmt (*gsi)); + } + else + var = cond; + + gsi_tmp = *gsi; + gsi_prev (&gsi_tmp); + + /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask)) */ + lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0); + notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var); + rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs); + + /* Run vecower on the expresisons we have introduced. */ + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) + expand_vector_operations_1 (&gsi_tmp); + + return t; +} + +#define pp(x) fprintf (stderr, "-- %s\n", x) + +static bool +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) +{ + tree type = TREE_TYPE (expr); + + if (TREE_CODE (expr) == VEC_COND_EXPR) + return true; + + if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE) + return true; + + if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR + || TREE_CODE (expr) == BIT_XOR_EXPR) + return is_vector_comparison (gsi, TREE_OPERAND (expr, 0)) + & is_vector_comparison (gsi, TREE_OPERAND (expr, 1)); + + if (TREE_CODE (expr) == VAR_DECL) + { + gimple_stmt_iterator gsi_tmp; + gsi_tmp = *gsi; + tree name = DECL_NAME (expr); + tree var = NULL_TREE; + + for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp)) + { + gimple stmt = gsi_stmt (gsi_tmp); + + if (gimple_code (stmt) != GIMPLE_ASSIGN) + continue; + + if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL + && DECL_NAME (gimple_assign_lhs (stmt)) == name) + return is_vector_comparison (&gsi_tmp, + gimple_assign_rhs_to_tree (stmt)); + } + } + + if (TREE_CODE (expr) == SSA_NAME) + { + enum tree_code code; + gimple exprdef = SSA_NAME_DEF_STMT (expr); + + if (gimple_code (exprdef) != GIMPLE_ASSIGN) + return false; + + if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE) + return false; + + + return is_vector_comparison (gsi, + gimple_assign_rhs_to_tree (exprdef)); + } + + return false; +} + /* Process one statement. If we identify a vector operation, expand it. */ static void @@ -450,11 +620,34 @@ expand_vector_operations_1 (gimple_stmt_ code = gimple_assign_rhs_code (stmt); rhs_class = get_gimple_rhs_class (code); + lhs = gimple_assign_lhs (stmt); + + if (code == VEC_COND_EXPR) + { + tree exp = gimple_assign_rhs1 (stmt); + tree cond = TREE_OPERAND (exp, 0); + + if (!is_vector_comparison (gsi, cond)) + TREE_OPERAND (exp, 0) = + build2 (NE_EXPR, TREE_TYPE (cond), cond, + build_vector_from_val (TREE_TYPE (cond), + build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0))); + + if (expand_vec_cond_expr_p (TREE_TYPE (exp), + TYPE_MODE (TREE_TYPE (exp)))) + { + update_stmt (gsi_stmt (*gsi)); + return; + } + + new_rhs = expand_vec_cond_expr_piecewise (gsi, exp); + gimple_assign_set_rhs_from_tree (gsi, new_rhs); + update_stmt (gsi_stmt (*gsi)); + } if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS) return; - lhs = gimple_assign_lhs (stmt); rhs1 = gimple_assign_rhs1 (stmt); type = gimple_expr_type (stmt); if (rhs_class == GIMPLE_BINARY_RHS) Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 177665) +++ gcc/Makefile.in (working copy) @@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim TARGET_DEF = target.def target-hooks-macros.h C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF) COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF) MACHMODE_H = machmode.h mode-classes.def insn-modes.h @@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c- REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \ - vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \ + vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \ tree-ssa-alias.h $(INTERNAL_FN_H) +TARGET_H = $(TGT) gimple.h GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h COVERAGE_H = coverage.h $(GCOV_IO_H) DEMANGLE_H = $(srcdir)/../include/demangle.h @@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic. $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \ $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \ langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \ - coretypes.h insn-codes.h + coretypes.h insn-codes.h target.h df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \ hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \ Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c (revision 177665) +++ gcc/tree-cfg.c (working copy) @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre return true; } + if (TREE_CODE (type) == VECTOR_TYPE) + { + if (TREE_CODE (op0_type) != VECTOR_TYPE + || TREE_CODE (op1_type) != VECTOR_TYPE) + { + error ("non-vector operands in vector comparison"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } + + if (!useless_type_conversion_p (op0_type, op1_type) + && !useless_type_conversion_p (op1_type, op0_type)) + { + error ("type mismatch in vector comparison"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } + + if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type) + && TYPE_PRECISION (TREE_TYPE (op0_type)) + != TYPE_PRECISION (TREE_TYPE (type))) + { + error ("invalid vector comparison resulting type"); + debug_generic_expr (type); + return true; + } + + return false; + } + /* For comparisons we do not have the operations type as the effective type the comparison is carried out in. Instead we require that either the first operand is trivially Index: gcc/c-parser.c =================================================================== --- gcc/c-parser.c (revision 177665) +++ gcc/c-parser.c (working copy) @@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse tree eptype = NULL_TREE; middle_loc = c_parser_peek_token (parser)->location; + + if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE) + { + error_at (middle_loc, "cannot ommit middle operator in " + "vector comparison"); + ret.value = error_mark_node; + return ret; + } + pedwarn (middle_loc, OPT_pedantic, "ISO C forbids omitting the middle term of a ?: expression"); warn_for_omitted_condop (middle_loc, cond.value); @@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse } else { - cond.value - = c_objc_common_truthvalue_conversion - (cond_loc, default_conversion (cond.value)); + if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE) + { + cond.value + = c_objc_common_truthvalue_conversion + (cond_loc, default_conversion (cond.value)); + } c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node; exp1 = c_parser_expression_conv (parser); mark_exp_read (exp1.value); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 177665) +++ gcc/config/i386/i386.c (working copy) @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. #include "tm.h" #include "rtl.h" #include "tree.h" +#include "tree-flow.h" #include "tm_p.h" #include "regs.h" #include "hard-reg-set.h" @@ -18402,27 +18403,55 @@ ix86_expand_sse_fp_minmax (rtx dest, enu return true; } +rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT); + +/* Returns a vector of mode MODE where all the elements are ARG. */ +rtx +rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg) +{ + rtvec v; + int units, i; + enum machine_mode inner; + + units = GET_MODE_NUNITS (mode); + inner = GET_MODE_INNER (mode); + v = rtvec_alloc (units); + for (i = 0; i < units; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg); + + return gen_rtx_raw_CONST_VECTOR (mode, v); +} + /* Expand an sse vector comparison. Return the register with the result. */ static rtx ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1, - rtx op_true, rtx op_false) + rtx op_true, rtx op_false, bool no_comparison) { enum machine_mode mode = GET_MODE (dest); rtx x; - cmp_op0 = force_reg (mode, cmp_op0); - if (!nonimmediate_operand (cmp_op1, mode)) - cmp_op1 = force_reg (mode, cmp_op1); + /* Avoid useless comparison. */ + if (no_comparison) + { + cmp_op0 = force_reg (mode, cmp_op0); + x = cmp_op0; + } + else + { + cmp_op0 = force_reg (mode, cmp_op0); + if (!nonimmediate_operand (cmp_op1, mode)) + cmp_op1 = force_reg (mode, cmp_op1); + + x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1); + } if (optimize || reg_overlap_mentioned_p (dest, op_true) || reg_overlap_mentioned_p (dest, op_false)) dest = gen_reg_rtx (mode); - x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1); emit_insn (gen_rtx_SET (VOIDmode, dest, x)); - return dest; } @@ -18434,8 +18463,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp { enum machine_mode mode = GET_MODE (dest); rtx t2, t3, x; - - if (op_false == CONST0_RTX (mode)) + rtx mask_true; + + if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1)) + && rtx_equal_p (op_false, CONST0_RTX (mode))) + { + emit_insn (gen_rtx_SET (VOIDmode, dest, cmp)); + } + else if (op_false == CONST0_RTX (mode)) { op_true = force_reg (mode, op_true); x = gen_rtx_AND (mode, cmp, op_true); @@ -18512,7 +18547,7 @@ ix86_expand_fp_movcc (rtx operands[]) return true; tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1, - operands[2], operands[3]); + operands[2], operands[3], false); ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]); return true; } @@ -18555,7 +18590,7 @@ ix86_expand_fp_vcond (rtx operands[]) return true; cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], - operands[1], operands[2]); + operands[1], operands[2], false); ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); return true; } @@ -18569,7 +18604,9 @@ ix86_expand_int_vcond (rtx operands[]) enum rtx_code code = GET_CODE (operands[3]); bool negate = false; rtx x, cop0, cop1; + rtx comp; + comp = operands[3]; cop0 = operands[4]; cop1 = operands[5]; @@ -18681,8 +18718,18 @@ ix86_expand_int_vcond (rtx operands[]) } } - x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, - operands[1+negate], operands[2-negate]); + if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX + && XEXP (comp, 1) == NULL_RTX) + { + rtx vec = CONST0_RTX (mode); + x = ix86_expand_sse_cmp (operands[0], code, cop0, vec, + operands[1+negate], operands[2-negate], true); + } + else + { + x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, + operands[1+negate], operands[2-negate], false); + } ix86_expand_sse_movcc (operands[0], x, operands[1+negate], operands[2-negate]); @@ -18774,7 +18821,7 @@ ix86_expand_sse_unpack (rtx operands[2], tmp = force_reg (imode, CONST0_RTX (imode)); else tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode), - operands[1], pc_rtx, pc_rtx); + operands[1], pc_rtx, pc_rtx, false); emit_insn (unpack (dest, operands[1], tmp)); } @@ -32827,6 +32874,276 @@ ix86_vectorize_builtin_vec_perm (tree ve return ix86_builtins[(int) fcode]; } +/* Find target specific sequence for vector comparison of + real-type vectors V0 and V1. Returns variable containing + result of the comparison or NULL_TREE in other case. */ +static tree +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, + enum machine_mode mode, tree v0, tree v1, + enum tree_code code) +{ + enum ix86_builtins fcode; + int arg = -1; + tree fdef, frtype, tmp, var, t; + gimple new_stmt; + bool reverse = false; + +#define SWITCH_MODE(mode, fcode, code, value) \ +switch (mode) \ + { \ + case V2DFmode: \ + if (!TARGET_SSE2) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMP ## code ## PD; \ + break; \ + case V4DFmode: \ + if (!TARGET_AVX) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMPPD256; \ + arg = value; \ + break; \ + case V4SFmode: \ + if (!TARGET_SSE) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMP ## code ## PS; \ + break; \ + case V8SFmode: \ + if (!TARGET_AVX) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMPPS256; \ + arg = value; \ + break; \ + default: \ + return NULL_TREE; \ + /* FIXME: Similar instructions for MMX. */ \ + } + + switch (code) + { + case EQ_EXPR: + SWITCH_MODE (mode, fcode, EQ, 0); + break; + + case NE_EXPR: + SWITCH_MODE (mode, fcode, NEQ, 4); + break; + + case GT_EXPR: + SWITCH_MODE (mode, fcode, LT, 1); + reverse = true; + break; + + case LT_EXPR: + SWITCH_MODE (mode, fcode, LT, 1); + break; + + case LE_EXPR: + SWITCH_MODE (mode, fcode, LE, 2); + break; + + case GE_EXPR: + SWITCH_MODE (mode, fcode, LE, 2); + reverse = true; + break; + + default: + return NULL_TREE; + } +#undef SWITCH_MODE + + fdef = ix86_builtins[(int)fcode]; + frtype = TREE_TYPE (TREE_TYPE (fdef)); + + tmp = create_tmp_var (frtype, "tmp"); + var = create_tmp_var (rettype, "tmp"); + + if (arg == -1) + if (reverse) + new_stmt = gimple_build_call (fdef, 2, v1, v0); + else + new_stmt = gimple_build_call (fdef, 2, v0, v1); + else + if (reverse) + new_stmt = gimple_build_call (fdef, 3, v0, v1, + build_int_cst (char_type_node, arg)); + else + new_stmt = gimple_build_call (fdef, 3, v1, v0, + build_int_cst (char_type_node, arg)); + + gimple_call_set_lhs (new_stmt, tmp); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp); + new_stmt = gimple_build_assign (var, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + + return var; +} + +/* Find target specific sequence for vector comparison of + integer-type vectors V0 and V1. Returns variable containing + result of the comparison or NULL_TREE in other case. */ +static tree +vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, + enum machine_mode mode, tree v0, tree v1, + enum tree_code code) +{ + enum ix86_builtins feq, fgt; + tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype; + gimple new_stmt; + + switch (mode) + { + /* SSE integer-type vectors. */ + case V2DImode: + if (!TARGET_SSE4_2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQQ; + fgt = IX86_BUILTIN_PCMPGTQ; + break; + + case V4SImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQD128; + fgt = IX86_BUILTIN_PCMPGTD128; + break; + + case V8HImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQW128; + fgt = IX86_BUILTIN_PCMPGTW128; + break; + + case V16QImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQB128; + fgt = IX86_BUILTIN_PCMPGTB128; + break; + + /* MMX integer-type vectors. */ + case V2SImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQD; + fgt = IX86_BUILTIN_PCMPGTD; + break; + + case V4HImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQW; + fgt = IX86_BUILTIN_PCMPGTW; + break; + + case V8QImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQB; + fgt = IX86_BUILTIN_PCMPGTB; + break; + + /* FIXME: Similar instructions for AVX. */ + default: + return NULL_TREE; + } + + + var = create_tmp_var (rettype, "ret"); + defeq = ix86_builtins[(int)feq]; + defgt = ix86_builtins[(int)fgt]; + eqrtype = TREE_TYPE (TREE_TYPE (defeq)); + gtrtype = TREE_TYPE (TREE_TYPE (defgt)); + +#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \ +do { \ + var = create_tmp_var (gteq ## rtype, "tmp"); \ + stmt = gimple_build_call (def ## gteq, 2, op0, op1); \ + gimple_call_set_lhs (stmt, var); \ + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \ +} while (0) + + switch (code) + { + case EQ_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq); + break; + + case NE_EXPR: + tmp = create_tmp_var (eqrtype, "tmp"); + + EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq); + + /* t = tmp1 ^ {-1, -1,...} */ + t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + case GT_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt); + break; + + case LT_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt); + break; + + case GE_EXPR: + if (eqrtype != gtrtype) + return NULL_TREE; + tmp = create_tmp_var (eqrtype, "tmp"); + EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + case LE_EXPR: + if (eqrtype != gtrtype) + return NULL_TREE; + tmp = create_tmp_var (eqrtype, "tmp"); + EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + default: + return NULL_TREE; + } +#undef EQGT_CALL + + t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp); + new_stmt = gimple_build_assign (var, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + return var; +} + +/* Lower a comparison of two vectors V0 and V1, returning a + variable with the result of comparison. Returns NULL_TREE + when it is impossible to find a target specific sequence. */ +static tree +ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, + tree v0, tree v1, enum tree_code code) +{ + tree type; + + /* Make sure we are comparing the same types. */ + if (TREE_TYPE (v0) != TREE_TYPE (v1) + || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1))) + return NULL_TREE; + + type = TREE_TYPE (v0); + + /* Cannot compare packed unsigned integers + unless it is EQ or NEQ operations. */ + if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE + && TYPE_UNSIGNED (TREE_TYPE (type))) + if (code != EQ_EXPR && code != NE_EXPR) + return NULL_TREE; + + + if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE) + return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code); + else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) + return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code); + else + return NULL_TREE; +} + /* Return a vector mode with twice as many elements as VMODE. */ /* ??? Consider moving this to a table generated by genmodes.c. */ @@ -35270,6 +35587,11 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ ix86_autovectorize_vector_sizes +#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE +#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \ + ix86_vectorize_builtin_vec_compare + + #undef TARGET_SET_CURRENT_FUNCTION #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function

Vector Comparison patch

Commit Message

Comments

Patch