Message ID | CABYV9SWByyvdWo00G2jSROehUeJcux=3rTJVez2XHFL15g+Ewg@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > Richard > > I formalized an approach a little-bit, now it works without target > hooks, but some polishing is still required. I want you to comment on > the several important approaches that I use in the patch. > > So how does it work. > 1) All the vector comparisons at the level of type-checker are > introduced using VEC_COND_EXPR with constant selection operands being > {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >> v1, {-1}, {0}>. > > 2) When optabs expand VEC_COND_EXPR, two cases are considered: > 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. > 2.b) first operand is something else, in that case, we specially mark > this case, recognize it in the backend, and do not create a > comparison, but use the mask as it was a result of some comparison. > > 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a > vector comparison we use is_vector_comparison function, if it returns > false, then we replace mask with mask != {0}. > > So we end-up with the following functionality: > VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of > comparison of two vectors, we leave it as it is, otherwise change with > mask != {0}. > > Plain vector comparison a <op> b is represented with VEC_COND_EXPR, > which correctly expands, without creating useless masking. > > > Basically for me there are two questions: > 1) Can we perform information passing in optabs in a nicer way? > 2) How is_vector_comparison could be improved? I have several ideas, > like checking if constant vector all consists of 0 and -1, and so on. > But first is it conceptually fine. > > P.S. I tired to put the functionality of is_vector_comparison in > tree-ssa-forwprop, but the thing is that it is called only with -On, > which I find inappropriate, and the functionality gets more > complicated. Why is it inappropriate to not optimize it at -O0? If the user separates comparison and ?: expression it's his own fault. Btw, the new hook is still in the patch. I would simply always create != 0 if it isn't and let optimizers (tree-ssa-forwprop.c) optimize this. You still have to deal with non-comparison operands during expansion though, but if you always forced a != 0 from the start you can then simply interpret it as a proper comparison result (in which case I'd modify the backends to have an alternate pattern or directly expand to masking operations - using the fake comparison RTX is too much of a hack). tree constant_boolean_node (int value, tree type) { - if (type == integer_type_node) + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value); + return build_vector_from_val (type, tval); as value is either 0 or 1 that won't work. Oh, I see you pass -1 for true in the callers. But I think we should simply decide that true (1) means -1 for a vector boolean node (and the value parameter should be a bool instead). Thus, + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); + return build_vector_from_val (type, tval); instead. @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr floating-point, we can only do some of these simplifications.) */ if (operand_equal_p (arg0, arg1, 0)) { + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; + tree arg0_type = TREE_TYPE (arg0); + as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS macros work perfectly fine on vector types. Richard. > > Thanks, > Artem. >
On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> Richard >> >> I formalized an approach a little-bit, now it works without target >> hooks, but some polishing is still required. I want you to comment on >> the several important approaches that I use in the patch. >> >> So how does it work. >> 1) All the vector comparisons at the level of type-checker are >> introduced using VEC_COND_EXPR with constant selection operands being >> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>> v1, {-1}, {0}>. >> >> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >> 2.b) first operand is something else, in that case, we specially mark >> this case, recognize it in the backend, and do not create a >> comparison, but use the mask as it was a result of some comparison. >> >> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >> vector comparison we use is_vector_comparison function, if it returns >> false, then we replace mask with mask != {0}. >> >> So we end-up with the following functionality: >> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >> comparison of two vectors, we leave it as it is, otherwise change with >> mask != {0}. >> >> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >> which correctly expands, without creating useless masking. >> >> >> Basically for me there are two questions: >> 1) Can we perform information passing in optabs in a nicer way? >> 2) How is_vector_comparison could be improved? I have several ideas, >> like checking if constant vector all consists of 0 and -1, and so on. >> But first is it conceptually fine. >> >> P.S. I tired to put the functionality of is_vector_comparison in >> tree-ssa-forwprop, but the thing is that it is called only with -On, >> which I find inappropriate, and the functionality gets more >> complicated. > > Why is it inappropriate to not optimize it at -O0? If the user > separates comparison and ?: expression it's his own fault. Well, because all the information is there, and I perfectly envision the case when user expressed comparison separately, just to avoid code duplication. Like: mask = a > b; res1 = mask ? v0 : v1; res2 = mask ? v2 : v3; Which in this case would be different from res1 = a > b ? v0 : v1; res2 = a > b ? v2 : v3; > Btw, the new hook is still in the patch. > > I would simply always create != 0 if it isn't and let optimizers > (tree-ssa-forwprop.c) optimize this. You still have to deal with > non-comparison operands during expansion though, but if > you always forced a != 0 from the start you can then simply > interpret it as a proper comparison result (in which case I'd > modify the backends to have an alternate pattern or directly > expand to masking operations - using the fake comparison > RTX is too much of a hack). Richard, I think you didn't get the problem. I really need the way, to pass the information, that the expression that is in the first operand of vcond is an appropriate mask. I though for quite a while and this hack is the only answer I found, is there a better way to do that. I could for example introduce another tree-node, but it would be overkill as well. Now why do I need it so much: I want to implement the comparison in a way that {1, 5, 0, -1} is actually {-1,-1,-1,-1}. So whenever I am not sure that mask of VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not always). To check the stuff, I use is_vector_comparison in tree-vect-generic. So I really have the difference between mask ? x : y and mask != {0} ? x : y, otherwise I could treat mask != {0} in the backend as just mask. If this link between optabs and backend breaks, then the patch falls apart. Because every time the comparison is taken out VEC_COND_EXPR, I will have to put != {0}. Keep in mind, that I cannot always put the comparison inside the VEC_COND_EXPR, what if it is defined in a function-comparison, or somehow else? So what would be an appropriate way to connect optabs and the backend? Thanks, Artem. All the rest would be adjusted later. > tree > constant_boolean_node (int value, tree type) > { > - if (type == integer_type_node) > + if (TREE_CODE (type) == VECTOR_TYPE) > + { > + tree tval; > + > + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); > + tval = build_int_cst (TREE_TYPE (type), value); > + return build_vector_from_val (type, tval); > > as value is either 0 or 1 that won't work. Oh, I see you pass -1 > for true in the callers. But I think we should simply decide that true (1) > means -1 for a vector boolean node (and the value parameter should > be a bool instead). Thus, > > + if (TREE_CODE (type) == VECTOR_TYPE) > + { > + tree tval; > + > + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); > + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); > + return build_vector_from_val (type, tval); > > instead. > > @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr > floating-point, we can only do some of these simplifications.) */ > if (operand_equal_p (arg0, arg1, 0)) > { > + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; > + tree arg0_type = TREE_TYPE (arg0); > + > > as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS > macros work perfectly fine on vector types. > > Richard. > >> >> Thanks, >> Artem. >> >
On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> Richard >>> >>> I formalized an approach a little-bit, now it works without target >>> hooks, but some polishing is still required. I want you to comment on >>> the several important approaches that I use in the patch. >>> >>> So how does it work. >>> 1) All the vector comparisons at the level of type-checker are >>> introduced using VEC_COND_EXPR with constant selection operands being >>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>> v1, {-1}, {0}>. >>> >>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>> 2.b) first operand is something else, in that case, we specially mark >>> this case, recognize it in the backend, and do not create a >>> comparison, but use the mask as it was a result of some comparison. >>> >>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>> vector comparison we use is_vector_comparison function, if it returns >>> false, then we replace mask with mask != {0}. >>> >>> So we end-up with the following functionality: >>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>> comparison of two vectors, we leave it as it is, otherwise change with >>> mask != {0}. >>> >>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>> which correctly expands, without creating useless masking. >>> >>> >>> Basically for me there are two questions: >>> 1) Can we perform information passing in optabs in a nicer way? >>> 2) How is_vector_comparison could be improved? I have several ideas, >>> like checking if constant vector all consists of 0 and -1, and so on. >>> But first is it conceptually fine. >>> >>> P.S. I tired to put the functionality of is_vector_comparison in >>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>> which I find inappropriate, and the functionality gets more >>> complicated. >> >> Why is it inappropriate to not optimize it at -O0? If the user >> separates comparison and ?: expression it's his own fault. > > Well, because all the information is there, and I perfectly envision > the case when user expressed comparison separately, just to avoid code > duplication. > > Like: > mask = a > b; > res1 = mask ? v0 : v1; > res2 = mask ? v2 : v3; > > Which in this case would be different from > res1 = a > b ? v0 : v1; > res2 = a > b ? v2 : v3; > >> Btw, the new hook is still in the patch. >> >> I would simply always create != 0 if it isn't and let optimizers >> (tree-ssa-forwprop.c) optimize this. You still have to deal with >> non-comparison operands during expansion though, but if >> you always forced a != 0 from the start you can then simply >> interpret it as a proper comparison result (in which case I'd >> modify the backends to have an alternate pattern or directly >> expand to masking operations - using the fake comparison >> RTX is too much of a hack). > > Richard, I think you didn't get the problem. > I really need the way, to pass the information, that the expression > that is in the first operand of vcond is an appropriate mask. I though > for quite a while and this hack is the only answer I found, is there a > better way to do that. I could for example introduce another > tree-node, but it would be overkill as well. > > Now why do I need it so much: > I want to implement the comparison in a way that {1, 5, 0, -1} is > actually {-1,-1,-1,-1}. So whenever I am not sure that mask of > VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not > always). To check the stuff, I use is_vector_comparison in > tree-vect-generic. > > So I really have the difference between mask ? x : y and mask != {0} ? > x : y, otherwise I could treat mask != {0} in the backend as just > mask. > > If this link between optabs and backend breaks, then the patch falls > apart. Because every time the comparison is taken out VEC_COND_EXPR, I > will have to put != {0}. Keep in mind, that I cannot always put the > comparison inside the VEC_COND_EXPR, what if it is defined in a > function-comparison, or somehow else? > > So what would be an appropriate way to connect optabs and the backend? Well, there is no problem in having the only valid mask operand for VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? vec1 : vec2. This comparison can be eliminated by optimization passes that either replace it by the real comparison computing the mask or just propagating the information this mask is already {-1,...} / {0,....} by simply dropping the comparison against zero. For the backends I'd have vcond patterns for both an embedded comparison and for a mask. (Now we can rewind the discussion a bit and allow arbitrary masks and define a vcond with a mask operand to do bitwise selection - what matters is the C frontend semantics which we need to translate to what the middle-end thinks of a VEC_COND_EXPR, they do not have to agree). If the mask is computed by a function you are of course out of luck, but I don't see how you'd manage to infer knowledge from nowhere either. Richard. > > Thanks, > Artem. > > All the rest would be adjusted later. > >> tree >> constant_boolean_node (int value, tree type) >> { >> - if (type == integer_type_node) >> + if (TREE_CODE (type) == VECTOR_TYPE) >> + { >> + tree tval; >> + >> + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); >> + tval = build_int_cst (TREE_TYPE (type), value); >> + return build_vector_from_val (type, tval); >> >> as value is either 0 or 1 that won't work. Oh, I see you pass -1 >> for true in the callers. But I think we should simply decide that true (1) >> means -1 for a vector boolean node (and the value parameter should >> be a bool instead). Thus, >> >> + if (TREE_CODE (type) == VECTOR_TYPE) >> + { >> + tree tval; >> + >> + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); >> + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); >> + return build_vector_from_val (type, tval); >> >> instead. >> >> @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr >> floating-point, we can only do some of these simplifications.) */ >> if (operand_equal_p (arg0, arg1, 0)) >> { >> + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; >> + tree arg0_type = TREE_TYPE (arg0); >> + >> >> as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS >> macros work perfectly fine on vector types. >> >> Richard. >> >>> >>> Thanks, >>> Artem. >>> >> >
On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> Richard >>>> >>>> I formalized an approach a little-bit, now it works without target >>>> hooks, but some polishing is still required. I want you to comment on >>>> the several important approaches that I use in the patch. >>>> >>>> So how does it work. >>>> 1) All the vector comparisons at the level of type-checker are >>>> introduced using VEC_COND_EXPR with constant selection operands being >>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>> v1, {-1}, {0}>. >>>> >>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>> 2.b) first operand is something else, in that case, we specially mark >>>> this case, recognize it in the backend, and do not create a >>>> comparison, but use the mask as it was a result of some comparison. >>>> >>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>> vector comparison we use is_vector_comparison function, if it returns >>>> false, then we replace mask with mask != {0}. >>>> >>>> So we end-up with the following functionality: >>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>> comparison of two vectors, we leave it as it is, otherwise change with >>>> mask != {0}. >>>> >>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>> which correctly expands, without creating useless masking. >>>> >>>> >>>> Basically for me there are two questions: >>>> 1) Can we perform information passing in optabs in a nicer way? >>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>> like checking if constant vector all consists of 0 and -1, and so on. >>>> But first is it conceptually fine. >>>> >>>> P.S. I tired to put the functionality of is_vector_comparison in >>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>> which I find inappropriate, and the functionality gets more >>>> complicated. >>> >>> Why is it inappropriate to not optimize it at -O0? If the user >>> separates comparison and ?: expression it's his own fault. >> >> Well, because all the information is there, and I perfectly envision >> the case when user expressed comparison separately, just to avoid code >> duplication. >> >> Like: >> mask = a > b; >> res1 = mask ? v0 : v1; >> res2 = mask ? v2 : v3; >> >> Which in this case would be different from >> res1 = a > b ? v0 : v1; >> res2 = a > b ? v2 : v3; >> >>> Btw, the new hook is still in the patch. >>> >>> I would simply always create != 0 if it isn't and let optimizers >>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>> non-comparison operands during expansion though, but if >>> you always forced a != 0 from the start you can then simply >>> interpret it as a proper comparison result (in which case I'd >>> modify the backends to have an alternate pattern or directly >>> expand to masking operations - using the fake comparison >>> RTX is too much of a hack). >> >> Richard, I think you didn't get the problem. >> I really need the way, to pass the information, that the expression >> that is in the first operand of vcond is an appropriate mask. I though >> for quite a while and this hack is the only answer I found, is there a >> better way to do that. I could for example introduce another >> tree-node, but it would be overkill as well. >> >> Now why do I need it so much: >> I want to implement the comparison in a way that {1, 5, 0, -1} is >> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >> always). To check the stuff, I use is_vector_comparison in >> tree-vect-generic. >> >> So I really have the difference between mask ? x : y and mask != {0} ? >> x : y, otherwise I could treat mask != {0} in the backend as just >> mask. >> >> If this link between optabs and backend breaks, then the patch falls >> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >> will have to put != {0}. Keep in mind, that I cannot always put the >> comparison inside the VEC_COND_EXPR, what if it is defined in a >> function-comparison, or somehow else? >> >> So what would be an appropriate way to connect optabs and the backend? > > Well, there is no problem in having the only valid mask operand for > VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. > Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? > vec1 : vec2. This happens already in the new version of patch (not submitted yet). > This comparison can be eliminated by optimization passes > that > either replace it by the real comparison computing the mask or just > propagating the information this mask is already {-1,...} / {0,....} by simply > dropping the comparison against zero. This is not a problem, because the backend recognizes these patterns, so no optimization is needed in this part. > For the backends I'd have vcond patterns for both an embedded comparison > and for a mask. (Now we can rewind the discussion a bit and allow > arbitrary masks and define a vcond with a mask operand to do bitwise > selection - what matters is the C frontend semantics which we need to > translate to what the middle-end thinks of a VEC_COND_EXPR, they > do not have to agree). But it seems like another combinatorial explosion here. Considering what Richard said in his e-mail, in order to support "generic" vcond, we just need to enumerate all the possible cases. Or I didn't understand right? I mean, I don't mind of course, but it seems to me that it would be cleaner to have one generic enough pattern. Is there seriously no way to pass something from optab into the backend?? > If the mask is computed by a function you are of course out of luck, > but I don't see how you'd manage to infer knowledge from nowhere either. Well, take simpler example a = {0}; for ( ; *p; p += 16) a &= pattern > (vec)*p; res = a ? v0 : v1; In this case it is simple to analyse that a is a comparison, but you cannot embed the operations of a into VEC_COND_EXPR. Ok, I am testing the patch that removes hooks. Could you push a little bit the backend-patterns business? Thanks, Artem. > Richard. > >> >> Thanks, >> Artem. >> >> All the rest would be adjusted later. >> >>> tree >>> constant_boolean_node (int value, tree type) >>> { >>> - if (type == integer_type_node) >>> + if (TREE_CODE (type) == VECTOR_TYPE) >>> + { >>> + tree tval; >>> + >>> + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); >>> + tval = build_int_cst (TREE_TYPE (type), value); >>> + return build_vector_from_val (type, tval); >>> >>> as value is either 0 or 1 that won't work. Oh, I see you pass -1 >>> for true in the callers. But I think we should simply decide that true (1) >>> means -1 for a vector boolean node (and the value parameter should >>> be a bool instead). Thus, >>> >>> + if (TREE_CODE (type) == VECTOR_TYPE) >>> + { >>> + tree tval; >>> + >>> + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); >>> + tval = build_int_cst (TREE_TYPE (type), value ? -1 : 0); >>> + return build_vector_from_val (type, tval); >>> >>> instead. >>> >>> @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr >>> floating-point, we can only do some of these simplifications.) */ >>> if (operand_equal_p (arg0, arg1, 0)) >>> { >>> + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; >>> + tree arg0_type = TREE_TYPE (arg0); >>> + >>> >>> as I said this is not necessary - the FLOAT_TYPE_P and HONOR_NANS >>> macros work perfectly fine on vector types. >>> >>> Richard. >>> >>>> >>>> Thanks, >>>> Artem. >>>> >>> >> >
On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> Richard >>>>> >>>>> I formalized an approach a little-bit, now it works without target >>>>> hooks, but some polishing is still required. I want you to comment on >>>>> the several important approaches that I use in the patch. >>>>> >>>>> So how does it work. >>>>> 1) All the vector comparisons at the level of type-checker are >>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>> v1, {-1}, {0}>. >>>>> >>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>> 2.b) first operand is something else, in that case, we specially mark >>>>> this case, recognize it in the backend, and do not create a >>>>> comparison, but use the mask as it was a result of some comparison. >>>>> >>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>> vector comparison we use is_vector_comparison function, if it returns >>>>> false, then we replace mask with mask != {0}. >>>>> >>>>> So we end-up with the following functionality: >>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>> mask != {0}. >>>>> >>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>> which correctly expands, without creating useless masking. >>>>> >>>>> >>>>> Basically for me there are two questions: >>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>> But first is it conceptually fine. >>>>> >>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>> which I find inappropriate, and the functionality gets more >>>>> complicated. >>>> >>>> Why is it inappropriate to not optimize it at -O0? If the user >>>> separates comparison and ?: expression it's his own fault. >>> >>> Well, because all the information is there, and I perfectly envision >>> the case when user expressed comparison separately, just to avoid code >>> duplication. >>> >>> Like: >>> mask = a > b; >>> res1 = mask ? v0 : v1; >>> res2 = mask ? v2 : v3; >>> >>> Which in this case would be different from >>> res1 = a > b ? v0 : v1; >>> res2 = a > b ? v2 : v3; >>> >>>> Btw, the new hook is still in the patch. >>>> >>>> I would simply always create != 0 if it isn't and let optimizers >>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>> non-comparison operands during expansion though, but if >>>> you always forced a != 0 from the start you can then simply >>>> interpret it as a proper comparison result (in which case I'd >>>> modify the backends to have an alternate pattern or directly >>>> expand to masking operations - using the fake comparison >>>> RTX is too much of a hack). >>> >>> Richard, I think you didn't get the problem. >>> I really need the way, to pass the information, that the expression >>> that is in the first operand of vcond is an appropriate mask. I though >>> for quite a while and this hack is the only answer I found, is there a >>> better way to do that. I could for example introduce another >>> tree-node, but it would be overkill as well. >>> >>> Now why do I need it so much: >>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>> always). To check the stuff, I use is_vector_comparison in >>> tree-vect-generic. >>> >>> So I really have the difference between mask ? x : y and mask != {0} ? >>> x : y, otherwise I could treat mask != {0} in the backend as just >>> mask. >>> >>> If this link between optabs and backend breaks, then the patch falls >>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>> will have to put != {0}. Keep in mind, that I cannot always put the >>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>> function-comparison, or somehow else? >>> >>> So what would be an appropriate way to connect optabs and the backend? >> >> Well, there is no problem in having the only valid mask operand for >> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >> vec1 : vec2. > > This happens already in the new version of patch (not submitted yet). > >> This comparison can be eliminated by optimization passes >> that >> either replace it by the real comparison computing the mask or just >> propagating the information this mask is already {-1,...} / {0,....} by simply >> dropping the comparison against zero. > > This is not a problem, because the backend recognizes these patterns, > so no optimization is needed in this part. I mean for mask = v1 < v2 ? {-1,...} : {0,...}; val = VCOND_EXPR <mask != 0, v3, v4>; optimizers can see how mask is defined and drop the != 0 test or replace it by v1 < v2. >> For the backends I'd have vcond patterns for both an embedded comparison >> and for a mask. (Now we can rewind the discussion a bit and allow >> arbitrary masks and define a vcond with a mask operand to do bitwise >> selection - what matters is the C frontend semantics which we need to >> translate to what the middle-end thinks of a VEC_COND_EXPR, they >> do not have to agree). > > But it seems like another combinatorial explosion here. Considering > what Richard said in his e-mail, in order to support "generic" vcond, > we just need to enumerate all the possible cases. Or I didn't > understand right? Well, the question is still what VCOND_EXPR and thus the vcond pattern semantically does for a non-comparison operand. I'd argue that using the bitwise selection semantic gives us maximum flexibility and a native instruction with AMD XOP. This non-comparison VCOND_EXPR is also easy to implement in the middle-end expansion code if there is no native instruction for it - by simply emitting the bitwise operations. But I have the feeling we are talking past each other ...? > I mean, I don't mind of course, but it seems to me that it would be > cleaner to have one generic enough pattern. > > Is there seriously no way to pass something from optab into the backend?? You can pass operands. And information is implicitly encoded in the name. >> If the mask is computed by a function you are of course out of luck, >> but I don't see how you'd manage to infer knowledge from nowhere either. > > Well, take simpler example > > a = {0}; > for ( ; *p; p += 16) > a &= pattern > (vec)*p; > > res = a ? v0 : v1; > > In this case it is simple to analyse that a is a comparison, but you > cannot embed the operations of a into VEC_COND_EXPR. Sure, but if the above is C source the frontend would generate res = a != 0 ? v0 : v1; initially. An optimization pass could still track this information and replace VEC_COND_EXPR <a != 0, v0, v1> with VEC_COND_EXPR <a, v0, v1> (no existing one would track vector contents though). > Ok, I am testing the patch that removes hooks. Could you push a little > bit the backend-patterns business? Well, I suppose we're waiting for Uros here. I hadn't much luck with fiddling with the mode-iterator stuff myself. Richard.
On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>> Richard >>>>>> >>>>>> I formalized an approach a little-bit, now it works without target >>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>> the several important approaches that I use in the patch. >>>>>> >>>>>> So how does it work. >>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>> v1, {-1}, {0}>. >>>>>> >>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>> this case, recognize it in the backend, and do not create a >>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>> >>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>> false, then we replace mask with mask != {0}. >>>>>> >>>>>> So we end-up with the following functionality: >>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>> mask != {0}. >>>>>> >>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>> which correctly expands, without creating useless masking. >>>>>> >>>>>> >>>>>> Basically for me there are two questions: >>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>> But first is it conceptually fine. >>>>>> >>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>> which I find inappropriate, and the functionality gets more >>>>>> complicated. >>>>> >>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>> separates comparison and ?: expression it's his own fault. >>>> >>>> Well, because all the information is there, and I perfectly envision >>>> the case when user expressed comparison separately, just to avoid code >>>> duplication. >>>> >>>> Like: >>>> mask = a > b; >>>> res1 = mask ? v0 : v1; >>>> res2 = mask ? v2 : v3; >>>> >>>> Which in this case would be different from >>>> res1 = a > b ? v0 : v1; >>>> res2 = a > b ? v2 : v3; >>>> >>>>> Btw, the new hook is still in the patch. >>>>> >>>>> I would simply always create != 0 if it isn't and let optimizers >>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>> non-comparison operands during expansion though, but if >>>>> you always forced a != 0 from the start you can then simply >>>>> interpret it as a proper comparison result (in which case I'd >>>>> modify the backends to have an alternate pattern or directly >>>>> expand to masking operations - using the fake comparison >>>>> RTX is too much of a hack). >>>> >>>> Richard, I think you didn't get the problem. >>>> I really need the way, to pass the information, that the expression >>>> that is in the first operand of vcond is an appropriate mask. I though >>>> for quite a while and this hack is the only answer I found, is there a >>>> better way to do that. I could for example introduce another >>>> tree-node, but it would be overkill as well. >>>> >>>> Now why do I need it so much: >>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>> always). To check the stuff, I use is_vector_comparison in >>>> tree-vect-generic. >>>> >>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>> mask. >>>> >>>> If this link between optabs and backend breaks, then the patch falls >>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>> function-comparison, or somehow else? >>>> >>>> So what would be an appropriate way to connect optabs and the backend? >>> >>> Well, there is no problem in having the only valid mask operand for >>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>> vec1 : vec2. >> >> This happens already in the new version of patch (not submitted yet). >> >>> This comparison can be eliminated by optimization passes >>> that >>> either replace it by the real comparison computing the mask or just >>> propagating the information this mask is already {-1,...} / {0,....} by simply >>> dropping the comparison against zero. >> >> This is not a problem, because the backend recognizes these patterns, >> so no optimization is needed in this part. > > I mean for > > mask = v1 < v2 ? {-1,...} : {0,...}; > val = VCOND_EXPR <mask != 0, v3, v4>; > > optimizers can see how mask is defined and drop the != 0 test or replace > it by v1 < v2. Yes, sure. >>> For the backends I'd have vcond patterns for both an embedded comparison >>> and for a mask. (Now we can rewind the discussion a bit and allow >>> arbitrary masks and define a vcond with a mask operand to do bitwise >>> selection - what matters is the C frontend semantics which we need to >>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>> do not have to agree). >> >> But it seems like another combinatorial explosion here. Considering >> what Richard said in his e-mail, in order to support "generic" vcond, >> we just need to enumerate all the possible cases. Or I didn't >> understand right? > > Well, the question is still what VCOND_EXPR and thus the vcond pattern > semantically does for a non-comparison operand. I'd argue that using > the bitwise selection semantic gives us maximum flexibility and a native > instruction with AMD XOP. This non-comparison VCOND_EXPR is > also easy to implement in the middle-end expansion code if there is > no native instruction for it - by simply emitting the bitwise operations. > > But I have the feeling we are talking past each other ...? I am all for the bitwise behaviour in the backend pattern, that is something that I rely on at the moment. What I don't want to have is the same behaviour in the frontend. So If we can guarantee, that we add != 0, when we don't know the "nature" of the mask, then I am perfectly fine with the back-end having bitwise-selection behaviour. >> I mean, I don't mind of course, but it seems to me that it would be >> cleaner to have one generic enough pattern. >> >> Is there seriously no way to pass something from optab into the backend?? > > You can pass operands. And information is implicitly encoded in the name. I didn't quite get that, could you give an example? >>> If the mask is computed by a function you are of course out of luck, >>> but I don't see how you'd manage to infer knowledge from nowhere either. >> >> Well, take simpler example >> >> a = {0}; >> for ( ; *p; p += 16) >> a &= pattern > (vec)*p; >> >> res = a ? v0 : v1; >> >> In this case it is simple to analyse that a is a comparison, but you >> cannot embed the operations of a into VEC_COND_EXPR. > > Sure, but if the above is C source the frontend would generate > res = a != 0 ? v0 : v1; initially. An optimization pass could still > track this information and replace VEC_COND_EXPR <a != 0, v0, v1> > with VEC_COND_EXPR <a, v0, v1> (no existing one would track > vector contents though). Yeah, sure. My point is, that we must be able to pass this information in the backend, that we checked everything, and we are sure that a is a corerct mask, please don't add any != 0 to it. >> Ok, I am testing the patch that removes hooks. Could you push a little >> bit the backend-patterns business? > > Well, I suppose we're waiting for Uros here. I hadn't much luck with > fiddling with the mode-iterator stuff myself. > > Richard. > Ok, fine. The patch is coming soon. Artem.
On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>> Richard >>>>>>> >>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>> the several important approaches that I use in the patch. >>>>>>> >>>>>>> So how does it work. >>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>> v1, {-1}, {0}>. >>>>>>> >>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>> this case, recognize it in the backend, and do not create a >>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>> >>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>> false, then we replace mask with mask != {0}. >>>>>>> >>>>>>> So we end-up with the following functionality: >>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>> mask != {0}. >>>>>>> >>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>> which correctly expands, without creating useless masking. >>>>>>> >>>>>>> >>>>>>> Basically for me there are two questions: >>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>> But first is it conceptually fine. >>>>>>> >>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>> which I find inappropriate, and the functionality gets more >>>>>>> complicated. >>>>>> >>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>> separates comparison and ?: expression it's his own fault. >>>>> >>>>> Well, because all the information is there, and I perfectly envision >>>>> the case when user expressed comparison separately, just to avoid code >>>>> duplication. >>>>> >>>>> Like: >>>>> mask = a > b; >>>>> res1 = mask ? v0 : v1; >>>>> res2 = mask ? v2 : v3; >>>>> >>>>> Which in this case would be different from >>>>> res1 = a > b ? v0 : v1; >>>>> res2 = a > b ? v2 : v3; >>>>> >>>>>> Btw, the new hook is still in the patch. >>>>>> >>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>> non-comparison operands during expansion though, but if >>>>>> you always forced a != 0 from the start you can then simply >>>>>> interpret it as a proper comparison result (in which case I'd >>>>>> modify the backends to have an alternate pattern or directly >>>>>> expand to masking operations - using the fake comparison >>>>>> RTX is too much of a hack). >>>>> >>>>> Richard, I think you didn't get the problem. >>>>> I really need the way, to pass the information, that the expression >>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>> for quite a while and this hack is the only answer I found, is there a >>>>> better way to do that. I could for example introduce another >>>>> tree-node, but it would be overkill as well. >>>>> >>>>> Now why do I need it so much: >>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>> always). To check the stuff, I use is_vector_comparison in >>>>> tree-vect-generic. >>>>> >>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>> mask. >>>>> >>>>> If this link between optabs and backend breaks, then the patch falls >>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>> function-comparison, or somehow else? >>>>> >>>>> So what would be an appropriate way to connect optabs and the backend? >>>> >>>> Well, there is no problem in having the only valid mask operand for >>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>> vec1 : vec2. >>> >>> This happens already in the new version of patch (not submitted yet). >>> >>>> This comparison can be eliminated by optimization passes >>>> that >>>> either replace it by the real comparison computing the mask or just >>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>> dropping the comparison against zero. >>> >>> This is not a problem, because the backend recognizes these patterns, >>> so no optimization is needed in this part. >> >> I mean for >> >> mask = v1 < v2 ? {-1,...} : {0,...}; >> val = VCOND_EXPR <mask != 0, v3, v4>; >> >> optimizers can see how mask is defined and drop the != 0 test or replace >> it by v1 < v2. > > Yes, sure. > >>>> For the backends I'd have vcond patterns for both an embedded comparison >>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>> selection - what matters is the C frontend semantics which we need to >>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>> do not have to agree). >>> >>> But it seems like another combinatorial explosion here. Considering >>> what Richard said in his e-mail, in order to support "generic" vcond, >>> we just need to enumerate all the possible cases. Or I didn't >>> understand right? >> >> Well, the question is still what VCOND_EXPR and thus the vcond pattern >> semantically does for a non-comparison operand. I'd argue that using >> the bitwise selection semantic gives us maximum flexibility and a native >> instruction with AMD XOP. This non-comparison VCOND_EXPR is >> also easy to implement in the middle-end expansion code if there is >> no native instruction for it - by simply emitting the bitwise operations. >> >> But I have the feeling we are talking past each other ...? > > I am all for the bitwise behaviour in the backend pattern, that is > something that I rely on at the moment. What I don't want to have is > the same behaviour in the frontend. So If we can guarantee, that we > add != 0, when we don't know the "nature" of the mask, then I am > perfectly fine with the back-end having bitwise-selection behaviour. Well, the C frontend would simply always add that != 0 (because it doesn't know). >>> I mean, I don't mind of course, but it seems to me that it would be >>> cleaner to have one generic enough pattern. >>> >>> Is there seriously no way to pass something from optab into the backend?? >> >> You can pass operands. And information is implicitly encoded in the name. > > I didn't quite get that, could you give an example? It was a larger variant of "no, apart from what is obvious". >>>> If the mask is computed by a function you are of course out of luck, >>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>> >>> Well, take simpler example >>> >>> a = {0}; >>> for ( ; *p; p += 16) >>> a &= pattern > (vec)*p; >>> >>> res = a ? v0 : v1; >>> >>> In this case it is simple to analyse that a is a comparison, but you >>> cannot embed the operations of a into VEC_COND_EXPR. >> >> Sure, but if the above is C source the frontend would generate >> res = a != 0 ? v0 : v1; initially. An optimization pass could still >> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >> vector contents though). > > Yeah, sure. My point is, that we must be able to pass this information > in the backend, that we checked everything, and we are sure that a is > a corerct mask, please don't add any != 0 to it. But all masks are correct as soon as they appear in a VEC_COND_EXPR. That's the whole point of the bitwise semantics. It's only the C frontend that needs to be careful to impose its stricter semantics. Richard.
On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>> <richard.guenther@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>> Richard >>>>>>>> >>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>>> the several important approaches that I use in the patch. >>>>>>>> >>>>>>>> So how does it work. >>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>>> v1, {-1}, {0}>. >>>>>>>> >>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>> >>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>> >>>>>>>> So we end-up with the following functionality: >>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>>> mask != {0}. >>>>>>>> >>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>> which correctly expands, without creating useless masking. >>>>>>>> >>>>>>>> >>>>>>>> Basically for me there are two questions: >>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>>> But first is it conceptually fine. >>>>>>>> >>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>> complicated. >>>>>>> >>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>> separates comparison and ?: expression it's his own fault. >>>>>> >>>>>> Well, because all the information is there, and I perfectly envision >>>>>> the case when user expressed comparison separately, just to avoid code >>>>>> duplication. >>>>>> >>>>>> Like: >>>>>> mask = a > b; >>>>>> res1 = mask ? v0 : v1; >>>>>> res2 = mask ? v2 : v3; >>>>>> >>>>>> Which in this case would be different from >>>>>> res1 = a > b ? v0 : v1; >>>>>> res2 = a > b ? v2 : v3; >>>>>> >>>>>>> Btw, the new hook is still in the patch. >>>>>>> >>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>> non-comparison operands during expansion though, but if >>>>>>> you always forced a != 0 from the start you can then simply >>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>> modify the backends to have an alternate pattern or directly >>>>>>> expand to masking operations - using the fake comparison >>>>>>> RTX is too much of a hack). >>>>>> >>>>>> Richard, I think you didn't get the problem. >>>>>> I really need the way, to pass the information, that the expression >>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>> better way to do that. I could for example introduce another >>>>>> tree-node, but it would be overkill as well. >>>>>> >>>>>> Now why do I need it so much: >>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>> tree-vect-generic. >>>>>> >>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>> mask. >>>>>> >>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>> function-comparison, or somehow else? >>>>>> >>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>> >>>>> Well, there is no problem in having the only valid mask operand for >>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>> vec1 : vec2. >>>> >>>> This happens already in the new version of patch (not submitted yet). >>>> >>>>> This comparison can be eliminated by optimization passes >>>>> that >>>>> either replace it by the real comparison computing the mask or just >>>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>>> dropping the comparison against zero. >>>> >>>> This is not a problem, because the backend recognizes these patterns, >>>> so no optimization is needed in this part. >>> >>> I mean for >>> >>> mask = v1 < v2 ? {-1,...} : {0,...}; >>> val = VCOND_EXPR <mask != 0, v3, v4>; >>> >>> optimizers can see how mask is defined and drop the != 0 test or replace >>> it by v1 < v2. >> >> Yes, sure. >> >>>>> For the backends I'd have vcond patterns for both an embedded comparison >>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>> selection - what matters is the C frontend semantics which we need to >>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>> do not have to agree). >>>> >>>> But it seems like another combinatorial explosion here. Considering >>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>> we just need to enumerate all the possible cases. Or I didn't >>>> understand right? >>> >>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>> semantically does for a non-comparison operand. I'd argue that using >>> the bitwise selection semantic gives us maximum flexibility and a native >>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>> also easy to implement in the middle-end expansion code if there is >>> no native instruction for it - by simply emitting the bitwise operations. >>> >>> But I have the feeling we are talking past each other ...? >> >> I am all for the bitwise behaviour in the backend pattern, that is >> something that I rely on at the moment. What I don't want to have is >> the same behaviour in the frontend. So If we can guarantee, that we >> add != 0, when we don't know the "nature" of the mask, then I am >> perfectly fine with the back-end having bitwise-selection behaviour. > > Well, the C frontend would simply always add that != 0 (because it > doesn't know). > >>>> I mean, I don't mind of course, but it seems to me that it would be >>>> cleaner to have one generic enough pattern. >>>> >>>> Is there seriously no way to pass something from optab into the backend?? >>> >>> You can pass operands. And information is implicitly encoded in the name. >> >> I didn't quite get that, could you give an example? > > It was a larger variant of "no, apart from what is obvious". Ha, joking again. :) >>>>> If the mask is computed by a function you are of course out of luck, >>>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>>> >>>> Well, take simpler example >>>> >>>> a = {0}; >>>> for ( ; *p; p += 16) >>>> a &= pattern > (vec)*p; >>>> >>>> res = a ? v0 : v1; >>>> >>>> In this case it is simple to analyse that a is a comparison, but you >>>> cannot embed the operations of a into VEC_COND_EXPR. >>> >>> Sure, but if the above is C source the frontend would generate >>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>> vector contents though). >> >> Yeah, sure. My point is, that we must be able to pass this information >> in the backend, that we checked everything, and we are sure that a is >> a corerct mask, please don't add any != 0 to it. > > But all masks are correct as soon as they appear in a VEC_COND_EXPR. > That's the whole point of the bitwise semantics. It's only the C frontend > that needs to be careful to impose its stricter semantics. > > Richard. > Ok, I see the last difference in the approaches we envision. I am assuming, that frontend does not put != 0, but the later optimisations (veclower in my case) check every mask in VEC_COND_EXPR and does the same functionality as you describe. So the philosophical question why it is better to first add and then remove, rather than just add if needed? In all the rest I think we agreed. Artem.
On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther <richard.guenther@gmail.com> wrote: >> In this case it is simple to analyse that a is a comparison, but you >> cannot embed the operations of a into VEC_COND_EXPR. > > Sure, but if the above is C source the frontend would generate > res = a != 0 ? v0 : v1; initially. An optimization pass could still > track this information and replace VEC_COND_EXPR <a != 0, v0, v1> > with VEC_COND_EXPR <a, v0, v1> (no existing one would track > vector contents though). > >> Ok, I am testing the patch that removes hooks. Could you push a little >> bit the backend-patterns business? > > Well, I suppose we're waiting for Uros here. I hadn't much luck with > fiddling with the mode-iterator stuff myself. It is not _that_ trivial change, since we have ix86_expand_fp_vcond and ix86_expand_int_vcond to merge. ATM, FP version deals with FP operands and vice versa. We have to merge them somehow and split out comparison part that handles FP as well as integer operands. I also don't know why vcond is not allowed to FAIL... probably middle-end should be enhanced for a fallback if some comparison isn't supported by optab. Uros.
On Mon, Aug 22, 2011 at 9:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: > >>> In this case it is simple to analyse that a is a comparison, but you >>> cannot embed the operations of a into VEC_COND_EXPR. >> >> Sure, but if the above is C source the frontend would generate >> res = a != 0 ? v0 : v1; initially. An optimization pass could still >> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >> vector contents though). >> >>> Ok, I am testing the patch that removes hooks. Could you push a little >>> bit the backend-patterns business? >> >> Well, I suppose we're waiting for Uros here. I hadn't much luck with >> fiddling with the mode-iterator stuff myself. > > It is not _that_ trivial change, since we have ix86_expand_fp_vcond > and ix86_expand_int_vcond to merge. ATM, FP version deals with FP > operands and vice versa. We have to merge them somehow and split out > comparison part that handles FP as well as integer operands. Yeah, I tried to keep it split in fp and int compare variants but allow the result mode and the 2nd and 3rd operand modes to vary differently but failed to build a mode iterator that would do this... OTOH I'm not sure how merging fp and int compare modes would simplify things here. > I also don't know why vcond is not allowed to FAIL... probably > middle-end should be enhanced for a fallback if some comparison isn't > supported by optab. The vectorizer currently uses the optab presence to verify if the target supports a given vec-cond-expr. I'm not sure if we can make its test more finegrained easily. Richard. > > Uros. >
On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>> Richard >>>>>>>>> >>>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>>>> the several important approaches that I use in the patch. >>>>>>>>> >>>>>>>>> So how does it work. >>>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>>>> v1, {-1}, {0}>. >>>>>>>>> >>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>>> >>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>>> >>>>>>>>> So we end-up with the following functionality: >>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>>>> mask != {0}. >>>>>>>>> >>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>>> which correctly expands, without creating useless masking. >>>>>>>>> >>>>>>>>> >>>>>>>>> Basically for me there are two questions: >>>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>>>> But first is it conceptually fine. >>>>>>>>> >>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>>> complicated. >>>>>>>> >>>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>>> separates comparison and ?: expression it's his own fault. >>>>>>> >>>>>>> Well, because all the information is there, and I perfectly envision >>>>>>> the case when user expressed comparison separately, just to avoid code >>>>>>> duplication. >>>>>>> >>>>>>> Like: >>>>>>> mask = a > b; >>>>>>> res1 = mask ? v0 : v1; >>>>>>> res2 = mask ? v2 : v3; >>>>>>> >>>>>>> Which in this case would be different from >>>>>>> res1 = a > b ? v0 : v1; >>>>>>> res2 = a > b ? v2 : v3; >>>>>>> >>>>>>>> Btw, the new hook is still in the patch. >>>>>>>> >>>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>>> non-comparison operands during expansion though, but if >>>>>>>> you always forced a != 0 from the start you can then simply >>>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>>> modify the backends to have an alternate pattern or directly >>>>>>>> expand to masking operations - using the fake comparison >>>>>>>> RTX is too much of a hack). >>>>>>> >>>>>>> Richard, I think you didn't get the problem. >>>>>>> I really need the way, to pass the information, that the expression >>>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>>> better way to do that. I could for example introduce another >>>>>>> tree-node, but it would be overkill as well. >>>>>>> >>>>>>> Now why do I need it so much: >>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>>> tree-vect-generic. >>>>>>> >>>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>>> mask. >>>>>>> >>>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>>> function-comparison, or somehow else? >>>>>>> >>>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>>> >>>>>> Well, there is no problem in having the only valid mask operand for >>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>>> vec1 : vec2. >>>>> >>>>> This happens already in the new version of patch (not submitted yet). >>>>> >>>>>> This comparison can be eliminated by optimization passes >>>>>> that >>>>>> either replace it by the real comparison computing the mask or just >>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>>>> dropping the comparison against zero. >>>>> >>>>> This is not a problem, because the backend recognizes these patterns, >>>>> so no optimization is needed in this part. >>>> >>>> I mean for >>>> >>>> mask = v1 < v2 ? {-1,...} : {0,...}; >>>> val = VCOND_EXPR <mask != 0, v3, v4>; >>>> >>>> optimizers can see how mask is defined and drop the != 0 test or replace >>>> it by v1 < v2. >>> >>> Yes, sure. >>> >>>>>> For the backends I'd have vcond patterns for both an embedded comparison >>>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>>> selection - what matters is the C frontend semantics which we need to >>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>>> do not have to agree). >>>>> >>>>> But it seems like another combinatorial explosion here. Considering >>>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>>> we just need to enumerate all the possible cases. Or I didn't >>>>> understand right? >>>> >>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>>> semantically does for a non-comparison operand. I'd argue that using >>>> the bitwise selection semantic gives us maximum flexibility and a native >>>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>>> also easy to implement in the middle-end expansion code if there is >>>> no native instruction for it - by simply emitting the bitwise operations. >>>> >>>> But I have the feeling we are talking past each other ...? >>> >>> I am all for the bitwise behaviour in the backend pattern, that is >>> something that I rely on at the moment. What I don't want to have is >>> the same behaviour in the frontend. So If we can guarantee, that we >>> add != 0, when we don't know the "nature" of the mask, then I am >>> perfectly fine with the back-end having bitwise-selection behaviour. >> >> Well, the C frontend would simply always add that != 0 (because it >> doesn't know). >> >>>>> I mean, I don't mind of course, but it seems to me that it would be >>>>> cleaner to have one generic enough pattern. >>>>> >>>>> Is there seriously no way to pass something from optab into the backend?? >>>> >>>> You can pass operands. And information is implicitly encoded in the name. >>> >>> I didn't quite get that, could you give an example? >> >> It was a larger variant of "no, apart from what is obvious". > > Ha, joking again. :) > >>>>>> If the mask is computed by a function you are of course out of luck, >>>>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>>>> >>>>> Well, take simpler example >>>>> >>>>> a = {0}; >>>>> for ( ; *p; p += 16) >>>>> a &= pattern > (vec)*p; >>>>> >>>>> res = a ? v0 : v1; >>>>> >>>>> In this case it is simple to analyse that a is a comparison, but you >>>>> cannot embed the operations of a into VEC_COND_EXPR. >>>> >>>> Sure, but if the above is C source the frontend would generate >>>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>>> vector contents though). >>> >>> Yeah, sure. My point is, that we must be able to pass this information >>> in the backend, that we checked everything, and we are sure that a is >>> a corerct mask, please don't add any != 0 to it. >> >> But all masks are correct as soon as they appear in a VEC_COND_EXPR. >> That's the whole point of the bitwise semantics. It's only the C frontend >> that needs to be careful to impose its stricter semantics. >> >> Richard. >> > > Ok, I see the last difference in the approaches we envision. > I am assuming, that frontend does not put != 0, but the later > optimisations (veclower in my case) check every mask in VEC_COND_EXPR > and does the same functionality as you describe. So the philosophical > question why it is better to first add and then remove, rather than > just add if needed? Well, it's "better be right than sorry". Thus, default to the conservatively correct way and let optimizers "optimize" it. > In all the rest I think we agreed. Fine. Thanks, Richard. > > Artem. >
On Mon, Aug 22, 2011 at 8:50 PM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Mon, Aug 22, 2011 at 5:34 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: > >>> In this case it is simple to analyse that a is a comparison, but you >>> cannot embed the operations of a into VEC_COND_EXPR. >> >> Sure, but if the above is C source the frontend would generate >> res = a != 0 ? v0 : v1; initially. An optimization pass could still >> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >> vector contents though). >> >>> Ok, I am testing the patch that removes hooks. Could you push a little >>> bit the backend-patterns business? >> >> Well, I suppose we're waiting for Uros here. I hadn't much luck with >> fiddling with the mode-iterator stuff myself. > > It is not _that_ trivial change, since we have ix86_expand_fp_vcond > and ix86_expand_int_vcond to merge. ATM, FP version deals with FP > operands and vice versa. We have to merge them somehow and split out > comparison part that handles FP as well as integer operands. > > I also don't know why vcond is not allowed to FAIL... probably > middle-end should be enhanced for a fallback if some comparison isn't > supported by optab. > > Uros. > Uros My biggest problem in the backend is to create a correct description in *.md, which would accept the generic case. I can imagine adding all the cases, but as I mentioned already it explodes. I mean, we will have to do it for SSE, then the same number of cases for AVX, and so on. I would assume that there is a chance to persuade md, that the only thing that matters is the size of the element type of mask and operands. If you can help me with the pattern, I am happy to merge x86 code. Thanks, Artem.
On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov > <artyom.shinkaroff@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov >>> <artyom.shinkaroff@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >>>> <richard.guenther@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>>>> <richard.guenther@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>>> Richard >>>>>>>>>> >>>>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>>>>> the several important approaches that I use in the patch. >>>>>>>>>> >>>>>>>>>> So how does it work. >>>>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>>>>> v1, {-1}, {0}>. >>>>>>>>>> >>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>>>> >>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>>>> >>>>>>>>>> So we end-up with the following functionality: >>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>>>>> mask != {0}. >>>>>>>>>> >>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>>>> which correctly expands, without creating useless masking. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Basically for me there are two questions: >>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>>>>> But first is it conceptually fine. >>>>>>>>>> >>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>>>> complicated. >>>>>>>>> >>>>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>>>> separates comparison and ?: expression it's his own fault. >>>>>>>> >>>>>>>> Well, because all the information is there, and I perfectly envision >>>>>>>> the case when user expressed comparison separately, just to avoid code >>>>>>>> duplication. >>>>>>>> >>>>>>>> Like: >>>>>>>> mask = a > b; >>>>>>>> res1 = mask ? v0 : v1; >>>>>>>> res2 = mask ? v2 : v3; >>>>>>>> >>>>>>>> Which in this case would be different from >>>>>>>> res1 = a > b ? v0 : v1; >>>>>>>> res2 = a > b ? v2 : v3; >>>>>>>> >>>>>>>>> Btw, the new hook is still in the patch. >>>>>>>>> >>>>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>>>> non-comparison operands during expansion though, but if >>>>>>>>> you always forced a != 0 from the start you can then simply >>>>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>>>> modify the backends to have an alternate pattern or directly >>>>>>>>> expand to masking operations - using the fake comparison >>>>>>>>> RTX is too much of a hack). >>>>>>>> >>>>>>>> Richard, I think you didn't get the problem. >>>>>>>> I really need the way, to pass the information, that the expression >>>>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>>>> better way to do that. I could for example introduce another >>>>>>>> tree-node, but it would be overkill as well. >>>>>>>> >>>>>>>> Now why do I need it so much: >>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>>>> tree-vect-generic. >>>>>>>> >>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>>>> mask. >>>>>>>> >>>>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>>>> function-comparison, or somehow else? >>>>>>>> >>>>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>>>> >>>>>>> Well, there is no problem in having the only valid mask operand for >>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>>>> vec1 : vec2. >>>>>> >>>>>> This happens already in the new version of patch (not submitted yet). >>>>>> >>>>>>> This comparison can be eliminated by optimization passes >>>>>>> that >>>>>>> either replace it by the real comparison computing the mask or just >>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>>>>> dropping the comparison against zero. >>>>>> >>>>>> This is not a problem, because the backend recognizes these patterns, >>>>>> so no optimization is needed in this part. >>>>> >>>>> I mean for >>>>> >>>>> mask = v1 < v2 ? {-1,...} : {0,...}; >>>>> val = VCOND_EXPR <mask != 0, v3, v4>; >>>>> >>>>> optimizers can see how mask is defined and drop the != 0 test or replace >>>>> it by v1 < v2. >>>> >>>> Yes, sure. >>>> >>>>>>> For the backends I'd have vcond patterns for both an embedded comparison >>>>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>>>> selection - what matters is the C frontend semantics which we need to >>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>>>> do not have to agree). >>>>>> >>>>>> But it seems like another combinatorial explosion here. Considering >>>>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>>>> we just need to enumerate all the possible cases. Or I didn't >>>>>> understand right? >>>>> >>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>>>> semantically does for a non-comparison operand. I'd argue that using >>>>> the bitwise selection semantic gives us maximum flexibility and a native >>>>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>>>> also easy to implement in the middle-end expansion code if there is >>>>> no native instruction for it - by simply emitting the bitwise operations. >>>>> >>>>> But I have the feeling we are talking past each other ...? >>>> >>>> I am all for the bitwise behaviour in the backend pattern, that is >>>> something that I rely on at the moment. What I don't want to have is >>>> the same behaviour in the frontend. So If we can guarantee, that we >>>> add != 0, when we don't know the "nature" of the mask, then I am >>>> perfectly fine with the back-end having bitwise-selection behaviour. >>> >>> Well, the C frontend would simply always add that != 0 (because it >>> doesn't know). >>> >>>>>> I mean, I don't mind of course, but it seems to me that it would be >>>>>> cleaner to have one generic enough pattern. >>>>>> >>>>>> Is there seriously no way to pass something from optab into the backend?? >>>>> >>>>> You can pass operands. And information is implicitly encoded in the name. >>>> >>>> I didn't quite get that, could you give an example? >>> >>> It was a larger variant of "no, apart from what is obvious". >> >> Ha, joking again. :) >> >>>>>>> If the mask is computed by a function you are of course out of luck, >>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>>>>> >>>>>> Well, take simpler example >>>>>> >>>>>> a = {0}; >>>>>> for ( ; *p; p += 16) >>>>>> a &= pattern > (vec)*p; >>>>>> >>>>>> res = a ? v0 : v1; >>>>>> >>>>>> In this case it is simple to analyse that a is a comparison, but you >>>>>> cannot embed the operations of a into VEC_COND_EXPR. >>>>> >>>>> Sure, but if the above is C source the frontend would generate >>>>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>>>> vector contents though). >>>> >>>> Yeah, sure. My point is, that we must be able to pass this information >>>> in the backend, that we checked everything, and we are sure that a is >>>> a corerct mask, please don't add any != 0 to it. >>> >>> But all masks are correct as soon as they appear in a VEC_COND_EXPR. >>> That's the whole point of the bitwise semantics. It's only the C frontend >>> that needs to be careful to impose its stricter semantics. >>> >>> Richard. >>> >> >> Ok, I see the last difference in the approaches we envision. >> I am assuming, that frontend does not put != 0, but the later >> optimisations (veclower in my case) check every mask in VEC_COND_EXPR >> and does the same functionality as you describe. So the philosophical >> question why it is better to first add and then remove, rather than >> just add if needed? > > Well, it's "better be right than sorry". Thus, default to the > conservatively correct > way and let optimizers "optimize" it. How can we get sorry, it is impossible to skip the vcond during the optimisation, but whatever, it is not really so important when to add. Currently I have a bigger problem, see below. > >> In all the rest I think we agreed. > > Fine. > > Thanks, > Richard. > >> >> Artem. >> > I found out that I cannot really gimplify correctly the vcond<a >b , c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is that gimplifier pulls a > b always as a separate expression during the gimplification, and I don't think that we can avoid it. So what happens is: vcond <a > b , c , d> transformed to x = b > c; x1 = vcond <x , -1, 0> vcond <x1, c, d> and so on, infinitely long. In order to fix the problem we need whether to introduce a new code like VEC_COMP_LT, VEC_COMP_GT, and so on whether a builtin function which we would lower whether stick back to the idea of hook. Anyway, representing a >b using vcond does not work. What would be your thinking here? Thanks, Artem.
On Mon, Aug 22, 2011 at 10:49 PM, Artem Shinkarov <artyom.shinkaroff@gmail.com> wrote: > On Mon, Aug 22, 2011 at 9:42 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: >> On Mon, Aug 22, 2011 at 5:58 PM, Artem Shinkarov >> <artyom.shinkaroff@gmail.com> wrote: >>> On Mon, Aug 22, 2011 at 4:50 PM, Richard Guenther >>> <richard.guenther@gmail.com> wrote: >>>> On Mon, Aug 22, 2011 at 5:43 PM, Artem Shinkarov >>>> <artyom.shinkaroff@gmail.com> wrote: >>>>> On Mon, Aug 22, 2011 at 4:34 PM, Richard Guenther >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Mon, Aug 22, 2011 at 5:21 PM, Artem Shinkarov >>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>> On Mon, Aug 22, 2011 at 4:01 PM, Richard Guenther >>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>> On Mon, Aug 22, 2011 at 2:05 PM, Artem Shinkarov >>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>> On Mon, Aug 22, 2011 at 12:25 PM, Richard Guenther >>>>>>>>> <richard.guenther@gmail.com> wrote: >>>>>>>>>> On Mon, Aug 22, 2011 at 12:53 AM, Artem Shinkarov >>>>>>>>>> <artyom.shinkaroff@gmail.com> wrote: >>>>>>>>>>> Richard >>>>>>>>>>> >>>>>>>>>>> I formalized an approach a little-bit, now it works without target >>>>>>>>>>> hooks, but some polishing is still required. I want you to comment on >>>>>>>>>>> the several important approaches that I use in the patch. >>>>>>>>>>> >>>>>>>>>>> So how does it work. >>>>>>>>>>> 1) All the vector comparisons at the level of type-checker are >>>>>>>>>>> introduced using VEC_COND_EXPR with constant selection operands being >>>>>>>>>>> {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 >>>>>>>>>>>> v1, {-1}, {0}>. >>>>>>>>>>> >>>>>>>>>>> 2) When optabs expand VEC_COND_EXPR, two cases are considered: >>>>>>>>>>> 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. >>>>>>>>>>> 2.b) first operand is something else, in that case, we specially mark >>>>>>>>>>> this case, recognize it in the backend, and do not create a >>>>>>>>>>> comparison, but use the mask as it was a result of some comparison. >>>>>>>>>>> >>>>>>>>>>> 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a >>>>>>>>>>> vector comparison we use is_vector_comparison function, if it returns >>>>>>>>>>> false, then we replace mask with mask != {0}. >>>>>>>>>>> >>>>>>>>>>> So we end-up with the following functionality: >>>>>>>>>>> VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of >>>>>>>>>>> comparison of two vectors, we leave it as it is, otherwise change with >>>>>>>>>>> mask != {0}. >>>>>>>>>>> >>>>>>>>>>> Plain vector comparison a <op> b is represented with VEC_COND_EXPR, >>>>>>>>>>> which correctly expands, without creating useless masking. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Basically for me there are two questions: >>>>>>>>>>> 1) Can we perform information passing in optabs in a nicer way? >>>>>>>>>>> 2) How is_vector_comparison could be improved? I have several ideas, >>>>>>>>>>> like checking if constant vector all consists of 0 and -1, and so on. >>>>>>>>>>> But first is it conceptually fine. >>>>>>>>>>> >>>>>>>>>>> P.S. I tired to put the functionality of is_vector_comparison in >>>>>>>>>>> tree-ssa-forwprop, but the thing is that it is called only with -On, >>>>>>>>>>> which I find inappropriate, and the functionality gets more >>>>>>>>>>> complicated. >>>>>>>>>> >>>>>>>>>> Why is it inappropriate to not optimize it at -O0? If the user >>>>>>>>>> separates comparison and ?: expression it's his own fault. >>>>>>>>> >>>>>>>>> Well, because all the information is there, and I perfectly envision >>>>>>>>> the case when user expressed comparison separately, just to avoid code >>>>>>>>> duplication. >>>>>>>>> >>>>>>>>> Like: >>>>>>>>> mask = a > b; >>>>>>>>> res1 = mask ? v0 : v1; >>>>>>>>> res2 = mask ? v2 : v3; >>>>>>>>> >>>>>>>>> Which in this case would be different from >>>>>>>>> res1 = a > b ? v0 : v1; >>>>>>>>> res2 = a > b ? v2 : v3; >>>>>>>>> >>>>>>>>>> Btw, the new hook is still in the patch. >>>>>>>>>> >>>>>>>>>> I would simply always create != 0 if it isn't and let optimizers >>>>>>>>>> (tree-ssa-forwprop.c) optimize this. You still have to deal with >>>>>>>>>> non-comparison operands during expansion though, but if >>>>>>>>>> you always forced a != 0 from the start you can then simply >>>>>>>>>> interpret it as a proper comparison result (in which case I'd >>>>>>>>>> modify the backends to have an alternate pattern or directly >>>>>>>>>> expand to masking operations - using the fake comparison >>>>>>>>>> RTX is too much of a hack). >>>>>>>>> >>>>>>>>> Richard, I think you didn't get the problem. >>>>>>>>> I really need the way, to pass the information, that the expression >>>>>>>>> that is in the first operand of vcond is an appropriate mask. I though >>>>>>>>> for quite a while and this hack is the only answer I found, is there a >>>>>>>>> better way to do that. I could for example introduce another >>>>>>>>> tree-node, but it would be overkill as well. >>>>>>>>> >>>>>>>>> Now why do I need it so much: >>>>>>>>> I want to implement the comparison in a way that {1, 5, 0, -1} is >>>>>>>>> actually {-1,-1,-1,-1}. So whenever I am not sure that mask of >>>>>>>>> VEC_COND_EXPR is a real comparison I transform it to mask != {0} (not >>>>>>>>> always). To check the stuff, I use is_vector_comparison in >>>>>>>>> tree-vect-generic. >>>>>>>>> >>>>>>>>> So I really have the difference between mask ? x : y and mask != {0} ? >>>>>>>>> x : y, otherwise I could treat mask != {0} in the backend as just >>>>>>>>> mask. >>>>>>>>> >>>>>>>>> If this link between optabs and backend breaks, then the patch falls >>>>>>>>> apart. Because every time the comparison is taken out VEC_COND_EXPR, I >>>>>>>>> will have to put != {0}. Keep in mind, that I cannot always put the >>>>>>>>> comparison inside the VEC_COND_EXPR, what if it is defined in a >>>>>>>>> function-comparison, or somehow else? >>>>>>>>> >>>>>>>>> So what would be an appropriate way to connect optabs and the backend? >>>>>>>> >>>>>>>> Well, there is no problem in having the only valid mask operand for >>>>>>>> VEC_COND_EXPRs being either a comparison or a {-1,...} / {0,....} mask. >>>>>>>> Just the C parser has to transform mask ? vec1 : vec2 to mask != 0 ? >>>>>>>> vec1 : vec2. >>>>>>> >>>>>>> This happens already in the new version of patch (not submitted yet). >>>>>>> >>>>>>>> This comparison can be eliminated by optimization passes >>>>>>>> that >>>>>>>> either replace it by the real comparison computing the mask or just >>>>>>>> propagating the information this mask is already {-1,...} / {0,....} by simply >>>>>>>> dropping the comparison against zero. >>>>>>> >>>>>>> This is not a problem, because the backend recognizes these patterns, >>>>>>> so no optimization is needed in this part. >>>>>> >>>>>> I mean for >>>>>> >>>>>> mask = v1 < v2 ? {-1,...} : {0,...}; >>>>>> val = VCOND_EXPR <mask != 0, v3, v4>; >>>>>> >>>>>> optimizers can see how mask is defined and drop the != 0 test or replace >>>>>> it by v1 < v2. >>>>> >>>>> Yes, sure. >>>>> >>>>>>>> For the backends I'd have vcond patterns for both an embedded comparison >>>>>>>> and for a mask. (Now we can rewind the discussion a bit and allow >>>>>>>> arbitrary masks and define a vcond with a mask operand to do bitwise >>>>>>>> selection - what matters is the C frontend semantics which we need to >>>>>>>> translate to what the middle-end thinks of a VEC_COND_EXPR, they >>>>>>>> do not have to agree). >>>>>>> >>>>>>> But it seems like another combinatorial explosion here. Considering >>>>>>> what Richard said in his e-mail, in order to support "generic" vcond, >>>>>>> we just need to enumerate all the possible cases. Or I didn't >>>>>>> understand right? >>>>>> >>>>>> Well, the question is still what VCOND_EXPR and thus the vcond pattern >>>>>> semantically does for a non-comparison operand. I'd argue that using >>>>>> the bitwise selection semantic gives us maximum flexibility and a native >>>>>> instruction with AMD XOP. This non-comparison VCOND_EXPR is >>>>>> also easy to implement in the middle-end expansion code if there is >>>>>> no native instruction for it - by simply emitting the bitwise operations. >>>>>> >>>>>> But I have the feeling we are talking past each other ...? >>>>> >>>>> I am all for the bitwise behaviour in the backend pattern, that is >>>>> something that I rely on at the moment. What I don't want to have is >>>>> the same behaviour in the frontend. So If we can guarantee, that we >>>>> add != 0, when we don't know the "nature" of the mask, then I am >>>>> perfectly fine with the back-end having bitwise-selection behaviour. >>>> >>>> Well, the C frontend would simply always add that != 0 (because it >>>> doesn't know). >>>> >>>>>>> I mean, I don't mind of course, but it seems to me that it would be >>>>>>> cleaner to have one generic enough pattern. >>>>>>> >>>>>>> Is there seriously no way to pass something from optab into the backend?? >>>>>> >>>>>> You can pass operands. And information is implicitly encoded in the name. >>>>> >>>>> I didn't quite get that, could you give an example? >>>> >>>> It was a larger variant of "no, apart from what is obvious". >>> >>> Ha, joking again. :) >>> >>>>>>>> If the mask is computed by a function you are of course out of luck, >>>>>>>> but I don't see how you'd manage to infer knowledge from nowhere either. >>>>>>> >>>>>>> Well, take simpler example >>>>>>> >>>>>>> a = {0}; >>>>>>> for ( ; *p; p += 16) >>>>>>> a &= pattern > (vec)*p; >>>>>>> >>>>>>> res = a ? v0 : v1; >>>>>>> >>>>>>> In this case it is simple to analyse that a is a comparison, but you >>>>>>> cannot embed the operations of a into VEC_COND_EXPR. >>>>>> >>>>>> Sure, but if the above is C source the frontend would generate >>>>>> res = a != 0 ? v0 : v1; initially. An optimization pass could still >>>>>> track this information and replace VEC_COND_EXPR <a != 0, v0, v1> >>>>>> with VEC_COND_EXPR <a, v0, v1> (no existing one would track >>>>>> vector contents though). >>>>> >>>>> Yeah, sure. My point is, that we must be able to pass this information >>>>> in the backend, that we checked everything, and we are sure that a is >>>>> a corerct mask, please don't add any != 0 to it. >>>> >>>> But all masks are correct as soon as they appear in a VEC_COND_EXPR. >>>> That's the whole point of the bitwise semantics. It's only the C frontend >>>> that needs to be careful to impose its stricter semantics. >>>> >>>> Richard. >>>> >>> >>> Ok, I see the last difference in the approaches we envision. >>> I am assuming, that frontend does not put != 0, but the later >>> optimisations (veclower in my case) check every mask in VEC_COND_EXPR >>> and does the same functionality as you describe. So the philosophical >>> question why it is better to first add and then remove, rather than >>> just add if needed? >> >> Well, it's "better be right than sorry". Thus, default to the >> conservatively correct >> way and let optimizers "optimize" it. > > How can we get sorry, it is impossible to skip the vcond during the > optimisation, but whatever, it is not really so important when to add. > Currently I have a bigger problem, see below. >> >>> In all the rest I think we agreed. >> >> Fine. >> >> Thanks, >> Richard. >> >>> >>> Artem. >>> >> > > I found out that I cannot really gimplify correctly the vcond<a >b , > c, d> expression when a > b is vcond<a > b, -1, 0>. The problem is > that gimplifier pulls a > b always as a separate expression during the > gimplification, and I don't think that we can avoid it. So what > happens is: > > vcond <a > b , c , d> > transformed to > x = b > c; > x1 = vcond <x , -1, 0> > vcond <x1, c, d> > > and so on, infinitely long. Sounds like a bug that is possible to fix. > In order to fix the problem we need whether to introduce a new code > like VEC_COMP_LT, VEC_COMP_GT, and so on > whether a builtin function which we would lower > whether stick back to the idea of hook. > > Anyway, representing a >b using vcond does not work. Well, sure it will work, it just needs some work appearantly. > What would be your thinking here? Do you have a patch that exposes this problem? I can have a look tomorrow. Richard. > > Thanks, > Artem. >
Index: gcc/doc/tm.texi =================================================================== --- gcc/doc/tm.texi (revision 177665) +++ gcc/doc/tm.texi (working copy) @@ -5738,6 +5738,10 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn +@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_COMPARE (gimple_stmt_iterator *@var{gsi}, tree @var{type}, tree @var{v0}, tree @var{v1}, enum tree_code @var{code}) +This hook should check whether it is possible to express vectorcomparison using the hardware-specific instructions and return resulttree. Hook should return NULL_TREE if expansion is impossible. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VEC_PERM (tree @var{type}, tree *@var{mask_element_type}) Target builtin that implements vector permute. @end deftypefn Index: gcc/doc/tm.texi.in =================================================================== --- gcc/doc/tm.texi.in (revision 177665) +++ gcc/doc/tm.texi.in (working copy) @@ -5676,6 +5676,8 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn +@hook TARGET_VECTORIZE_BUILTIN_VEC_COMPARE + @hook TARGET_VECTORIZE_BUILTIN_VEC_PERM Target builtin that implements vector permute. @end deftypefn Index: gcc/targhooks.c =================================================================== --- gcc/targhooks.c (revision 177665) +++ gcc/targhooks.c (working copy) @@ -969,6 +969,18 @@ default_builtin_vector_alignment_reachab return true; } +/* Replaces vector comparison with the target-specific instructions + and returns the resulting variable or NULL_TREE otherwise. */ +tree +default_builtin_vec_compare (gimple_stmt_iterator *gsi ATTRIBUTE_UNUSED, + tree type ATTRIBUTE_UNUSED, + tree v0 ATTRIBUTE_UNUSED, + tree v1 ATTRIBUTE_UNUSED, + enum tree_code code ATTRIBUTE_UNUSED) +{ + return NULL_TREE; +} + /* By default, assume that a target supports any factor of misalignment memory access if it supports movmisalign patten. is_packed is true if the memory access is defined in a packed struct. */ Index: gcc/targhooks.h =================================================================== --- gcc/targhooks.h (revision 177665) +++ gcc/targhooks.h (working copy) @@ -86,6 +86,11 @@ extern int default_builtin_vectorization extern tree default_builtin_reciprocal (unsigned int, bool, bool); extern bool default_builtin_vector_alignment_reachable (const_tree, bool); + +extern tree default_builtin_vec_compare (gimple_stmt_iterator *gsi, + tree type, tree v0, tree v1, + enum tree_code code); + extern bool default_builtin_support_vector_misalignment (enum machine_mode mode, const_tree, Index: gcc/target.def =================================================================== --- gcc/target.def (revision 177665) +++ gcc/target.def (working copy) @@ -988,6 +988,15 @@ DEFHOOK bool, (tree vec_type, tree mask), hook_bool_tree_tree_true) +/* Implement hardware vector comparison or return false. */ +DEFHOOK +(builtin_vec_compare, + "This hook should check whether it is possible to express vector\ +comparison using the hardware-specific instructions and return result\ +tree. Hook should return NULL_TREE if expansion is impossible.", + tree, (gimple_stmt_iterator *gsi, tree type, tree v0, tree v1, enum tree_code code), + default_builtin_vec_compare) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ Index: gcc/optabs.c =================================================================== --- gcc/optabs.c (revision 177665) +++ gcc/optabs.c (working copy) @@ -6572,16 +6572,37 @@ expand_vec_cond_expr (tree vec_cond_type if (icode == CODE_FOR_nothing) return 0; - comparison = vector_compare_rtx (op0, unsignedp, icode); rtx_op1 = expand_normal (op1); rtx_op2 = expand_normal (op2); + + if (COMPARISON_CLASS_P (op0)) + { + comparison = vector_compare_rtx (op0, unsignedp, icode); + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], rtx_op1, mode); + create_input_operand (&ops[2], rtx_op2, mode); + create_fixed_operand (&ops[3], comparison); + create_fixed_operand (&ops[4], XEXP (comparison, 0)); + create_fixed_operand (&ops[5], XEXP (comparison, 1)); + + } + else + { + rtx rtx_op0; + rtx vec; + + rtx_op0 = expand_normal (op0); + comparison = gen_rtx_NE (mode, NULL_RTX, NULL_RTX); + vec = CONST0_RTX (mode); + + create_output_operand (&ops[0], target, mode); + create_input_operand (&ops[1], rtx_op1, mode); + create_input_operand (&ops[2], rtx_op2, mode); + create_input_operand (&ops[3], comparison, mode); + create_input_operand (&ops[4], rtx_op0, mode); + create_input_operand (&ops[5], vec, mode); + } - create_output_operand (&ops[0], target, mode); - create_input_operand (&ops[1], rtx_op1, mode); - create_input_operand (&ops[2], rtx_op2, mode); - create_fixed_operand (&ops[3], comparison); - create_fixed_operand (&ops[4], XEXP (comparison, 0)); - create_fixed_operand (&ops[5], XEXP (comparison, 1)); expand_insn (icode, 6, ops); return ops[0].value; } Index: gcc/target.h =================================================================== --- gcc/target.h (revision 177665) +++ gcc/target.h (working copy) @@ -51,6 +51,7 @@ #define GCC_TARGET_H #include "insn-modes.h" +#include "gimple.h" #ifdef ENABLE_CHECKING Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c (revision 177665) +++ gcc/fold-const.c (working copy) @@ -5930,12 +5930,21 @@ extract_muldiv_1 (tree t, tree c, enum t } /* Return a node which has the indicated constant VALUE (either 0 or - 1), and is of the indicated TYPE. */ + 1 for scalars and is either {-1,-1,..} or {0,0,...} for vectors), + and is of the indicated TYPE. */ tree constant_boolean_node (int value, tree type) { - if (type == integer_type_node) + if (TREE_CODE (type) == VECTOR_TYPE) + { + tree tval; + + gcc_assert (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE); + tval = build_int_cst (TREE_TYPE (type), value); + return build_vector_from_val (type, tval); + } + else if (type == integer_type_node) return value ? integer_one_node : integer_zero_node; else if (type == boolean_type_node) return value ? boolean_true_node : boolean_false_node; @@ -9073,26 +9082,29 @@ fold_comparison (location_t loc, enum tr floating-point, we can only do some of these simplifications.) */ if (operand_equal_p (arg0, arg1, 0)) { + int true_val = TREE_CODE (type) == VECTOR_TYPE ? -1 : 0; + tree arg0_type = TREE_TYPE (arg0); + switch (code) { case EQ_EXPR: - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) - return constant_boolean_node (1, type); + if (! FLOAT_TYPE_P (arg0_type) + || ! HONOR_NANS (TYPE_MODE (arg0_type))) + return constant_boolean_node (true_val, type); break; case GE_EXPR: case LE_EXPR: - if (! FLOAT_TYPE_P (TREE_TYPE (arg0)) - || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) - return constant_boolean_node (1, type); + if (! FLOAT_TYPE_P (arg0_type) + || ! HONOR_NANS (TYPE_MODE (arg0_type))) + return constant_boolean_node (true_val, type); return fold_build2_loc (loc, EQ_EXPR, type, arg0, arg1); case NE_EXPR: /* For NE, we can only do this simplification if integer or we don't honor IEEE floating point NaNs. */ - if (FLOAT_TYPE_P (TREE_TYPE (arg0)) - && HONOR_NANS (TYPE_MODE (TREE_TYPE (arg0)))) + if (FLOAT_TYPE_P (arg0_type) + && HONOR_NANS (TYPE_MODE (arg0_type))) break; /* ... fall through ... */ case GT_EXPR: Index: gcc/c-typeck.c =================================================================== --- gcc/c-typeck.c (revision 177665) +++ gcc/c-typeck.c (working copy) @@ -4058,6 +4058,94 @@ build_conditional_expr (location_t colon type2 = TREE_TYPE (op2); code2 = TREE_CODE (type2); + if (TREE_CODE (TREE_TYPE (ifexp)) == VECTOR_TYPE) + { + bool maybe_const = true; + tree sc; + + if (TREE_CODE (type1) != VECTOR_TYPE + || TREE_CODE (type2) != VECTOR_TYPE) + { + error_at (colon_loc, "vector comparison arguments must be of " + "type vector"); + return error_mark_node; + } + + if (TREE_CODE (TREE_TYPE (TREE_TYPE (ifexp))) != INTEGER_TYPE) + { + error_at (colon_loc, "non-integer type in vector condition"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type1) != TYPE_VECTOR_SUBPARTS (type2) + || TYPE_VECTOR_SUBPARTS (TREE_TYPE (ifexp)) + != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (colon_loc, "vectors of different length found in " + "vector comparison"); + return error_mark_node; + } + + if (TREE_TYPE (type1) != TREE_TYPE (type2)) + { + error_at (colon_loc, "vectors of different types involved in " + "vector comparison"); + return error_mark_node; + } + + if (TYPE_SIZE (TREE_TYPE (TREE_TYPE (ifexp))) + != TYPE_SIZE (TREE_TYPE (type1))) + { + error_at (colon_loc, "vector-condition element type must be " + "the same as result vector element type"); + return error_mark_node; + } + + /* Avoid C_MAYBE_CONST in VEC_COND_EXPR. */ + sc = c_fully_fold (ifexp, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + ifexp = c_wrap_maybe_const (sc, true); + else + ifexp = sc; + + sc = c_fully_fold (op1, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + op1 = c_wrap_maybe_const (sc, true); + else + op1 = sc; + + sc = c_fully_fold (op2, false, &maybe_const); + sc = save_expr (sc); + if (!maybe_const) + op2 = c_wrap_maybe_const (sc, true); + else + op2 = sc; + + /* Currently the expansion of VEC_COND_EXPR does not allow + expessions where the type of vectors you compare differs + form the type of vectors you select from. For the time + being we insert implicit conversions. */ + if ((COMPARISON_CLASS_P (ifexp) + && TREE_TYPE (TREE_OPERAND (ifexp, 0)) != type1) + || TREE_TYPE (ifexp) != type1) + { + tree comp_type = COMPARISON_CLASS_P (ifexp) + ? TREE_TYPE (TREE_OPERAND (ifexp, 0)) + : TREE_TYPE (ifexp); + tree vcond; + + op1 = convert (comp_type, op1); + op2 = convert (comp_type, op2); + vcond = build3 (VEC_COND_EXPR, comp_type, ifexp, op1, op2); + vcond = convert (type1, vcond); + return vcond; + } + else + return build3 (VEC_COND_EXPR, type1, ifexp, op1, op2); + } + /* C90 does not permit non-lvalue arrays in conditional expressions. In C99 they will be pointers by now. */ if (code1 == ARRAY_TYPE || code2 == ARRAY_TYPE) @@ -9906,6 +9995,37 @@ build_binary_op (location_t location, en case EQ_EXPR: case NE_EXPR: + if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) + { + tree intt; + if (TREE_TYPE (type0) != TREE_TYPE (type1)) + { + error_at (location, "comparing vectors with different " + "element types"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (location, "comparing vectors with different " + "number of elements"); + return error_mark_node; + } + + /* Always construct signed integer vector type. */ + intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0); + result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + converted = 1; + /*break; */ + + ret = build3 (VEC_COND_EXPR, result_type, + build2 (code, result_type, op0, op1), + build_vector_from_val (result_type, + build_int_cst (intt, -1)), + build_vector_from_val (result_type, + build_int_cst (intt, 0))); + goto return_build_binary_op; + } if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1)) warning_at (location, OPT_Wfloat_equal, @@ -10018,6 +10138,37 @@ build_binary_op (location_t location, en case GE_EXPR: case LT_EXPR: case GT_EXPR: + if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE) + { + tree intt; + if (TREE_TYPE (type0) != TREE_TYPE (type1)) + { + error_at (location, "comparing vectors with different " + "element types"); + return error_mark_node; + } + + if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1)) + { + error_at (location, "comparing vectors with different " + "number of elements"); + return error_mark_node; + } + + /* Always construct signed integer vector type. */ + intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)),0); + result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + converted = 1; + /* break; */ + ret = build3 (VEC_COND_EXPR, result_type, + build2 (code, result_type, op0, op1), + build_vector_from_val (result_type, + build_int_cst (intt, -1)), + build_vector_from_val (result_type, + build_int_cst (intt, 0))); + goto return_build_binary_op; + + } build_type = integer_type_node; if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE || code0 == FIXED_POINT_TYPE) @@ -10425,6 +10576,10 @@ c_objc_common_truthvalue_conversion (loc case FUNCTION_TYPE: gcc_unreachable (); + case VECTOR_TYPE: + error_at (location, "used vector type where scalar is required"); + return error_mark_node; + default: break; } Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c (revision 177665) +++ gcc/gimplify.c (working copy) @@ -7064,6 +7064,22 @@ gimplify_expr (tree *expr_p, gimple_seq } break; + case VEC_COND_EXPR: + { + enum gimplify_status r0, r1, r2; + + r0 = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, + post_p, is_gimple_condexpr, fb_rvalue); + r1 = gimplify_expr (&TREE_OPERAND (*expr_p, 1), pre_p, + post_p, is_gimple_val, fb_rvalue); + r2 = gimplify_expr (&TREE_OPERAND (*expr_p, 2), pre_p, + post_p, is_gimple_val, fb_rvalue); + recalculate_side_effects (*expr_p); + + ret = MIN (r0, MIN (r1, r2)); + } + break; + case TARGET_MEM_REF: { enum gimplify_status r0 = GS_ALL_DONE, r1 = GS_ALL_DONE; @@ -7348,6 +7364,11 @@ gimplify_expr (tree *expr_p, gimple_seq { tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 1)); + /* Vector comparisons is a valid gimple expression + which could be lowered down later. */ + if (TREE_CODE (type) == VECTOR_TYPE) + goto expr_2; + if (!AGGREGATE_TYPE_P (type)) { tree org_type = TREE_TYPE (*expr_p); Index: gcc/tree.def =================================================================== --- gcc/tree.def (revision 177665) +++ gcc/tree.def (working copy) @@ -704,7 +704,10 @@ DEFTREECODE (TRUTH_NOT_EXPR, "truth_not_ The others are allowed only for integer (or pointer or enumeral) or real types. In all cases the operands will have the same type, - and the value is always the type used by the language for booleans. */ + and the value is either the type used by the language for booleans + or an integer vector type of the same size and with the same number + of elements as the comparison operands. True for a vector of + comparison results has all bits set while false is equal to zero. */ DEFTREECODE (LT_EXPR, "lt_expr", tcc_comparison, 2) DEFTREECODE (LE_EXPR, "le_expr", tcc_comparison, 2) DEFTREECODE (GT_EXPR, "gt_expr", tcc_comparison, 2) Index: gcc/emit-rtl.c =================================================================== --- gcc/emit-rtl.c (revision 177665) +++ gcc/emit-rtl.c (working copy) @@ -5474,6 +5474,11 @@ gen_const_vector (enum machine_mode mode return tem; } +rtx +gen_const_vector1 (enum machine_mode mode, int constant) +{ + return gen_const_vector (mode, constant); +} /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when all elements are zero, and the one vector when all elements are one. */ rtx Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c (revision 177665) +++ gcc/tree-vect-generic.c (working copy) @@ -30,11 +30,16 @@ along with GCC; see the file COPYING3. #include "tree-pass.h" #include "flags.h" #include "ggc.h" +#include "target.h" /* Need to include rtl.h, expr.h, etc. for optabs. */ #include "expr.h" #include "optabs.h" + +static void expand_vector_operations_1 (gimple_stmt_iterator *); + + /* Build a constant of type TYPE, made of VALUE's bits replicated every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision. */ static tree @@ -125,6 +130,21 @@ do_binop (gimple_stmt_iterator *gsi, tre return gimplify_build2 (gsi, code, inner_type, a, b); } + +/* Construct expression (A[BITPOS] code B[BITPOS]) ? -1 : 0; */ +static tree +do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b, + tree bitpos, tree bitsize, enum tree_code code) +{ + tree cond; + a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos); + b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos); + cond = gimplify_build2 (gsi, code, inner_type, a, b); + return gimplify_build3 (gsi, COND_EXPR, inner_type, cond, + build_int_cst (inner_type, -1), + build_int_cst (inner_type, 0)); +} + /* Expand vector addition to scalars. This does bit twiddling in order to increase parallelism: @@ -333,6 +353,24 @@ uniform_vector_p (tree vec) return NULL_TREE; } +/* Try to expand vector comparison expression OP0 CODE OP1 using + builtin_vec_compare hardware hook, in case target does not + support comparison of type TYPE, extract comparison piecewise. + GSI is used inside the target hook to create the code needed + for the given comparison. */ +static tree +expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0, + tree op1, enum tree_code code) +{ + tree t = targetm.vectorize.builtin_vec_compare (gsi, type, op0, op1, code); + + if (t == NULL_TREE) + t = expand_vector_piecewise (gsi, do_compare, type, + TREE_TYPE (TREE_TYPE (op0)), op0, op1, code); + return t; + +} + static tree expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type, gimple assign, enum tree_code code) @@ -375,8 +413,24 @@ expand_vector_operation (gimple_stmt_ite case BIT_NOT_EXPR: return expand_vector_parallel (gsi, do_unop, type, gimple_assign_rhs1 (assign), - NULL_TREE, code); - + NULL_TREE, code); + case EQ_EXPR: + case NE_EXPR: + case GT_EXPR: + case LT_EXPR: + case GE_EXPR: + case LE_EXPR: + case UNEQ_EXPR: + case UNGT_EXPR: + case UNLT_EXPR: + case UNGE_EXPR: + case UNLE_EXPR: + case LTGT_EXPR: + case ORDERED_EXPR: + case UNORDERED_EXPR: + return expand_vector_comparison (gsi, type, + gimple_assign_rhs1 (assign), + gimple_assign_rhs2 (assign), code); default: break; } @@ -432,6 +486,122 @@ type_for_widest_vector_mode (enum machin } } + + +/* Expand vector condition EXP which should have the form + VEC_COND_EXPR<cond, vec0, vec1> into the following + vector: + {cond[i] != 0 ? vec0[i] : vec1[i], ... } + i changes from 0 to TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec0)). */ +static tree +expand_vec_cond_expr_piecewise (gimple_stmt_iterator *gsi, tree exp) +{ + tree cond = TREE_OPERAND (exp, 0); + tree vec0 = TREE_OPERAND (exp, 1); + tree vec1 = TREE_OPERAND (exp, 2); + tree type = TREE_TYPE (vec0); + tree lhs, rhs, notmask; + tree var, new_rhs; + optab op = NULL; + gimple new_stmt; + gimple_stmt_iterator gsi_tmp; + tree t; + + if (!COMPARISON_CLASS_P (cond)) + cond = build2 (EQ_EXPR, TREE_TYPE (cond), cond, + build_vector_from_val (TREE_TYPE (cond), + build_int_cst (TREE_TYPE (TREE_TYPE (cond)), -1))); + + /* Expand vector condition inside of VEC_COND_EXPR. */ + op = optab_for_tree_code (TREE_CODE (cond), type, optab_default); + if (!op || optab_handler (op, TYPE_MODE (type)) == CODE_FOR_nothing) + { + var = create_tmp_reg (TREE_TYPE (cond), "cond"); + new_rhs = expand_vector_comparison (gsi, TREE_TYPE (cond), + TREE_OPERAND (cond, 0), + TREE_OPERAND (cond, 1), + TREE_CODE (cond)); + new_stmt = gimple_build_assign (var, new_rhs); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + update_stmt (gsi_stmt (*gsi)); + } + else + var = cond; + + gsi_tmp = *gsi; + gsi_prev (&gsi_tmp); + + /* Expand VCOND<mask, v0, v1> to ((v0 & mask) | (v1 & ~mask)) */ + lhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, var, vec0); + notmask = gimplify_build1 (gsi, BIT_NOT_EXPR, type, var); + rhs = gimplify_build2 (gsi, BIT_AND_EXPR, type, notmask, vec1); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, type, lhs, rhs); + + /* Run vecower on the expresisons we have introduced. */ + for (; gsi_tmp.ptr != gsi->ptr; gsi_next (&gsi_tmp)) + expand_vector_operations_1 (&gsi_tmp); + + return t; +} + +#define pp(x) fprintf (stderr, "-- %s\n", x) + +static bool +is_vector_comparison (gimple_stmt_iterator *gsi, tree expr) +{ + tree type = TREE_TYPE (expr); + + if (TREE_CODE (expr) == VEC_COND_EXPR) + return true; + + if (COMPARISON_CLASS_P (expr) && TREE_CODE (type) == VECTOR_TYPE) + return true; + + if (TREE_CODE (expr) == BIT_IOR_EXPR || TREE_CODE (expr) == BIT_AND_EXPR + || TREE_CODE (expr) == BIT_XOR_EXPR) + return is_vector_comparison (gsi, TREE_OPERAND (expr, 0)) + & is_vector_comparison (gsi, TREE_OPERAND (expr, 1)); + + if (TREE_CODE (expr) == VAR_DECL) + { + gimple_stmt_iterator gsi_tmp; + gsi_tmp = *gsi; + tree name = DECL_NAME (expr); + tree var = NULL_TREE; + + for (; gsi_tmp.ptr; gsi_prev (&gsi_tmp)) + { + gimple stmt = gsi_stmt (gsi_tmp); + + if (gimple_code (stmt) != GIMPLE_ASSIGN) + continue; + + if (TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL + && DECL_NAME (gimple_assign_lhs (stmt)) == name) + return is_vector_comparison (&gsi_tmp, + gimple_assign_rhs_to_tree (stmt)); + } + } + + if (TREE_CODE (expr) == SSA_NAME) + { + enum tree_code code; + gimple exprdef = SSA_NAME_DEF_STMT (expr); + + if (gimple_code (exprdef) != GIMPLE_ASSIGN) + return false; + + if (TREE_CODE (gimple_expr_type (exprdef)) != VECTOR_TYPE) + return false; + + + return is_vector_comparison (gsi, + gimple_assign_rhs_to_tree (exprdef)); + } + + return false; +} + /* Process one statement. If we identify a vector operation, expand it. */ static void @@ -450,11 +620,34 @@ expand_vector_operations_1 (gimple_stmt_ code = gimple_assign_rhs_code (stmt); rhs_class = get_gimple_rhs_class (code); + lhs = gimple_assign_lhs (stmt); + + if (code == VEC_COND_EXPR) + { + tree exp = gimple_assign_rhs1 (stmt); + tree cond = TREE_OPERAND (exp, 0); + + if (!is_vector_comparison (gsi, cond)) + TREE_OPERAND (exp, 0) = + build2 (NE_EXPR, TREE_TYPE (cond), cond, + build_vector_from_val (TREE_TYPE (cond), + build_int_cst (TREE_TYPE (TREE_TYPE (cond)), 0))); + + if (expand_vec_cond_expr_p (TREE_TYPE (exp), + TYPE_MODE (TREE_TYPE (exp)))) + { + update_stmt (gsi_stmt (*gsi)); + return; + } + + new_rhs = expand_vec_cond_expr_piecewise (gsi, exp); + gimple_assign_set_rhs_from_tree (gsi, new_rhs); + update_stmt (gsi_stmt (*gsi)); + } if (rhs_class != GIMPLE_UNARY_RHS && rhs_class != GIMPLE_BINARY_RHS) return; - lhs = gimple_assign_lhs (stmt); rhs1 = gimple_assign_rhs1 (stmt); type = gimple_expr_type (stmt); if (rhs_class == GIMPLE_BINARY_RHS) Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in (revision 177665) +++ gcc/Makefile.in (working copy) @@ -888,7 +888,7 @@ EXCEPT_H = except.h $(HASHTAB_H) vecprim TARGET_DEF = target.def target-hooks-macros.h C_TARGET_DEF = c-family/c-target.def target-hooks-macros.h COMMON_TARGET_DEF = common/common-target.def target-hooks-macros.h -TARGET_H = $(TM_H) target.h $(TARGET_DEF) insn-modes.h +TGT = $(TM_H) target.h $(TARGET_DEF) insn-modes.h C_TARGET_H = c-family/c-target.h $(C_TARGET_DEF) COMMON_TARGET_H = common/common-target.h $(INPUT_H) $(COMMON_TARGET_DEF) MACHMODE_H = machmode.h mode-classes.def insn-modes.h @@ -919,8 +919,9 @@ TREE_H = tree.h all-tree.def tree.def c- REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) cfghooks.h GIMPLE_H = gimple.h gimple.def gsstruct.def pointer-set.h $(VEC_H) \ - vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TARGET_H) tree-ssa-operands.h \ + vecir.h $(GGC_H) $(BASIC_BLOCK_H) $(TGT) tree-ssa-operands.h \ tree-ssa-alias.h $(INTERNAL_FN_H) +TARGET_H = $(TGT) gimple.h GCOV_IO_H = gcov-io.h gcov-iov.h auto-host.h COVERAGE_H = coverage.h $(GCOV_IO_H) DEMANGLE_H = $(srcdir)/../include/demangle.h @@ -3185,7 +3186,7 @@ tree-vect-generic.o : tree-vect-generic. $(TM_H) $(TREE_FLOW_H) $(GIMPLE_H) tree-iterator.h $(TREE_PASS_H) \ $(FLAGS_H) $(OPTABS_H) $(MACHMODE_H) $(EXPR_H) \ langhooks.h $(FLAGS_H) $(DIAGNOSTIC_H) gt-tree-vect-generic.h $(GGC_H) \ - coretypes.h insn-codes.h + coretypes.h insn-codes.h target.h df-core.o : df-core.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \ insn-config.h $(RECOG_H) $(FUNCTION_H) $(REGS_H) alloc-pool.h \ hard-reg-set.h $(BASIC_BLOCK_H) $(DF_H) $(BITMAP_H) sbitmap.h $(TIMEVAR_H) \ Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c (revision 177665) +++ gcc/tree-cfg.c (working copy) @@ -3191,6 +3191,38 @@ verify_gimple_comparison (tree type, tre return true; } + if (TREE_CODE (type) == VECTOR_TYPE) + { + if (TREE_CODE (op0_type) != VECTOR_TYPE + || TREE_CODE (op1_type) != VECTOR_TYPE) + { + error ("non-vector operands in vector comparison"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } + + if (!useless_type_conversion_p (op0_type, op1_type) + && !useless_type_conversion_p (op1_type, op0_type)) + { + error ("type mismatch in vector comparison"); + debug_generic_expr (op0_type); + debug_generic_expr (op1_type); + return true; + } + + if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type) + && TYPE_PRECISION (TREE_TYPE (op0_type)) + != TYPE_PRECISION (TREE_TYPE (type))) + { + error ("invalid vector comparison resulting type"); + debug_generic_expr (type); + return true; + } + + return false; + } + /* For comparisons we do not have the operations type as the effective type the comparison is carried out in. Instead we require that either the first operand is trivially Index: gcc/c-parser.c =================================================================== --- gcc/c-parser.c (revision 177665) +++ gcc/c-parser.c (working copy) @@ -5339,6 +5339,15 @@ c_parser_conditional_expression (c_parse tree eptype = NULL_TREE; middle_loc = c_parser_peek_token (parser)->location; + + if (TREE_CODE (TREE_TYPE (cond.value)) == VECTOR_TYPE) + { + error_at (middle_loc, "cannot ommit middle operator in " + "vector comparison"); + ret.value = error_mark_node; + return ret; + } + pedwarn (middle_loc, OPT_pedantic, "ISO C forbids omitting the middle term of a ?: expression"); warn_for_omitted_condop (middle_loc, cond.value); @@ -5357,9 +5366,12 @@ c_parser_conditional_expression (c_parse } else { - cond.value - = c_objc_common_truthvalue_conversion - (cond_loc, default_conversion (cond.value)); + if (TREE_CODE (TREE_TYPE (cond.value)) != VECTOR_TYPE) + { + cond.value + = c_objc_common_truthvalue_conversion + (cond_loc, default_conversion (cond.value)); + } c_inhibit_evaluation_warnings += cond.value == truthvalue_false_node; exp1 = c_parser_expression_conv (parser); mark_exp_read (exp1.value); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 177665) +++ gcc/config/i386/i386.c (working copy) @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. #include "tm.h" #include "rtl.h" #include "tree.h" +#include "tree-flow.h" #include "tm_p.h" #include "regs.h" #include "hard-reg-set.h" @@ -18402,27 +18403,55 @@ ix86_expand_sse_fp_minmax (rtx dest, enu return true; } +rtx rtx_build_vector_from_val (enum machine_mode, HOST_WIDE_INT); + +/* Returns a vector of mode MODE where all the elements are ARG. */ +rtx +rtx_build_vector_from_val (enum machine_mode mode, HOST_WIDE_INT arg) +{ + rtvec v; + int units, i; + enum machine_mode inner; + + units = GET_MODE_NUNITS (mode); + inner = GET_MODE_INNER (mode); + v = rtvec_alloc (units); + for (i = 0; i < units; ++i) + RTVEC_ELT (v, i) = gen_rtx_CONST_INT (inner, arg); + + return gen_rtx_raw_CONST_VECTOR (mode, v); +} + /* Expand an sse vector comparison. Return the register with the result. */ static rtx ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1, - rtx op_true, rtx op_false) + rtx op_true, rtx op_false, bool no_comparison) { enum machine_mode mode = GET_MODE (dest); rtx x; - cmp_op0 = force_reg (mode, cmp_op0); - if (!nonimmediate_operand (cmp_op1, mode)) - cmp_op1 = force_reg (mode, cmp_op1); + /* Avoid useless comparison. */ + if (no_comparison) + { + cmp_op0 = force_reg (mode, cmp_op0); + x = cmp_op0; + } + else + { + cmp_op0 = force_reg (mode, cmp_op0); + if (!nonimmediate_operand (cmp_op1, mode)) + cmp_op1 = force_reg (mode, cmp_op1); + + x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1); + } if (optimize || reg_overlap_mentioned_p (dest, op_true) || reg_overlap_mentioned_p (dest, op_false)) dest = gen_reg_rtx (mode); - x = gen_rtx_fmt_ee (code, mode, cmp_op0, cmp_op1); emit_insn (gen_rtx_SET (VOIDmode, dest, x)); - return dest; } @@ -18434,8 +18463,14 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp { enum machine_mode mode = GET_MODE (dest); rtx t2, t3, x; - - if (op_false == CONST0_RTX (mode)) + rtx mask_true; + + if (rtx_equal_p (op_true, rtx_build_vector_from_val (mode, -1)) + && rtx_equal_p (op_false, CONST0_RTX (mode))) + { + emit_insn (gen_rtx_SET (VOIDmode, dest, cmp)); + } + else if (op_false == CONST0_RTX (mode)) { op_true = force_reg (mode, op_true); x = gen_rtx_AND (mode, cmp, op_true); @@ -18512,7 +18547,7 @@ ix86_expand_fp_movcc (rtx operands[]) return true; tmp = ix86_expand_sse_cmp (operands[0], code, op0, op1, - operands[2], operands[3]); + operands[2], operands[3], false); ix86_expand_sse_movcc (operands[0], tmp, operands[2], operands[3]); return true; } @@ -18555,7 +18590,7 @@ ix86_expand_fp_vcond (rtx operands[]) return true; cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5], - operands[1], operands[2]); + operands[1], operands[2], false); ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]); return true; } @@ -18569,7 +18604,9 @@ ix86_expand_int_vcond (rtx operands[]) enum rtx_code code = GET_CODE (operands[3]); bool negate = false; rtx x, cop0, cop1; + rtx comp; + comp = operands[3]; cop0 = operands[4]; cop1 = operands[5]; @@ -18681,8 +18718,18 @@ ix86_expand_int_vcond (rtx operands[]) } } - x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, - operands[1+negate], operands[2-negate]); + if (GET_CODE (comp) == NE && XEXP (comp, 0) == NULL_RTX + && XEXP (comp, 1) == NULL_RTX) + { + rtx vec = CONST0_RTX (mode); + x = ix86_expand_sse_cmp (operands[0], code, cop0, vec, + operands[1+negate], operands[2-negate], true); + } + else + { + x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1, + operands[1+negate], operands[2-negate], false); + } ix86_expand_sse_movcc (operands[0], x, operands[1+negate], operands[2-negate]); @@ -18774,7 +18821,7 @@ ix86_expand_sse_unpack (rtx operands[2], tmp = force_reg (imode, CONST0_RTX (imode)); else tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode), - operands[1], pc_rtx, pc_rtx); + operands[1], pc_rtx, pc_rtx, false); emit_insn (unpack (dest, operands[1], tmp)); } @@ -32827,6 +32874,276 @@ ix86_vectorize_builtin_vec_perm (tree ve return ix86_builtins[(int) fcode]; } +/* Find target specific sequence for vector comparison of + real-type vectors V0 and V1. Returns variable containing + result of the comparison or NULL_TREE in other case. */ +static tree +vector_fp_compare (gimple_stmt_iterator *gsi, tree rettype, + enum machine_mode mode, tree v0, tree v1, + enum tree_code code) +{ + enum ix86_builtins fcode; + int arg = -1; + tree fdef, frtype, tmp, var, t; + gimple new_stmt; + bool reverse = false; + +#define SWITCH_MODE(mode, fcode, code, value) \ +switch (mode) \ + { \ + case V2DFmode: \ + if (!TARGET_SSE2) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMP ## code ## PD; \ + break; \ + case V4DFmode: \ + if (!TARGET_AVX) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMPPD256; \ + arg = value; \ + break; \ + case V4SFmode: \ + if (!TARGET_SSE) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMP ## code ## PS; \ + break; \ + case V8SFmode: \ + if (!TARGET_AVX) return NULL_TREE; \ + fcode = IX86_BUILTIN_CMPPS256; \ + arg = value; \ + break; \ + default: \ + return NULL_TREE; \ + /* FIXME: Similar instructions for MMX. */ \ + } + + switch (code) + { + case EQ_EXPR: + SWITCH_MODE (mode, fcode, EQ, 0); + break; + + case NE_EXPR: + SWITCH_MODE (mode, fcode, NEQ, 4); + break; + + case GT_EXPR: + SWITCH_MODE (mode, fcode, LT, 1); + reverse = true; + break; + + case LT_EXPR: + SWITCH_MODE (mode, fcode, LT, 1); + break; + + case LE_EXPR: + SWITCH_MODE (mode, fcode, LE, 2); + break; + + case GE_EXPR: + SWITCH_MODE (mode, fcode, LE, 2); + reverse = true; + break; + + default: + return NULL_TREE; + } +#undef SWITCH_MODE + + fdef = ix86_builtins[(int)fcode]; + frtype = TREE_TYPE (TREE_TYPE (fdef)); + + tmp = create_tmp_var (frtype, "tmp"); + var = create_tmp_var (rettype, "tmp"); + + if (arg == -1) + if (reverse) + new_stmt = gimple_build_call (fdef, 2, v1, v0); + else + new_stmt = gimple_build_call (fdef, 2, v0, v1); + else + if (reverse) + new_stmt = gimple_build_call (fdef, 3, v0, v1, + build_int_cst (char_type_node, arg)); + else + new_stmt = gimple_build_call (fdef, 3, v1, v0, + build_int_cst (char_type_node, arg)); + + gimple_call_set_lhs (new_stmt, tmp); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp); + new_stmt = gimple_build_assign (var, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + + return var; +} + +/* Find target specific sequence for vector comparison of + integer-type vectors V0 and V1. Returns variable containing + result of the comparison or NULL_TREE in other case. */ +static tree +vector_int_compare (gimple_stmt_iterator *gsi, tree rettype, + enum machine_mode mode, tree v0, tree v1, + enum tree_code code) +{ + enum ix86_builtins feq, fgt; + tree var, t, tmp, tmp1, tmp2, defeq, defgt, gtrtype, eqrtype; + gimple new_stmt; + + switch (mode) + { + /* SSE integer-type vectors. */ + case V2DImode: + if (!TARGET_SSE4_2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQQ; + fgt = IX86_BUILTIN_PCMPGTQ; + break; + + case V4SImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQD128; + fgt = IX86_BUILTIN_PCMPGTD128; + break; + + case V8HImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQW128; + fgt = IX86_BUILTIN_PCMPGTW128; + break; + + case V16QImode: + if (!TARGET_SSE2) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQB128; + fgt = IX86_BUILTIN_PCMPGTB128; + break; + + /* MMX integer-type vectors. */ + case V2SImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQD; + fgt = IX86_BUILTIN_PCMPGTD; + break; + + case V4HImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQW; + fgt = IX86_BUILTIN_PCMPGTW; + break; + + case V8QImode: + if (!TARGET_MMX) return NULL_TREE; + feq = IX86_BUILTIN_PCMPEQB; + fgt = IX86_BUILTIN_PCMPGTB; + break; + + /* FIXME: Similar instructions for AVX. */ + default: + return NULL_TREE; + } + + + var = create_tmp_var (rettype, "ret"); + defeq = ix86_builtins[(int)feq]; + defgt = ix86_builtins[(int)fgt]; + eqrtype = TREE_TYPE (TREE_TYPE (defeq)); + gtrtype = TREE_TYPE (TREE_TYPE (defgt)); + +#define EQGT_CALL(gsi, stmt, var, op0, op1, gteq) \ +do { \ + var = create_tmp_var (gteq ## rtype, "tmp"); \ + stmt = gimple_build_call (def ## gteq, 2, op0, op1); \ + gimple_call_set_lhs (stmt, var); \ + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); \ +} while (0) + + switch (code) + { + case EQ_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v0, v1, eq); + break; + + case NE_EXPR: + tmp = create_tmp_var (eqrtype, "tmp"); + + EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, eq); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v0, eq); + + /* t = tmp1 ^ {-1, -1,...} */ + t = gimplify_build2 (gsi, BIT_XOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + case GT_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v0, v1, gt); + break; + + case LT_EXPR: + EQGT_CALL (gsi, new_stmt, tmp, v1, v0, gt); + break; + + case GE_EXPR: + if (eqrtype != gtrtype) + return NULL_TREE; + tmp = create_tmp_var (eqrtype, "tmp"); + EQGT_CALL (gsi, new_stmt, tmp1, v0, v1, gt); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + case LE_EXPR: + if (eqrtype != gtrtype) + return NULL_TREE; + tmp = create_tmp_var (eqrtype, "tmp"); + EQGT_CALL (gsi, new_stmt, tmp1, v1, v0, gt); + EQGT_CALL (gsi, new_stmt, tmp2, v0, v1, eq); + t = gimplify_build2 (gsi, BIT_IOR_EXPR, eqrtype, tmp1, tmp2); + new_stmt = gimple_build_assign (tmp, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + break; + + default: + return NULL_TREE; + } +#undef EQGT_CALL + + t = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, rettype, tmp); + new_stmt = gimple_build_assign (var, t); + gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT); + return var; +} + +/* Lower a comparison of two vectors V0 and V1, returning a + variable with the result of comparison. Returns NULL_TREE + when it is impossible to find a target specific sequence. */ +static tree +ix86_vectorize_builtin_vec_compare (gimple_stmt_iterator *gsi, tree rettype, + tree v0, tree v1, enum tree_code code) +{ + tree type; + + /* Make sure we are comparing the same types. */ + if (TREE_TYPE (v0) != TREE_TYPE (v1) + || TREE_TYPE (TREE_TYPE (v0)) != TREE_TYPE (TREE_TYPE (v1))) + return NULL_TREE; + + type = TREE_TYPE (v0); + + /* Cannot compare packed unsigned integers + unless it is EQ or NEQ operations. */ + if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE + && TYPE_UNSIGNED (TREE_TYPE (type))) + if (code != EQ_EXPR && code != NE_EXPR) + return NULL_TREE; + + + if (TREE_CODE (TREE_TYPE (type)) == REAL_TYPE) + return vector_fp_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code); + else if (TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) + return vector_int_compare (gsi, rettype, TYPE_MODE (type), v0, v1, code); + else + return NULL_TREE; +} + /* Return a vector mode with twice as many elements as VMODE. */ /* ??? Consider moving this to a table generated by genmodes.c. */ @@ -35270,6 +35587,11 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \ ix86_autovectorize_vector_sizes +#undef TARGET_VECTORIZE_BUILTIN_VEC_COMPARE +#define TARGET_VECTORIZE_BUILTIN_VEC_COMPARE \ + ix86_vectorize_builtin_vec_compare + + #undef TARGET_SET_CURRENT_FUNCTION #define TARGET_SET_CURRENT_FUNCTION ix86_set_current_function
Richard I formalized an approach a little-bit, now it works without target hooks, but some polishing is still required. I want you to comment on the several important approaches that I use in the patch. So how does it work. 1) All the vector comparisons at the level of type-checker are introduced using VEC_COND_EXPR with constant selection operands being {-1} and {0}. For example v0 > v1 is transformed into VEC_COND_EXPR<v0 > v1, {-1}, {0}>. 2) When optabs expand VEC_COND_EXPR, two cases are considered: 2.a) first operand of VEC_COND_EXPR is comparison, in that case nothing changes. 2.b) first operand is something else, in that case, we specially mark this case, recognize it in the backend, and do not create a comparison, but use the mask as it was a result of some comparison. 3) In order to make sure that mask in VEC_COND_EXPR<mask, v0, v1> is a vector comparison we use is_vector_comparison function, if it returns false, then we replace mask with mask != {0}. So we end-up with the following functionality: VEC_COND_EXPR<mask, v0,v1> -- if we know that mask is a result of comparison of two vectors, we leave it as it is, otherwise change with mask != {0}. Plain vector comparison a <op> b is represented with VEC_COND_EXPR, which correctly expands, without creating useless masking. Basically for me there are two questions: 1) Can we perform information passing in optabs in a nicer way? 2) How is_vector_comparison could be improved? I have several ideas, like checking if constant vector all consists of 0 and -1, and so on. But first is it conceptually fine. P.S. I tired to put the functionality of is_vector_comparison in tree-ssa-forwprop, but the thing is that it is called only with -On, which I find inappropriate, and the functionality gets more complicated. Thanks, Artem.