Constant-fold vector comparisons

Message ID	alpine.DEB.2.02.1209291454140.11224@stedding.saclay.inria.fr
State	New
Headers	show Return-Path: <gcc-patches-return-327619-incoming=patchwork.ozlabs.org@gcc.gnu.org> Comment: DKIM? See http://www.dkim.org Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:Date:From:To:Subject:Message-ID:User-Agent:MIME-Version:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=M2Vel1+X9zoGYKLDjUIcPh9KjN2WVP48IGgo7wXz7jcJVW0w18YuE04p2sNK/P zCdBs5xZJBN9hBY6WRdo71CIu5VzJDRrHADvtGCd520hnOdkU2Yu4bKHKrjX1Bvu gR63qu28If3Ri1RUe54CT0Ca+CjhUuNCkcRaN7cI5dB5U=; Date: Sat, 29 Sep 2012 15:25:25 +0200 (CEST) From: Marc Glisse <marc.glisse@inria.fr> To: gcc-patches@gcc.gnu.org Subject: Constant-fold vector comparisons Message-ID: <alpine.DEB.2.02.1209291454140.11224@stedding.saclay.inria.fr> User-Agent: Alpine 2.02 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-1047929957-1348925125=:11224" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org

Marc Glisse Sept. 29, 2012, 1:25 p.m. UTC

Hello,

this patch does 2 things (I should have split it in 2, but the questions 
go together):

1) it handles constant folding of vector comparisons,

2) it fixes another place where vectors are not expected (I'll probably 
wait to have front-end support and testcases to do more of those, but 
there is something to discuss).

I wasn't sure what integer_truep should test exactly. For integer: == 1 or 
!= 0? For vectors: == -1 or < 0? I chose the one that worked best for the 
forwprop case where I used it.

It seems that before this patch, the middle-end didn't know how comparison 
results were encoded (a good reason for VEC_COND_EXPR to require a 
comparison as its first argument). I am using the OpenCL encoding that 
what matters is the high bit of each vector element. I am not quite sure 
what happens for targets (are there any?) that use a different encoding. 
When expanding vcond, they can do the comparison as they like. When 
expanding an isolated comparison, I expect they have to expand it as 
vcond(a<b,-1,0). So it should be ok, but I could easily have missed 
something.


2012-10-01  Marc Glisse  <marc.glisse@inria.fr>

gcc/
 	* tree.c (integer_truep): New function.
 	* tree.h (integer_truep): Declare.
 	* tree-ssa-forwprop.c (forward_propagate_into_cond): Call it.
 	Don't use boolean_type_node for vectors.
 	* fold-const.c (fold_relational_const): Handle VECTOR_CST.

gcc/testsuite/
 	* gcc.dg/tree-ssa/foldconst-6.c: New testcase.

Richard Biener Oct. 1, 2012, 11:39 a.m. UTC | #1

On Sat, Sep 29, 2012 at 3:25 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Hello,
>
> this patch does 2 things (I should have split it in 2, but the questions go
> together):
>
> 1) it handles constant folding of vector comparisons,
>
> 2) it fixes another place where vectors are not expected (I'll probably wait
> to have front-end support and testcases to do more of those, but there is
> something to discuss).
>
> I wasn't sure what integer_truep should test exactly. For integer: == 1 or
> != 0? For vectors: == -1 or < 0? I chose the one that worked best for the
> forwprop case where I used it.
>
> It seems that before this patch, the middle-end didn't know how comparison
> results were encoded (a good reason for VEC_COND_EXPR to require a
> comparison as its first argument). I am using the OpenCL encoding that what
> matters is the high bit of each vector element. I am not quite sure what
> happens for targets (are there any?) that use a different encoding. When
> expanding vcond, they can do the comparison as they like. When expanding an
> isolated comparison, I expect they have to expand it as vcond(a<b,-1,0). So
> it should be ok, but I could easily have missed something.

Comments below

>
> 2012-10-01  Marc Glisse  <marc.glisse@inria.fr>
>
> gcc/
>         * tree.c (integer_truep): New function.
>         * tree.h (integer_truep): Declare.
>         * tree-ssa-forwprop.c (forward_propagate_into_cond): Call it.
>         Don't use boolean_type_node for vectors.
>         * fold-const.c (fold_relational_const): Handle VECTOR_CST.
>
> gcc/testsuite/
>         * gcc.dg/tree-ssa/foldconst-6.c: New testcase.
>
> --
> Marc Glisse
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  (revision 191850)
> +++ gcc/tree.h  (working copy)
> @@ -5272,20 +5272,25 @@ extern int integer_zerop (const_tree);
>
>  /* integer_onep (tree x) is nonzero if X is an integer constant of value 1.
> */
>
>  extern int integer_onep (const_tree);
>
>  /* integer_all_onesp (tree x) is nonzero if X is an integer constant
>     all of whose significant bits are 1.  */
>
>  extern int integer_all_onesp (const_tree);
>
> +/* integer_truep (tree x) is nonzero if X is an integer constant of value
> 1,
> +   or a vector constant of value < 0.  */
> +
> +extern bool integer_truep (const_tree);
> +
>  /* integer_pow2p (tree x) is nonzero is X is an integer constant with
>     exactly one bit 1.  */
>
>  extern int integer_pow2p (const_tree);
>
>  /* integer_nonzerop (tree x) is nonzero if X is an integer constant
>     with a nonzero value.  */
>
>  extern int integer_nonzerop (const_tree);
>
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     (revision 191850)
> +++ gcc/tree-ssa-forwprop.c     (working copy)
> @@ -564,46 +564,46 @@ forward_propagate_into_cond (gimple_stmt
>        enum tree_code code;
>        tree name = cond;
>        gimple def_stmt = get_prop_source_stmt (name, true, NULL);
>        if (!def_stmt || !can_propagate_from (def_stmt))
>         return 0;
>
>        code = gimple_assign_rhs_code (def_stmt);
>        if (TREE_CODE_CLASS (code) == tcc_comparison)
>         tmp = fold_build2_loc (gimple_location (def_stmt),
>                                code,
> -                              boolean_type_node,
> +                              TREE_TYPE (cond),

That's obvious.

>                                gimple_assign_rhs1 (def_stmt),
>                                gimple_assign_rhs2 (def_stmt));
>        else if ((code == BIT_NOT_EXPR
>                 && TYPE_PRECISION (TREE_TYPE (cond)) == 1)
>                || (code == BIT_XOR_EXPR
> -                  && integer_onep (gimple_assign_rhs2 (def_stmt))))
> +                  && integer_truep (gimple_assign_rhs2 (def_stmt))))

See below.

>         {
>           tmp = gimple_assign_rhs1 (def_stmt);
>           swap = true;
>         }
>      }
>
>    if (tmp
>        && is_gimple_condexpr (tmp))
>      {
>        if (dump_file && tmp)
>         {
>           fprintf (dump_file, "  Replaced '");
>           print_generic_expr (dump_file, cond, 0);
>           fprintf (dump_file, "' with '");
>           print_generic_expr (dump_file, tmp, 0);
>           fprintf (dump_file, "'\n");
>         }
>
> -      if (integer_onep (tmp))
> +      if (integer_truep (tmp))
>         gimple_assign_set_rhs_from_tree (gsi_p, gimple_assign_rhs2 (stmt));
>        else if (integer_zerop (tmp))
>         gimple_assign_set_rhs_from_tree (gsi_p, gimple_assign_rhs3 (stmt));
>        else
>         {
>           gimple_assign_set_rhs1 (stmt, unshare_expr (tmp));
>           if (swap)
>             {
>               tree t = gimple_assign_rhs2 (stmt);
>               gimple_assign_set_rhs2 (stmt, gimple_assign_rhs3 (stmt));
> Index: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c (revision 0)
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-ccp1" } */
> +
> +typedef long vec __attribute__ ((vector_size (2 * sizeof(long))));
> +
> +vec f ()
> +{
> +  vec a = { -2, 666 };
> +  vec b = { 3, 2 };
> +  return a < b;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "666" "ccp1"} } */
> +/* { dg-final { cleanup-tree-dump "ccp1" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/foldconst-6.c
> ___________________________________________________________________
> Added: svn:keywords
>    + Author Date Id Revision URL
> Added: svn:eol-style
>    + native
>
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    (revision 191850)
> +++ gcc/fold-const.c    (working copy)
> @@ -16084,20 +16084,44 @@ fold_relational_const (enum tree_code co
>                                           TREE_IMAGPART (op0),
>                                           TREE_IMAGPART (op1));
>        if (code == EQ_EXPR)
>         return fold_build2 (TRUTH_ANDIF_EXPR, type, rcond, icond);
>        else if (code == NE_EXPR)
>         return fold_build2 (TRUTH_ORIF_EXPR, type, rcond, icond);
>        else
>         return NULL_TREE;
>      }
>
> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
> +    {
> +      int count = VECTOR_CST_NELTS (op0);
> +      tree *elts =  XALLOCAVEC (tree, count);
> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
> +
> +      for (int i = 0; i < count; i++)
> +       {
> +         tree elem_type = TREE_TYPE (type);
> +         tree elem0 = VECTOR_CST_ELT (op0, i);
> +         tree elem1 = VECTOR_CST_ELT (op1, i);
> +
> +         elts[i] = fold_relational_const (code, elem_type,
> +                                          elem0, elem1);
> +
> +         if(elts[i] == NULL_TREE)
> +           return NULL_TREE;
> +
> +         elts[i] = fold_negate_const (elts[i], elem_type);

I think you need to invent something new similar to STORE_FLAG_VALUE
or use STORE_FLAG_VALUE here.  With the above you try to map
{0, 1} to {0, -1} which is only true if the operation on the element types
returns {0, 1} (thus, STORE_FLAG_VALUE is 1).

> +       }
> +
> +      return build_vector (type, elts);
> +    }
> +
>    /* From here on we only handle LT, LE, GT, GE, EQ and NE.
>
>       To compute GT, swap the arguments and do LT.
>       To compute GE, do LT and invert the result.
>       To compute LE, swap the arguments, do LT and invert the result.
>       To compute NE, do EQ and invert the result.
>
>       Therefore, the code below must handle only EQ and LT.  */
>
>    if (code == LE_EXPR || code == GT_EXPR)
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  (revision 191850)
> +++ gcc/tree.c  (working copy)
> @@ -1835,20 +1835,48 @@ integer_all_onesp (const_tree expr)
>        else
>         high_value = ((HOST_WIDE_INT) 1 << shift_amount) - 1;
>
>        return (TREE_INT_CST_LOW (expr) == ~(unsigned HOST_WIDE_INT) 0
>               && TREE_INT_CST_HIGH (expr) == high_value);
>      }
>    else
>      return TREE_INT_CST_LOW (expr) == ((unsigned HOST_WIDE_INT) 1 << prec)
> - 1;
>  }
>
> +/* Return true if EXPR is an integer constant representing true.  */
> +
> +bool
> +integer_truep (const_tree expr)
> +{
> +  STRIP_NOPS (expr);
> +
> +  switch (TREE_CODE (expr))
> +    {
> +    case INTEGER_CST:
> +      /* Do not just test != 0, some places expect the value 1.  */
> +      return (TREE_INT_CST_LOW (expr) == 1
> +             && TREE_INT_CST_HIGH (expr) == 0);

I wonder if using STORE_FLAG_VALUE is better here (note that it
usually differs for FP vs. integral comparisons and the mode passed
to STORE_FLAG_VALUE is that of the comparison result).

That said, until we are sure what semantics we want here (forwprop
for example doesn't look at 'comparisons' but operations on special
values and types) I'd prefer to not introduce integer_truep ().

Thanks,
Richard.

> +    case VECTOR_CST:
> +      {
> +       for (unsigned i = 0; i < VECTOR_CST_NELTS (expr); ++i)
> +         {
> +           tree elm = VECTOR_CST_ELT (expr, i);
> +           if (TREE_CODE (elm) != INTEGER_CST || !tree_int_cst_sign_bit
> (elm))
> +             return false;
> +         }
> +       return true;
> +      }
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Return 1 if EXPR is an integer constant that is a power of 2 (i.e., has
> only
>     one bit on).  */
>
>  int
>  integer_pow2p (const_tree expr)
>  {
>    int prec;
>    unsigned HOST_WIDE_INT high, low;
>
>    STRIP_NOPS (expr);
>

Marc Glisse Oct. 1, 2012, 3:57 p.m. UTC | #2

[merging both threads, thanks for the answers]

On Mon, 1 Oct 2012, Richard Guenther wrote:

>>> optabs should be fixed instead, an is_gimple_val condition is implicitely
>>> val != 0.
>>
>> For vectors, I think it should be val < 0 (with an appropriate cast of val
>> to a signed integer vector type if necessary). Or (val & highbit) != 0, but
>> that's longer.
>
> I don't think so.  Throughout the compiler we generally assume false == 0
> and anything else is true.  (yes, for FP there is STORE_FLAG_VALUE, but
> it's scope is quite limited - if we want sth similar for vectors we'd have to
> invent it).

See below.

>>> If we for example have
>>>
>>> predicate = a < b;
>>> x = predicate ? d : e;
>>> y = predicate ? f : g;
>>>
>>> we ideally want to re-use the predicate computation on targets where
>>> that would be optimal (and combine should be able to recover the
>>> case where it is not).
>>
>> That I don't understand. The vcond instruction implemented by targets takes
>> as arguments d, e, cmp, a, b and emits the comparison itself. I don't see
>> how I can avoid sending to the targets both (d,e,<,a,b) and (f,g,<,a,b).
>> They will notice eventually that a<b is computed twice and remove one of the
>> two, but I don't see how to do that in optabs.c. Or I can compute x = a < b,
>> use x < 0 as the comparison passed to the targets, and expect targets (those
>> for which it is true) to recognize that < 0 is useless in a vector condition
>> (PR54700), or is useless on a comparison result.
>
> But that's a limitation of how vcond works.  ISTR there is/was a vselect
> instruction as well, taking a "mask" and two vectors to select from.  At least
> that's how vcond works internally for some sub-targets.

vselect seems to only appear in config/. Would it be defined as:
vselect(m,a,b)=(a&m)|(b&~m) ? I would almost be tempted to just define a 
pattern in .md files and let combine handle it, although it might be one 
instruction too long for that (and if m is x<y, ~m might look like x>=y).
Or would it match the OpenCL select: "For each component of a vector type,
result[i] = if MSB of c[i] is set ? b[i] : a[i]."? Or the pattern with &
and | but with a precondition that the value of each element of the mask
must be 0 or ±1?

I don't find vcond that bad, as long as targets check for trivial 
comparisons in the expansion (what trivial means may depend on the 
platform). It is quite flexible for targets.

On Mon, 1 Oct 2012, Richard Guenther wrote:

>>         tmp = fold_build2_loc (gimple_location (def_stmt),
>>                                code,
>> -                              boolean_type_node,
>> +                              TREE_TYPE (cond),
>
> That's obvious.

Ok, I'll test and commit that line separately.

>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>> +    {
>> +      int count = VECTOR_CST_NELTS (op0);
>> +      tree *elts =  XALLOCAVEC (tree, count);
>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>> +
>> +      for (int i = 0; i < count; i++)
>> +       {
>> +         tree elem_type = TREE_TYPE (type);
>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>> +
>> +         elts[i] = fold_relational_const (code, elem_type,
>> +                                          elem0, elem1);
>> +
>> +         if(elts[i] == NULL_TREE)
>> +           return NULL_TREE;
>> +
>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>
> I think you need to invent something new similar to STORE_FLAG_VALUE
> or use STORE_FLAG_VALUE here.  With the above you try to map
> {0, 1} to {0, -1} which is only true if the operation on the element types
> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).

Er, seems to me that constant folding of a scalar comparison in the
front/middle-end only returns {0, 1}.

>> +/* Return true if EXPR is an integer constant representing true.  */
>> +
>> +bool
>> +integer_truep (const_tree expr)
>> +{
>> +  STRIP_NOPS (expr);
>> +
>> +  switch (TREE_CODE (expr))
>> +    {
>> +    case INTEGER_CST:
>> +      /* Do not just test != 0, some places expect the value 1.  */
>> +      return (TREE_INT_CST_LOW (expr) == 1
>> +             && TREE_INT_CST_HIGH (expr) == 0);
>
> I wonder if using STORE_FLAG_VALUE is better here (note that it
> usually differs for FP vs. integral comparisons and the mode passed
> to STORE_FLAG_VALUE is that of the comparison result).

I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
simplify-rtx, in a way that seems a bit strange but I'll try to
understand that later). Thanks for showing me this macro, it seems
important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
the RTL level.

It looks like it would be possible to have 3 different semantics:
source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
and back-end is whatever the target wants. The front-end would generate
for a<b : vec_cond_expr(a<b,-1,0) and for a?b:c : vec_cond_expr(a<0,b,c)
and there is no need for the middle-end to use the same representation
of comparisons as the front-ends or targets (expand of a vec_cond_expr
whose first argument is not a comparison would use != 0 if we chose a 0
/ 1 encoding).

However, since the front-ends and many targets agree on the OpenCL
semantics, it means introducing a conversion back and forth in the
middle-end, which may complicate things a bit. Note also that we already
have constant_boolean_node that returns a vector of -1 for true, and
that the front-end doesn't currently generate a vec_cond_expr for a<b.

It is true though that using 1 for true would fit better with the scalar
ops.

Well, if someone feels like taking a decision... I am happy with either
choice, as long as I know where to go.

> That said, until we are sure what semantics we want here (forwprop
> for example doesn't look at 'comparisons' but operations on special
> values and types) I'd prefer to not introduce integer_truep ().

I completely agree that defining the semantics comes first :-)

Richard Biener Oct. 2, 2012, 12:41 p.m. UTC | #3

On Mon, Oct 1, 2012 at 5:57 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> [merging both threads, thanks for the answers]
>
>
> On Mon, 1 Oct 2012, Richard Guenther wrote:
>
>>>> optabs should be fixed instead, an is_gimple_val condition is
>>>> implicitely
>>>> val != 0.
>>>
>>>
>>> For vectors, I think it should be val < 0 (with an appropriate cast of
>>> val
>>> to a signed integer vector type if necessary). Or (val & highbit) != 0,
>>> but
>>> that's longer.
>>
>>
>> I don't think so.  Throughout the compiler we generally assume false == 0
>> and anything else is true.  (yes, for FP there is STORE_FLAG_VALUE, but
>> it's scope is quite limited - if we want sth similar for vectors we'd have
>> to
>> invent it).
>
>
> See below.
>
>
>>>> If we for example have
>>>>
>>>> predicate = a < b;
>>>> x = predicate ? d : e;
>>>> y = predicate ? f : g;
>>>>
>>>> we ideally want to re-use the predicate computation on targets where
>>>> that would be optimal (and combine should be able to recover the
>>>> case where it is not).
>>>
>>>
>>> That I don't understand. The vcond instruction implemented by targets
>>> takes
>>> as arguments d, e, cmp, a, b and emits the comparison itself. I don't see
>>> how I can avoid sending to the targets both (d,e,<,a,b) and (f,g,<,a,b).
>>> They will notice eventually that a<b is computed twice and remove one of
>>> the
>>> two, but I don't see how to do that in optabs.c. Or I can compute x = a <
>>> b,
>>> use x < 0 as the comparison passed to the targets, and expect targets
>>> (those
>>> for which it is true) to recognize that < 0 is useless in a vector
>>> condition
>>> (PR54700), or is useless on a comparison result.
>>
>>
>> But that's a limitation of how vcond works.  ISTR there is/was a vselect
>> instruction as well, taking a "mask" and two vectors to select from.  At
>> least
>> that's how vcond works internally for some sub-targets.
>
>
> vselect seems to only appear in config/. Would it be defined as:
> vselect(m,a,b)=(a&m)|(b&~m) ? I would almost be tempted to just define a
> pattern in .md files and let combine handle it, although it might be one
> instruction too long for that (and if m is x<y, ~m might look like x>=y).
> Or would it match the OpenCL select: "For each component of a vector type,
> result[i] = if MSB of c[i] is set ? b[i] : a[i]."? Or the pattern with &
> and | but with a precondition that the value of each element of the mask
> must be 0 or ±1?
>
> I don't find vcond that bad, as long as targets check for trivial
> comparisons in the expansion (what trivial means may depend on the
> platform). It is quite flexible for targets.

Well, ok.

>
> On Mon, 1 Oct 2012, Richard Guenther wrote:
>
>>>         tmp = fold_build2_loc (gimple_location (def_stmt),
>>>                                code,
>>> -                              boolean_type_node,
>>> +                              TREE_TYPE (cond),
>>
>>
>> That's obvious.
>
>
> Ok, I'll test and commit that line separately.
>
>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>> +    {
>>> +      int count = VECTOR_CST_NELTS (op0);
>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>> +
>>> +      for (int i = 0; i < count; i++)
>>> +       {
>>> +         tree elem_type = TREE_TYPE (type);
>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>> +
>>> +         elts[i] = fold_relational_const (code, elem_type,
>>> +                                          elem0, elem1);
>>> +
>>> +         if(elts[i] == NULL_TREE)
>>> +           return NULL_TREE;
>>> +
>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>
>>
>> I think you need to invent something new similar to STORE_FLAG_VALUE
>> or use STORE_FLAG_VALUE here.  With the above you try to map
>> {0, 1} to {0, -1} which is only true if the operation on the element types
>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>
>
> Er, seems to me that constant folding of a scalar comparison in the
> front/middle-end only returns {0, 1}.

The point is we need to define some semantics for vector comparison
results.  One variant is to make it target independent which in turn
would inhibit (or make it more difficult) to exploit some target features.
You for example use {0, -1} for truth values - probably to exploit target
features - even though the most natural middle-end way would be to
use {0, 1} as for everything else (caveat: there may be both signed
and unsigned bools, we don't allow vector components with non-mode precision,
thus you could argue that a signed bool : 1 is just "sign-extended"
for your solution).  A different variant is to make it target dependent
to leverage optimization opportunities - that's why STORE_FLAG_VALUE
exists.  For example with vector comparisons a < v result, when
performing bitwise operations on it, you either have to make the target
expand code to produce {0, -1} even if the natural compare instruction
would, say, produce {0, 0x80000} - or not constrain the possible values
of its result (like forwprop would do with your patch).  In general we
want constant folding to yield the same results as if the HW carried
out the operation to make -O0 code not diverge from -O1.  Thus,

v4si g;
int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }

should not assign different values to g dependent on constant propagation
performed or not.  The easiest way out is something like STORE_FLAG_VALUE
if there does not exist a middle-end choice for vector true / false components
that can be easily generated from what the target produces.

Like if you perform a FP comparison

int main () { double x = 1.0; static _Bool b; b = x < 3.0; }

you get without CCP on x86_64:

        ucomisd -8(%rbp), %xmm0
        seta    %al
        movb    %al, b.1715(%rip)

thus the equivalent of

    flag_reg = x < 3.0;
    b = flag_reg ? 1 : 0;

for vector compares you get something similar:

    flag_vec = x < y;
    res = flag_vec ? { 1, ... } : { 0, ... };

which I think you can see being produced by generic vector lowering
(in do_compare).  Where I can see we indeed use {0, -1} ... which
would match your constant folding behavior.

We may not be able to easily recover from this intermediate step
with combine (I'm not sure), so a target dependent value may
be prefered.


>>> +/* Return true if EXPR is an integer constant representing true.  */
>>> +
>>> +bool
>>> +integer_truep (const_tree expr)
>>> +{
>>> +  STRIP_NOPS (expr);
>>> +
>>> +  switch (TREE_CODE (expr))
>>> +    {
>>> +    case INTEGER_CST:
>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>
>>
>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>> usually differs for FP vs. integral comparisons and the mode passed
>> to STORE_FLAG_VALUE is that of the comparison result).
>
>
> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
> simplify-rtx, in a way that seems a bit strange but I'll try to
> understand that later). Thanks for showing me this macro, it seems
> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
> the RTL level.
>
> It looks like it would be possible to have 3 different semantics:
> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
> and back-end is whatever the target wants. The front-end would generate
> for a<b : vec_cond_expr(a<b,-1,0)

seems like the middle-end uses this for lowering vector compares,
a < b -> { a[0] < b[0] ? -1 : 0, ... }

> and for a?b:c : vec_cond_expr(a<0,b,c)

it looks like ?: is not generally handled by tree-vect-generic, so it must
be either not supported by the frontend or lowered therein (ISTR
it is forced to appear as a != {0,...} ? ... : ...)

> and there is no need for the middle-end to use the same representation
> of comparisons as the front-ends or targets (expand of a vec_cond_expr
> whose first argument is not a comparison would use != 0 if we chose a 0
> / 1 encoding).
>
> However, since the front-ends and many targets agree on the OpenCL
> semantics, it means introducing a conversion back and forth in the
> middle-end, which may complicate things a bit. Note also that we already
> have constant_boolean_node that returns a vector of -1 for true, and
> that the front-end doesn't currently generate a vec_cond_expr for a<b.

Yeah, I realized that now.

> It is true though that using 1 for true would fit better with the scalar
> ops.
>
> Well, if someone feels like taking a decision... I am happy with either
> choice, as long as I know where to go.

I'd say adjust your fold-const patch to not negate the scalar result
but build a proper -1 / 0 value based on integer_zerop().

Thanks,
Richard.

>> That said, until we are sure what semantics we want here (forwprop
>> for example doesn't look at 'comparisons' but operations on special
>> values and types) I'd prefer to not introduce integer_truep ().
>
>
> I completely agree that defining the semantics comes first :-)
>
> --
> Marc Glisse

Marc Glisse Oct. 5, 2012, 3:01 p.m. UTC | #4

[I am still a little confused, sorry for the long email...]

On Tue, 2 Oct 2012, Richard Guenther wrote:

>>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>> +    {
>>>> +      int count = VECTOR_CST_NELTS (op0);
>>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>>> +
>>>> +      for (int i = 0; i < count; i++)
>>>> +       {
>>>> +         tree elem_type = TREE_TYPE (type);
>>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>>> +
>>>> +         elts[i] = fold_relational_const (code, elem_type,
>>>> +                                          elem0, elem1);
>>>> +
>>>> +         if(elts[i] == NULL_TREE)
>>>> +           return NULL_TREE;
>>>> +
>>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>>
>>>
>>> I think you need to invent something new similar to STORE_FLAG_VALUE
>>> or use STORE_FLAG_VALUE here.  With the above you try to map
>>> {0, 1} to {0, -1} which is only true if the operation on the element types
>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>
>> Er, seems to me that constant folding of a scalar comparison in the
>> front/middle-end only returns {0, 1}.
[and later]
> I'd say adjust your fold-const patch to not negate the scalar result
> but build a proper -1 / 0 value based on integer_zerop().

I don't mind doing it that way, but I would like to understand first. 
LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So 
negating should be the same as testing with integer_zerop to build -1 or 
0. Is it just a matter of style (then I am ok), or am I missing a reason 
which makes the negation wrong?

> The point is we need to define some semantics for vector comparison
> results.

Yes. I think a documentation patch should come first: generic.texi is 
missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't 
mention vectors. But before that we need to decide what to put there...

> One variant is to make it target independent which in turn
> would inhibit (or make it more difficult) to exploit some target features.
> You for example use {0, -1} for truth values - probably to exploit target
> features -

Actually it was mostly because that is the meaning in the language. OpenCL 
says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB 
of the elements in the condition. The fact that it matches what some 
targets do is a simple consequence of the fact that OpenCL was based on 
what hardware already did.

> even though the most natural middle-end way would be to
> use {0, 1} as for everything else

I agree that it would be natural and convenient in a number of places.

> (caveat: there may be both signed and unsigned bools, we don't allow 
> vector components with non-mode precision, thus you could argue that a 
> signed bool : 1 is just "sign-extended" for your solution).

Not sure how that would translate in the code.

> A different variant is to make it target dependent to leverage 
> optimization opportunities

That's an interesting possibility...

> that's why STORE_FLAG_VALUE exists.

AFAICS it only appears when we go from gimple to rtl, not before (and 
there is already a VECTOR_STORE_FLAG_VALUE, although no target defines 
it). Which doesn't mean we couldn't make it appear earlier for vectors.

> For example with vector comparisons a < v result, when
> performing bitwise operations on it, you either have to make the target
> expand code to produce {0, -1} even if the natural compare instruction
> would, say, produce {0, 0x80000} - or not constrain the possible values
> of its result (like forwprop would do with your patch).  In general we
> want constant folding to yield the same results as if the HW carried
> out the operation to make -O0 code not diverge from -O1.  Thus,
>
> v4si g;
> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>
> should not assign different values to g dependent on constant propagation
> performed or not.

That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether 
your target likes it or not. Depending on how things are handled, 
comparisons could be constrained internally to only appear (possibly 
indirectly) in the first argument of a vec_cond_expr.

> The easiest way out is something like STORE_FLAG_VALUE
> if there does not exist a middle-end choice for vector true / false components
> that can be easily generated from what the target produces.
>
> Like if you perform a FP comparison
>
> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>
> you get without CCP on x86_64:
>
>        ucomisd -8(%rbp), %xmm0
>        seta    %al
>        movb    %al, b.1715(%rip)
>
> thus the equivalent of
>
>    flag_reg = x < 3.0;
>    b = flag_reg ? 1 : 0;

where this expansion happens in the back-end.

> for vector compares you get something similar:
>
>    flag_vec = x < y;
>    res = flag_vec ? { 1, ... } : { 0, ... };
>
> which I think you can see being produced by generic vector lowering
> (in do_compare).  Where I can see we indeed use {0, -1} ... which
> would match your constant folding behavior.
>
> We may not be able to easily recover from this intermediate step
> with combine (I'm not sure), so a target dependent value may
> be prefered.

Being able to optimize it is indeed a key point. Let's try on an example 
(not assuming any specific representation in the middle-end for now). Say 
I write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)

The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y

On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE 
is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0 
can also disappear if the vcond instruction only looks at the MSB (x86). 
And we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.

On other architectures, expecting the back-end to simplify everything does 
seem hard. But it isn't obvious how to handle it in the middle end either.
Some other forms we could imagine the middle-end producing:
(a<b)?(c<d)?x:y:y
or assuming that VECTOR_STORE_FLAG_VALUE is defined:
(((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
Both would require special code to happen.

But then how do we handle for instance sparc, where IIUC comparing 2 
vectors returns an integer, where bits 0, 1, etc of the integer represent 
true/false for the comparisons of elements 0, 1, etc of the vectors (as in 
vec_merge, but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not 
possible since comparisons don't return a vector, but we would still want 
to compute a<b, c<d, and perform an AND of those 2 integers before calling 
the usual code for the selection.

If we assume a -1/0 and MSB representation in the middle-end, the 
front-end could just pass ((a<b)&(c<d))?x:y to the middle-end. When 
moving to the back-end, "nothing" would happen on x86.

Comparing x86, neon and altivec, they all have comparisons that return a 
vector of -1 and 0. On the other hand, they have different selection 
instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select 
and thus requires exactly -1 or 0. It thus seems to me that we should 
decide in the middle-end that vector comparisons return vectors of -1 and 
0. VEC_COND_EXPR is more complicated. We could for instance require that 
it takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon 
thing are equivalent). Which would leave to decide what the expansion of 
vec_cond_expr passes to the targets when the first argument is not a 
comparison, between !=0, <0, ==-1 or others (I vote for <0 because of 
opencl). One issue is that targets wouldn't know if it was a dummy 
comparison that can safely be ignored because the other part is the result 
of logical operations on comparisons (thus composed of -1 and 0) or a 
genuine comparison with an arbitrary vector, so a new optimization would 
be needed (in the back-end I guess or we would need an alternate 
instruction to vcond) to detect if a vector is a "signed boolean" vector.
We could instead say that vec_cond_expr really follows OpenCL's semantics 
and looks at the MSB of each element. I am not sure that would change 
much, it would mostly delay the apparition of <0 to RTL expansion time 
(and thus make gimple slightly lighter).

>>>> +/* Return true if EXPR is an integer constant representing true.  */
>>>> +
>>>> +bool
>>>> +integer_truep (const_tree expr)
>>>> +{
>>>> +  STRIP_NOPS (expr);
>>>> +
>>>> +  switch (TREE_CODE (expr))
>>>> +    {
>>>> +    case INTEGER_CST:
>>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>>
>>>
>>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>>> usually differs for FP vs. integral comparisons and the mode passed
>>> to STORE_FLAG_VALUE is that of the comparison result).
>>
>>
>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
>> simplify-rtx, in a way that seems a bit strange but I'll try to
>> understand that later). Thanks for showing me this macro, it seems
>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
>> the RTL level.
>>
>> It looks like it would be possible to have 3 different semantics:
>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
>> and back-end is whatever the target wants. The front-end would generate
>> for a<b : vec_cond_expr(a<b,-1,0)
>
> seems like the middle-end uses this for lowering vector compares,
> a < b -> { a[0] < b[0] ? -1 : 0, ... }
>
>> and for a?b:c : vec_cond_expr(a<0,b,c)
>
> it looks like ?: is not generally handled by tree-vect-generic, so it must
> be either not supported by the frontend or lowered therein (ISTR
> it is forced to appear as a != {0,...} ? ... : ...)

Not supported by the front-end yet (not even by the gimplifier), I have 
(bad) patches but I can't really finish them before this conversation is 
done.

I think there are quite few places in the middle-end that assume that 
comparisons return a vector of -1/0 and even fewer that vec_cond_expr only 
looks at the MSB of each element. So it is still time to change that if 
you want to. But if we want to change it, I think it should happen now 
before even more vector code gets in (not particularly my patches, I am 
thinking of cilk and others too).

Ok, that's long enough, I need to send it now...

Richard Biener Oct. 8, 2012, 9:04 a.m. UTC | #5

On Fri, Oct 5, 2012 at 5:01 PM, Marc Glisse <marc.glisse@inria.fr> wrote:
> [I am still a little confused, sorry for the long email...]
>
>
> On Tue, 2 Oct 2012, Richard Guenther wrote:
>
>>>>> +  if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>>>>> +    {
>>>>> +      int count = VECTOR_CST_NELTS (op0);
>>>>> +      tree *elts =  XALLOCAVEC (tree, count);
>>>>> +      gcc_assert (TREE_CODE (type) == VECTOR_TYPE);
>>>>> +
>>>>> +      for (int i = 0; i < count; i++)
>>>>> +       {
>>>>> +         tree elem_type = TREE_TYPE (type);
>>>>> +         tree elem0 = VECTOR_CST_ELT (op0, i);
>>>>> +         tree elem1 = VECTOR_CST_ELT (op1, i);
>>>>> +
>>>>> +         elts[i] = fold_relational_const (code, elem_type,
>>>>> +                                          elem0, elem1);
>>>>> +
>>>>> +         if(elts[i] == NULL_TREE)
>>>>> +           return NULL_TREE;
>>>>> +
>>>>> +         elts[i] = fold_negate_const (elts[i], elem_type);
>>>>
>>>>
>>>>
>>>> I think you need to invent something new similar to STORE_FLAG_VALUE
>>>> or use STORE_FLAG_VALUE here.  With the above you try to map
>>>> {0, 1} to {0, -1} which is only true if the operation on the element
>>>> types
>>>> returns {0, 1} (thus, STORE_FLAG_VALUE is 1).
>>>
>>>
>>> Er, seems to me that constant folding of a scalar comparison in the
>>> front/middle-end only returns {0, 1}.
>
> [and later]
>
>> I'd say adjust your fold-const patch to not negate the scalar result
>> but build a proper -1 / 0 value based on integer_zerop().
>
>
> I don't mind doing it that way, but I would like to understand first.
> LT_EXPR on scalars is guaranteed (in generic.texi) to be 0 or 1. So negating
> should be the same as testing with integer_zerop to build -1 or 0. Is it
> just a matter of style (then I am ok), or am I missing a reason which makes
> the negation wrong?

Just a matter of style.  Negating is a lot less descriptive for the actual
set of return values we produce.

>> The point is we need to define some semantics for vector comparison
>> results.
>
>
> Yes. I think a documentation patch should come first: generic.texi is
> missing an entry for VEC_COND_EXPR and the entry for LT_EXPR doesn't mention
> vectors. But before that we need to decide what to put there...
>
>
>> One variant is to make it target independent which in turn
>> would inhibit (or make it more difficult) to exploit some target features.
>> You for example use {0, -1} for truth values - probably to exploit target
>> features -
>
>
> Actually it was mostly because that is the meaning in the language. OpenCL
> says that a<b is a vector of 0 and -1, and that ?: only looks at the MSB of
> the elements in the condition. The fact that it matches what some targets do
> is a simple consequence of the fact that OpenCL was based on what hardware
> already did.

Yes, it seems that the {0, -1} choice is most reasonable for GENERIC.  So
let's document that.

>
>> even though the most natural middle-end way would be to
>> use {0, 1} as for everything else
>
>
> I agree that it would be natural and convenient in a number of places.
>
>
>> (caveat: there may be both signed and unsigned bools, we don't allow
>> vector components with non-mode precision, thus you could argue that a
>> signed bool : 1 is just "sign-extended" for your solution).
>
>
> Not sure how that would translate in the code.
>
>
>> A different variant is to make it target dependent to leverage
>> optimization opportunities
>
>
> That's an interesting possibility...
>
>
>> that's why STORE_FLAG_VALUE exists.
>
>
> AFAICS it only appears when we go from gimple to rtl, not before (and there
> is already a VECTOR_STORE_FLAG_VALUE, although no target defines it). Which
> doesn't mean we couldn't make it appear earlier for vectors.
>
>
>> For example with vector comparisons a < v result, when
>> performing bitwise operations on it, you either have to make the target
>> expand code to produce {0, -1} even if the natural compare instruction
>> would, say, produce {0, 0x80000} - or not constrain the possible values
>> of its result (like forwprop would do with your patch).  In general we
>> want constant folding to yield the same results as if the HW carried
>> out the operation to make -O0 code not diverge from -O1.  Thus,
>>
>> v4si g;
>> int main() { g = { 1, 2, 3, 4 } < { 4, 3, 2, 1}; }
>>
>> should not assign different values to g dependent on constant propagation
>> performed or not.
>
>
> That one is clear, OpenCL constrains the answer to be {-1,-1,0,0}, whether
> your target likes it or not. Depending on how things are handled,
> comparisons could be constrained internally to only appear (possibly
> indirectly) in the first argument of a vec_cond_expr.

Yes, I realized that later.

>
>> The easiest way out is something like STORE_FLAG_VALUE
>> if there does not exist a middle-end choice for vector true / false
>> components
>> that can be easily generated from what the target produces.
>>
>> Like if you perform a FP comparison
>>
>> int main () { double x = 1.0; static _Bool b; b = x < 3.0; }
>>
>> you get without CCP on x86_64:
>>
>>        ucomisd -8(%rbp), %xmm0
>>        seta    %al
>>        movb    %al, b.1715(%rip)
>>
>> thus the equivalent of
>>
>>    flag_reg = x < 3.0;
>>    b = flag_reg ? 1 : 0;
>
>
> where this expansion happens in the back-end.

In the target specific expander for the comparison.

>
>> for vector compares you get something similar:
>>
>>    flag_vec = x < y;
>>    res = flag_vec ? { 1, ... } : { 0, ... };
>>
>> which I think you can see being produced by generic vector lowering
>> (in do_compare).  Where I can see we indeed use {0, -1} ... which
>> would match your constant folding behavior.
>>
>> We may not be able to easily recover from this intermediate step
>> with combine (I'm not sure), so a target dependent value may
>> be prefered.
>
>
> Being able to optimize it is indeed a key point. Let's try on an example
> (not assuming any specific representation in the middle-end for now). Say I
> write this C/OpenCL code: ((a<b)&&(c<d))?x:y (not currently supported)
>
> The front-end gives to the middle-end: ((((a<b)?-1:0)&((c<d)?-1:0))<0)?x:y
>
> On an architecture like sse, neon or altivec where VECTOR_STORE_FLAG_VALUE
> is -1 (well, should be), expansion of (a<b)?-1:0 would just be a<b. The <0
> can also disappear if the vcond instruction only looks at the MSB (x86). And
> we are left in the back-end with ((a<b)&(c<d))?x:y, as desired.
>
> On other architectures, expecting the back-end to simplify everything does
> seem hard. But it isn't obvious how to handle it in the middle end either.
> Some other forms we could imagine the middle-end producing:
> (a<b)?(c<d)?x:y:y
> or assuming that VECTOR_STORE_FLAG_VALUE is defined:
> (((a<b)&(c<d))!=0)?x:y (back-end would remove the != 0 on altivec)
> Both would require special code to happen.

True.

> But then how do we handle for instance sparc, where IIUC comparing 2 vectors
> returns an integer, where bits 0, 1, etc of the integer represent true/false
> for the comparisons of elements 0, 1, etc of the vectors (as in vec_merge,
> but not constant)? Defining VECTOR_STORE_FLAG_VALUE is not possible since
> comparisons don't return a vector, but we would still want to compute a<b,
> c<d, and perform an AND of those 2 integers before calling the usual code
> for the selection.

Yeah ... :/

>
> If we assume a -1/0 and MSB representation in the middle-end, the front-end
> could just pass ((a<b)&(c<d))?x:y to the middle-end. When moving to the
> back-end, "nothing" would happen on x86.

But the frontend needs to follow a language standard (which seems to
be OpenCL for C).  It of course could see the conditions are only
used in a COND_EXPR and try to optimize that.

> Comparing x86, neon and altivec, they all have comparisons that return a
> vector of -1 and 0. On the other hand, they have different selection
> instructions. x86 uses <0, altivec uses !=0 and neon has a bitwise select
> and thus requires exactly -1 or 0. It thus seems to me that we should decide
> in the middle-end that vector comparisons return vectors of -1 and 0.

Yes, I think -1 and 0 are indeed the best choice.

> VEC_COND_EXPR is more complicated. We could for instance require that it
> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
> thing are equivalent). Which would leave to decide what the expansion of
> vec_cond_expr passes to the targets when the first argument is not a
> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
> opencl). One issue is that targets wouldn't know if it was a dummy
> comparison that can safely be ignored because the other part is the result
> of logical operations on comparisons (thus composed of -1 and 0) or a
> genuine comparison with an arbitrary vector, so a new optimization would be
> needed (in the back-end I guess or we would need an alternate instruction to
> vcond) to detect if a vector is a "signed boolean" vector.
> We could instead say that vec_cond_expr really follows OpenCL's semantics
> and looks at the MSB of each element. I am not sure that would change much,
> it would mostly delay the apparition of <0 to RTL expansion time (and thus
> make gimple slightly lighter).

I think we should delay the decision on how to optimize this.  It's indeed
not trivial and the GIMPLE middle-end aggressively forwards feeding
comparisons into the VEC_COND_EXPR expressions already (somewhat
defeating any CSE that might be possible here) in forwprop.

>
>
>>>>> +/* Return true if EXPR is an integer constant representing true.  */
>>>>> +
>>>>> +bool
>>>>> +integer_truep (const_tree expr)
>>>>> +{
>>>>> +  STRIP_NOPS (expr);
>>>>> +
>>>>> +  switch (TREE_CODE (expr))
>>>>> +    {
>>>>> +    case INTEGER_CST:
>>>>> +      /* Do not just test != 0, some places expect the value 1.  */
>>>>> +      return (TREE_INT_CST_LOW (expr) == 1
>>>>> +             && TREE_INT_CST_HIGH (expr) == 0);
>>>>
>>>>
>>>>
>>>> I wonder if using STORE_FLAG_VALUE is better here (note that it
>>>> usually differs for FP vs. integral comparisons and the mode passed
>>>> to STORE_FLAG_VALUE is that of the comparison result).
>>>
>>>
>>>
>>> I notice there is already a VECTOR_STORE_FLAG_VALUE (used only once in
>>> simplify-rtx, in a way that seems a bit strange but I'll try to
>>> understand that later). Thanks for showing me this macro, it seems
>>> important indeed. However the STORE_FLAG_VALUE mechanism seems to be for
>>> the RTL level.
>>>
>>> It looks like it would be possible to have 3 different semantics:
>>> source code is OpenCL, middle-end whatever we want (0 / 1 for instance),
>>> and back-end is whatever the target wants. The front-end would generate
>>> for a<b : vec_cond_expr(a<b,-1,0)
>>
>>
>> seems like the middle-end uses this for lowering vector compares,
>> a < b -> { a[0] < b[0] ? -1 : 0, ... }
>>
>>> and for a?b:c : vec_cond_expr(a<0,b,c)
>>
>>
>> it looks like ?: is not generally handled by tree-vect-generic, so it must
>> be either not supported by the frontend or lowered therein (ISTR
>> it is forced to appear as a != {0,...} ? ... : ...)
>
>
> Not supported by the front-end yet (not even by the gimplifier), I have
> (bad) patches but I can't really finish them before this conversation is
> done.
>
>
>
> I think there are quite few places in the middle-end that assume that
> comparisons return a vector of -1/0 and even fewer that vec_cond_expr only
> looks at the MSB of each element. So it is still time to change that if you
> want to. But if we want to change it, I think it should happen now before
> even more vector code gets in (not particularly my patches, I am thinking of
> cilk and others too).

I think we should document the -1/0 fact and stick to it.

Thanks,
Richard.

>
> Ok, that's long enough, I need to send it now...
>
> --
> Marc Glisse

Marc Glisse Oct. 8, 2012, 9:34 a.m. UTC | #6

On Mon, 8 Oct 2012, Richard Guenther wrote:

>> VEC_COND_EXPR is more complicated. We could for instance require that it
>> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
>> thing are equivalent). Which would leave to decide what the expansion of
>> vec_cond_expr passes to the targets when the first argument is not a
>> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
>> opencl). One issue is that targets wouldn't know if it was a dummy
>> comparison that can safely be ignored because the other part is the result
>> of logical operations on comparisons (thus composed of -1 and 0) or a
>> genuine comparison with an arbitrary vector, so a new optimization would be
>> needed (in the back-end I guess or we would need an alternate instruction to
>> vcond) to detect if a vector is a "signed boolean" vector.
>> We could instead say that vec_cond_expr really follows OpenCL's semantics
>> and looks at the MSB of each element. I am not sure that would change much,
>> it would mostly delay the apparition of <0 to RTL expansion time (and thus
>> make gimple slightly lighter).
>
> I think we should delay the decision on how to optimize this.  It's indeed
> not trivial and the GIMPLE middle-end aggressively forwards feeding
> comparisons into the VEC_COND_EXPR expressions already (somewhat
> defeating any CSE that might be possible here) in forwprop.

Thanks for going through the long email :-)

What does that imply for the first argument of VEC_COND_EXPR? Currently, 
the expander asserts that it is a comparison, but that is not reflected in 
the gimple checkers.

If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the 
case for a comparison), I don't think it prevents from later relaxing that 
to <0 or !=0. But then I don't know how to handle expansion when the 
argument is neither a comparison (vcond) nor a constant (vec_merge? I 
haven't tried but that should be doable), I would have to pass <0 or !=0 
to the target. So is the best choice to document that VEC_COND_EXPR takes 
as first argument a comparison and make gimple checking reflect that? 
(seems sad, but at least that would tell me what I can/can't do)

By the way, since we are documenting comparisons as returning 0 and -1, 
does that bring back the integer_truep predicate?

Richard Biener Oct. 8, 2012, 9:44 a.m. UTC | #7

On Mon, Oct 8, 2012 at 11:34 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 8 Oct 2012, Richard Guenther wrote:
>
>>> VEC_COND_EXPR is more complicated. We could for instance require that it
>>> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
>>> thing are equivalent). Which would leave to decide what the expansion of
>>> vec_cond_expr passes to the targets when the first argument is not a
>>> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
>>> opencl). One issue is that targets wouldn't know if it was a dummy
>>> comparison that can safely be ignored because the other part is the
>>> result
>>> of logical operations on comparisons (thus composed of -1 and 0) or a
>>> genuine comparison with an arbitrary vector, so a new optimization would
>>> be
>>> needed (in the back-end I guess or we would need an alternate instruction
>>> to
>>> vcond) to detect if a vector is a "signed boolean" vector.
>>> We could instead say that vec_cond_expr really follows OpenCL's semantics
>>> and looks at the MSB of each element. I am not sure that would change
>>> much,
>>> it would mostly delay the apparition of <0 to RTL expansion time (and
>>> thus
>>> make gimple slightly lighter).
>>
>>
>> I think we should delay the decision on how to optimize this.  It's indeed
>> not trivial and the GIMPLE middle-end aggressively forwards feeding
>> comparisons into the VEC_COND_EXPR expressions already (somewhat
>> defeating any CSE that might be possible here) in forwprop.
>
>
> Thanks for going through the long email :-)
>
> What does that imply for the first argument of VEC_COND_EXPR? Currently, the
> expander asserts that it is a comparison, but that is not reflected in the
> gimple checkers.

And I don't think we should reflect that in the gimple checkers rather fixup the
expander (transparently use p != 0 or p < 0).

> If we document that VEC_COND_EXPR takes a vector of -1 and 0 (which is the
> case for a comparison), I don't think it prevents from later relaxing that
> to <0 or !=0. But then I don't know how to handle expansion when the
> argument is neither a comparison (vcond) nor a constant (vec_merge? I
> haven't tried but that should be doable), I would have to pass <0 or !=0 to
> the target.

Yes.

> So is the best choice to document that VEC_COND_EXPR takes as
> first argument a comparison and make gimple checking reflect that? (seems
> sad, but at least that would tell me what I can/can't do)

No, that would just mean that in GIMPLE you'd add this p != 0 or p < 0.
And at some point in the future I really really want to push this embedded
expression to a separate statement so I have a SSA definition for it.

> By the way, since we are documenting comparisons as returning 0 and -1, does
> that bring back the integer_truep predicate?

Not sure, true would still be != 0 or all_onesp (all bits of the
precision are 1), no?

Richard.

> --
> Marc Glisse

Marc Glisse Oct. 10, 2012, 11:20 p.m. UTC | #8

On Mon, 8 Oct 2012, Richard Guenther wrote:

> On Mon, Oct 8, 2012 at 11:34 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> On Mon, 8 Oct 2012, Richard Guenther wrote:
>>
>>>> VEC_COND_EXPR is more complicated. We could for instance require that it
>>>> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
>>>> thing are equivalent). Which would leave to decide what the expansion of
>>>> vec_cond_expr passes to the targets when the first argument is not a
>>>> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
>>>> opencl). One issue is that targets wouldn't know if it was a dummy
>>>> comparison that can safely be ignored because the other part is the
>>>> result
>>>> of logical operations on comparisons (thus composed of -1 and 0) or a
>>>> genuine comparison with an arbitrary vector, so a new optimization would
>>>> be
>>>> needed (in the back-end I guess or we would need an alternate instruction
>>>> to
>>>> vcond) to detect if a vector is a "signed boolean" vector.
>>>> We could instead say that vec_cond_expr really follows OpenCL's semantics
>>>> and looks at the MSB of each element. I am not sure that would change
>>>> much,
>>>> it would mostly delay the apparition of <0 to RTL expansion time (and
>>>> thus
>>>> make gimple slightly lighter).
>>>
>>>
>>> I think we should delay the decision on how to optimize this.  It's indeed
>>> not trivial and the GIMPLE middle-end aggressively forwards feeding
>>> comparisons into the VEC_COND_EXPR expressions already (somewhat
>>> defeating any CSE that might be possible here) in forwprop.
>>
>>
>> Thanks for going through the long email :-)
>>
>> What does that imply for the first argument of VEC_COND_EXPR? Currently, the
>> expander asserts that it is a comparison, but that is not reflected in the
>> gimple checkers.
>
> And I don't think we should reflect that in the gimple checkers rather fixup the
> expander (transparently use p != 0 or p < 0).

I guess I'll pick p < 0 then (just because I am more interested in x86 and 
it makes the optimization easier on x86). Having another expander than 
vcond (one that takes the mask directly instead of a comparison, and for 
which we promise that the argument will be a vector of -1/0) would be 
convenient...

>> So is the best choice to document that VEC_COND_EXPR takes as
>> first argument a comparison and make gimple checking reflect that? (seems
>> sad, but at least that would tell me what I can/can't do)
>
> No, that would just mean that in GIMPLE you'd add this p != 0 or p < 0.
> And at some point in the future I really really want to push this embedded
> expression to a separate statement so I have a SSA definition for it.

Once the expander is ready to accept it, ok. It seems to me that the 
scalar COND_EXPR may also have an embedded expression, so I assume 
COND_EXPR and VEC_COND_EXPR are meant to diverge (or maybe you also want 
to do the same for COND_EXPR?).

>> By the way, since we are documenting comparisons as returning 0 and -1, does
>> that bring back the integer_truep predicate?
>
> Not sure, true would still be != 0 or all_onesp (all bits of the
> precision are 1), no?

I was going to make truep equivalent to onep for scalars and all_onesp for 
vectors (since -1 will be the only value documented as "true" for 
vectors). I guess it can wait, I can manually inline it for now.

Since we are documenting that comparisons of vectors return -1 and 0 in 
the middle-end, I was wondering whether the comparison expanders would 
need updating so they forward to vcond(...,-1,0), at least on platforms 
that don't define VECTOR_STORE_FLAG_VALUE to constm1_rtx for this mode. 
But a simple test on sparc shows it is already fine :-)

Richard Biener Oct. 11, 2012, 12:54 p.m. UTC | #9

On Thu, Oct 11, 2012 at 1:20 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> On Mon, 8 Oct 2012, Richard Guenther wrote:
>
>> On Mon, Oct 8, 2012 at 11:34 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> On Mon, 8 Oct 2012, Richard Guenther wrote:
>>>
>>>>> VEC_COND_EXPR is more complicated. We could for instance require that
>>>>> it
>>>>> takes as first argument a vector of -1 and 0 (thus <0, !=0 and the neon
>>>>> thing are equivalent). Which would leave to decide what the expansion
>>>>> of
>>>>> vec_cond_expr passes to the targets when the first argument is not a
>>>>> comparison, between !=0, <0, ==-1 or others (I vote for <0 because of
>>>>> opencl). One issue is that targets wouldn't know if it was a dummy
>>>>> comparison that can safely be ignored because the other part is the
>>>>> result
>>>>> of logical operations on comparisons (thus composed of -1 and 0) or a
>>>>> genuine comparison with an arbitrary vector, so a new optimization
>>>>> would
>>>>> be
>>>>> needed (in the back-end I guess or we would need an alternate
>>>>> instruction
>>>>> to
>>>>> vcond) to detect if a vector is a "signed boolean" vector.
>>>>> We could instead say that vec_cond_expr really follows OpenCL's
>>>>> semantics
>>>>> and looks at the MSB of each element. I am not sure that would change
>>>>> much,
>>>>> it would mostly delay the apparition of <0 to RTL expansion time (and
>>>>> thus
>>>>> make gimple slightly lighter).
>>>>
>>>>
>>>>
>>>> I think we should delay the decision on how to optimize this.  It's
>>>> indeed
>>>> not trivial and the GIMPLE middle-end aggressively forwards feeding
>>>> comparisons into the VEC_COND_EXPR expressions already (somewhat
>>>> defeating any CSE that might be possible here) in forwprop.
>>>
>>>
>>>
>>> Thanks for going through the long email :-)
>>>
>>> What does that imply for the first argument of VEC_COND_EXPR? Currently,
>>> the
>>> expander asserts that it is a comparison, but that is not reflected in
>>> the
>>> gimple checkers.
>>
>>
>> And I don't think we should reflect that in the gimple checkers rather
>> fixup the
>> expander (transparently use p != 0 or p < 0).
>
>
> I guess I'll pick p < 0 then (just because I am more interested in x86 and
> it makes the optimization easier on x86). Having another expander than vcond
> (one that takes the mask directly instead of a comparison, and for which we
> promise that the argument will be a vector of -1/0) would be convenient...
>
>
>>> So is the best choice to document that VEC_COND_EXPR takes as
>>> first argument a comparison and make gimple checking reflect that? (seems
>>> sad, but at least that would tell me what I can/can't do)
>>
>>
>> No, that would just mean that in GIMPLE you'd add this p != 0 or p < 0.
>> And at some point in the future I really really want to push this embedded
>> expression to a separate statement so I have a SSA definition for it.
>
>
> Once the expander is ready to accept it, ok. It seems to me that the scalar
> COND_EXPR may also have an embedded expression, so I assume COND_EXPR and
> VEC_COND_EXPR are meant to diverge (or maybe you also want to do the same
> for COND_EXPR?).

Yes, I want the same for COND_EXPR and even GIMPLE_COND.  I had
patches to do this about two years ago but was too lazy to fixup all the
fallout.  My plan was to eventually return to this and first tackle COND_EXPR
and VEC_COND_EXRP only, leaving GIMPLE_COND in place.

>>> By the way, since we are documenting comparisons as returning 0 and -1,
>>> does
>>> that bring back the integer_truep predicate?
>>
>>
>> Not sure, true would still be != 0 or all_onesp (all bits of the
>> precision are 1), no?
>
>
> I was going to make truep equivalent to onep for scalars and all_onesp for
> vectors (since -1 will be the only value documented as "true" for vectors).
> I guess it can wait, I can manually inline it for now.

Yes please.

> Since we are documenting that comparisons of vectors return -1 and 0 in the
> middle-end, I was wondering whether the comparison expanders would need
> updating so they forward to vcond(...,-1,0), at least on platforms that
> don't define VECTOR_STORE_FLAG_VALUE to constm1_rtx for this mode. But a
> simple test on sparc shows it is already fine :-)

Heh, good.

Richard.

> --
> Marc Glisse

Constant-fold vector comparisons

Commit Message

Comments

Patch