diff mbox

Fix fold making valid VEC_PERMs invalid

Message ID alpine.DEB.2.02.1411021029500.15070@stedding.saclay.inria.fr
State New
Headers show

Commit Message

Marc Glisse Nov. 2, 2014, 10:08 a.m. UTC
On Thu, 30 Oct 2014, Richard Biener wrote:

> On Thu, 30 Oct 2014, Jakub Jelinek wrote:
>
>> On Thu, Oct 30, 2014 at 09:56:32AM +0100, Richard Biener wrote:
>>>
>>> The following patch makes fold_ternary no longer make
>>> VEC_PERMs valid for the target invalid.  As pointed out
>>> in the PR we only need to make sure this doesn't happen
>>> after vector lowering.
>>
>> Well, even if you do that before vector lowering, if the original
>> mask is fine and new one is not, you'd seriously pessimize code.
>
> Note that what fold does here isn't very elaborate - it just
> tries hard to make a two-input VEC_PERM a one-input one which
> is good for canonicalization and CSE.  I'd say that the
> optabs.c code should ideally be able to recover the working
> variant (or the target expanders should be more clever...).

It may be hard for optabs.c. The target expanders should really be fixed. 
This isn't the same as when we were talking of combining any permutations, 
here it is something that should be fairly easy to do. In that sense, 
simply avoiding the ICE (at least in release mode) and leaving 
optimization to the targets should be enough. Still, I am proposing 
something closer to Jakub's suggestion below.

>> How about moving the VEC_PERM_EXPR arg2 == VECTOR_CST folding
>> into a separate function with single_arg argument, call it with
>> operand_equal_p (op0, op1, 0) initially and call that function again
>> if single_arg and !can_vec_perm_p (...), that time with
>> single_arg parameter false?

If the original permutation is !can_vec_perm_p, I believe we should 
transform. That can indeed be done with a function. I did it without a 
function, but I can try to rewrite it if you want.

> Or how about removing the code instead and doing it during
> vector lowering if the original permute is not !can_vec_perm_p?

I'd rather not delay that much.


Here is a proposed patch that passed bootstrap+testsuite on 
x86_64-linux-gnu. I did not test on arm (or whichever platform it was that 
failed). If can_vec_perm_p is costly, we can test single_arg first 
(otherwise the 2 can_vec_perm_p must be the same) and test 
PROP_gimple_lvec before the second can_vec_perm_p (which must answer true 
then).

I don't remember what the arg2 == op2 test is about, so I kept it. I also 
didn't try to fix TREE_SIDE_EFFECTS handling, a quick test didn't trigger 
that issue.

2014-11-03  Marc Glisse  <marc.glisse@inria.fr>

 	* fold-const.c: Include "optabs.h".
 	(fold_ternary_loc) <VEC_PERM_EXPR>: Avoid canonicalizing a
 	can_vec_perm_p permutation to one that is not.

Comments

Marc Glisse Nov. 2, 2014, 10:43 a.m. UTC | #1
On Sun, 2 Nov 2014, Marc Glisse wrote:

> On Thu, 30 Oct 2014, Richard Biener wrote:
>
>> On Thu, 30 Oct 2014, Jakub Jelinek wrote:
>> 
>>> On Thu, Oct 30, 2014 at 09:56:32AM +0100, Richard Biener wrote:
>>>> 
>>>> The following patch makes fold_ternary no longer make
>>>> VEC_PERMs valid for the target invalid.  As pointed out
>>>> in the PR we only need to make sure this doesn't happen
>>>> after vector lowering.
>>> 
>>> Well, even if you do that before vector lowering, if the original
>>> mask is fine and new one is not, you'd seriously pessimize code.
>> 
>> Note that what fold does here isn't very elaborate - it just
>> tries hard to make a two-input VEC_PERM a one-input one which
>> is good for canonicalization and CSE.  I'd say that the
>> optabs.c code should ideally be able to recover the working
>> variant (or the target expanders should be more clever...).
>
> It may be hard for optabs.c. The target expanders should really be fixed. 
> This isn't the same as when we were talking of combining any permutations, 
> here it is something that should be fairly easy to do. In that sense, simply 
> avoiding the ICE (at least in release mode) and leaving optimization to the 
> targets should be enough. Still, I am proposing something closer to Jakub's 
> suggestion below.
>
>>> How about moving the VEC_PERM_EXPR arg2 == VECTOR_CST folding
>>> into a separate function with single_arg argument, call it with
>>> operand_equal_p (op0, op1, 0) initially and call that function again
>>> if single_arg and !can_vec_perm_p (...), that time with
>>> single_arg parameter false?
>
> If the original permutation is !can_vec_perm_p, I believe we should 
> transform. That can indeed be done with a function. I did it without a 
> function, but I can try to rewrite it if you want.
>
>> Or how about removing the code instead and doing it during
>> vector lowering if the original permute is not !can_vec_perm_p?
>
> I'd rather not delay that much.
>
>
> Here is a proposed patch that passed bootstrap+testsuite on x86_64-linux-gnu. 
> I did not test on arm (or whichever platform it was that failed). If 
> can_vec_perm_p is costly, we can test single_arg first (otherwise the 2 
> can_vec_perm_p must be the same) and test PROP_gimple_lvec before the second 
> can_vec_perm_p (which must answer true then).
>
> I don't remember what the arg2 == op2 test is about, so I kept it. I also 
> didn't try to fix TREE_SIDE_EFFECTS handling, a quick test didn't trigger 
> that issue.
>
> 2014-11-03  Marc Glisse  <marc.glisse@inria.fr>
>

 	PR tree-optimization/63666

> 	* fold-const.c: Include "optabs.h".
> 	(fold_ternary_loc) <VEC_PERM_EXPR>: Avoid canonicalizing a
> 	can_vec_perm_p permutation to one that is not.

(sorry for the incomplete ChangeLog in the previous email)
Jakub Jelinek Nov. 2, 2014, 11:21 a.m. UTC | #2
On Sun, Nov 02, 2014 at 11:08:57AM +0100, Marc Glisse wrote:
> @@ -14189,47 +14190,47 @@ fold_ternary_loc (location_t loc, enum t
>  	return fold_build2_loc (loc, PLUS_EXPR, type,
>  				const_binop (MULT_EXPR, arg0, arg1), arg2);
>        if (integer_zerop (arg2))
>  	return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
>  
>        return fold_fma (loc, type, arg0, arg1, arg2);
>  
>      case VEC_PERM_EXPR:
>        if (TREE_CODE (arg2) == VECTOR_CST)
>  	{
> -	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask;
> +	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask, mask2;
>  	  unsigned char *sel = XALLOCAVEC (unsigned char, nelts);
> +	  unsigned char *sel2 = XALLOCAVEC (unsigned char, nelts);

Can't you just XALLOCAVEC 2 * nelts and set sel2 = sel + nelts; ?

>  	  bool need_mask_canon = false;
> +	  bool need_mask_canon2 = false;
>  	  bool all_in_vec0 = true;
>  	  bool all_in_vec1 = true;
>  	  bool maybe_identity = true;
>  	  bool single_arg = (op0 == op1);
>  	  bool changed = false;
>  
>  	  mask = single_arg ? (nelts - 1) : (2 * nelts - 1);
> +	  mask2 = 2 * nelts - 1;

Perhaps mask2 = 2 * nelts - 1; first and use mask2 in mask = ?

Otherwise LGTM.

	Jakub
diff mbox

Patch

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	(revision 217007)
+++ gcc/fold-const.c	(working copy)
@@ -75,20 +75,21 @@  along with GCC; see the file COPYING3.
 #include "gimple.h"
 #include "gimplify.h"
 #include "tree-dfa.h"
 #include "hash-table.h"  /* Required for ENABLE_FOLD_CHECKING.  */
 #include "builtins.h"
 #include "hash-map.h"
 #include "plugin-api.h"
 #include "ipa-ref.h"
 #include "cgraph.h"
 #include "generic-match.h"
+#include "optabs.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
    otherwise.  */
 int folding_initializer = 0;
 
 /* The following constants represent a bit based encoding of GCC's
    comparison operators.  This encoding simplifies transformations
    on relational comparison operators, such as AND and OR.  */
 enum comparison_code {
   COMPCODE_FALSE = 0,
@@ -14189,47 +14190,47 @@  fold_ternary_loc (location_t loc, enum t
 	return fold_build2_loc (loc, PLUS_EXPR, type,
 				const_binop (MULT_EXPR, arg0, arg1), arg2);
       if (integer_zerop (arg2))
 	return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1);
 
       return fold_fma (loc, type, arg0, arg1, arg2);
 
     case VEC_PERM_EXPR:
       if (TREE_CODE (arg2) == VECTOR_CST)
 	{
-	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask;
+	  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type), i, mask, mask2;
 	  unsigned char *sel = XALLOCAVEC (unsigned char, nelts);
+	  unsigned char *sel2 = XALLOCAVEC (unsigned char, nelts);
 	  bool need_mask_canon = false;
+	  bool need_mask_canon2 = false;
 	  bool all_in_vec0 = true;
 	  bool all_in_vec1 = true;
 	  bool maybe_identity = true;
 	  bool single_arg = (op0 == op1);
 	  bool changed = false;
 
 	  mask = single_arg ? (nelts - 1) : (2 * nelts - 1);
+	  mask2 = 2 * nelts - 1;
 	  gcc_assert (nelts == VECTOR_CST_NELTS (arg2));
 	  for (i = 0; i < nelts; i++)
 	    {
 	      tree val = VECTOR_CST_ELT (arg2, i);
 	      if (TREE_CODE (val) != INTEGER_CST)
 		return NULL_TREE;
 
 	      /* Make sure that the perm value is in an acceptable
 		 range.  */
 	      wide_int t = val;
-	      if (wi::gtu_p (t, mask))
-		{
-		  need_mask_canon = true;
-		  sel[i] = t.to_uhwi () & mask;
-		}
-	      else
-		sel[i] = t.to_uhwi ();
+	      need_mask_canon |= wi::gtu_p (t, mask);
+	      need_mask_canon2 |= wi::gtu_p (t, mask2);
+	      sel[i] = t.to_uhwi () & mask;
+	      sel2[i] = t.to_uhwi () & mask2;
 
 	      if (sel[i] < nelts)
 		all_in_vec1 = false;
 	      else
 		all_in_vec0 = false;
 
 	      if ((sel[i] & (nelts-1)) != i)
 		maybe_identity = false;
 	    }
 
@@ -14257,20 +14258,31 @@  fold_ternary_loc (location_t loc, enum t
 		  || TREE_CODE (op1) == CONSTRUCTOR))
 	    {
 	      tree t = fold_vec_perm (type, op0, op1, sel);
 	      if (t != NULL_TREE)
 		return t;
 	    }
 
 	  if (op0 == op1 && !single_arg)
 	    changed = true;
 
+	  /* Some targets are deficient and fail to expand a single
+	     argument permutation while still allowing an equivalent
+	     2-argument version.  */
+	  if (need_mask_canon && arg2 == op2
+	      && !can_vec_perm_p (TYPE_MODE (type), false, sel)
+	      && can_vec_perm_p (TYPE_MODE (type), false, sel2))
+	    {
+	      need_mask_canon = need_mask_canon2;
+	      sel = sel2;
+	    }
+
 	  if (need_mask_canon && arg2 == op2)
 	    {
 	      tree *tsel = XALLOCAVEC (tree, nelts);
 	      tree eltype = TREE_TYPE (TREE_TYPE (arg2));
 	      for (i = 0; i < nelts; i++)
 		tsel[i] = build_int_cst (eltype, sel[i]);
 	      op2 = build_vector (TREE_TYPE (arg2), tsel);
 	      changed = true;
 	    }