[RFC] Try vector<bool> as a new representation for vector masks

On 08 Sep 15:37, Ilya Enkovich wrote:
> 2015-09-04 23:42 GMT+03:00 Jeff Law <law@redhat.com>:
> >
> > So do we have enough confidence in this representation that we want to go
> > ahead and commit to it?
> 
> I think new representation fits nice mostly. There are some places
> where I have to make some exceptions for vector of bools to make it
> work. This is mostly to avoid target modifications. I'd like to avoid
> necessity to change all targets currently supporting vec_cond. It
> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
> a special code in vect_init_vector to build vec<bool> invariants with
> proper casting to int. Otherwise I'd need to do it on a target side.
> 
> I made several fixes and current patch (still allowing integer vector
> result for vector comparison and applying bool patterns) passes
> bootstrap and regression testing on x86_64. Now I'll try to fully
> switch to vec<bool> and see how it goes.
> 
> Thanks,
> Ilya
> 

Hi,

I made a step forward forcing vector comparisons have a mask (vec<bool>) result and disabling bool patterns in case vector comparison is supported by target.  Several issues were met.

 - c/c++ front-ends generate vector comparison with integer vector result.  I had to make some modifications to use vec_cond instead.  Don't know if there are other front-ends producing vector comparisons.
 - vector lowering fails to expand vector masks due to mismatch of type and mode sizes.  I fixed vector type size computation to match mode size and added a special handling of mask expand.
 - I disabled canonical type creation for vector mask because we can't layout it with VOID mode. I don't know why we may need a canonical type here.  But get_mask_mode call may be moved into type layout to get it.
 - Expand of vec<bool> constants/contstructors requires special handling.  Common case should require target hooks/optabs to expand vector into required mode.  But I suppose we want to have a generic code to handle vector of int mode case to avoid modification of multiple targets which use default vec<bool> modes.

Currently 'make check' shows two types of regression.
  - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).  This must be due to my front-end changes.  Hope it will be easy to fix.
  - missed vectorization. All of them appear due to bool patterns disabling.  I didn't look into all of them but it seems the main problem is in mixed type sizes.  With bool patterns and integer vector masks we just put int->(other sized int) conversion for masks and it gives us required mask transformation.  With boolean mask we don't have a proper scalar statements to do that.  I think mask widening/narrowing may be directly supported in masked statements vectorization.  Going to look into it.

I attach what I currently have for a prototype.  It grows bigger so I split into several parts.

Thanks,
Ilya
--
* avx512-vec-bool-01-add-truth-vector.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* doc/tm.texi: Regenerated.
	* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
	* stor-layout.c (layout_type): Use mode to get vector mask size.
	(vector_type_mode): Likewise.
	* target.def (get_mask_mode): New.
	* targhooks.c (default_vector_alignment): Use mode alignment
	for vector masks.
	(default_get_mask_mode): New.
	* targhooks.h (default_get_mask_mode): New.
	* tree.c (make_vector_type): Vector mask has no canonical type.
	(build_truth_vector_type): New.
	(build_same_sized_truth_vector_type): New.
	(truth_type_for): Support vector masks.
	* tree.h (VECTOR_MASK_TYPE_P): New.
	(build_truth_vector_type): New.
	(build_same_sized_truth_vector_type): New.

* avx512-vec-bool-02-no-int-vec-cmp.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-cfg.c (verify_gimple_comparison) Require vector mask
	type for vector comparison.
	(verify_gimple_assign_ternary): Likewise.

gcc/c

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* c-typeck.c (build_conditional_expr): Use vector mask
	type for vector comparison.
	(build_vec_cmp): New.
	(build_binary_op): Use build_vec_cmp for comparison.

gcc/cp

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* call.c (build_conditional_expr_1): Use vector mask
	type for vector comparison.
	* typeck.c (build_vec_cmp): New.
	(cp_build_binary_op): Use build_vec_cmp for comparison.

* avx512-vec-bool-03-vec-lower.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-vect-generic.c (tree_vec_extract): Use additional
	comparison when extracting boolean value.
	(do_bool_compare): New.
	(expand_vector_comparison): Add casts for vector mask.
	(expand_vector_divmod): Use vector mask type for vector
	comparison.
	(expand_vector_operations_1) Skip scalar mode mask statements.

* avx512-vec-bool-04-vectorize.ChangeLog

gcc/

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
	(const_vector_mask_from_tree): New.
	(const_vector_from_tree): Use const_vector_mask_from_tree for vector
	masks.
	* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
	(expand_MASK_STORE): Likewise.
	* optabs.c (vector_compare_rtx): Add OPNO arg.
	(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
	(get_vec_cmp_icode): New.
	(expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* optabs.def (vec_cmp_optab): New.
	(vec_cmpu_optab): New.
	(maskload_optab): Transform into convert optab.
	(maskstore_optab): Likewise.
	* optabs.h (expand_vec_cmp_expr_p): New.
	(expand_vec_cmp_expr): New.
	(can_vec_mask_load_store_p): Add MASK_MODE arg.
	* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
	can_vec_mask_load_store_p signature change.
	(predicate_mem_writes): Use boolean mask.
	* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
	(vect_create_destination_var): Likewise.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
	operations for VF.  Add mask type computation.
	* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
	(vect_get_vec_def_for_operand): Support mask constant.
	(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
	signature change.
	(vectorizable_condition): Use vector mask type for vector comparison.
	(vectorizable_comparison): New.
	(vect_analyze_stmt): Add vectorizable_comparison.
	(vect_transform_stmt): Likewise.
	(get_mask_type_for_scalar_type): New.
	* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
	(enum stmt_vec_info_type): Add comparison_vec_info_type.
	(get_mask_type_for_scalar_type): New.

* avx512-vec-bool-05-bool-patterns.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* tree-vect-patterns.c (check_bool_pattern): Check fails
	if we can vectorize comparison directly.
	(search_type_for_mask): New.
	(vect_recog_bool_pattern): Support cases when bool pattern
	check fails.

* avx512-vec-bool-06-i386.ChangeLog

2015-09-15  Ilya Enkovich  <enkovich.gnu@gmail.com>

	* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
	(ix86_expand_int_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
	op_true and op_false.
	(ix86_int_cmp_code_to_pcmp_immediate): New.
	(ix86_fp_cmp_code_to_pcmp_immediate): New.
	(ix86_cmp_code_to_pcmp_immediate): New.
	(ix86_expand_mask_vec_cmp): New.
	(ix86_expand_fp_vec_cmp): New.
	(ix86_expand_int_sse_cmp): New.
	(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
	(ix86_expand_int_vec_cmp): New.
	(ix86_get_mask_mode): New.
	(TARGET_VECTORIZE_GET_MASK_MODE): New.
	* config/i386/sse.md (avx512fmaskmodelower): New.
	(vec_cmp<mode><avx512fmaskmodelower>): New.
	(vec_cmp<mode><sseintvecmodelower>): New.
	(vec_cmpv2div2di): New.
	(vec_cmpu<mode><avx512fmaskmodelower>): New.
	(vec_cmpu<mode><sseintvecmodelower>): New.
	(vec_cmpuv2div2di): New.
	(maskload<mode>): Rename to ...
	(maskload<mode><sseintvecmodelower>): ... this.
	(maskstore<mode>): Rename to ...
	(maskstore<mode><sseintvecmodelower>): ... this.
	(maskload<mode><avx512fmaskmodelower>): New.
	(maskstore<mode><avx512fmaskmodelower>): New.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index e8c8189..6ea4f19 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -4753,6 +4753,18 @@ build_conditional_expr (location_t colon_loc, tree ifexp, bool ifexp_bcp,
 		       && TREE_CODE (orig_op2) == INTEGER_CST
 		       && !TREE_OVERFLOW (orig_op2)));
     }
+
+  /* Need to convert condition operand into a vector mask.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (ifexp)))
+    {
+      tree vectype = TREE_TYPE (ifexp);
+      tree elem_type = TREE_TYPE (vectype);
+      tree zero = build_int_cst (elem_type, 0);
+      tree zero_vec = build_vector_from_val (vectype, zero);
+      tree cmp_type = build_same_sized_truth_vector_type (vectype);
+      ifexp = build2 (NE_EXPR, cmp_type, ifexp, zero_vec);
+    }
+
   if (int_const || (ifexp_bcp && TREE_CODE (ifexp) == INTEGER_CST))
     ret = fold_build3_loc (colon_loc, COND_EXPR, result_type, ifexp, op1, op2);
   else
@@ -10195,6 +10207,19 @@ push_cleanup (tree decl, tree cleanup, bool eh_only)
   STATEMENT_LIST_STMT_EXPR (list) = stmt_expr;
 }
 
+/* Build a vector comparison using VEC_COND_EXPR.  */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+	       tree arg0, tree arg1)
+{
+  tree zero_vec = build_zero_cst (type);
+  tree minus_one_vec = build_minus_one_cst (type);
+  tree cmp_type = build_same_sized_truth_vector_type (type);
+  tree cmp = build2 (code, cmp_type, arg0, arg1);
+  return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
+
 /* Build a binary-operation expression without default conversions.
    CODE is the kind of expression to build.
    LOCATION is the operator's location.
@@ -10753,7 +10778,8 @@ build_binary_op (location_t location, enum tree_code code,
           result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
-          break;
+	  ret = build_vec_cmp (resultcode, result_type, op0, op1);
+          goto return_build_binary_op;
         }
       if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
 	warning_at (location,
@@ -10895,7 +10921,8 @@ build_binary_op (location_t location, enum tree_code code,
           result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
           converted = 1;
-          break;
+	  ret = build_vec_cmp (resultcode, result_type, op0, op1);
+          goto return_build_binary_op;
         }
       build_type = integer_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 8d4a9e2..7f16e84 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4727,8 +4727,10 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree arg2, tree arg3,
 	}
 
       if (!COMPARISON_CLASS_P (arg1))
-	arg1 = cp_build_binary_op (loc, NE_EXPR, arg1,
-				   build_zero_cst (arg1_type), complain);
+	{
+	  tree cmp_type = build_same_sized_truth_vector_type (arg1_type);
+	  arg1 = build2 (NE_EXPR, cmp_type, arg1, build_zero_cst (arg1_type));
+	}
       return fold_build3 (VEC_COND_EXPR, arg2_type, arg1, arg2, arg3);
     }
 
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 83fd34c..89bacc2 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3898,6 +3898,18 @@ build_binary_op (location_t location, enum tree_code code, tree op0, tree op1,
   return cp_build_binary_op (location, code, op0, op1, tf_warning_or_error);
 }
 
+/* Build a vector comparison using VEC_COND_EXPR.  */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+	       tree arg0, tree arg1)
+{
+  tree zero_vec = build_zero_cst (type);
+  tree minus_one_vec = build_minus_one_cst (type);
+  tree cmp_type = build_same_sized_truth_vector_type(type);
+  tree cmp = build2 (code, cmp_type, arg0, arg1);
+  return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
 
 /* Build a binary-operation expression without default conversions.
    CODE is the kind of expression to build.
@@ -4757,7 +4769,7 @@ cp_build_binary_op (location_t location,
 	  result_type = build_opaque_vector_type (intt,
 						  TYPE_VECTOR_SUBPARTS (type0));
 	  converted = 1;
-	  break;
+	  return build_vec_cmp (resultcode, result_type, op0, op1);
 	}
       build_type = boolean_type_node;
       if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..2ce5a84 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
     }
-  /* Or an integer vector type with the same size and element count
+  /* Or a boolean vector type with the same element count
      as the comparison operand types.  */
   else if (TREE_CODE (type) == VECTOR_TYPE
-	   && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+	   && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
     {
       if (TREE_CODE (op0_type) != VECTOR_TYPE
 	  || TREE_CODE (op1_type) != VECTOR_TYPE)
@@ -3478,12 +3478,7 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
           return true;
         }
 
-      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
-	  || (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (type)))
-	      != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))))
-	  /* The result of a vector comparison is of signed
-	     integral type.  */
-	  || TYPE_UNSIGNED (TREE_TYPE (type)))
+      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
         {
           error ("invalid vector comparison resulting type");
           debug_generic_expr (type);
@@ -3970,15 +3965,13 @@ verify_gimple_assign_ternary (gassign *stmt)
       break;
 
     case VEC_COND_EXPR:
-      if (!VECTOR_INTEGER_TYPE_P (rhs1_type)
-	  || TYPE_SIGN (rhs1_type) != SIGNED
-	  || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type)
+      if (!VECTOR_MASK_TYPE_P (rhs1_type)
 	  || TYPE_VECTOR_SUBPARTS (rhs1_type)
 	     != TYPE_VECTOR_SUBPARTS (lhs_type))
 	{
-	  error ("the first argument of a VEC_COND_EXPR must be of a signed "
-		 "integral vector type of the same size and number of "
-		 "elements as the result");
+	  error ("the first argument of a VEC_COND_EXPR must be of a "
+		 "boolean vector type of the same number of elements "
+		 "as the result");
 	  debug_generic_expr (lhs_type);
 	  debug_generic_expr (rhs1_type);
 	  return true;
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index be3d27f..a89b08c 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
 		  tree t, tree bitsize, tree bitpos)
 {
   if (bitpos)
-    return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+    {
+      if (TREE_CODE (type) == BOOLEAN_TYPE)
+	{
+	  tree itype
+	    = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
+	  tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
+					bitsize, bitpos);
+	  return gimplify_build2 (gsi, NE_EXPR, type, field,
+				  build_zero_cst (itype));
+	}
+      else
+	return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+    }
   else
     return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
 }
@@ -171,6 +183,21 @@ do_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
 			  build_int_cst (comp_type, 0));
 }
 
+/* Construct expression (A[BITPOS] code B[BITPOS])
+
+   INNER_TYPE is the type of A and B elements
+
+   returned expression is of boolean type.  */
+static tree
+do_bool_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+		 tree bitpos, tree bitsize, enum tree_code code)
+{
+  a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+  b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+
+  return gimplify_build2 (gsi, code, boolean_type_node, a, b);
+}
+
 /* Expand vector addition to scalars.  This does bit twiddling
    in order to increase parallelism:
 
@@ -350,9 +377,31 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree type, tree op0,
                           tree op1, enum tree_code code)
 {
   tree t;
-  if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
-    t = expand_vector_piecewise (gsi, do_compare, type,
-				 TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+  if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type)
+      && !expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+    {
+      if (VECTOR_MODE_P (TYPE_MODE (type)))
+	{
+	  tree inner_type = TREE_TYPE (TREE_TYPE (op0));
+	  tree elem_type = build_nonstandard_integer_type
+	    (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+	  tree int_vec_type = build_vector_type (elem_type,
+						 TYPE_VECTOR_SUBPARTS (type));
+	  tree vec = expand_vector_piecewise (gsi, do_compare, int_vec_type,
+					      TREE_TYPE (TREE_TYPE (op0)),
+					      op0, op1, code);
+	  gimple stmt;
+
+	  return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, vec);
+	  t = make_ssa_name (type);
+	  stmt = gimple_build_assign (t, build1 (VIEW_CONVERT_EXPR, type, vec));
+	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+	}
+      else
+	t = expand_vector_piecewise (gsi, do_bool_compare, type,
+				     TREE_TYPE (TREE_TYPE (op0)),
+				     op0, op1, code);
+    }
   else
     t = NULL_TREE;
 
@@ -625,11 +674,12 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type))
 	    {
-	      tree zero, cst, cond;
+	      tree zero, cst, cond, mask_type;
 	      gimple stmt;
 
+	      mask_type = build_same_sized_truth_vector_type (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, type, op0, zero);
+	      cond = build2 (LT_EXPR, mask_type, op0, zero);
 	      for (i = 0; i < nunits; i++)
 		vec[i] = build_int_cst (TREE_TYPE (type),
 					((unsigned HOST_WIDE_INT) 1
@@ -1506,6 +1556,12 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != VECTOR_TYPE)
     return;
 
+  /* A scalar operation pretending to be a vector one.  */
+  if (VECTOR_MASK_TYPE_P (type)
+      && !VECTOR_MODE_P (TYPE_MODE (type))
+      && TYPE_MODE (type) != BLKmode)
+    return;
+
   if (CONVERT_EXPR_CODE_P (code)
       || code == FLOAT_EXPR
       || code == FIX_TRUNC_EXPR
diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..6ae0c4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
-      tree if_true = constant_boolean_node (true, ops->type);
-      tree if_false = constant_boolean_node (false, ops->type);
-      return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false, target);
+      if (VECTOR_MASK_TYPE_P (ops->type))
+	return expand_vec_cmp_expr (ops->type, ifexp, target);
+      else
+	{
+	  tree if_true = constant_boolean_node (true, ops->type);
+	  tree if_false = constant_boolean_node (false, ops->type);
+	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
+				       if_false, target);
+	}
     }
 
   /* Get the rtx comparison code to use.  We know that EXP is a comparison
@@ -11289,6 +11295,39 @@ try_tablejump (tree index_type, tree index_expr, tree minval, tree range,
   return 1;
 }
 
+/* Return a CONST_VECTOR rtx representing vector mask for
+   a VECTOR_CST of booleans.  */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+  rtvec v;
+  unsigned i;
+  int units;
+  tree elt;
+  machine_mode inner, mode;
+
+  mode = TYPE_MODE (TREE_TYPE (exp));
+  units = GET_MODE_NUNITS (mode);
+  inner = GET_MODE_INNER (mode);
+
+  v = rtvec_alloc (units);
+
+  for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+    {
+      elt = VECTOR_CST_ELT (exp, i);
+
+      gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+      if (integer_zerop (elt))
+	RTVEC_ELT (v, i) = CONST0_RTX (inner);
+      else if (integer_onep (elt))
+	RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+      else
+	gcc_unreachable ();
+    }
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+}
+
 /* Return a CONST_VECTOR rtx for a VECTOR_CST tree.  */
 static rtx
 const_vector_from_tree (tree exp)
@@ -11304,6 +11343,9 @@ const_vector_from_tree (tree exp)
   if (initializer_zerop (exp))
     return CONST0_RTX (mode);
 
+  if (VECTOR_MASK_TYPE_P (TREE_TYPE (exp)))
+      return const_vector_mask_from_tree (exp);
+
   units = GET_MODE_NUNITS (mode);
   inner = GET_MODE_INNER (mode);
 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index e785946..4ca0a40 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
   create_output_operand (&ops[0], target, TYPE_MODE (type));
   create_fixed_operand (&ops[1], mem);
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
@@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt)
   create_fixed_operand (&ops[0], mem);
   create_input_operand (&ops[1], reg, TYPE_MODE (type));
   create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+  expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
+				      TYPE_MODE (TREE_TYPE (maskt))),
+	       3, ops);
 }
 
 static void
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..fd9932f 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
 }
 
 /* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
-   unsigned operators. Do not generate compare instruction.  */
+   unsigned operators.  OPNO holds an index of the first comparison
+   operand in insn with code ICODE.  Do not generate compare instruction.  */
 
 static rtx
 vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
-		    bool unsignedp, enum insn_code icode)
+		    bool unsignedp, enum insn_code icode,
+		    unsigned int opno)
 {
   struct expand_operand ops[2];
   rtx rtx_op0, rtx_op1;
@@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
 
   create_input_operand (&ops[0], rtx_op0, m0);
   create_input_operand (&ops[1], rtx_op1, m1);
-  if (!maybe_legitimize_operands (icode, 4, 2, ops))
+  if (!maybe_legitimize_operands (icode, opno, 2, ops))
     gcc_unreachable ();
   return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
 }
@@ -6843,16 +6845,25 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
       op0a = TREE_OPERAND (op0, 0);
       op0b = TREE_OPERAND (op0, 1);
       tcode = TREE_CODE (op0);
+      unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
     }
   else
     {
+      gcc_assert (VECTOR_MASK_TYPE_P (TREE_TYPE (op0)));
+      if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT)
+	{
+	  /* This is a vcond with mask.  To be supported soon...  */
+	  gcc_unreachable ();
+	}
       /* Fake op0 < 0.  */
-      gcc_assert (!TYPE_UNSIGNED (TREE_TYPE (op0)));
-      op0a = op0;
-      op0b = build_zero_cst (TREE_TYPE (op0));
-      tcode = LT_EXPR;
+      else
+	{
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	  unsignedp = false;
+	}
     }
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
   cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
 
 
@@ -6863,7 +6874,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   if (icode == CODE_FOR_nothing)
     return 0;
 
-  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
   rtx_op1 = expand_normal (op1);
   rtx_op2 = expand_normal (op2);
 
@@ -6877,6 +6888,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
   return ops[0].value;
 }
 
+/* Return insn code for a comparison operator with VMODE
+   resultin MASK_MODE, unsigned if UNS is true.  */
+
+static inline enum insn_code
+get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns)
+{
+  optab tab = uns ? vec_cmpu_optab : vec_cmp_optab;
+  return convert_optab_handler (tab, vmode, mask_mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+   for vector comparison expr with vector type VALUE_TYPE
+   and resulting mask with MASK_TYPE.  */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+  enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+					    TYPE_MODE (mask_type),
+					    TYPE_UNSIGNED (value_type));
+  return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a vector comparison into a mask.  */
+
+rtx
+expand_vec_cmp_expr (tree type, tree exp, rtx target)
+{
+  struct expand_operand ops[4];
+  enum insn_code icode;
+  rtx comparison;
+  machine_mode mask_mode = TYPE_MODE (type);
+  machine_mode vmode;
+  bool unsignedp;
+  tree op0a, op0b;
+  enum tree_code tcode;
+
+  op0a = TREE_OPERAND (exp, 0);
+  op0b = TREE_OPERAND (exp, 1);
+  tcode = TREE_CODE (exp);
+
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+  vmode = TYPE_MODE (TREE_TYPE (op0a));
+
+  icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    return 0;
+
+  comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+  create_output_operand (&ops[0], target, mask_mode);
+  create_fixed_operand (&ops[1], comparison);
+  create_fixed_operand (&ops[2], XEXP (comparison, 0));
+  create_fixed_operand (&ops[3], XEXP (comparison, 1));
+  expand_insn (icode, 4, ops);
+  return ops[0].value;
+}
+
 /* Return non-zero if a highpart multiply is supported of can be synthisized.
    For the benefit of expand_mult_highpart, the return value is 1 for direct,
    2 for even/odd widening, and 3 for hi/lo widening.  */
@@ -7002,26 +7070,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx op1,
 
 /* Return true if target supports vector masked load/store for mode.  */
 bool
-can_vec_mask_load_store_p (machine_mode mode, bool is_load)
+can_vec_mask_load_store_p (machine_mode mode,
+			   machine_mode mask_mode,
+			   bool is_load)
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
-  machine_mode vmode;
   unsigned int vector_sizes;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
-    return optab_handler (op, mode) != CODE_FOR_nothing;
+    return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
 
   /* Otherwise, return true if there is some vector mode with
      the mask load/store supported.  */
 
   /* See if there is any chance the mask load or store might be
      vectorized.  If not, punt.  */
-  vmode = targetm.vectorize.preferred_simd_mode (mode);
-  if (!VECTOR_MODE_P (vmode))
+  mode = targetm.vectorize.preferred_simd_mode (mode);
+  if (!VECTOR_MODE_P (mode))
+    return false;
+
+  mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+					       GET_MODE_SIZE (mode));
+  if (mask_mode == VOIDmode)
     return false;
 
-  if (optab_handler (op, vmode) != CODE_FOR_nothing)
+  if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
     return true;
 
   vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
@@ -7031,9 +7105,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool is_load)
       vector_sizes &= ~cur;
       if (cur <= GET_MODE_SIZE (mode))
 	continue;
-      vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
-      if (VECTOR_MODE_P (vmode)
-	  && optab_handler (op, vmode) != CODE_FOR_nothing)
+      mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+      mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+						   cur);
+      if (VECTOR_MODE_P (mode)
+	  && mask_mode != VOIDmode
+	  && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
 	return true;
     }
   return false;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..9804378 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
 OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b")
 OPTAB_CD(vcond_optab, "vcond$a$b")
 OPTAB_CD(vcondu_optab, "vcondu$a$b")
+OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
+OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
+OPTAB_CD(maskload_optab, "maskload$a$b")
+OPTAB_CD(maskstore_optab, "maskstore$a$b")
 
 OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
 OPTAB_NX(add_optab, "add$F$a3")
@@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
 OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
 OPTAB_D (usad_optab, "usad$I$a")
 OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (maskload_optab, "maskload$a")
-OPTAB_D (maskstore_optab, "maskstore$a")
 OPTAB_D (vec_extract_optab, "vec_extract$a")
 OPTAB_D (vec_init_optab, "vec_init$a")
 OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..dfe9ebf 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const unsigned char *);
 extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
 
 /* Return tree if target supports vector operations for COND_EXPR.  */
+bool expand_vec_cmp_expr_p (tree, tree);
+
+/* Generate code for VEC_COND_EXPR.  */
+extern rtx expand_vec_cmp_expr (tree, tree, rtx);
+
+/* Return true if target supports vector comparison.  */
 bool expand_vec_cond_expr_p (tree, tree);
 
 /* Generate code for VEC_COND_EXPR.  */
@@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool);
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
 /* Return true if target supports vector masked load/store for mode.  */
-extern bool can_vec_mask_load_store_p (machine_mode, bool);
+extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
 
 /* Return true if there is an inline compare and swap pattern.  */
 extern bool can_compare_and_swap_p (machine_mode, bool);
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..d66517d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
       || VECTOR_MODE_P (mode))
     return false;
 
-  if (can_vec_mask_load_store_p (mode, is_load))
+  if (can_vec_mask_load_store_p (mode, VOIDmode, is_load))
     return true;
 
   return false;
@@ -2068,7 +2068,7 @@ predicate_mem_writes (loop_p loop)
 	  {
 	    tree lhs = gimple_assign_lhs (stmt);
 	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+	    tree ref, addr, ptr, masktype, mask;
 	    gimple new_stmt;
 	    int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
 	    ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
@@ -2082,15 +2082,47 @@ predicate_mem_writes (loop_p loop)
 	      mask = vect_masks[index];
 	    else
 	      {
-		masktype = build_nonstandard_integer_type (bitsize, 1);
-		mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
-		mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
-		cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-						   is_gimple_condexpr,
-						   NULL_TREE,
-						   true, GSI_SAME_STMT);
-		mask = fold_build_cond_expr (masktype, unshare_expr (cond),
-					     mask_op0, mask_op1);
+		masktype = boolean_type_node;
+		if ((TREE_CODE (cond) == NE_EXPR
+		     || TREE_CODE (cond) == EQ_EXPR)
+		    && (integer_zerop (TREE_OPERAND (cond, 1))
+			|| integer_onep (TREE_OPERAND (cond, 1)))
+		    && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0)))
+		       == BOOLEAN_TYPE)
+		  {
+		    bool negate = (TREE_CODE (cond) == EQ_EXPR);
+		    if (integer_onep (TREE_OPERAND (cond, 1)))
+		      negate = !negate;
+		    if (swap)
+		      negate = !negate;
+		    mask = TREE_OPERAND (cond, 0);
+		    if (negate)
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else if (swap &&
+			 TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison)
+		  {
+		    tree op_type = TREE_TYPE (TREE_OPERAND (cond, 0));
+		    tree_code code
+		      = invert_tree_comparison (TREE_CODE (cond),
+						HONOR_NANS (op_type));
+		    if (code != ERROR_MARK)
+			mask = build2 (code, TREE_TYPE (cond),
+				       TREE_OPERAND (cond, 0),
+				       TREE_OPERAND (cond, 1));
+		    else
+		      {
+			mask = ifc_temp_var (masktype, unshare_expr (cond),
+					     &gsi);
+			mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+		      }
+		  }
+		else
+		  mask = unshare_expr (cond);
 		mask = ifc_temp_var (masktype, mask, &gsi);
 		/* Save mask and its size for further use.  */
 	        vect_sizes.safe_push (bitsize);
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef4..0a39825 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind var_kind, const char *name)
   case vect_scalar_var:
     prefix = "stmp";
     break;
+  case vect_mask_var:
+    prefix = "mask";
+    break;
   case vect_pointer_var:
     prefix = "vectp";
     break;
@@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree vectype)
   tree type;
   enum vect_var_kind kind;
 
-  kind = vectype ? vect_simple_var : vect_scalar_var;
+  kind = vectype
+    ? VECTOR_MASK_TYPE_P (vectype)
+    ? vect_mask_var
+    : vect_simple_var
+    : vect_scalar_var;
   type = vectype ? vectype : TREE_TYPE (scalar_dest);
 
   gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..1810f78 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes;
+  unsigned nbbs = loop->num_nodes;
   unsigned int vectorization_factor = 0;
   tree scalar_type;
   gphi *phi;
   tree vectype;
   unsigned int nunits;
   stmt_vec_info stmt_info;
-  int i;
+  unsigned i;
   HOST_WIDE_INT dummy;
   gimple stmt, pattern_stmt = NULL;
   gimple_seq pattern_def_seq = NULL;
   gimple_stmt_iterator pattern_def_si = gsi_none ();
   bool analyze_pattern_stmt = false;
+  bool bool_result;
+  auto_vec<stmt_vec_info> mask_producers;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	      return false;
 	    }
 
+	  bool_result = false;
+
 	  if (STMT_VINFO_VECTYPE (stmt_info))
 	    {
 	      /* The only case when a vectype had been already set is for stmts
@@ -444,6 +448,32 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
 	      else
 		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+	      /* Bool ops don't participate in vectorization factor
+		 computation.  For comparison use compared types to
+		 compute a factor.  */
+	      if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
+		{
+		  mask_producers.safe_push (stmt_info);
+		  bool_result = true;
+
+		  if (gimple_code (stmt) == GIMPLE_ASSIGN
+		      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+			 == tcc_comparison
+		      && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+			 != BOOLEAN_TYPE)
+		    scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+		  else
+		    {
+		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+			{
+			  pattern_def_seq = NULL;
+			  gsi_next (&si);
+			}
+		      continue;
+		    }
+		}
+
 	      if (dump_enabled_p ())
 		{
 		  dump_printf_loc (MSG_NOTE, vect_location,
@@ -466,7 +496,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 		  return false;
 		}
 
-	      STMT_VINFO_VECTYPE (stmt_info) = vectype;
+	      if (!bool_result)
+		STMT_VINFO_VECTYPE (stmt_info) = vectype;
 
 	      if (dump_enabled_p ())
 		{
@@ -479,8 +510,9 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
 	  /* The vectorization factor is according to the smallest
 	     scalar type (or the largest vector size, but we only
 	     support one vector size per loop).  */
-	  scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
-						       &dummy);
+	  if (!bool_result)
+	    scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
+							 &dummy);
 	  if (dump_enabled_p ())
 	    {
 	      dump_printf_loc (MSG_NOTE, vect_location,
@@ -555,6 +587,100 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo)
     }
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
 
+  for (i = 0; i < mask_producers.length (); i++)
+    {
+      tree mask_type = NULL;
+      bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]);
+
+      stmt = STMT_VINFO_STMT (mask_producers[i]);
+
+      if (gimple_code (stmt) == GIMPLE_ASSIGN
+	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+	  && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt))) != BOOLEAN_TYPE)
+	{
+	  scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+	  mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+	  if (!mask_type)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "not vectorized: unsupported mask\n");
+	      return false;
+	    }
+	}
+      else
+	{
+	  tree rhs, def;
+	  ssa_op_iter iter;
+	  gimple def_stmt;
+	  enum vect_def_type dt;
+
+	  FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+	    {
+	      if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo,
+					 &def_stmt, &def, &dt, &vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: can't compute mask type "
+				       "for statement, ");
+		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+					0);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+
+	      /* No vectype probably means external definition.
+		 Allow it in case there is another operand which
+		 allows to determine mask type.  */
+	      if (!vectype)
+		continue;
+
+	      if (!mask_type)
+		mask_type = vectype;
+	      else if (TYPE_VECTOR_SUBPARTS (mask_type)
+		       != TYPE_VECTOR_SUBPARTS (vectype))
+		{
+		  if (dump_enabled_p ())
+		    {
+		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				       "not vectorized: different sized masks "
+				       "types in statement, ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 mask_type);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+					 vectype);
+		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		    }
+		  return false;
+		}
+	    }
+	}
+
+      /* No mask_type should mean loop invariant predicate.
+	 This is probably a subject for optimization in
+	 if-conversion.  */
+      if (!mask_type)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: can't compute mask type "
+			       "for statement, ");
+	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
+				0);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+	  return false;
+	}
+
+      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+    }
+
   return true;
 }
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c066..f3887be 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1316,27 +1316,61 @@ vect_init_vector_1 (gimple stmt, gimple new_stmt, gimple_stmt_iterator *gsi)
 tree
 vect_init_vector (gimple stmt, tree val, tree type, gimple_stmt_iterator *gsi)
 {
+  tree val_type = TREE_TYPE (val);
+  machine_mode mode = TYPE_MODE (type);
+  machine_mode val_mode = TYPE_MODE(val_type);
   tree new_var;
   gimple init_stmt;
   tree vec_oprnd;
   tree new_temp;
 
   if (TREE_CODE (type) == VECTOR_TYPE
-      && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
-    {
-      if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
+      && TREE_CODE (val_type) != VECTOR_TYPE)
+    {
+      /* Handle vector of bool represented as a vector of
+	 integers here rather than on expand because it is
+	 a default mask type for targets.  Vector mask is
+	 built in a following way:
+
+	 tmp = (int)val
+	 vec_tmp = {tmp, ..., tmp}
+	 vec_cst = VIEW_CONVERT_EXPR<vector(N) _Bool>(vec_tmp);  */
+      if (TREE_CODE (val_type) == BOOLEAN_TYPE
+	  && VECTOR_MODE_P (mode)
+	  && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
+	  && GET_MODE_INNER (mode) != val_mode)
 	{
-	  if (CONSTANT_CLASS_P (val))
-	    val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
-	  else
+	  unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+	  tree stype = build_nonstandard_integer_type (size, 1);
+	  tree vectype = get_vectype_for_scalar_type (stype);
+
+	  new_temp = make_ssa_name (stype);
+	  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = make_ssa_name (vectype);
+	  new_temp = build_vector_from_val (vectype, new_temp);
+	  init_stmt = gimple_build_assign (val, new_temp);
+	  vect_init_vector_1 (stmt, init_stmt, gsi);
+
+	  val = build1 (VIEW_CONVERT_EXPR, type, val);
+	}
+      else
+	{
+	  if (!types_compatible_p (TREE_TYPE (type), val_type))
 	    {
-	      new_temp = make_ssa_name (TREE_TYPE (type));
-	      init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
-	      vect_init_vector_1 (stmt, init_stmt, gsi);
-	      val = new_temp;
+	      if (CONSTANT_CLASS_P (val))
+		val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+	      else
+		{
+		  new_temp = make_ssa_name (TREE_TYPE (type));
+		  init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+		  vect_init_vector_1 (stmt, init_stmt, gsi);
+		  val = new_temp;
+		}
 	    }
+	  val = build_vector_from_val (type, val);
 	}
-      val = build_vector_from_val (type, val);
     }
 
   new_var = vect_get_new_vect_var (type, vect_simple_var, "cst_");
@@ -1368,6 +1402,7 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
   gimple def_stmt;
   stmt_vec_info def_stmt_info = NULL;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+  tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   unsigned int nunits;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   tree def;
@@ -1411,7 +1446,12 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 1: operand is a constant.  */
     case vect_constant_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+	if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+	    && VECTOR_MASK_TYPE_P (stmt_vectype))
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+
 	gcc_assert (vector_type);
 	nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
@@ -1429,7 +1469,11 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree *scalar_def)
     /* Case 2: operand is defined outside the loop - loop invariant.  */
     case vect_external_def:
       {
-	vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
+	if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+	    && VECTOR_MASK_TYPE_P (stmt_vectype))
+	  vector_type = stmt_vectype;
+	else
+	  vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
 	gcc_assert (vector_type);
 
 	if (scalar_def)
@@ -1758,6 +1802,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree mask_vectype;
   tree elem_type;
   gimple new_stmt;
   tree dummy;
@@ -1785,8 +1830,8 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 
   is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
   mask = gimple_call_arg (stmt, 2);
-  if (TYPE_PRECISION (TREE_TYPE (mask))
-      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+
+  if (TREE_CODE (TREE_TYPE (mask)) != BOOLEAN_TYPE)
     return false;
 
   /* FORNOW. This restriction should be relaxed.  */
@@ -1815,6 +1860,19 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
   if (STMT_VINFO_STRIDED_P (stmt_info))
     return false;
 
+  if (TREE_CODE (mask) != SSA_NAME)
+    return false;
+
+  if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL,
+			     &def_stmt, &def, &dt, &mask_vectype))
+    return false;
+
+  if (!mask_vectype)
+    mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
+
+  if (!mask_vectype)
+    return false;
+
   if (STMT_VINFO_GATHER_P (stmt_info))
     {
       gimple def_stmt;
@@ -1848,14 +1906,9 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
 				 : DR_STEP (dr), size_zero_node) <= 0)
     return false;
   else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
-	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store))
-    return false;
-
-  if (TREE_CODE (mask) != SSA_NAME)
-    return false;
-
-  if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
-			   &def_stmt, &def, &dt))
+	   || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
+					  TYPE_MODE (mask_vectype),
+					  !is_store))
     return false;
 
   if (is_store)
@@ -7229,10 +7282,7 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
 	   && TREE_CODE (else_clause) != FIXED_CST)
     return false;
 
-  unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)));
-  /* The result of a vector comparison should be signed type.  */
-  tree cmp_type = build_nonstandard_integer_type (prec, 0);
-  vec_cmp_type = get_same_sized_vectype (cmp_type, vectype);
+  vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype);
   if (vec_cmp_type == NULL_TREE)
     return false;
 
@@ -7373,6 +7423,201 @@ vectorizable_condition (gimple stmt, gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* vectorizable_comparison.
+
+   Check if STMT is comparison expression that can be vectorized.
+   If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+   comparison, put it in VEC_STMT, and insert it at GSI.
+
+   Return FALSE if not a vectorizable STMT, TRUE otherwise.  */
+
+bool
+vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi,
+			 gimple *vec_stmt, tree reduc_def,
+			 slp_tree slp_node)
+{
+  tree lhs, rhs1, rhs2;
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype1, vectype2;
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+  tree vec_compare;
+  tree new_temp;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  tree def;
+  enum vect_def_type dt, dts[4];
+  unsigned nunits;
+  int ncopies;
+  enum tree_code code;
+  stmt_vec_info prev_stmt_info = NULL;
+  int i, j;
+  bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+  vec<tree> vec_oprnds0 = vNULL;
+  vec<tree> vec_oprnds1 = vNULL;
+  tree mask_type;
+  tree mask;
+
+  if (!VECTOR_MASK_TYPE_P (vectype))
+    return false;
+
+  mask_type = vectype;
+  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (slp_node || PURE_SLP_STMT (stmt_info))
+    ncopies = 1;
+  else
+    ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+  gcc_assert (ncopies >= 1);
+  if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+    return false;
+
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+      && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+	   && reduc_def))
+    return false;
+
+  if (STMT_VINFO_LIVE_P (stmt_info))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "value used after loop.\n");
+      return false;
+    }
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  code = gimple_assign_rhs_code (stmt);
+
+  if (TREE_CODE_CLASS (code) != tcc_comparison)
+    return false;
+
+  rhs1 = gimple_assign_rhs1 (stmt);
+  rhs2 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (rhs1) == SSA_NAME)
+    {
+      gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
+      if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
+				 &rhs1_def_stmt, &def, &dt, &vectype1))
+	return false;
+    }
+  else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
+	   && TREE_CODE (rhs1) != FIXED_CST)
+    return false;
+
+  if (TREE_CODE (rhs2) == SSA_NAME)
+    {
+      gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2);
+      if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo,
+				 &rhs2_def_stmt, &def, &dt, &vectype2))
+	return false;
+    }
+  else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST
+	   && TREE_CODE (rhs2) != FIXED_CST)
+    return false;
+
+  vectype = vectype1 ? vectype1 : vectype2;
+
+  if (!vectype
+      || nunits != TYPE_VECTOR_SUBPARTS (vectype))
+    return false;
+
+  if (!vec_stmt)
+    {
+      STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+      return expand_vec_cmp_expr_p (vectype, mask_type);
+    }
+
+  /* Transform.  */
+  if (!slp_node)
+    {
+      vec_oprnds0.create (1);
+      vec_oprnds1.create (1);
+    }
+
+  /* Handle def.  */
+  lhs = gimple_assign_lhs (stmt);
+  mask = vect_create_destination_var (lhs, mask_type);
+
+  /* Handle cmp expr.  */
+  for (j = 0; j < ncopies; j++)
+    {
+      gassign *new_stmt = NULL;
+      if (j == 0)
+	{
+	  if (slp_node)
+	    {
+	      auto_vec<tree, 2> ops;
+	      auto_vec<vec<tree>, 2> vec_defs;
+
+	      ops.safe_push (rhs1);
+	      ops.safe_push (rhs2);
+	      vect_get_slp_defs (ops, slp_node, &vec_defs, -1);
+	      vec_oprnds1 = vec_defs.pop ();
+	      vec_oprnds0 = vec_defs.pop ();
+
+	      ops.release ();
+	      vec_defs.release ();
+	    }
+	  else
+	    {
+	      gimple gtemp;
+	      vec_rhs1
+		= vect_get_vec_def_for_operand (rhs1, stmt, NULL);
+	      vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[0]);
+	      vec_rhs2 =
+		vect_get_vec_def_for_operand (rhs2, stmt, NULL);
+	      vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL,
+				  &gtemp, &def, &dts[1]);
+	    }
+	}
+      else
+	{
+	  vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0],
+						     vec_oprnds0.pop ());
+	  vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1],
+						     vec_oprnds1.pop ());
+	}
+
+      if (!slp_node)
+	{
+	  vec_oprnds0.quick_push (vec_rhs1);
+	  vec_oprnds1.quick_push (vec_rhs2);
+	}
+
+      /* Arguments are ready.  Create the new vector stmt.  */
+      FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1)
+	{
+	  vec_rhs2 = vec_oprnds1[i];
+
+	  vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
+	  new_stmt = gimple_build_assign (mask, vec_compare);
+	  new_temp = make_ssa_name (mask, new_stmt);
+	  gimple_assign_set_lhs (new_stmt, new_temp);
+	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+	  if (slp_node)
+	    SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+	}
+
+      if (slp_node)
+	continue;
+
+      if (j == 0)
+	STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+      else
+	STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+      prev_stmt_info = vinfo_for_stmt (new_stmt);
+    }
+
+  vec_oprnds0.release ();
+  vec_oprnds1.release ();
+
+  return true;
+}
 
 /* Make sure the statement is vectorizable.  */
 
@@ -7576,7 +7821,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	  || vectorizable_call (stmt, NULL, NULL, node)
 	  || vectorizable_store (stmt, NULL, NULL, node)
 	  || vectorizable_reduction (stmt, NULL, NULL, node)
-	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	  || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	  || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
   else
     {
       if (bb_vinfo)
@@ -7588,7 +7834,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize, slp_tree node)
 	      || vectorizable_load (stmt, NULL, NULL, node, NULL)
 	      || vectorizable_call (stmt, NULL, NULL, node)
 	      || vectorizable_store (stmt, NULL, NULL, node)
-	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+	      || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+	      || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
     }
 
   if (!ok)
@@ -7704,6 +7951,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator *gsi,
       gcc_assert (done);
       break;
 
+    case comparison_vec_info_type:
+      done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node);
+      gcc_assert (done);
+      break;
+
     case call_vec_info_type:
       done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
       stmt = gsi_stmt (*gsi);
@@ -8038,6 +8290,23 @@ get_vectype_for_scalar_type (tree scalar_type)
   return vectype;
 }
 
+/* Function get_mask_type_for_scalar_type.
+
+   Returns the mask type corresponding to a result of comparison
+   of vectors of specified SCALAR_TYPE as supported by target.  */
+
+tree
+get_mask_type_for_scalar_type (tree scalar_type)
+{
+  tree vectype = get_vectype_for_scalar_type (scalar_type);
+
+  if (!vectype)
+    return NULL;
+
+  return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
+				  current_vector_size);
+}
+
 /* Function get_same_sized_vectype
 
    Returns a vector type corresponding to SCALAR_TYPE of size
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 58e8f10..94aea1a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -28,7 +28,8 @@ along with GCC; see the file COPYING3.  If not see
 enum vect_var_kind {
   vect_simple_var,
   vect_pointer_var,
-  vect_scalar_var
+  vect_scalar_var,
+  vect_mask_var
 };
 
 /* Defines type of operation.  */
@@ -482,6 +483,7 @@ enum stmt_vec_info_type {
   call_simd_clone_vec_info_type,
   assignment_vec_info_type,
   condition_vec_info_type,
+  comparison_vec_info_type,
   reduc_vec_info_type,
   induc_vec_info_type,
   type_promotion_vec_info_type,
@@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
 /* In tree-vect-stmts.c.  */
 extern unsigned int current_vector_size;
 extern tree get_vectype_for_scalar_type (tree);
+extern tree get_mask_type_for_scalar_type (tree);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_is_simple_use (tree, gimple, loop_vec_info,
 			        bb_vec_info, gimple *,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 758ca38..cffacaa 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2957,7 +2957,7 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
     default:
       if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
 	{
-	  tree vecitype, comp_vectype;
+	  tree vecitype, comp_vectype, mask_type;
 
 	  /* If the comparison can throw, then is_gimple_condexpr will be
 	     false and we can't make a COND_EXPR/VEC_COND_EXPR out of it.  */
@@ -2968,6 +2968,11 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
 	  if (comp_vectype == NULL_TREE)
 	    return false;
 
+	  mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1));
+	  if (mask_type
+	      && expand_vec_cmp_expr_p (comp_vectype, mask_type))
+	    return false;
+
 	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
 	    {
 	      machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
@@ -3192,6 +3197,75 @@ adjust_bool_pattern (tree var, tree out_type, tree trueval,
 }
 
 
+/* Try to determine a proper type for converting bool VAR
+   into an integer value.  The type is chosen so that
+   conversion has the same number of elements as a mask
+   producer.  */
+
+static tree
+search_type_for_mask (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
+{
+  gimple def_stmt;
+  enum vect_def_type dt;
+  tree def, rhs1;
+  enum tree_code rhs_code;
+  tree res = NULL;
+
+  if (TREE_CODE (var) != SSA_NAME)
+    return NULL;
+
+  if ((TYPE_PRECISION (TREE_TYPE (var)) != 1
+       || !TYPE_UNSIGNED (TREE_TYPE (var)))
+      && TREE_CODE (TREE_TYPE (var)) != BOOLEAN_TYPE)
+    return NULL;
+
+  if (!vect_is_simple_use (var, NULL, loop_vinfo, bb_vinfo, &def_stmt, &def,
+			   &dt))
+    return NULL;
+
+  if (dt != vect_internal_def)
+    return NULL;
+
+  if (!is_gimple_assign (def_stmt))
+    return NULL;
+
+  rhs_code = gimple_assign_rhs_code (def_stmt);
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+
+  switch (rhs_code)
+    {
+    case SSA_NAME:
+    case BIT_NOT_EXPR:
+    CASE_CONVERT:
+      res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo);
+      break;
+
+    case BIT_AND_EXPR:
+    case BIT_IOR_EXPR:
+    case BIT_XOR_EXPR:
+      if (!(res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo)))
+	res = search_type_for_mask (gimple_assign_rhs2 (def_stmt),
+					loop_vinfo, bb_vinfo);
+      break;
+
+    default:
+      if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+	{
+	  if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
+	      || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+	    {
+	      machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+	      res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
+	    }
+	  else
+	    res = TREE_TYPE (rhs1);
+	}
+    }
+
+  return res;
+}
+
+
 /* Function vect_recog_bool_pattern
 
    Try to find pattern like following:
@@ -3249,6 +3323,7 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
   enum tree_code rhs_code;
   tree var, lhs, rhs, vectype;
   stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+  stmt_vec_info new_stmt_info;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
   gimple pattern_stmt;
@@ -3274,16 +3349,43 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       if (vectype == NULL_TREE)
 	return NULL;
 
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
-
-      rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
-      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
-      if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
-	pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	{
+	  rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
+	  lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	  if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
+	    pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+	  else
+	    pattern_stmt
+	      = gimple_build_assign (lhs, NOP_EXPR, rhs);
+	}
       else
-	pattern_stmt
-	  = gimple_build_assign (lhs, NOP_EXPR, rhs);
+	{
+	  tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+	  tree cst0, cst1;
+
+	  if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (lhs)))
+	    type = TREE_TYPE (lhs);
+	  cst0 = build_int_cst (type, 0);
+	  cst1 = build_int_cst (type, 1);
+	  lhs = vect_recog_temp_ssa_var (type, NULL);
+	  pattern_stmt = gimple_build_assign (lhs, COND_EXPR, var, cst0, cst1);
+
+	  if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
+	    {
+	      tree new_vectype = get_vectype_for_scalar_type (type);
+	      new_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+						 bb_vinfo);
+	      set_vinfo_for_stmt (pattern_stmt, new_stmt_info);
+	      STMT_VINFO_VECTYPE (new_stmt_info) = new_vectype;
+	      new_pattern_def_seq (stmt_vinfo, pattern_stmt);
+
+	      rhs = lhs;
+	      lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+	      pattern_stmt = gimple_build_assign (lhs, CONVERT_EXPR, rhs);
+	    }
+	}
+
       *type_out = vectype;
       *type_in = vectype;
       stmts->safe_push (last_stmt);
@@ -3312,10 +3414,11 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       if (get_vectype_for_scalar_type (type) == NULL_TREE)
 	return NULL;
 
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
+      else
+	rhs = var;
 
-      rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
       lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
       pattern_stmt 
 	  = gimple_build_assign (lhs, COND_EXPR,
@@ -3340,16 +3443,38 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree *type_in,
       gcc_assert (vectype != NULL_TREE);
       if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
 	return NULL;
-      if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
-	return NULL;
 
-      rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts);
+      if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+	rhs = adjust_bool_pattern (var, TREE_TYPE (vectype),
+				   NULL_TREE, stmts);
+      else
+	{
+	  tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+	  tree cst0, cst1, new_vectype;
+
+	  if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (vectype)))
+	    type = TREE_TYPE (vectype);
+
+	  cst0 = build_int_cst (type, 0);
+	  cst1 = build_int_cst (type, 1);
+	  new_vectype = get_vectype_for_scalar_type (type);
+
+	  rhs = vect_recog_temp_ssa_var (type, NULL);
+	  pattern_stmt = gimple_build_assign (rhs, COND_EXPR, var, cst0, cst1);
+
+	  pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+						 bb_vinfo);
+	  set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
+	  STMT_VINFO_VECTYPE (pattern_stmt_info) = new_vectype;
+	  append_pattern_def_seq (stmt_vinfo, pattern_stmt);
+	}
+
       lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
       if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
 	{
 	  tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
 	  gimple cast_stmt = gimple_build_assign (rhs2, NOP_EXPR, rhs);
-	  new_pattern_def_seq (stmt_vinfo, cast_stmt);
+	  append_pattern_def_seq (stmt_vinfo, cast_stmt);
 	  rhs = rhs2;
 	}
       pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6a17ef4..e22aa57 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]);
 extern bool ix86_expand_int_vcond (rtx[]);
 extern void ix86_expand_vec_perm (rtx[]);
 extern bool ix86_expand_vec_perm_const (rtx[]);
+extern bool ix86_expand_mask_vec_cmp (rtx[]);
+extern bool ix86_expand_int_vec_cmp (rtx[]);
+extern bool ix86_expand_fp_vec_cmp (rtx[]);
 extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
 extern bool ix86_expand_int_addcc (rtx[]);
 extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 070605f..d17c350 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx cmp_op0, rtx cmp_op1,
     cmp_op1 = force_reg (cmp_ops_mode, cmp_op1);
 
   if (optimize
-      || reg_overlap_mentioned_p (dest, op_true)
-      || reg_overlap_mentioned_p (dest, op_false))
+      || (op_true && reg_overlap_mentioned_p (dest, op_true))
+      || (op_false && reg_overlap_mentioned_p (dest, op_false)))
     dest = gen_reg_rtx (maskcmp ? cmp_mode : mode);
 
   /* Compare patterns for int modes are unspec in AVX512F only.  */
@@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[])
   return true;
 }
 
-/* Expand a floating-point vector conditional move; a vcond operation
-   rather than a movcc operation.  */
+/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes.  */
+
+static int
+ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0;
+    case LT:
+    case LTU:
+      return 1;
+    case LE:
+    case LEU:
+      return 2;
+    case NE:
+      return 4;
+    case GE:
+    case GEU:
+      return 5;
+    case GT:
+    case GTU:
+      return 6;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes.  */
+
+static int
+ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+  switch (code)
+    {
+    case EQ:
+      return 0x08;
+    case NE:
+      return 0x04;
+    case GT:
+      return 0x16;
+    case LE:
+      return 0x1a;
+    case GE:
+      return 0x15;
+    case LT:
+      return 0x19;
+    default:
+      gcc_unreachable ();
+    }
+}
+
+/* Return immediate value to be used in UNSPEC_PCMP
+   for comparison CODE in MODE.  */
+
+static int
+ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode)
+{
+  if (FLOAT_MODE_P (mode))
+    return ix86_fp_cmp_code_to_pcmp_immediate (code);
+  return ix86_int_cmp_code_to_pcmp_immediate (code);
+}
+
+/* Expand AVX-512 vector comparison.  */
 
 bool
-ix86_expand_fp_vcond (rtx operands[])
+ix86_expand_mask_vec_cmp (rtx operands[])
 {
-  enum rtx_code code = GET_CODE (operands[3]);
+  machine_mode mask_mode = GET_MODE (operands[0]);
+  machine_mode cmp_mode = GET_MODE (operands[2]);
+  enum rtx_code code = GET_CODE (operands[1]);
+  rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode));
+  int unspec_code;
+  rtx unspec;
+
+  switch (code)
+    {
+    case LEU:
+    case GTU:
+    case GEU:
+    case LTU:
+      unspec_code = UNSPEC_UNSIGNED_PCMP;
+    default:
+      unspec_code = UNSPEC_PCMP;
+    }
+
+  unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2],
+						 operands[3], imm),
+			   unspec_code);
+  emit_insn (gen_rtx_SET (operands[0], unspec));
+
+  return true;
+}
+
+/* Expand fp vector comparison.  */
+
+bool
+ix86_expand_fp_vec_cmp (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[1]);
   rtx cmp;
 
   code = ix86_prepare_sse_fp_compare_args (operands[0], code,
-					   &operands[4], &operands[5]);
+					   &operands[2], &operands[3]);
   if (code == UNKNOWN)
     {
       rtx temp;
-      switch (GET_CODE (operands[3]))
+      switch (GET_CODE (operands[1]))
 	{
 	case LTGT:
-	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2],
+				     operands[3], NULL, NULL);
 	  code = AND;
 	  break;
 	case UNEQ:
-	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
-				      operands[5], operands[0], operands[0]);
-	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
-				     operands[5], operands[1], operands[2]);
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2],
+				      operands[3], NULL, NULL);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2],
+				     operands[3], NULL, NULL);
 	  code = IOR;
 	  break;
 	default:
@@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[])
 	}
       cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
 				 OPTAB_DIRECT);
-      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
-      return true;
     }
+  else
+    cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3],
+			       operands[1], operands[2]);
 
-  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
-				 operands[5], operands[1], operands[2]))
-    return true;
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
 
-  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
-			     operands[1], operands[2]);
-  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
   return true;
 }
 
-/* Expand a signed/unsigned integral vector conditional move.  */
-
-bool
-ix86_expand_int_vcond (rtx operands[])
+static rtx
+ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
+			 rtx op_true, rtx op_false, bool *negate)
 {
-  machine_mode data_mode = GET_MODE (operands[0]);
-  machine_mode mode = GET_MODE (operands[4]);
-  enum rtx_code code = GET_CODE (operands[3]);
-  bool negate = false;
-  rtx x, cop0, cop1;
-
-  cop0 = operands[4];
-  cop1 = operands[5];
+  machine_mode data_mode = GET_MODE (dest);
+  machine_mode mode = GET_MODE (cop0);
+  rtx x;
 
-  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
-     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
-  if ((code == LT || code == GE)
-      && data_mode == mode
-      && cop1 == CONST0_RTX (mode)
-      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
-      && GET_MODE_UNIT_SIZE (data_mode) > 1
-      && GET_MODE_UNIT_SIZE (data_mode) <= 8
-      && (GET_MODE_SIZE (data_mode) == 16
-	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
-    {
-      rtx negop = operands[2 - (code == LT)];
-      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
-      if (negop == CONST1_RTX (data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 1, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-      else if (GET_MODE_INNER (data_mode) != DImode
-	       && vector_all_ones_operand (negop, data_mode))
-	{
-	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
-					 operands[0], 0, OPTAB_DIRECT);
-	  if (res != operands[0])
-	    emit_move_insn (operands[0], res);
-	  return true;
-	}
-    }
-
-  if (!nonimmediate_operand (cop1, mode))
-    cop1 = force_reg (mode, cop1);
-  if (!general_operand (operands[1], data_mode))
-    operands[1] = force_reg (data_mode, operands[1]);
-  if (!general_operand (operands[2], data_mode))
-    operands[2] = force_reg (data_mode, operands[2]);
+  *negate = false;
 
   /* XOP supports all of the comparisons on all 128-bit vector int types.  */
   if (TARGET_XOP
@@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[])
 	case LE:
 	case LEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  break;
 
 	case GE:
 	case GEU:
 	  code = reverse_condition (code);
-	  negate = true;
+	  *negate = true;
 	  /* FALLTHRU */
 
 	case LT:
@@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[])
 	    case EQ:
 	      /* SSE4.1 supports EQ.  */
 	      if (!TARGET_SSE4_1)
-		return false;
+		return NULL;
 	      break;
 
 	    case GT:
 	    case GTU:
 	      /* SSE4.2 supports GT/GTU.  */
 	      if (!TARGET_SSE4_2)
-		return false;
+		return NULL;
 	      break;
 
 	    default:
@@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[])
 	    case V8HImode:
 	      /* Perform a parallel unsigned saturating subtraction.  */
 	      x = gen_reg_rtx (mode);
-	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1)));
+	      emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0,
+							   cop1)));
 
 	      cop0 = x;
 	      cop1 = CONST0_RTX (mode);
 	      code = EQ;
-	      negate = !negate;
+	      *negate = !*negate;
 	      break;
 
 	    default:
@@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[])
 	}
     }
 
+  if (*negate)
+    std::swap (op_true, op_false);
+
   /* Allow the comparison to be done in one mode, but the movcc to
      happen in another mode.  */
   if (data_mode == mode)
     {
-      x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+      x = ix86_expand_sse_cmp (dest, code, cop0, cop1,
+			       op_true, op_false);
     }
   else
     {
       gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
       x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1,
-			       operands[1+negate], operands[2-negate]);
+			       op_true, op_false);
       if (GET_MODE (x) == mode)
 	x = gen_lowpart (data_mode, x);
     }
 
+  return x;
+}
+
+/* Expand integer vector comparison.  */
+
+bool
+ix86_expand_int_vec_cmp (rtx operands[])
+{
+  rtx_code code = GET_CODE (operands[1]);
+  bool negate = false;
+  rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2],
+				     operands[3], NULL, NULL, &negate);
+
+  if (!cmp)
+    return false;
+
+  if (negate)
+    cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
+				   CONST0_RTX (GET_MODE (cmp)),
+				   NULL, NULL, &negate);
+
+  gcc_assert (!negate);
+
+  if (operands[0] != cmp)
+    emit_move_insn (operands[0], cmp);
+
+  return true;
+}
+
+/* Expand a floating-point vector conditional move; a vcond operation
+   rather than a movcc operation.  */
+
+bool
+ix86_expand_fp_vcond (rtx operands[])
+{
+  enum rtx_code code = GET_CODE (operands[3]);
+  rtx cmp;
+
+  code = ix86_prepare_sse_fp_compare_args (operands[0], code,
+					   &operands[4], &operands[5]);
+  if (code == UNKNOWN)
+    {
+      rtx temp;
+      switch (GET_CODE (operands[3]))
+	{
+	case LTGT:
+	  temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = AND;
+	  break;
+	case UNEQ:
+	  temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
+				      operands[5], operands[0], operands[0]);
+	  cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
+				     operands[5], operands[1], operands[2]);
+	  code = IOR;
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
+				 OPTAB_DIRECT);
+      ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+      return true;
+    }
+
+  if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
+				 operands[5], operands[1], operands[2]))
+    return true;
+
+  cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
+			     operands[1], operands[2]);
+  ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+  return true;
+}
+
+/* Expand a signed/unsigned integral vector conditional move.  */
+
+bool
+ix86_expand_int_vcond (rtx operands[])
+{
+  machine_mode data_mode = GET_MODE (operands[0]);
+  machine_mode mode = GET_MODE (operands[4]);
+  enum rtx_code code = GET_CODE (operands[3]);
+  bool negate = false;
+  rtx x, cop0, cop1;
+
+  cop0 = operands[4];
+  cop1 = operands[5];
+
+  /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+     and x < 0 ? 1 : 0 into (unsigned) x >> 31.  */
+  if ((code == LT || code == GE)
+      && data_mode == mode
+      && cop1 == CONST0_RTX (mode)
+      && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
+      && GET_MODE_UNIT_SIZE (data_mode) > 1
+      && GET_MODE_UNIT_SIZE (data_mode) <= 8
+      && (GET_MODE_SIZE (data_mode) == 16
+	  || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
+    {
+      rtx negop = operands[2 - (code == LT)];
+      int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
+      if (negop == CONST1_RTX (data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 1, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+      else if (GET_MODE_INNER (data_mode) != DImode
+	       && vector_all_ones_operand (negop, data_mode))
+	{
+	  rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
+					 operands[0], 0, OPTAB_DIRECT);
+	  if (res != operands[0])
+	    emit_move_insn (operands[0], res);
+	  return true;
+	}
+    }
+
+  if (!nonimmediate_operand (cop1, mode))
+    cop1 = force_reg (mode, cop1);
+  if (!general_operand (operands[1], data_mode))
+    operands[1] = force_reg (data_mode, operands[1]);
+  if (!general_operand (operands[2], data_mode))
+    operands[2] = force_reg (data_mode, operands[2]);
+
+  x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1,
+			       operands[1], operands[2], &negate);
+
+  if (!x)
+    return false;
+
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 			 operands[2-negate]);
   return true;
@@ -51678,6 +51866,30 @@ ix86_autovectorize_vector_sizes (void)
     (TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
 }
 
+/* Implemenation of targetm.vectorize.get_mask_mode.  */
+
+static machine_mode
+ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+  /* Scalar mask case.  */
+  if (TARGET_AVX512F && vector_size == 64)
+    {
+      unsigned elem_size = vector_size / nunits;
+      if ((vector_size == 64 || TARGET_AVX512VL)
+	  && ((elem_size == 4 || elem_size == 8)
+	      || TARGET_AVX512BW))
+	return smallest_mode_for_size (nunits, MODE_INT);
+    }
+
+  unsigned elem_size = vector_size / nunits;
+  machine_mode elem_mode
+    = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+  gcc_assert (elem_size * nunits == vector_size);
+
+  return mode_for_vector (elem_mode, nunits);
+}
+
 
 
 /* Return class of registers which could be used for pseudo of MODE
@@ -52612,6 +52824,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 #undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
 #define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
   ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
 #undef TARGET_VECTORIZE_INIT_COST
 #define TARGET_VECTORIZE_INIT_COST ix86_init_cost
 #undef TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..a8d55cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -605,6 +605,15 @@
    (V16SF "HI") (V8SF  "QI") (V4SF  "QI")
    (V8DF  "QI") (V4DF  "QI") (V2DF  "QI")])
 
+;; Mapping of vector modes to corresponding mask size
+(define_mode_attr avx512fmaskmodelower
+  [(V64QI "di") (V32QI "si") (V16QI "hi")
+   (V32HI "si") (V16HI "hi") (V8HI  "qi") (V4HI "qi")
+   (V16SI "hi") (V8SI  "qi") (V4SI  "qi")
+   (V8DI  "qi") (V4DI  "qi") (V2DI  "qi")
+   (V16SF "hi") (V8SF  "qi") (V4SF  "qi")
+   (V8DF  "qi") (V4DF  "qi") (V2DF  "qi")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr sseintvecmode
   [(V16SF "V16SI") (V8DF  "V8DI")
@@ -2803,6 +2812,150 @@
 		      (const_string "0")))
    (set_attr "mode" "<MODE>")])
 
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:V48_AVX512VL 2 "register_operand")
+	   (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_256 2 "register_operand")
+	   (match_operand:VF_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VF_128 2 "register_operand")
+	   (match_operand:VF_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE"
+{
+  bool ok = ix86_expand_fp_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI48_AVX512VL 2 "register_operand")
+	   (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512F"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+  [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+	(match_operator:<avx512fmaskmode> 1 ""
+	  [(match_operand:VI12_AVX512VL 2 "register_operand")
+	   (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+  "TARGET_AVX512BW"
+{
+  bool ok = ix86_expand_mask_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI_256 2 "register_operand")
+	   (match_operand:VI_256 3 "nonimmediate_operand")]))]
+  "TARGET_AVX2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+  [(set (match_operand:<sseintvecmode> 0 "register_operand")
+	(match_operator:<sseintvecmode> 1 ""
+	  [(match_operand:VI124_128 2 "register_operand")
+	   (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+  "TARGET_SSE2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
+(define_expand "vec_cmpuv2div2di"
+  [(set (match_operand:V2DI 0 "register_operand")
+	(match_operator:V2DI 1 ""
+	  [(match_operand:V2DI 2 "register_operand")
+	   (match_operand:V2DI 3 "nonimmediate_operand")]))]
+  "TARGET_SSE4_2"
+{
+  bool ok = ix86_expand_int_vec_cmp (operands);
+  gcc_assert (ok);
+  DONE;
+})
+
 (define_expand "vcond<V_512:mode><VF_512:mode>"
   [(set (match_operand:V_512 0 "register_operand")
 	(if_then_else:V_512
@@ -17895,7 +18048,7 @@
    (set_attr "btver2_decode" "vector") 
    (set_attr "mode" "<sseinsnmode>")])
 
-(define_expand "maskload<mode>"
+(define_expand "maskload<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "register_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17903,7 +18056,23 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
-(define_expand "maskstore<mode>"
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "register_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "register_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "memory_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
+(define_expand "maskstore<mode><sseintvecmodelower>"
   [(set (match_operand:V48_AVX2 0 "memory_operand")
 	(unspec:V48_AVX2
 	  [(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17912,6 +18081,22 @@
 	  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
+	(vec_merge:V48_AVX512VL
+	  (match_operand:V48_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512F")
+
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
+	(vec_merge:VI12_AVX512VL
+	  (match_operand:VI12_AVX512VL 1 "register_operand")
+	  (match_dup 0)
+	  (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+  "TARGET_AVX512BW")
+
 (define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
   [(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
 	(unspec:AVX256MODE2P


[RFC] Try vector<bool> as a new representation for vector masks

Commit Message

Comments

Patch