[vec-tails,07/10] Support loop epilogue combining

On 16 Jun 10:54, Jeff Law wrote:
> On 05/19/2016 01:44 PM, Ilya Enkovich wrote:
> >Hi,
> >
> >This patch introduces support for loop epilogue combining.  This includes
> >support in cost estimation and all required changes required to mask
> >vectorized loop.
> >
> >Thanks,
> >Ilya
> >--
> >gcc/
> >
> >2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
> >
> >	* dbgcnt.def (vect_tail_combine): New.
> >	* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
> >	* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
> >	* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
> >	epilogue combined with loop body.
> >	(vect_do_peeling_for_loop_bound): Likewise.
> >	* tree-vect-loop.c Include alias.h and dbgcnt.h.
> >	(vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
> >	arg, compute number of iterations for which loop epilogue combining is
> >	profitable.
> >	(vect_generate_tmps_on_preheader): Support combined apilogue.
> >	(vect_gen_ivs_for_masking): New.
> >	(vect_get_mask_index_for_elems): New.
> >	(vect_get_mask_index_for_type): New.
> >	(vect_gen_loop_masks): New.
> >	(vect_mask_reduction_stmt): New.
> >	(vect_mask_mask_load_store_stmt): New.
> >	(vect_mask_load_store_stmt): New.
> >	(vect_combine_loop_epilogue): New.
> >	(vect_transform_loop): Support combined apilogue.
> >
> >
> >diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> >index fab5879..b3c0668 100644
> >--- a/gcc/tree-vect-loop-manip.c
> >+++ b/gcc/tree-vect-loop-manip.c
> >@@ -1464,11 +1469,20 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
> >   bb_between_loops = new_exit_bb;
> >   bb_after_second_loop = split_edge (single_exit (second_loop));
> >
> >-  pre_condition =
> >-	fold_build2 (EQ_EXPR, boolean_type_node, *first_niters, niters);
> >-  skip_e = slpeel_add_loop_guard (bb_between_loops, pre_condition, NULL,
> >-                                  bb_after_second_loop, bb_before_first_loop,
> >-				  inverse_probability (second_guard_probability));
> >+  if (skip_second_after_first)
> >+    /* We can just redirect edge from bb_between_loops to
> >+       bb_after_second_loop but we have many code assuming
> >+       we have a guard after the first loop.  So just make
> >+       always taken condtion.  */
> >+    pre_condition = fold_build2 (EQ_EXPR, boolean_type_node, integer_zero_node,
> >+				 integer_zero_node);
> This isn't ideal, but I don't think it's that big of an issue.
> 
> >@@ -1758,8 +1772,10 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> >   basic_block preheader;
> >   int loop_num;
> >   int max_iter;
> >+  int bound2;
> >   tree cond_expr = NULL_TREE;
> >   gimple_seq cond_expr_stmt_list = NULL;
> >+  bool combine = LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo);
> >
> >   if (dump_enabled_p ())
> >     dump_printf_loc (MSG_NOTE, vect_location,
> >@@ -1769,12 +1785,13 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> >
> >   loop_num  = loop->num;
> >
> >+  bound2 = combine ? th : LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> Can you document what the TH parameter is to the various routines that use
> it in tree-vect-loop-manip.c?  I realize you didn't add it, but it would
> help anyone looking at this code in the future to know it's the threshold of
> iterations for vectorization without having to find it in other function
> comment headers ;-)
> 
> That's pre-approved to go in immediately :-)
> 
> >@@ -1803,7 +1820,11 @@ vect_do_peeling_for_loop_bound (loop_vec_info loop_vinfo,
> >   max_iter = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
> > 	      ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) * 2
> > 	      : LOOP_VINFO_VECT_FACTOR (loop_vinfo)) - 2;
> >-  if (check_profitability)
> >+  /* When epilogue is combined only profitability
> >+     treshold matters.  */
> s/treshold/threshold/
> 
> 
> 
> > static void
> > vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
> > 				    int *ret_min_profitable_niters,
> >-				    int *ret_min_profitable_estimate)
> >+				    int *ret_min_profitable_estimate,
> >+				    int *ret_min_profitable_combine_niters)
> I'm torn a bit here.  There's all kinds of things missing/incomplete in the
> function comments throughout the vectorizer.  And in some cases, like this
> one, the parameters are largely self-documenting.  But we've also got coding
> standards that we'd like to adhere to.
> 
> I don't think it's fair to require you to fix all these issues in the
> vectorizer (though if you wanted to, I'd fully support those an independent
> cleanups).
> 
> Perhaps just document LOOP_VINFO with a generic comment about the ret_*
> parameters for this function rather than a comment for each ret_* parameter.
> Pre-approved for the trunk independent of the vec-tails work.
> 
> 
> >@@ -3728,6 +3784,77 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
> > 		     min_profitable_estimate);
> >
> >+
> >+      unsigned combine_treshold
> >+	= PARAM_VALUE (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD);
> >+      /* Calculate profitability combining epilogue with the main loop.
> >+	 We have a threshold for inside cost overhead (not applied
> >+	 for low trip count loop case):
> >+	 MIC * 100 < VIC * CT
> >+	 Masked iteration should be better than a scalar prologue:
> >+	 MIC + VIC < SIC * epilogue_niters  */
> Can you double-check the whitespace formatting here.  Where does the "100"
> come from and should it be a param?

I checked the formatting.  We have 100 here because combine_treshold
is measured in percent.  E.g. value 2 means iterations masking overhead
shouldn't exceed 2% of vector iteration cost.

> 
> 
> >@@ -6886,6 +7030,485 @@ vect_generate_tmps_on_preheader (loop_vec_info loop_vinfo,
> >   return;
> > }
> >
> 
> >+
> >+/* Function vect_gen_loop_masks.
> >+
> >+   Create masks to mask a loop desvribed by LOOP_VINFO.  Masks
> s/desvribed/described/
> 
> >+   are created according to LOOP_VINFO_REQUIRED_MASKS and are stored
> >+   into MASKS vector.
> >+
> >+   Index of a mask in a vector is computed according to a number
> >+   of masks's elements.  Masks are sorted by number of its elements
> >+   in descending order.  Index 0 is used to access a mask with
> >+   current_vector_size elements.  Among masks with the same number
> >+   of elements the one with lower index is used to mask iterations
> >+   with smaller iteration counter.  Note that you may get NULL elements
> >+   for masks which are not required.  Use vect_get_mask_index_for_elems
> >+   or vect_get_mask_index_for_type to access resulting vector.  */
> >+
> >+static void
> >+vect_gen_loop_masks (loop_vec_info loop_vinfo, vec<tree> *masks)
> >+{
> >+  struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >+  edge pe = loop_preheader_edge (loop);
> >+  tree niters = LOOP_VINFO_NITERS (loop_vinfo);
> >+  unsigned min_mask_elems, max_mask_elems, nmasks;
> >+  unsigned iv_elems, cur_mask, prev_mask, cur_mask_elems;
> >+  auto_vec<tree> ivs;
> >+  tree vectype, mask_type;
> >+  tree vec_niters, vec_niters_val, mask;
> >+  gimple *stmt;
> >+  basic_block bb;
> >+  gimple_stmt_iterator gsi = gsi_after_labels (loop->header);
> >+  unsigned vec_size;
> >+
> >+  /* Create required IVs.  */
> >+  vect_gen_ivs_for_masking (loop_vinfo, &ivs);
> >+  vectype = TREE_TYPE (ivs[0]);
> >+
> >+  vec_size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
> >+  iv_elems = TYPE_VECTOR_SUBPARTS (vectype);
> >+
> >+  /* Get a proper niter to build a vector.  */
> >+  if (!is_gimple_val (niters))
> >+    {
> >+      gimple_seq seq = NULL;
> >+      niters = force_gimple_operand (niters, &seq, true, NULL);
> >+      gsi_insert_seq_on_edge_immediate (pe, seq);
> >+    }
> >+  /* We may need a type cast in case niter has a too small type
> >+     for generated IVs.  */
> Nit.  There should be vertical whitespace after the close brace and the
> comment for the next logical block of code.  Can you do a scan over the
> patchkit looking for other instances where the vertical whitespace is
> needed.
> 
> Generally, if you find that a blob of code needs a comment, then the comment
> and blob of code should have that vertical whitespace to visually separate
> it from everything else.
> 
> 
> 
> >+/* Function vect_combine_loop_epilogue.
> >+
> >+   Combine loop epilogue with the main vectorized body.  It requires
> >+   masking of memory accesses and reductions.  */
> So you mask reductions, loads & stores.  Is there anything else that we
> might potentially need to mask to combine the loop & epilogue via masking?
> 
> 
> I don't see anything particularly worrisome here either -- I have a slight
> concern about correctness issues with only masking loads/stores and
> reductions.  But I will defer to your judgment on whether or not there's
> other stuff that we need to mask to combine the epilogue with the loop via
> masking.

We have to mask operations which may cause errors if executed speculatively.
For others we just ignore produced result.  So we don't truly mask reductions
but fix-up their results.  I assume memory accesses are only ones we have to
truly mask (plus non-const calls which are rejected now).  For signalling
arithmetic I assumed we just don't vectorize it.

Basically we should act similar to if-conversion.  I'll check if it has
restrictions I miss.

> 
> Jeff

Here is an updated patch version.

Thanks,
Ilya
--
gcc/

2016-05-28  Ilya Enkovich  <ilya.enkovich@intel.com>

	* dbgcnt.def (vect_tail_combine): New.
	* params.def (PARAM_VECT_COST_INCREASE_COMBINE_THRESHOLD): New.
	* tree-vect-data-refs.c (vect_get_new_ssa_name): Support vect_mask_var.
	* tree-vect-loop-manip.c (slpeel_tree_peel_loop_to_edge): Support
	epilogue combined with loop body.
	(vect_do_peeling_for_loop_bound): LIkewise.
	(vect_do_peeling_for_alignment): ???
	* tree-vect-loop.c Include alias.h and dbgcnt.h.
	(vect_estimate_min_profitable_iters): Add ret_min_profitable_combine_niters
	arg, compute number of iterations for which loop epilogue combining is
	profitable.
	(vect_generate_tmps_on_preheader): Support combined apilogue.
	(vect_gen_ivs_for_masking): New.
	(vect_get_mask_index_for_elems): New.
	(vect_get_mask_index_for_type): New.
	(vect_gen_loop_masks): New.
	(vect_mask_reduction_stmt): New.
	(vect_mask_mask_load_store_stmt): New.
	(vect_mask_load_store_stmt): New.
	(vect_combine_loop_epilogue): New.
	(vect_transform_loop): Support combined apilogue.

[vec-tails,07/10] Support loop epilogue combining

Commit Message

Comments

Patch