diff mbox series

[RFC,2/5] vect: Introduce loop reduction affine closure to vect pattern recog

Message ID LV2PR01MB783928896DF80235611B7FD2F7AF2@LV2PR01MB7839.prod.exchangelabs.com
State New
Headers show
Series [RFC,1/5] vect: Fix single_imm_use in tree_vect_patterns | expand

Commit Message

Feng Xue OS July 21, 2024, 9:15 a.m. UTC
For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
mathematically proved, it is legitimate to optimize evaluation of a value
with lane-reducing pattern, only if its definition statement locates in affine
closure. That is to say, canonicalized representation for loop reduction
could be of the following affine form, in which "opX" denotes an operation
for lane-reducing pattern, h(i) represents remaining operations irrelvant to
those patterns.

  for (i)
    sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

At initialization, we invoke a preprocessing step to mark all statements in
affine closure, which could ease retrieval of the property during pattern
matching. Since a pattern hit would replace original statement with new
pattern statements, we resort to a postprocessing step after recognition,
to parse semantics of those new, and incrementally update affine closure,
or rollback the pattern change if it would break completeness of existing
closure.

Thus, inside affine closure, recog framework could universally handle both
lane-reducing and normal patterns. Also with this patch, we are able to add
more complicated logic to enhance lane-reducing patterns.

Thanks,
Feng
---
gcc/
	* tree-vectorizer.h (enum vect_reduc_pattern_status): New enum.
	(_stmt_vec_info): Add a new field reduc_pattern_status.
	* tree-vect-patterns.cc (vect_split_statement): Adjust statement
	status for reduction affine closure.
	(vect_convert_input): Do not reuse conversion statement in process.
	(vect_reassociating_reduction_p): Add a condition check to only allow
	statement in reduction affine closure.
	(vect_pattern_expr_invariant_p): New function.
	(vect_get_affine_operands_mask): Likewise.
	(vect_mark_reduction_affine_closure): Likewise.
	(vect_mark_stmts_for_reduction_pattern_recog): Likewise.
	(vect_get_prev_reduction_stmt): Likewise.
	(vect_mark_reduction_pattern_sequence_formed): Likewise.
	(vect_check_pattern_stmts_for_reduction): Likewise.
	(vect_pattern_recog_1): Check if a pattern recognition would break
	existing lane-reducing pattern statements.
	(vect_pattern_recog): Mark loop reduction affine closure.
---
 gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++-
 gcc/tree-vectorizer.h     |  23 ++
 2 files changed, 742 insertions(+), 3 deletions(-)
diff mbox series

Patch

From 737e7ea35dff9d85f5dbd5ec908e8b8229a6631d Mon Sep 17 00:00:00 2001
From: Feng Xue <fxue@os.amperecomputing.com>
Date: Mon, 8 Apr 2024 10:57:54 +0800
Subject: [PATCH 2/5] vect: Introduce loop reduction affine closure to vect
 pattern recog

For sum-based loop reduction, its affine closure is composed by statements
whose results and derived computation only end up in the reduction, and are
not used in any non-linear transform operation. The concept underlies the
generalized lane-reducing pattern recognition in the coming patches. As
mathematically proved, it is legitimate to optimize evaluation of a value
with lane-reducing pattern, only if its definition statement locates in affine
closure. That is to say, canonicalized representation for loop reduction
could be of the following affine form, in which "opX" denotes an operation
for lane-reducing pattern, h(i) represents remaining operations irrelvant to
those patterns.

  for (i)
    sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);

At initialization, we invoke a preprocessing step to mark all statements in
affine closure, which could ease retrieval of the property during pattern
matching. Since a pattern hit would replace original statement with new
pattern statements, we resort to a postprocessing step after recognition,
to parse semantics of those new, and incrementally update affine closure,
or rollback the pattern change if it would break completeness of existing
closure.

Thus, inside affine closure, recog framework could universally handle both
lane-reducing and normal patterns. Also with this patch, we are able to add
more complicated logic to enhance lane-reducing patterns.

2024-04-08 Feng Xue <fxue@os.amperecomputing.com>

gcc/
	* tree-vectorizer.h (enum vect_reduc_pattern_status): New enum.
	(_stmt_vec_info): Add a new field reduc_pattern_status.
	* tree-vect-patterns.cc (vect_split_statement): Adjust statement
	status for reduction affine closure.
	(vect_convert_input): Do not reuse conversion statement in process.
	(vect_reassociating_reduction_p): Add a condition check to only allow
	statement in reduction affine closure.
	(vect_pattern_expr_invariant_p): New function.
	(vect_get_affine_operands_mask): Likewise.
	(vect_mark_reduction_affine_closure): Likewise.
	(vect_mark_stmts_for_reduction_pattern_recog): Likewise.
	(vect_get_prev_reduction_stmt): Likewise.
	(vect_mark_reduction_pattern_sequence_formed): Likewise.
	(vect_check_pattern_stmts_for_reduction): Likewise.
	(vect_pattern_recog_1): Check if a pattern recognition would break
	existing lane-reducing pattern statements.
	(vect_pattern_recog): Mark loop reduction affine closure.
---
 gcc/tree-vect-patterns.cc | 722 +++++++++++++++++++++++++++++++++++++-
 gcc/tree-vectorizer.h     |  23 ++
 2 files changed, 742 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index ca8809e7cfd..02f6b942026 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -750,7 +750,6 @@  vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 	  gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq);
 	  gsi_insert_before_without_update (&gsi, stmt1, GSI_SAME_STMT);
 	}
-      return true;
     }
   else
     {
@@ -783,9 +782,35 @@  vect_split_statement (vec_info *vinfo, stmt_vec_info stmt2_info, tree new_rhs,
 	  dump_printf_loc (MSG_NOTE, vect_location, "and: %G",
 			   (gimple *) new_stmt2);
 	}
+    }
 
-      return true;
+  /* Since this function would change existing conversion statement no matter
+     the pattern is finally applied or not, we should check whether affine
+     closure of loop reduction need to be adjusted for impacted statements.  */
+  unsigned int status = stmt2_info->reduc_pattern_status;
+
+  if (status != rpatt_none)
+    {
+      tree rhs_type = TREE_TYPE (gimple_assign_rhs1 (stmt1));
+      tree new_rhs_type = TREE_TYPE (new_rhs);
+
+      /* The new statement generated by splitting is a nature widening
+	 conversion. */
+      gcc_assert (TYPE_PRECISION (rhs_type) < TYPE_PRECISION (new_rhs_type));
+      gcc_assert (TYPE_UNSIGNED (rhs_type) || !TYPE_UNSIGNED (new_rhs_type));
+
+      /* The new statement would not break transform invariance of lane-
+	 reducing operation, if the original conversion depends on the one
+	 formed previously.  For the case, it should also be marked with
+	 rpatt_formed status.  */
+      if (status & rpatt_formed)
+	vinfo->lookup_stmt (stmt1)->reduc_pattern_status = rpatt_formed;
+
+      if (!is_pattern_stmt_p (stmt2_info))
+	STMT_VINFO_RELATED_STMT (stmt2_info)->reduc_pattern_status = status;
     }
+
+  return true;
 }
 
 /* Look for the following pattern
@@ -890,7 +915,10 @@  vect_convert_input (vec_info *vinfo, stmt_vec_info stmt_info, tree type,
     return wide_int_to_tree (type, wi::to_widest (unprom->op));
 
   tree input = unprom->op;
-  if (unprom->caster)
+
+  /* We should not reuse conversion, if it is just the statement under pattern
+     recognition.  */
+  if (unprom->caster && unprom->caster != stmt_info)
     {
       tree lhs = gimple_get_lhs (unprom->caster->stmt);
       tree lhs_type = TREE_TYPE (lhs);
@@ -1018,6 +1046,11 @@  vect_reassociating_reduction_p (vec_info *vinfo,
   if (!loop_info)
     return false;
 
+  /* As a candidate of lane-reducing pattern matching, the statement must
+     be inside affine closure of loop reduction.  */
+  if (!(stmt_info->reduc_pattern_status & rpatt_allow))
+    return false;
+
   gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
   if (!assign || gimple_assign_rhs_code (assign) != code)
     return false;
@@ -7201,6 +7234,672 @@  static vect_recog_func vect_vect_recog_func_ptrs[] = {
 
 const unsigned int NUM_PATTERNS = ARRAY_SIZE (vect_vect_recog_func_ptrs);
 
+/* Check if EXPR is invariant regarding to vectorization region VINFO.  */
+
+static bool
+vect_pattern_expr_invariant_p (vec_info *vinfo, tree expr)
+{
+  enum vect_def_type dt;
+
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      if (SSA_NAME_IS_DEFAULT_DEF (expr))
+	return true;
+
+      /* This is a value that is defined by a pattern statement that has not
+	 been bounded with its original statement.  */
+      if (!gimple_bb (SSA_NAME_DEF_STMT (expr)))
+	return false;
+    }
+
+  if (!vect_is_simple_use (expr, vinfo, &dt))
+    return false;
+
+  if (dt == vect_external_def || dt == vect_constant_def)
+    return true;
+
+  return false;
+}
+
+/* If OP is a linear transform operation, return index bit mask of all possible
+   variant operands, otherwise, return 0.  */
+
+static int
+vect_get_affine_operands_mask (vec_info *vinfo, const gimple_match_op &op)
+{
+  switch (op.code.safe_as_tree_code ())
+    {
+      CASE_CONVERT:
+	if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
+	  break;
+	/* FALLTHRU */
+
+      case SSA_NAME:
+      case NEGATE_EXPR:
+      case BIT_NOT_EXPR:
+	return 1 << 0;
+
+      case PLUS_EXPR:
+      case MINUS_EXPR:
+	return (1 << 0) | (1 << 1);
+
+      case MULT_EXPR:
+	if (vect_pattern_expr_invariant_p (vinfo, op.ops[0]))
+	  return 1 << 1;
+	/* FALLTHRU */
+
+      case LSHIFT_EXPR:
+	if (vect_pattern_expr_invariant_p (vinfo, op.ops[1]))
+	  return 1 << 0;
+	break;
+
+      default:
+	if (lane_reducing_op_p (op.code))
+	  {
+	    /* The last operand of lane-reducing op is for reduction.  */
+	    gcc_assert (op.num_ops > 1);
+	    return 1 << (op.num_ops - 1);
+	  }
+	break;
+    }
+
+  return 0;
+}
+
+/* Mark all statements in affine closure whose computation leads to START that
+   is non-reduction addend of a loop reduction statement.  The corresponding
+   reduction PHI is represented by REDUC_INFO.  For ssa name defined by marked
+   statement, we record the count of uses that have not been marked so far,
+   into hash map USE_CNT_MAP.  This function is to be called for all reduction
+   statements in the loop.  */
+
+static void
+vect_mark_reduction_affine_closure (loop_vec_info loop_vinfo,
+				    tree start, stmt_vec_info reduc_info,
+				    hash_map<tree, unsigned> &use_cnt_map)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  auto_vec<tree> worklist;
+
+  worklist.safe_push (start);
+
+  do
+    {
+      tree value = worklist.pop ();
+      stmt_vec_info stmt_info = loop_vinfo->lookup_def (value);
+
+      if (!stmt_info || STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def)
+	continue;
+
+      if (!has_single_use (value))
+	{
+	  bool exist;
+	  auto &use_cnt = use_cnt_map.get_or_insert (value, &exist);
+
+	  if (!exist)
+	    use_cnt = num_imm_uses (value);
+
+	  gcc_checking_assert (use_cnt > 0);
+
+	  /* As long as value is not referred by statement outside of reduction
+	     affine closure, we are free to apply lane-reducing patterns to it
+	     without duplication, no matter whether the value is single used
+	     or not, thus even sharing a lane-reducing operation among multiple
+	     loop reductions could be possible.  */
+	  if (--use_cnt)
+	    continue;
+	}
+
+      gimple *stmt = stmt_info->stmt;
+      gimple_match_op op;
+
+      /* Skip reduction PHI statement and leaf statement like "x = const".  */
+      if (!gimple_extract_op (stmt, &op))
+	continue;
+
+      if (needs_fold_left_reduction_p (op.type, op.code)
+	  || gimple_bb (stmt)->loop_father != loop)
+	continue;
+
+      stmt_info->reduc_pattern_status = rpatt_allow;
+
+      /* Vectorizable analysis and transform on lane-reducing operation needs
+	 some information in the associated reduction PHI statement.  */
+      STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info;
+
+      if (auto mask = vect_get_affine_operands_mask (loop_vinfo, op))
+	{
+	  /* Try to expand affine closure to dependant affine operands.  */
+	  for (unsigned i = 0; i < op.num_ops; i++)
+	    {
+	      if (mask & (1 << i))
+		worklist.safe_push (op.ops[i]);
+	    }
+	}
+    } while (!worklist.is_empty ());
+}
+
+/* The prerequisite to optimize evaluation of a value with lane-reducing
+   pattern is that its definition statement must locate in affine closure of
+   non-reduction addend of loop reduction statements.  To be specific, the
+   value and all its derived computation only end up in loop reductions, and
+   are not used in any non-linear transform operation.  That is to say, if
+   such kind of patterns are matched, final pattern statements for loop
+   reduction could be canonicalized to the following affine form, in which
+   "opX" denotes a lane-reducing operation, h(i) represents other operations
+   irrelvant to those patterns.
+
+     for (i)
+       sum += cst0 * op0 + cst1 * op1 + ... + cstN * opN + h(i);
+
+   This function traverses all loop reductions to discover affine closures
+   and mark all statements inside them.  */
+
+static void
+vect_mark_stmts_for_reduction_pattern_recog (loop_vec_info loop_vinfo)
+{
+  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+  const edge latch = loop_latch_edge (loop);
+  basic_block header = loop->header;
+  hash_map<tree, unsigned> use_cnt_map;
+
+  DUMP_VECT_SCOPE ("vect_mark_stmts_for_reduction_pattern_recog");
+
+  for (auto gsi = gsi_start_phis (header); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gphi *phi = gsi.phi ();
+      stmt_vec_info reduc_info = loop_vinfo->lookup_stmt (phi);
+
+      if (!reduc_info
+	  || STMT_VINFO_DEF_TYPE (reduc_info) != vect_reduction_def
+	  || STMT_VINFO_REDUC_CODE (reduc_info) != PLUS_EXPR
+	  || STMT_VINFO_REDUC_TYPE (reduc_info) != TREE_CODE_REDUCTION)
+	continue;
+
+      tree start_def = PHI_RESULT (phi);
+      tree reduc_def = PHI_ARG_DEF_FROM_EDGE (phi, latch);
+      auto_vec<stmt_vec_info, 8> reduc_stmts;
+      auto_vec<tree, 8> addends;
+
+      while (reduc_def != start_def)
+	{
+	  gimple *stmt = SSA_NAME_DEF_STMT (reduc_def);
+	  gimple_match_op op;
+
+	  /* Dot not step into inner loop.  */
+	  if (gimple_bb (stmt)->loop_father != loop)
+	    break;
+
+	  if (!gimple_extract_op (stmt, &op))
+	    {
+	      gcc_assert (gimple_code (stmt) == GIMPLE_PHI);
+	      break;
+	    }
+
+	  stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+	  int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+
+	  gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops);
+
+	  if (op.code == PLUS_EXPR || op.code == MINUS_EXPR)
+	    {
+	      if (needs_fold_left_reduction_p (op.type, op.code))
+		break;
+
+	      /* Record non-reduction addend.  */
+	      addends.safe_push (op.ops[reduc_idx ? 0 : 1]);
+	    }
+	  else
+	    {
+	      gcc_assert (CONVERT_EXPR_CODE_P (op.code));
+	      if (!tree_nop_conversion_p (op.type, TREE_TYPE (op.ops[0])))
+		break;
+	    }
+
+	  reduc_stmts.safe_push (stmt_info);
+	  reduc_def = op.ops[reduc_idx];
+	}
+
+      if (reduc_def == start_def)
+	{
+	  /* Mark reduction PHI statement although it would not be matched
+	     against any pattern.  */
+	  reduc_info->reduc_pattern_status = rpatt_allow;
+
+	  for (auto stmt_info : reduc_stmts)
+	    {
+	      /* Mark reduction statement itself.  */
+	      stmt_info->reduc_pattern_status = rpatt_allow;
+
+	      /* Vectorizable analysis and transform on lane-reducing operation
+		 needs some information in the associated reduction PHI
+		 statement.  */
+	      STMT_VINFO_REDUC_DEF (stmt_info) = reduc_info;
+	    }
+
+	  /* Mark statements that participate in loop reduction indirectly
+	     through non-reduction addends.  */
+	  for (auto addend : addends)
+	    vect_mark_reduction_affine_closure (loop_vinfo, addend,
+						reduc_info, use_cnt_map);
+	}
+    }
+}
+
+/* For a reduction statement STMT_INFO, which could also be the reduction PHI,
+   return the previous reduction statement that it depends on. */
+
+static stmt_vec_info
+vect_get_prev_reduction_stmt (loop_vec_info loop_vinfo,
+			      stmt_vec_info stmt_info)
+{
+  gimple *stmt = stmt_info->stmt;
+  tree prev_def;
+
+  if (is_a <gphi *> (stmt))
+    {
+      class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+      const edge latch = loop_latch_edge (loop);
+
+      gcc_assert (STMT_VINFO_REDUC_DEF (stmt_info));
+      gcc_assert (loop == gimple_bb (stmt)->loop_father);
+      prev_def = PHI_ARG_DEF_FROM_EDGE (stmt, latch);
+    }
+  else
+    {
+      int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
+      gimple_match_op op;
+
+      if (!gimple_extract_op (stmt, &op))
+	gcc_unreachable ();
+
+      gcc_assert (reduc_idx >= 0 && reduc_idx < (int) op.num_ops);
+      prev_def = op.ops[reduc_idx];
+    }
+
+  return vect_stmt_to_vectorize (loop_vinfo->lookup_def (prev_def));
+}
+
+/* Given pattern statement sequence for ORIG_STMT_INFO (including PATTERN_STMT
+   and STMT_VINFO_PATTERN_DEF_SEQ), a subset of it represented by FORMED_STMTS
+   are known to depend on (or just be) lane-reducing operations.  In this
+   function, the subset would be marked with rpatt_formed at first, then the
+   status is forward propagated to every dependent pattern statement along
+   paths that contribute to PATTERN_STMT, other statements remain unchanged.
+   FORMED_STMTS is reset to empty upon completion.  */
+
+static void
+vect_mark_reduction_pattern_sequence_formed (loop_vec_info loop_vinfo,
+					     stmt_vec_info orig_stmt_info,
+					     gimple *pattern_stmt,
+					     vec<stmt_vec_info> &formed_stmts)
+{
+  stmt_vec_info last_stmt = formed_stmts.last ();
+  stmt_vec_info related_stmt = STMT_VINFO_RELATED_STMT (last_stmt);
+  gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
+  hash_map<tree, auto_vec<stmt_vec_info>> use_map;
+
+  /* Due to lack of a mechanism to quickly get immedidate uses for a pattern
+     def, we have to build a simple def-use graph out of pattern statement
+     sequence.  */
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (gsi_stmt (gsi));
+	gimple_match_op op;
+
+	gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == related_stmt);
+
+	/* Since elements are placed to FORMED_STMTS in the way that the nearer
+	   distance to PATTERN_STMT, the first order, pattern statements after
+	   the last one in the set must not depend on lane-reducing operation,
+	   no need to process them.  */
+	if (stmt_info == last_stmt)
+	  goto out;
+
+	if (!gimple_extract_op (stmt_info->stmt, &op))
+	  continue;
+
+	for (unsigned i = 0; i < op.num_ops; i++)
+	  {
+	    if (TREE_CODE (op.ops[i]) == SSA_NAME)
+	      use_map.get_or_insert (op.ops[i]).safe_push (stmt_info);
+	  }
+      }
+out:
+
+  basic_block bb = gimple_bb (pattern_stmt);
+
+  do
+    {
+      stmt_vec_info stmt_info = formed_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt) == bb);
+
+      /* A statement may be reached from more than one lane-reducing
+	 operations, suppose a case in which two dot-products are added
+	 together.  */
+      if (stmt_info->reduc_pattern_status & rpatt_formed)
+	continue;
+
+      stmt_info->reduc_pattern_status |= rpatt_formed;
+
+      /* Do not propagate status outside of pattern statement sequence.  */
+      if (stmt != pattern_stmt)
+	{
+	  auto *uses = use_map.get (gimple_get_lhs (stmt));
+
+	  gcc_assert (uses);
+	  for (auto use : *uses)
+	    formed_stmts.safe_push (use);
+	}
+    } while (!formed_stmts.is_empty ());
+}
+
+/* A successful pattern recognition would replace matched statement with new
+   pattern statements, which might cause loop reduction affine closure being
+   changed.  On the one hand, new linear-transform-like pattern statement could
+   be pulled into closure, for example, this could happen with a pattern that
+   decomposes a mult-by-constant to a series of additions and shifts.  On the
+   other hand, some statements that are originally in closure have to be kicked
+   out if linearity of a relay statement linking into the closure would be
+   broken, such as, due to introduction of a non-trivial conversion.  However,
+   this would get us into a conflict situation when impacted statement connects
+   lane-reducing and loop reduction statement, in that lane-reducing pattern
+   could not be reverted once it has been formed.  Only alternative is to
+   invalidate the other pattern in process.
+
+   Therefore, after a pattern is recognized on ORIG_STMT_INFO, this function
+   is called to parse semantics of all new pattern statements (including
+   PATTERN_STMT), and check if possible resultant adjustment on affine closure
+   of loop reduction would conflict with existing lane-reducing statements, if
+   not, return true, otherwise, return false.  */
+
+static bool
+vect_check_pattern_stmts_for_reduction (loop_vec_info loop_vinfo,
+					stmt_vec_info orig_stmt_info,
+					gimple *pattern_stmt)
+{
+  unsigned status = orig_stmt_info->reduc_pattern_status;
+
+  /* Nothing to be done if original statement is reduction irrelevant.  */
+  if (status == rpatt_none)
+    return true;
+
+  /* Degraded lane-reducing statement is not in reduction affine closure.
+     Pattern recognition on such statement should be very rare.  Do not allow
+     it for simplicity.  */
+  if (!(status & rpatt_allow))
+    return false;
+
+  auto_vec<stmt_vec_info> non_reduc_stmts;
+  auto_vec<stmt_vec_info> rpatt_formed_stmts;
+  gimple_seq pattern_seq = STMT_VINFO_PATTERN_DEF_SEQ (orig_stmt_info);
+  gimple_match_op op;
+
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+
+	if (!stmt_info)
+	  stmt_info = loop_vinfo->add_stmt (stmt);
+
+	/* Replacement of original statement by pattern statement sequence has
+	   not been committed yet, so basic block is not set.  This fact could
+	   be used to distinguish these pending pattern statements from
+	   existing ones.  */
+	gcc_assert (!gimple_bb (stmt));
+
+	/* Initially mark pattern statement as in affine closure, and this
+	   status might be changed later according to def/use relationship
+	   among all pattern statements.  */
+	stmt_info->reduc_pattern_status = rpatt_allow;
+      }
+
+  /* Traverse statements in the order that use precedes def.  */
+  for (auto seq : { pattern_stmt, pattern_seq })
+    for (auto gsi = gsi_last (seq); !gsi_end_p (gsi); gsi_prev (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+
+	/* Need not do any further for leaf statement like "x = const".  */
+	if (!gimple_extract_op (stmt, &op))
+	  continue;
+
+	stmt_vec_info stmt_info = loop_vinfo->lookup_stmt (stmt);
+	int affine_oprnds_mask = 0;
+
+	if (needs_fold_left_reduction_p (op.type, op.code))
+	  stmt_info->reduc_pattern_status = rpatt_none;
+	else if (stmt_info->reduc_pattern_status == rpatt_allow)
+	  affine_oprnds_mask = vect_get_affine_operands_mask (loop_vinfo, op);
+
+	/* Record lane-reducing statement into a set from which forward
+	   propagation of rpatt_formed status would start.  */
+	if (lane_reducing_op_p (op.code))
+	  rpatt_formed_stmts.safe_push (stmt_info);
+
+	for (unsigned i = 0; i < op.num_ops; i++)
+	  {
+	    tree oprnd = op.ops[i];
+	    stmt_vec_info oprnd_info = loop_vinfo->lookup_def (oprnd);
+
+	    if (oprnd_info)
+	      oprnd_info = vect_stmt_to_vectorize (oprnd_info);
+	    else if (vect_pattern_expr_invariant_p (loop_vinfo, oprnd))
+	      continue;
+	    else
+	      {
+		/* Pattern statement contains unvectorizable operand, simply
+		   bail out.  */
+		return false;
+	      }
+
+	    if (!(affine_oprnds_mask & (1 << i)))
+	      {
+		/* It is expected that this operand would not be in affine
+		   closure.  */
+
+		if (!gimple_bb (oprnd_info->stmt))
+		  {
+		    /* The operand is defined by another uncommitted pattern
+		       statement, whose status should be changed to
+		       rpatt_none.  */
+		    oprnd_info->reduc_pattern_status = rpatt_none;
+		  }
+		else if (oprnd_info->reduc_pattern_status & rpatt_formed)
+		  {
+		    /* Conflict with an existing lane-reducing pattern
+		       statement, so fail the check.  TODO: Allow pattern
+		       statement that uses value defined by degraded lane-
+		       reducing statement.  */
+		    return false;
+		  }
+		else if (oprnd_info->reduc_pattern_status & rpatt_allow)
+		  {
+		    /* This statement has to be removed from affine closure.
+		       Here only record it into a set, and the actual removal
+		       action will be recursively performed later on it and
+		       all statements that are linked to the closure through
+		       it.  */
+		    non_reduc_stmts.safe_push (oprnd_info);
+		  }
+	      }
+	    else if (oprnd_info->reduc_pattern_status & rpatt_formed)
+	      {
+		/* There must be a path from the original statement to some
+		   lane-reducing statement.  */
+		gcc_assert (status & rpatt_formed);
+
+		/* The operand definition statement should not be uncommited
+		   pattern statement, for which propagation of rpatt_formed
+		   status has not been started.   */
+		gcc_assert (gimple_bb (oprnd_info->stmt));
+
+		/* The operand definition statement should be in reduction
+		   affine closure.   */
+		gcc_assert (oprnd_info->reduc_pattern_status & rpatt_allow);
+
+		/* This uncommitted pattern statement is a boundary point to
+		   which rpatt_formed status would be propagated from other
+		   exisiting statement.  */
+		rpatt_formed_stmts.safe_push (stmt_info);
+	      }
+	  }
+      }
+
+  /* Forward propagate rpatt_formed status inside uncommitted pattern statement
+     sequence.  */
+  if (!rpatt_formed_stmts.is_empty ())
+    vect_mark_reduction_pattern_sequence_formed (loop_vinfo, orig_stmt_info,
+						 pattern_stmt,
+						 rpatt_formed_stmts);
+
+  stmt_vec_info pattern_stmt_info = loop_vinfo->lookup_stmt (pattern_stmt);
+  unsigned pattern_status = pattern_stmt_info->reduc_pattern_status;
+
+  /* Overriding formed lane-reducing operation by another new normal pattern
+     matching is not allowed.  */
+  if ((status & rpatt_formed) && !(pattern_status & rpatt_formed))
+    return false;
+
+  if (pattern_status == rpatt_none && vect_is_reduction (orig_stmt_info))
+    {
+      auto prev = vect_get_prev_reduction_stmt (loop_vinfo, orig_stmt_info);
+
+      /* Since statements in a reduction chain are cyclically dependent, we
+	 have to exclude the whole chain from affine closure if one reduction
+	 statement does not meet lane-reducing prerequisite.  Then prepare for
+	 propagating rpatt_none status to the previous reduction statement.  */
+      non_reduc_stmts.safe_push (prev);
+
+      /* Terminate propagation when rotating back to the original
+	 statement.  */
+      orig_stmt_info->reduc_pattern_status = rpatt_none;
+   }
+
+  /* Backward propagate rpatt_none status to existing statements.  */
+  while (!non_reduc_stmts.is_empty ())
+    {
+      stmt_vec_info stmt_info = non_reduc_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt));
+
+      if (stmt_info->reduc_pattern_status == rpatt_none)
+	continue;
+
+      gcc_assert (!(stmt_info->reduc_pattern_status & rpatt_formed));
+      stmt_info->reduc_pattern_status = rpatt_none;
+
+      if (is_a <gphi *> (stmt))
+	{
+	  auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info);
+
+	  /* For reduction PHI, propagation shoule be confined inside the loop,
+	     so only through latch edge.  */
+	  non_reduc_stmts.safe_push (prev);
+	  continue;
+	}
+
+      if (!gimple_extract_op (stmt, &op))
+	continue;
+
+      gcc_assert (!lane_reducing_op_p (op.code));
+
+      for (unsigned i = 0; i < op.num_ops; i++)
+	{
+	  if (auto oprnd_info = loop_vinfo->lookup_def (op.ops[i]))
+	    non_reduc_stmts.safe_push (vect_stmt_to_vectorize (oprnd_info));
+	}
+    }
+
+  if ((status & rpatt_formed) || !(pattern_status & rpatt_formed))
+    {
+      /* If lane-reducing statement has already existed on other path to the
+	 original statement, no need to propagate rpatt_formed status again.
+	 Or no lane-reducing statement is generated, nothing to do.  */
+      return true;
+    }
+
+  rpatt_formed_stmts.safe_push (orig_stmt_info);
+
+  /* Forward propagate rpatt_formed status inside existing pattern statement
+     sequence.  */
+  if (is_pattern_stmt_p (orig_stmt_info))
+    {
+      stmt_vec_info root_orig_info = vect_orig_stmt (orig_stmt_info);
+      stmt_vec_info root_pattern = vect_stmt_to_vectorize (root_orig_info);
+
+      vect_mark_reduction_pattern_sequence_formed (loop_vinfo, root_orig_info,
+						   root_pattern->stmt,
+						   rpatt_formed_stmts);
+
+      gcc_assert (root_pattern->reduc_pattern_status & rpatt_formed);
+      rpatt_formed_stmts.safe_push (root_orig_info);
+    }
+
+  /* Forward propagate rpatt_formed status to existing statements that haven
+     not been processed for pattern recognition.  */
+  do
+    {
+      stmt_vec_info stmt_info = rpatt_formed_stmts.pop ();
+      gimple *stmt = stmt_info->stmt;
+
+      gcc_assert (gimple_bb (stmt));
+
+      if (stmt_info->reduc_pattern_status & rpatt_formed)
+	continue;
+
+      gcc_assert (stmt_info->reduc_pattern_status & rpatt_allow);
+      stmt_info->reduc_pattern_status |= rpatt_formed;
+
+      if (vect_is_reduction (stmt_info) || is_a <gphi *> (stmt))
+	{
+	  auto prev = vect_get_prev_reduction_stmt (loop_vinfo, stmt_info);
+
+	  /* We must consider statements in reduction chain as a whole in order
+	     to ensure legality of lane-reducing operations, for which all
+	     reduction statements should be marked with rpatt_formed status.
+	     As a special handling, here we traverse reduction statements
+	     "backforward", in that some of them might be created by pattern,
+	     and currently, there is no straightforward way to obtain
+	     immedidate uses for a value defined by pattern statement.  Since
+	     reduction statements are in a cycle chain, the approach would lead
+	     to the same marking as forward progation.  */
+	  rpatt_formed_stmts.safe_push (prev);
+	  continue;
+	}
+
+      tree lhs = gimple_get_lhs (stmt);
+      imm_use_iterator iter;
+      gimple *use_stmt;
+
+      /* The statement has not been processed yet, so we could walk def/use
+	 chain by normal means.  */
+      gcc_assert (!STMT_VINFO_IN_PATTERN_P (stmt_info));
+      gcc_assert (!is_pattern_stmt_p (stmt_info));
+
+      FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+	{
+	  if (is_gimple_debug (use_stmt))
+	    continue;
+
+	  stmt_vec_info use_stmt_info = loop_vinfo->lookup_stmt (use_stmt);
+
+	  /* Because the statement is not reduction statement or PHI, it
+	     should not have any use outside of the loop.  */
+	  gcc_assert (gimple_has_lhs (use_stmt) && use_stmt_info);
+	  rpatt_formed_stmts.safe_push (use_stmt_info);
+	}
+    } while (!rpatt_formed_stmts.is_empty ());
+
+  return true;
+}
+
 /* Mark statements that are involved in a pattern.  */
 
 void
@@ -7383,6 +8082,19 @@  vect_pattern_recog_1 (vec_info *vinfo,
       return;
     }
 
+  loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+
+  /* Check if the pattern would break existing lane-reducing pattern
+     statements.  */
+  if (loop_vinfo
+      && !vect_check_pattern_stmts_for_reduction (loop_vinfo, stmt_info,
+						  pattern_stmt))
+    {
+      /* Invalidate the pattern when detecting conflict.  */
+      STMT_VINFO_PATTERN_DEF_SEQ (stmt_info) = NULL;
+      return;
+    }
+
   /* Found a vectorizable pattern.  */
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -7481,6 +8193,10 @@  vect_pattern_recog (vec_info *vinfo)
 
   DUMP_VECT_SCOPE ("vect_pattern_recog");
 
+  /* Mark loop reduction affine closure for lane-reducing patterns.  */
+  if (auto loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+    vect_mark_stmts_for_reduction_pattern_recog (loop_vinfo);
+
   /* Scan through the stmts in the region, applying the pattern recognition
      functions starting at each stmt visited.  */
   for (unsigned i = 0; i < nbbs; i++)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index df6c8ada2f7..52793ee87e9 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1185,6 +1185,25 @@  enum slp_vect_type {
   hybrid
 };
 
+/* The status of statement for lane-reducing patterns matching.  */
+enum vect_reduc_pattern_status {
+  /* Statement is not in loop reduction affine closure.  */
+  rpatt_none = 0,
+
+  /* Statement is part of loop reduction affine closure, so it is candidate of
+     lane-reducing patterns.  */
+  rpatt_allow = 1,
+
+  /* Statement is or depends on lane-reducing pattern statement, once being
+     marked, the status could not be changed.  In most situations, the
+     statement also has the status of rpatt_allow.  One exceptional case
+     is that when lane-reducing to a given result type is not supported by
+     target, we could settle for second best by creating a degraded lane-
+     reducing statement with a smaller intermediate result type.  Such
+     statement is not in affine closure.  */
+  rpatt_formed = 2
+};
+
 /* Says whether a statement is a load, a store of a vectorized statement
    result, or a store of an invariant value.  */
 enum vec_load_store_type {
@@ -1431,6 +1450,10 @@  public:
   /* Whether on this stmt reduction meta is recorded.  */
   bool is_reduc_info;
 
+  /* Describe how the statement would be handled when performing lane-reducing
+     pattern matching.   */
+  unsigned int reduc_pattern_status;
+
   /* If nonzero, the lhs of the statement could be truncated to this
      many bits without affecting any users of the result.  */
   unsigned int min_output_precision;
-- 
2.17.1