diff mbox series

tree-optimization/116818 - try VMAT_GATHER_SCATTER also for SLP

Message ID 20240923132905.C9C79385840A@sourceware.org
State New
Headers show
Series tree-optimization/116818 - try VMAT_GATHER_SCATTER also for SLP | expand

Commit Message

Richard Biener Sept. 23, 2024, 1:26 p.m. UTC
When not doing SLP and we end up with VMAT_ELEMENTWISE we consider
using strided loads, aka VMAT_GATHER_SCATTER.  The following moves
this logic down to also apply to SLP where we now can end up
using VMAT_ELEMENTWISE as well.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

	PR tree-optimization/116818
	* tree-vect-stmts.cc (get_group_load_store_type): Consider
	VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE also for SLP.
---
 gcc/tree-vect-stmts.cc | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)
diff mbox series

Patch

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ad08fbe5511..d74497822c4 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2264,21 +2264,21 @@  get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 		}
 	    }
 	}
+    }
 
-      /* As a last resort, trying using a gather load or scatter store.
+  /* As a last resort, trying using a gather load or scatter store.
 
-	 ??? Although the code can handle all group sizes correctly,
-	 it probably isn't a win to use separate strided accesses based
-	 on nearby locations.  Or, even if it's a win over scalar code,
-	 it might not be a win over vectorizing at a lower VF, if that
-	 allows us to use contiguous accesses.  */
-      if (*memory_access_type == VMAT_ELEMENTWISE
-	  && single_element_p
-	  && loop_vinfo
-	  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
-						 masked_p, gs_info))
-	*memory_access_type = VMAT_GATHER_SCATTER;
-    }
+     ??? Although the code can handle all group sizes correctly,
+     it probably isn't a win to use separate strided accesses based
+     on nearby locations.  Or, even if it's a win over scalar code,
+     it might not be a win over vectorizing at a lower VF, if that
+     allows us to use contiguous accesses.  */
+  if (*memory_access_type == VMAT_ELEMENTWISE
+      && single_element_p
+      && loop_vinfo
+      && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
+					     masked_p, gs_info))
+    *memory_access_type = VMAT_GATHER_SCATTER;
 
   if (*memory_access_type == VMAT_GATHER_SCATTER
       || *memory_access_type == VMAT_ELEMENTWISE)