Fix fallout of peeling for gap improvements

Message ID	20240614070335.2446D13AB5@imap1.dmz-prg2.suse.org
State	New
Headers	show Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org> DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53CD43882100 Date: Fri, 14 Jun 2024 09:03:34 +0200 (CEST) From: Richard Biener <rguenther@suse.de> To: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix fallout of peeling for gap improvements MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Message-Id: <20240614070335.2446D13AB5@imap1.dmz-prg2.suse.org> default: False [-4.30 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.20)[-0.990]; MIME_GOOD(-0.10)[text/plain]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; MIME_TRACE(0.00)[0:+]; MISSING_XM_UA(0.00)[]; RCVD_TLS_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; PREVIOUSLY_DELIVERED(0.00)[gcc-patches@gcc.gnu.org]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[imap1.dmz-prg2.suse.org:helo] Precedence: list Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Series	Fix fallout of peeling for gap improvements \| expand Fix fallout of peeling for gap improvements

Message ID

20240614070335.2446D13AB5@imap1.dmz-prg2.suse.org

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 53CD43882100
Date: Fri, 14 Jun 2024 09:03:34 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Fix fallout of peeling for gap improvements
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Message-Id: <20240614070335.2446D13AB5@imap1.dmz-prg2.suse.org>
Precedence: list
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Series

Fix fallout of peeling for gap improvements | expand

Commit Message

Richard Biener June 14, 2024, 7:03 a.m. UTC

The following hopefully addresses an observed bootstrap issue on aarch64
where maybe-uninit diagnostics occur.  It also fixes bogus napkin math
from myself when I was confusing rounded up size of a single access
with rounded up size of the group accessed in a single scalar iteration.
So the following puts in a correctness check, leaving a set of peeling
for gaps as insufficient.  This could be rectified by splitting the
last load into multiple ones but I'm leaving this for a followup, better
quickly fix the reported wrong-code.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	* tree-vect-stmts.cc (get_group_load_store_type): Do not
	re-use poly-int remain but re-compute with non-poly values.
	Verify the shortened load is good enough to be covered with
	a single scalar gap iteration before accepting it.

	* gcc.dg/vect/pr115385.c: Enable AVX2 if available.
---
 gcc/testsuite/gcc.dg/vect/pr115385.c |  1 +
 gcc/tree-vect-stmts.cc               | 12 +++++++-----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c b/gcc/testsuite/gcc.dg/vect/pr115385.c
index a18cd665d7d..baea0b2473f 100644
--- a/gcc/testsuite/gcc.dg/vect/pr115385.c
+++ b/gcc/testsuite/gcc.dg/vect/pr115385.c
@@ -1,4 +1,5 @@ 
 /* { dg-require-effective-target mmap } */
+/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
 
 #include <sys/mman.h>
 #include <stdio.h>
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e32d44050e5..ca6052662a3 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2148,15 +2148,17 @@  get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 	    {
 	      /* But peeling a single scalar iteration is enough if
 		 we can use the next power-of-two sized partial
-		 access.  */
+		 access and that is sufficiently small to be covered
+		 by the single scalar iteration.  */
 	      unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size;
 	      if (!nunits.is_constant (&cnunits)
 		  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf)
-		  || ((cremain = remain.to_constant (), true)
+		  || (((cremain = group_size * cvf - gap % cnunits), true)
 		      && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits)
-		      && vector_vector_composition_type
-			   (vectype, cnunits / cpart_size,
-			    &half_vtype) == NULL_TREE))
+		      && (cremain + group_size < cpart_size
+			  || vector_vector_composition_type
+			       (vectype, cnunits / cpart_size,
+				&half_vtype) == NULL_TREE)))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

Fix fallout of peeling for gap improvements

Commit Message

Patch