From patchwork Mon Jun  3 09:52:52 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 1942794
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256
 header.s=susede2_rsa header.b=0fgP+NHr;
	dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256
 header.s=susede2_ed25519 header.b=xPN94rZe;
	dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.a=rsa-sha256 header.s=susede2_rsa header.b=0fgP+NHr;
	dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256
 header.s=susede2_ed25519 header.b=xPN94rZe;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vt8DV2SWJz1ydW
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jun 2024 19:53:14 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 56CD6396E01F
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jun 2024 09:53:12 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtp-out1.suse.de (smtp-out1.suse.de
 [IPv6:2a07:de40:b251:101:10:150:64:1])
 by sourceware.org (Postfix) with ESMTPS id 35D63396E01F
 for <gcc-patches@gcc.gnu.org>; Mon,  3 Jun 2024 09:52:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 35D63396E01F
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 35D63396E01F
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717408374; cv=none;
 b=pYTwQTwc22b0UufZ8yUXQI00WrJ4zxflq/3JYfmVrvdAfnjrO9h7uQsGlVdjbP+wymoLRpvpPiu4g5d8rU6zWIrqcGmNe2+1vxD98xVm5BOrWPfEWdzp7XjJ7n+RyBOhlQ3dr8I6H+1yQnQwPVbW3sHqt0es2ynw901LNEem8X8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1717408374; c=relaxed/simple;
 bh=pF4aWPH44xDr3sOExmOQgbOS+iJQtZeXAsWUqlRowyk=;
 h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
 From:To:Subject:MIME-Version;
 b=sFyk/uQcG/H2dn9OalLMex2AS9bTAARtRVDNtLvcFuEbMdUNTj9M5txPX/tjRCWRHQTEnsEwD6n3Ii0XuevqYPMm5B+WkZ2cnpPuvg5OrSCv7p7F6/dJHOxPX1YXkW6d2FLRs9dbOzcqRcYZcYbZ2qnYodxh/6/awSb+L8cLTIo=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from murzim.nue2.suse.org (unknown [10.168.4.243])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-out1.suse.de (Postfix) with ESMTPS id 29DAE22234;
 Mon,  3 Jun 2024 09:52:52 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
 t=1717408372;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=WiP+nrgsl+gDbbxucvqdyBOHRZ4Z3xQwE0icupz1oOE=;
 b=0fgP+NHrq89+GcYFzrgXvhHKK+nYI9zza9gzftqYx1BnMPkyB24G7Gr6KbV+gr2Ck5c8JY
 7xO0tHmsLkQavW/C/cfJShQt9XOZfWI29N/dMCPBzxQiACXpECJOEeeqdLCQhGC81yImiu
 CHjc8URNcc/loaGTwumObTZOkfE1oFA=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1717408372;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=WiP+nrgsl+gDbbxucvqdyBOHRZ4Z3xQwE0icupz1oOE=;
 b=xPN94rZeaHrEtEO3sWkPz1W+bY8HsXdFOE86YZxl0O6fELOJAZtaxiB15UIabJFHT/ude6
 L0wijzrxLpd3DbAw==
Authentication-Results: smtp-out1.suse.de;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
 t=1717408372;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=WiP+nrgsl+gDbbxucvqdyBOHRZ4Z3xQwE0icupz1oOE=;
 b=0fgP+NHrq89+GcYFzrgXvhHKK+nYI9zza9gzftqYx1BnMPkyB24G7Gr6KbV+gr2Ck5c8JY
 7xO0tHmsLkQavW/C/cfJShQt9XOZfWI29N/dMCPBzxQiACXpECJOEeeqdLCQhGC81yImiu
 CHjc8URNcc/loaGTwumObTZOkfE1oFA=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1717408372;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=WiP+nrgsl+gDbbxucvqdyBOHRZ4Z3xQwE0icupz1oOE=;
 b=xPN94rZeaHrEtEO3sWkPz1W+bY8HsXdFOE86YZxl0O6fELOJAZtaxiB15UIabJFHT/ude6
 L0wijzrxLpd3DbAw==
Date: Mon, 3 Jun 2024 11:52:52 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
cc: richard.sandiford@arm.com
Subject: [PATCH 1/2][final] Avoid inserting after a GIMPLE_COND with SLP and
 early break
MIME-Version: 1.0
X-Spam-Score: -1.80
X-Spam-Level: 
X-Spamd-Result: default: False [-1.80 / 50.00]; BAYES_HAM(-3.00)[100.00%];
 MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000];
 NEURAL_HAM_SHORT(-0.20)[-0.980]; MIME_GOOD(-0.10)[text/plain];
 FROM_HAS_DN(0.00)[]; RCVD_COUNT_ZERO(0.00)[0];
 MISSING_XM_UA(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2];
 DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com];
 FROM_EQ_ENVFROM(0.00)[]; TO_DN_NONE(0.00)[];
 MIME_TRACE(0.00)[0:+]
X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Message-Id: <20240603095312.56CD6396E01F@sourceware.org>

When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

	* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
	loops make sure to not advance the insertion iterator
	beyond a GIMPLE_COND.
---
 gcc/tree-vect-slp.cc | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bf1f467f53f..11ec82086fc 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9650,7 +9650,12 @@ vect_schedule_slp_node (vec_info *vinfo,
       else
 	{
 	  si = gsi_for_stmt (last_stmt);
-	  gsi_next (&si);
+	  /* When we're getting gsi_after_labels from the starting
+	     condition of a fully masked/len loop avoid insertion
+	     after a GIMPLE_COND that can appear as the only header
+	     stmt with early break vectorization.  */
+	  if (gimple_code (last_stmt) != GIMPLE_COND)
+	    gsi_next (&si);
 	}
     }
 

From patchwork Mon Jun  3 09:57:35 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 1942795
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256
 header.s=susede2_rsa header.b=hzveNfAm;
	dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256
 header.s=susede2_ed25519 header.b=oPMrbA7E;
	dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de
 header.a=rsa-sha256 header.s=susede2_rsa header.b=hzveNfAm;
	dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256
 header.s=susede2_ed25519 header.b=oPMrbA7E;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vt8L15CT9z20Pb
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jun 2024 19:58:01 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DF845396E42D
	for <incoming@patchwork.ozlabs.org>; Mon,  3 Jun 2024 09:57:59 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2a07:de40:b251:101:10:150:64:2])
 by sourceware.org (Postfix) with ESMTPS id 14022395B820
 for <gcc-patches@gcc.gnu.org>; Mon,  3 Jun 2024 09:57:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 14022395B820
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=suse.de
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 14022395B820
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717408659; cv=none;
 b=OKa8ZF8Rt3H5omeydrrvHhmppvoKCKrJWhTFQz9wuzjcwPFUAfl+kgXmoG+PaZtfVMErK8Eb9d2Uyt7FTWvbT+2BfMwK+ER6AzLmBzQu7He9a9Q4MJTl7LpvTAhVMqPg+MWxQwzc84ennbiZpdvJDXlQVbTAIkCXFoJpbTnoFCY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1717408659; c=relaxed/simple;
 bh=vLmGny6I/3trgSuCRrrpdt3w3S5vc1zzRzzUlK1z26w=;
 h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date:
 From:To:Subject:MIME-Version;
 b=Q+4GeCoHpVyOCnSEkdo7IHEi9iBSY+9plXC7AP5HnuNtQKbh9Nsst0+k+X2whPth/ZHryfy2lpBRXHRRLMryrpbIsrBdYnmBM7rlPqf6j2b36rYs6udCCEJlrVGTzp+IEv/aNMuOChV/vHC++87CKfonIRO0tjBoUsbjs+EhJaE=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from murzim.nue2.suse.org (unknown [10.168.4.243])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-out2.suse.de (Postfix) with ESMTPS id 080722002A;
 Mon,  3 Jun 2024 09:57:36 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
 t=1717408656;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=zEZmuxcuXyZ09sQrofVrhd7msh6Qbzq8ULEnAFyLarw=;
 b=hzveNfAmhbr2lPJa/Ynv5nKFIVoNJwz76YotvU4Gf3hsJcIzKRXTgnzmygz2SE1UtAqsIW
 n4lU2K88ZVN0fPZzQh/KPQhaEPY2RSy+lLDxS4kln7+pieIVx+txG6YxKRvbZvgx2a7wr5
 xW+m1p4PgnUWhdlzW0Ymfh7cDMGmYvQ=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1717408656;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=zEZmuxcuXyZ09sQrofVrhd7msh6Qbzq8ULEnAFyLarw=;
 b=oPMrbA7EfeEdb2jA1W6fYsKaKGDOO4UiJGFQnyQAUclUyF/3SB/OhkpdTCqVJePcmRq5T7
 Owc1kekVstAZlMAg==
Authentication-Results: smtp-out2.suse.de;
	none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_rsa;
 t=1717408656;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=zEZmuxcuXyZ09sQrofVrhd7msh6Qbzq8ULEnAFyLarw=;
 b=hzveNfAmhbr2lPJa/Ynv5nKFIVoNJwz76YotvU4Gf3hsJcIzKRXTgnzmygz2SE1UtAqsIW
 n4lU2K88ZVN0fPZzQh/KPQhaEPY2RSy+lLDxS4kln7+pieIVx+txG6YxKRvbZvgx2a7wr5
 xW+m1p4PgnUWhdlzW0Ymfh7cDMGmYvQ=
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de;
 s=susede2_ed25519; t=1717408656;
 h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=zEZmuxcuXyZ09sQrofVrhd7msh6Qbzq8ULEnAFyLarw=;
 b=oPMrbA7EfeEdb2jA1W6fYsKaKGDOO4UiJGFQnyQAUclUyF/3SB/OhkpdTCqVJePcmRq5T7
 Owc1kekVstAZlMAg==
Date: Mon, 3 Jun 2024 11:57:35 +0200 (CEST)
From: Richard Biener <rguenther@suse.de>
To: gcc-patches@gcc.gnu.org
cc: richard.sandiford@arm.com
Subject: [PATCH 2/2][final] RISC-V: Do single-lane SLP discovery for
 reductions
MIME-Version: 1.0
X-Spam-Score: -1.79
X-Spam-Level: 
X-Spamd-Result: default: False [-1.79 / 50.00]; BAYES_HAM(-3.00)[100.00%];
 MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-1.00)[-1.000];
 NEURAL_HAM_SHORT(-0.20)[-0.975]; MIME_GOOD(-0.10)[text/plain];
 FROM_HAS_DN(0.00)[]; RCVD_COUNT_ZERO(0.00)[0];
 MISSING_XM_UA(0.00)[]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2];
 DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519];
 TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com];
 FROM_EQ_ENVFROM(0.00)[]; TO_DN_NONE(0.00)[];
 MIME_TRACE(0.00)[0:+]
X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Message-Id: <20240603095759.DF845396E42D@sourceware.org>

The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.

For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.

The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

I've decided to adjust three testcases for appearing single-lane
SLP instances instead of not dumping "vectorizing stmts using SLP"
for single-lane instances as that also requires testsuite adjustments.

This is the final version of the series where the set of FAILs
caused is minimized to arm/risc-v architecture specific ones and
a few generic ones that will get resolved with merging the load
permute part.  From there it should be possible to start filling
missing pieces like generating load-lane/store-lane via SLP
patterns (or permute optimization?) and implement missing SLP
support in a few places.  After the load part is in I plan
to add a default-off --param that makes vectorization FAIL if
there's non-SLP vectorization surviving.

I plan to push this version if the CI goes through w/o surprises.

Thanks,
Richard.

	* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
	discoveries are reduction chains and need special backedge
	treatment.
	(vect_analyze_slp): Fall back to single-lane SLP discovery
	for reductions.  Make sure to try single-lane SLP reduction
	for all reductions as fallback.
	(vectorizable_load): Avoid outer loop SLP vectorization with
	multi-copy vector stmts in the inner loop.
	(vectorizable_store): Likewise.
	* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
	direct opcode and shift reduction also for SLP reductions
	with a single lane.
	* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
	check for the PR65518 single-element interleaving case as done in
	vect_grouped_load_supported.

	* gcc.dg/vect/slp-24.c: Expect another SLP instance for the
	reduction.
	* gcc.dg/vect/slp-24-big-array.c: Likewise.
	* gcc.dg/vect/slp-reduc-6.c: Remove scan for zero SLP instances.
---
 gcc/testsuite/gcc.dg/vect/slp-24-big-array.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-24.c           |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-6.c      |  1 -
 gcc/tree-vect-loop.cc                        |  4 +-
 gcc/tree-vect-slp.cc                         | 71 +++++++++++++++-----
 gcc/tree-vect-stmts.cc                       | 24 ++++++-
 6 files changed, 80 insertions(+), 24 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
index 5eaea9600ac..63f744338a1 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24-big-array.c
@@ -92,4 +92,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-24.c b/gcc/testsuite/gcc.dg/vect/slp-24.c
index 59178f2c0f2..7814d7c324e 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-24.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-24.c
@@ -78,4 +78,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && ilp32 } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { vect_no_align && ilp32 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { xfail { vect_no_align && ilp32 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
index 1fd15aa3c87..5566705a704 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-6.c
@@ -45,6 +45,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { xfail { vect_no_int_add || { ! { vect_unpack || vect_strided2 } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
 /* { dg-final { scan-tree-dump-times "different interleaving chains in one node" 1 "vect" { target { ! vect_no_int_add } } } } */
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a08357acc11..06292ed8bbe 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6504,7 +6504,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three schemes described
          above. In SLP we simply need to extract all the elements from the 
          vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_fn != IFN_LAST && !slp_reduc)
+  else if (reduc_fn != IFN_LAST && (!slp_reduc || group_size == 1))
     {
       tree tmp;
       tree vec_elem_type;
@@ -6674,7 +6674,7 @@ vect_create_epilog_for_reduction (loop_vec_info loop_vinfo,
       gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
       reduc_inputs[0] = new_temp;
 
-      if (reduce_with_shift && !slp_reduc)
+      if (reduce_with_shift && (!slp_reduc || group_size == 1))
 	{
 	  int element_bitsize = tree_to_uhwi (bitsize);
 	  /* Enforced by vectorizable_reduction, which disallows SLP reductions
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 11ec82086fc..ba1190c7155 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1911,7 +1911,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 	    /* Reduction chain backedge defs are filled manually.
 	       ???  Need a better way to identify a SLP reduction chain PHI.
 	       Or a better overall way to SLP match those.  */
-	    if (all_same && def_type == vect_reduction_def)
+	    if (stmts.length () > 1
+		&& all_same && def_type == vect_reduction_def)
 	      skip_args[loop_latch_edge (loop)->dest_idx] = true;
 	  }
 	else if (def_type != vect_internal_def)
@@ -3909,9 +3910,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 	  }
 
       /* Find SLP sequences starting from groups of reductions.  */
-      if (loop_vinfo->reductions.length () > 1)
+      if (loop_vinfo->reductions.length () > 0)
 	{
-	  /* Collect reduction statements.  */
+	  /* Collect reduction statements we can combine into
+	     a SLP reduction.  */
 	  vec<stmt_vec_info> scalar_stmts;
 	  scalar_stmts.create (loop_vinfo->reductions.length ());
 	  for (auto next_info : loop_vinfo->reductions)
@@ -3924,23 +3926,58 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size)
 		     reduction path.  In that case we'd have to reverse
 		     engineer that conversion stmt following the chain using
 		     reduc_idx and from the PHI using reduc_def.  */
-		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
-		  /* Do not discover SLP reductions for lane-reducing ops, that
-		     will fail later.  */
-		  && (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
-		      || !lane_reducing_op_p (gimple_assign_rhs_code (g))))
-		scalar_stmts.quick_push (next_info);
+		  && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+		{
+		  /* Do not discover SLP reductions combining lane-reducing
+		     ops, that will fail later.  */
+		  if (!(g = dyn_cast <gassign *> (STMT_VINFO_STMT (next_info)))
+		      || !lane_reducing_op_p (gimple_assign_rhs_code (g)))
+		    scalar_stmts.quick_push (next_info);
+		  else
+		    {
+		      /* Do SLP discovery for single-lane reductions.  */
+		      vec<stmt_vec_info> stmts;
+		      vec<stmt_vec_info> roots = vNULL;
+		      vec<tree> remain = vNULL;
+		      stmts.create (1);
+		      stmts.quick_push (next_info);
+		      vect_build_slp_instance (vinfo,
+					       slp_inst_kind_reduc_group,
+					       stmts, roots, remain,
+					       max_tree_size, &limit,
+					       bst_map, NULL);
+		    }
+		}
 	    }
-	  if (scalar_stmts.length () > 1)
+	  /* Save for re-processing on failure.  */
+	  vec<stmt_vec_info> saved_stmts = scalar_stmts.copy ();
+	  vec<stmt_vec_info> roots = vNULL;
+	  vec<tree> remain = vNULL;
+	  if (scalar_stmts.length () <= 1
+	      || !vect_build_slp_instance (loop_vinfo,
+					   slp_inst_kind_reduc_group,
+					   scalar_stmts, roots, remain,
+					   max_tree_size, &limit, bst_map,
+					   NULL))
 	    {
-	      vec<stmt_vec_info> roots = vNULL;
-	      vec<tree> remain = vNULL;
-	      vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group,
-				       scalar_stmts, roots, remain,
-				       max_tree_size, &limit, bst_map, NULL);
+	      if (scalar_stmts.length () <= 1)
+		scalar_stmts.release ();
+	      /* Do SLP discovery for single-lane reductions.  */
+	      for (auto stmt_info : saved_stmts)
+		{
+		  vec<stmt_vec_info> stmts;
+		  vec<stmt_vec_info> roots = vNULL;
+		  vec<tree> remain = vNULL;
+		  stmts.create (1);
+		  stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
+		  vect_build_slp_instance (vinfo,
+					   slp_inst_kind_reduc_group,
+					   stmts, roots, remain,
+					   max_tree_size, &limit,
+					   bst_map, NULL);
+		}
+	      saved_stmts.release ();
 	    }
-	  else
-	    scalar_stmts.release ();
 	}
     }
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 935d80f0e1b..b26cc74f417 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2160,6 +2160,23 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info,
 		}
 	      overrun_p = true;
 	    }
+
+	  /* If this is single-element interleaving with an element
+	     distance that leaves unused vector loads around punt - we
+	     at least create very sub-optimal code in that case (and
+	     blow up memory, see PR65518).  */
+	  if (loop_vinfo
+	      && *memory_access_type == VMAT_CONTIGUOUS
+	      && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ()
+	      && single_element_p
+	      && maybe_gt (group_size, TYPE_VECTOR_SUBPARTS (vectype)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "single-element interleaving not supported "
+				 "for not adjacent vector loads\n");
+	      return false;
+	    }
 	}
     }
   else
@@ -8202,7 +8219,9 @@ vectorizable_store (vec_info *vinfo,
   gcc_assert (ncopies >= 1);
 
   /* FORNOW.  This restriction should be relaxed.  */
-  if (loop && nested_in_vect_loop_p (loop, stmt_info) && ncopies > 1)
+  if (loop
+      && nested_in_vect_loop_p (loop, stmt_info)
+      && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -9945,7 +9964,8 @@ vectorizable_load (vec_info *vinfo,
   gcc_assert (ncopies >= 1);
 
   /* FORNOW. This restriction should be relaxed.  */
-  if (nested_in_vect_loop && ncopies > 1)
+  if (nested_in_vect_loop
+      && (ncopies > 1 || (slp && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1)))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,