From patchwork Fri Oct  4 10:40:51 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1992694
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XKlVB2f45z1xt7
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 20:42:10 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id DF4603844764
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 10:42:07 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 91CF5385DDFC
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Oct 2024 10:41:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 91CF5385DDFC
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 91CF5385DDFC
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728038497; cv=none;
 b=UfNQsGzGS1wBrUuWnK0yhiNRowLdPSWBK1vt3soqQ2kYIQuy9BtAIDxcu3GVUb//NfrEIStOPktGdh7WjFjfIZPSEwaDVaZIeMkK73gjdeiymuiC+vnntdfED0an6lh9QXmHdJE7FvvnZnwA2lL9cEWjbWsXqSpADD2xhShtFlw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728038497; c=relaxed/simple;
 bh=clvvjOI878QY8m5DULswV2kdeMNWmTddDh1Xjx4/Kxg=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=SgABxjCZ7OhKT84vOIfa2Tcw8S1Ai7STd2FxY5tOu2DbJv8Jo0phvv/iQVX4EuWWK4dJifigQ/Cste+aGJ/WaY2zW2NuFGTnoIcV6SXu9TPQD1fp4BdzFTVYMShYqEIHQ3hAZssXBsQgWIy4d2/UtEQ3ItNFdyb/sYLehQDPweI=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D8FFC497;
 Fri,  4 Oct 2024 03:42:05 -0700 (PDT)
Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com
 [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8C80E3F640;
 Fri,  4 Oct 2024 03:41:35 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: rguenther@suse.de,
	tamar.christina@arm.com,
	gcc-patches@gcc.gnu.org
Cc: Richard Sandiford <richard.sandiford@arm.com>
Subject: [PATCH 1/4] vect: Variable lane indices in
 vectorizable_slp_permutation_1
Date: Fri,  4 Oct 2024 11:40:51 +0100
Message-Id: <20241004104054.2653382-2-richard.sandiford@arm.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20241004104054.2653382-1-richard.sandiford@arm.com>
References: <20241004104054.2653382-1-richard.sandiford@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-18.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_NUMSUBJECT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

The main patch for PR116583 needs to create variable indices into
an input vector.  This pre-patch changes the types to allow that.

There is no pretty-print format for poly_uint64 because of issues
with passing C++ objects through "...".

gcc/
	PR tree-optimization/116583
	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Using
	poly_uint64 for scalar lane indices.
---
 gcc/tree-vect-slp.cc | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 482b9d50496..7aeda69f447 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10296,8 +10296,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
      from the { SLP operand, scalar lane } permutation as recorded in the
      SLP node as intermediate step.  This part should already work
      with SLP children with arbitrary number of lanes.  */
-  auto_vec<std::pair<std::pair<unsigned, unsigned>, unsigned> > vperm;
-  auto_vec<unsigned> active_lane;
+  auto_vec<std::pair<std::pair<unsigned, unsigned>, poly_uint64>> vperm;
+  auto_vec<poly_uint64> active_lane;
   vperm.create (olanes);
   active_lane.safe_grow_cleared (children.length (), true);
   for (unsigned i = 0; i < ncopies; ++i)
@@ -10312,8 +10312,9 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 	    {
 	      /* We checked above that the vectors are constant-length.  */
 	      unsigned vnunits = TYPE_VECTOR_SUBPARTS (vtype).to_constant ();
-	      unsigned vi = (active_lane[p.first] + p.second) / vnunits;
-	      unsigned vl = (active_lane[p.first] + p.second) % vnunits;
+	      unsigned lane = active_lane[p.first].to_constant ();
+	      unsigned vi = (lane + p.second) / vnunits;
+	      unsigned vl = (lane + p.second) % vnunits;
 	      vperm.quick_push ({{p.first, vi}, vl});
 	    }
 	}
@@ -10339,9 +10340,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 		  ? multiple_p (i, npatterns)
 		  : multiple_p (i, TYPE_VECTOR_SUBPARTS (vectype))))
 	    dump_printf (MSG_NOTE, ",");
-	  dump_printf (MSG_NOTE, " vops%u[%u][%u]",
-		       vperm[i].first.first, vperm[i].first.second,
-		       vperm[i].second);
+	  dump_printf (MSG_NOTE, " vops%u[%u][",
+		       vperm[i].first.first, vperm[i].first.second);
+	  dump_dec (MSG_NOTE, vperm[i].second);
+	  dump_printf (MSG_NOTE, "]");
 	}
       dump_printf (MSG_NOTE, "\n");
     }

From patchwork Fri Oct  4 10:40:52 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1992697
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XKlWQ4stRz1xt7
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 20:43:14 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 8058F386D61C
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 10:43:12 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id A44A0385E45A
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Oct 2024 10:41:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A44A0385E45A
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A44A0385E45A
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728038499; cv=none;
 b=jNL6IbG0yIw1lu/6CxWb81R6tapeOshodIEyMIlXlGShjgMcfWFr7ZRW+4ilKiGrSEPFUiFKTovWKG/h2O6l96dXaKfjOCl4ObP9HeGdaPcIzdTPysjR8xYRIntEXXSoGpQXD5Ne7wPyUKW+XYcJ3Dj01JQ0s2kxRghNSYLasOc=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728038499; c=relaxed/simple;
 bh=HAFVfg6if4YF+Q77MKVgzkjX3J/UdEwDkeV2L4TKSzI=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=IPXBqfbDIHHlGfC3JhXAtCtA5hUK9agGmN6FdDZmGV4/o99QVBq8zX/8yQgLP97UUPDv+8lws936+LByy9MRNYh/xePe9cLSyxhRRs5wBAMH9fInBHrEB8Vvo7g+mIy424dqhZ6pC4vHuxbKPnSqicPNdO4/pkL9XnE4qGKTm+U=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E772F1063;
 Fri,  4 Oct 2024 03:42:06 -0700 (PDT)
Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com
 [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9B2253F640;
 Fri,  4 Oct 2024 03:41:36 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: rguenther@suse.de,
	tamar.christina@arm.com,
	gcc-patches@gcc.gnu.org
Cc: Richard Sandiford <richard.sandiford@arm.com>
Subject: [PATCH 2/4] vect: Restructure repeating_p case for SLP permutations
Date: Fri,  4 Oct 2024 11:40:52 +0100
Message-Id: <20241004104054.2653382-3-richard.sandiford@arm.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20241004104054.2653382-1-richard.sandiford@arm.com>
References: <20241004104054.2653382-1-richard.sandiford@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-18.8 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

The repeating_p case previously handled the specific situation
in which the inputs have N lanes and the output has N lanes,
where N divides the number of vector elements.  In that case,
every output uses the same permute vector.

The code was therefore structured so that the outer loop only
constructed one permute vector, with an inner loop generating
as many VEC_PERM_EXPRs from it as required.

However, the main patch for PR116583 adds support for cycling
through N permute vectors, rather than just having one.
The current structure doesn't really handle that case well.
(We'd need to interleave the results after generating them,
which sounds a bit fragile.)

This patch instead makes the transform phase calculate each output
vector's permutation explicitly, like for the !repeating_p path.
As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS.

This arguably undermines one of the justifications for using repeating_p
for constant-length vectors: that the repeating_p path involved less
work than the !repeating_p path.  That justification does still hold for
the analysis phase, though, and that should be the more time-sensitive
part.  And the other justification -- to get more coverage of the code --
still applies.  So I'd prefer that we continue to use repeating_p for
constant-length vectors unless that causes a known missed optimisation.

gcc/
	PR tree-optimization/116583
	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
	the noutputs_per_mask inner loop and instead generate a
	separate permute vector for each output.
---
 gcc/tree-vect-slp.cc | 75 ++++++++++++++++++++++++--------------------
 1 file changed, 41 insertions(+), 34 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7aeda69f447..470128ea775 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10243,26 +10243,33 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
       return 1;
     }
 
-  /* REPEATING_P is true if every output vector is guaranteed to use the
-     same permute vector.  We can handle that case for both variable-length
-     and constant-length vectors, but we only handle other cases for
-     constant-length vectors.
+  /* Set REPEATING_P to true if every output uses the same permute vector
+     and if we can generate the vectors in a vector-length agnostic way.
+
+     When REPEATING_P is true, NOUTPUTS holds the total number of outputs
+     that we actually need to generate.  */
+  uint64_t noutputs = 0;
+  loop_vec_info linfo = dyn_cast <loop_vec_info> (vinfo);
+  if (!linfo
+      || !constant_multiple_p (LOOP_VINFO_VECT_FACTOR (linfo)
+			       * SLP_TREE_LANES (node), nunits, &noutputs))
+    repeating_p = false;
+
+  /* We can handle the conditions described for REPEATING_P above for
+     both variable- and constant-length vectors.  The fallback requires
+     us to generate every element of every permute vector explicitly,
+     which is only possible for constant-length permute vectors.
 
      Set:
 
      - NPATTERNS and NELTS_PER_PATTERN to the encoding of the permute
-       mask vector that we want to build.
+       mask vectors that we want to build.
 
      - NCOPIES to the number of copies of PERM that we need in order
-       to build the necessary permute mask vectors.
-
-     - NOUTPUTS_PER_MASK to the number of output vectors we want to create
-       for each permute mask vector.  This is only relevant when GSI is
-       nonnull.  */
+       to build the necessary permute mask vectors.  */
   uint64_t npatterns;
   unsigned nelts_per_pattern;
   uint64_t ncopies;
-  unsigned noutputs_per_mask;
   if (repeating_p)
     {
       /* We need a single permute mask vector that has the form:
@@ -10274,7 +10281,6 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 	 that we use for permutes requires 3n elements.  */
       npatterns = SLP_TREE_LANES (node);
       nelts_per_pattern = ncopies = 3;
-      noutputs_per_mask = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
     }
   else
     {
@@ -10284,10 +10290,8 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 	  || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ())
 	return -1;
       nelts_per_pattern = ncopies = 1;
-      if (loop_vec_info linfo = dyn_cast <loop_vec_info> (vinfo))
-	if (!LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
-	  return -1;
-      noutputs_per_mask = 1;
+      if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
+	return -1;
     }
   unsigned olanes = ncopies * SLP_TREE_LANES (node);
   gcc_assert (repeating_p || multiple_p (olanes, nunits));
@@ -10364,16 +10368,24 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
   mask.quick_grow (count);
   vec_perm_indices indices;
   unsigned nperms = 0;
-  for (unsigned i = 0; i < vperm.length (); ++i)
-    {
-      mask_element = vperm[i].second;
+  /* When REPEATING_P is true, we only have one unique permute vector
+     to check during analysis, but we need to generate NOUTPUTS vectors
+     during transformation.  */
+  unsigned total_nelts = olanes;
+  if (repeating_p && gsi)
+    total_nelts *= noutputs;
+  for (unsigned i = 0; i < total_nelts; ++i)
+    {
+      unsigned vi = i / olanes;
+      unsigned ei = i % olanes;
+      mask_element = vperm[ei].second;
       if (first_vec.first == -1U
-	  || first_vec == vperm[i].first)
-	first_vec = vperm[i].first;
+	  || first_vec == vperm[ei].first)
+	first_vec = vperm[ei].first;
       else if (second_vec.first == -1U
-	       || second_vec == vperm[i].first)
+	       || second_vec == vperm[ei].first)
 	{
-	  second_vec = vperm[i].first;
+	  second_vec = vperm[ei].first;
 	  mask_element += nunits;
 	}
       else
@@ -10437,17 +10449,12 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 	      if (!identity_p)
 		mask_vec = vect_gen_perm_mask_checked (vectype, indices);
 
-	      for (unsigned int vi = 0; vi < noutputs_per_mask; ++vi)
-		{
-		  tree first_def
-		    = vect_get_slp_vect_def (first_node,
-					     first_vec.second + vi);
-		  tree second_def
-		    = vect_get_slp_vect_def (second_node,
-					     second_vec.second + vi);
-		  vect_add_slp_permutation (vinfo, gsi, node, first_def,
-					    second_def, mask_vec, mask[0]);
-		}
+	      tree first_def
+		= vect_get_slp_vect_def (first_node, first_vec.second + vi);
+	      tree second_def
+		= vect_get_slp_vect_def (second_node, second_vec.second + vi);
+	      vect_add_slp_permutation (vinfo, gsi, node, first_def,
+					second_def, mask_vec, mask[0]);
 	    }
 
 	  index = 0;

From patchwork Fri Oct  4 10:40:53 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1992698
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XKlWc6vNZz1xt7
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 20:43:24 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 2DEF8386D60B
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 10:43:23 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id D5A88385E45D
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Oct 2024 10:41:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5A88385E45D
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D5A88385E45D
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728038501; cv=none;
 b=joVGaFg4Vol/uRDbujn3hqEiRHknecXaqtHZXZtXbkARy0Ts1+bJhEOxFrDPCSbvpaSHnPDNtwHf60b9enYTKodOVipE0AnTTGbIcep+vfiyh7LzVNfYiWP6TMFteh4lvwsXHVOsYPDVWrtm3g8nJYuqo9AOdTnFIPto7Yikzh0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728038501; c=relaxed/simple;
 bh=HfzqcuvYhpUD0yaQehFleFcYU6LXmOmfKEIqncYXSrI=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=GT/xgcMaZ7GjLCVdVXpp9zHFq9EBK1TWK2Wv9OJygQu7VziexptgaQfyiM77+blF+Ep+RgEHb3Z4pI/QFllKCrev4pWaeCT2vjuacRRURVxzdcwE0dU/5QdQKdp/rJrVQu9eFEovvgzlJOrqfd4tbOETP/EpzVYBH2w01xJZtTk=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2AE57339;
 Fri,  4 Oct 2024 03:42:08 -0700 (PDT)
Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com
 [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A9DA33F640;
 Fri,  4 Oct 2024 03:41:37 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: rguenther@suse.de,
	tamar.christina@arm.com,
	gcc-patches@gcc.gnu.org
Cc: Richard Sandiford <richard.sandiford@arm.com>
Subject: [PATCH 3/4] vect: Support more VLA SLP permutations
Date: Fri,  4 Oct 2024 11:40:53 +0100
Message-Id: <20241004104054.2653382-4-richard.sandiford@arm.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20241004104054.2653382-1-richard.sandiford@arm.com>
References: <20241004104054.2653382-1-richard.sandiford@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-18.8 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

This is the main patch for PR116583.  Previously, we only
supported VLA SLP permutations for which the output and inputs
have the same number of lanes, and for which that number of
lanes divides the number of vector elements.

The patch extends this to handle:

(1) "packs" of a single 2N-vector input into an N-vector output
(2) "unpacks" of N-vector inputs into an XN-vector output

Hopefully the comments in the code explain the approach.

The contents of the:

  for (unsigned i = 0; i < ncopies; ++i)

loop do not change; the patch simply adds an outer loop around it.

The patch removes the XFAIL in slp-13.c and also improves
the SVE vect.exp results with vect-force-slp=1.  I haven't
added new tests specifically for this, since presumably the
existing ones will cover it once the SLP switch is flipped.

gcc/
	PR tree-optimization/PR116583
	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Handle
	variable-length pack and unpack permutations.

gcc/testsuite/
	PR tree-optimization/PR116583
	* gcc.dg/vect/slp-13.c: Remove xfail for vect_variable_length.
	* gcc.dg/vect/slp-13-big-array.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/slp-13-big-array.c |   2 +-
 gcc/testsuite/gcc.dg/vect/slp-13.c           |   2 +-
 gcc/tree-vect-slp.cc                         | 107 ++++++++++++++-----
 3 files changed, 82 insertions(+), 29 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
index ca70856c1dd..e45f8aab133 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
@@ -137,4 +137,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-13.c b/gcc/testsuite/gcc.dg/vect/slp-13.c
index b7f947e6dbe..d6346aef978 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-13.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-13.c
@@ -131,4 +131,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 470128ea775..66f5906ebb9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10194,6 +10194,13 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
   unsigned i;
   poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   bool repeating_p = multiple_p (nunits, SLP_TREE_LANES (node));
+  /* True if we're permuting a single input of 2N vectors down
+     to N vectors.  This case doesn't generalize beyond 2 since
+     VEC_PERM_EXPR only takes 2 inputs.  */
+  bool pack_p = false;
+  /* If we're permuting inputs of N vectors each into X*N outputs,
+     this is the value of X, otherwise it is 1.  */
+  unsigned int unpack_factor = 1;
   tree op_vectype = NULL_TREE;
   FOR_EACH_VEC_ELT (children, i, child)
     if (SLP_TREE_VECTYPE (child))
@@ -10215,7 +10222,20 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 			     "Unsupported vector types in lane permutation\n");
 	  return -1;
 	}
-      if (SLP_TREE_LANES (child) != SLP_TREE_LANES (node))
+      auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype);
+      unsigned int this_unpack_factor;
+      /* Check whether the input has twice as many lanes per vector.  */
+      if (children.length () == 1
+	  && known_eq (SLP_TREE_LANES (child) * nunits,
+		       SLP_TREE_LANES (node) * op_nunits * 2))
+	pack_p = true;
+      /* Check whether the output has N times as many lanes per vector.  */
+      else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,
+				    SLP_TREE_LANES (child) * nunits,
+				    &this_unpack_factor)
+	       && (i == 0 || unpack_factor == this_unpack_factor))
+	unpack_factor = this_unpack_factor;
+      else
 	repeating_p = false;
     }
 
@@ -10243,14 +10263,25 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
       return 1;
     }
 
-  /* Set REPEATING_P to true if every output uses the same permute vector
+  /* Set REPEATING_P to true if the permutations are cylical wrt UNPACK_FACTOR
      and if we can generate the vectors in a vector-length agnostic way.
+     This requires UNPACK_STEP == NUNITS / UNPACK_FACTOR to be known at
+     compile time.
+
+     The significance of UNPACK_STEP is that, when PACK_P is false,
+     output vector I operates on a window of UNPACK_STEP elements from each
+     input, starting at lane UNPACK_STEP * (I % UNPACK_FACTOR).  For example,
+     when UNPACK_FACTOR is 2, the first output vector operates on lanes
+     [0, NUNITS / 2 - 1] of each input vector and the second output vector
+     operates on lanes [NUNITS / 2, NUNITS - 1] of each input vector.
 
      When REPEATING_P is true, NOUTPUTS holds the total number of outputs
      that we actually need to generate.  */
   uint64_t noutputs = 0;
+  poly_uint64 unpack_step = 0;
   loop_vec_info linfo = dyn_cast <loop_vec_info> (vinfo);
   if (!linfo
+      || !multiple_p (nunits, unpack_factor, &unpack_step)
       || !constant_multiple_p (LOOP_VINFO_VECT_FACTOR (linfo)
 			       * SLP_TREE_LANES (node), nunits, &noutputs))
     repeating_p = false;
@@ -10272,7 +10303,7 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
   uint64_t ncopies;
   if (repeating_p)
     {
-      /* We need a single permute mask vector that has the form:
+      /* We need permute mask vectors that have the form:
 
 	   { X1, ..., Xn, X1 + n, ..., Xn + n, X1 + 2n, ..., Xn + 2n, ... }
 
@@ -10292,8 +10323,10 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
       nelts_per_pattern = ncopies = 1;
       if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
 	return -1;
+      pack_p = false;
+      unpack_factor = 1;
     }
-  unsigned olanes = ncopies * SLP_TREE_LANES (node);
+  unsigned olanes = unpack_factor * ncopies * SLP_TREE_LANES (node);
   gcc_assert (repeating_p || multiple_p (olanes, nunits));
 
   /* Compute the { { SLP operand, vector index}, lane } permutation sequence
@@ -10304,27 +10337,34 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
   auto_vec<poly_uint64> active_lane;
   vperm.create (olanes);
   active_lane.safe_grow_cleared (children.length (), true);
-  for (unsigned i = 0; i < ncopies; ++i)
+  for (unsigned int ui = 0; ui < unpack_factor; ++ui)
     {
-      for (unsigned pi = 0; pi < perm.length (); ++pi)
+      for (unsigned j = 0; j < children.length (); ++j)
+	active_lane[j] = ui * unpack_step;
+      for (unsigned i = 0; i < ncopies; ++i)
 	{
-	  std::pair<unsigned, unsigned> p = perm[pi];
-	  tree vtype = SLP_TREE_VECTYPE (children[p.first]);
-	  if (repeating_p)
-	    vperm.quick_push ({{p.first, 0}, p.second + active_lane[p.first]});
-	  else
+	  for (unsigned pi = 0; pi < perm.length (); ++pi)
 	    {
-	      /* We checked above that the vectors are constant-length.  */
-	      unsigned vnunits = TYPE_VECTOR_SUBPARTS (vtype).to_constant ();
-	      unsigned lane = active_lane[p.first].to_constant ();
-	      unsigned vi = (lane + p.second) / vnunits;
-	      unsigned vl = (lane + p.second) % vnunits;
-	      vperm.quick_push ({{p.first, vi}, vl});
+	      std::pair<unsigned, unsigned> p = perm[pi];
+	      tree vtype = SLP_TREE_VECTYPE (children[p.first]);
+	      if (repeating_p)
+		vperm.quick_push ({{p.first, 0},
+				   p.second + active_lane[p.first]});
+	      else
+		{
+		  /* We checked above that the vectors are constant-length.  */
+		  unsigned vnunits = TYPE_VECTOR_SUBPARTS (vtype)
+		    .to_constant ();
+		  unsigned lane = active_lane[p.first].to_constant ();
+		  unsigned vi = (lane + p.second) / vnunits;
+		  unsigned vl = (lane + p.second) % vnunits;
+		  vperm.quick_push ({{p.first, vi}, vl});
+		}
 	    }
+	  /* Advance to the next group.  */
+	  for (unsigned j = 0; j < children.length (); ++j)
+	    active_lane[j] += SLP_TREE_LANES (children[j]);
 	}
-      /* Advance to the next group.  */
-      for (unsigned j = 0; j < children.length (); ++j)
-	active_lane[j] += SLP_TREE_LANES (children[j]);
     }
 
   if (dump_p)
@@ -10368,19 +10408,32 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
   mask.quick_grow (count);
   vec_perm_indices indices;
   unsigned nperms = 0;
-  /* When REPEATING_P is true, we only have one unique permute vector
-     to check during analysis, but we need to generate NOUTPUTS vectors
-     during transformation.  */
+  /* When REPEATING_P is true, we only have UNPACK_FACTOR unique permute
+     vectors to check during analysis, but we need to generate NOUTPUTS
+     vectors during transformation.  */
   unsigned total_nelts = olanes;
   if (repeating_p && gsi)
-    total_nelts *= noutputs;
+    total_nelts = (total_nelts / unpack_factor) * noutputs;
   for (unsigned i = 0; i < total_nelts; ++i)
     {
-      unsigned vi = i / olanes;
+      /* VI is the input vector index when generating code for REPEATING_P.  */
+      unsigned vi = i / olanes * (pack_p ? 2 : 1);
       unsigned ei = i % olanes;
       mask_element = vperm[ei].second;
-      if (first_vec.first == -1U
-	  || first_vec == vperm[ei].first)
+      if (pack_p)
+	{
+	  /* In this case, we have N outputs and the single child provides 2N
+	     inputs.  Output X permutes inputs 2X and 2X+1.
+
+	     The mask indices are taken directly from the SLP permutation node.
+	     Index X selects from the first vector if (X / NUNITS) % 2 == 0;
+	     X selects from the second vector otherwise.  These conditions
+	     are only known at compile time for constant-length vectors.  */
+	  first_vec = std::make_pair (0, 0);
+	  second_vec = std::make_pair (0, 1);
+	}
+      else if (first_vec.first == -1U
+	       || first_vec == vperm[ei].first)
 	first_vec = vperm[ei].first;
       else if (second_vec.first == -1U
 	       || second_vec == vperm[ei].first)

From patchwork Fri Oct  4 10:40:54 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 1992696
Return-Path: <gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4XKlVQ04lWz1xt7
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 20:42:22 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C8F483842FC5
	for <incoming@patchwork.ozlabs.org>; Fri,  4 Oct 2024 10:42:19 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id E7BA738460B4
 for <gcc-patches@gcc.gnu.org>; Fri,  4 Oct 2024 10:41:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E7BA738460B4
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E7BA738460B4
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728038501; cv=none;
 b=ZgJfr3Jewb4O2ou7UlO71EuWSvZwi4JyZv5aA74uIz2S7577kZnRqQmE1A4wdEIWk7wnGAOdj4i5gq6q8mJnig8GcSsq24JpWdUmffwIelcZiJ3iIWhAn7+bIE7UIsMufFz/ofrL9HxMqEHPnS8gVSOeptJpq5/QcITavWyG7iE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1728038501; c=relaxed/simple;
 bh=PJK392ipTLjywzzZApFVQeWdZTjgQPjY/M4x7VTd4zU=;
 h=From:To:Subject:Date:Message-Id:MIME-Version;
 b=SpQdgYw9oz6F6MVQAI7ZhF8n/tLC9Un+yQS6YrA9dXqW7hsj7ZV1i/rpriWltQIAPjW+xlPMTwyMCdKkOb0ijC7Ke6elKDLmcPsbs9A7G3zQufYmaB7DfSaCLiMmICQvxJlFoJYNeAujtjtEoo3QhWfs1HS/DHf4n+81HfCr5KI=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 39A78497;
 Fri,  4 Oct 2024 03:42:09 -0700 (PDT)
Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com
 [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E10103F640;
 Fri,  4 Oct 2024 03:41:38 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: rguenther@suse.de,
	tamar.christina@arm.com,
	gcc-patches@gcc.gnu.org
Cc: Richard Sandiford <richard.sandiford@arm.com>
Subject: [PATCH 4/4] vect: Add more dump messages for VLA SLP permutation
Date: Fri,  4 Oct 2024 11:40:54 +0100
Message-Id: <20241004104054.2653382-5-richard.sandiford@arm.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20241004104054.2653382-1-richard.sandiford@arm.com>
References: <20241004104054.2653382-1-richard.sandiford@arm.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-18.8 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE,
 SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org

Taking the !repeating_p route for VLA vectors causes analysis
to fail, but it wasn't clear from the dump files when this
had happened, and which node caused it.

gcc/
	PR tree-optimization/116583
	* tree-vect-slp.cc (vectorizable_slp_permutation_1): Add more
	dump messages.
---
 gcc/tree-vect-slp.cc | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 66f5906ebb9..56fb55cb628 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10319,10 +10319,22 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi,
 	 instead of relying on the pattern described above.  */
       if (!nunits.is_constant (&npatterns)
 	  || !TYPE_VECTOR_SUBPARTS (op_vectype).is_constant ())
-	return -1;
+	{
+	  if (dump_p)
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "unsupported permutation %p on variable-length"
+			     " vectors\n", (void *) node);
+	  return -1;
+	}
       nelts_per_pattern = ncopies = 1;
       if (linfo && !LOOP_VINFO_VECT_FACTOR (linfo).is_constant (&ncopies))
-	return -1;
+	{
+	  if (dump_p)
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "unsupported permutation %p for variable VF\n",
+			     (void *) node);
+	  return -1;
+	}
       pack_p = false;
       unpack_factor = 1;
     }