From patchwork Wed Jul 3 13:24:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1956242 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=tK2MyEvN; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=lNkxWoMf; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=tK2MyEvN; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=lNkxWoMf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WDgXq016Sz1xpN for ; Wed, 3 Jul 2024 23:26:34 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3BEF53860C3D for ; Wed, 3 Jul 2024 13:26:33 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 4811D386481A for ; Wed, 3 Jul 2024 13:24:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4811D386481A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4811D386481A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720013049; cv=none; b=GxAuAnhNsk2CyA6twum96/MXPfpKqD9l5I07brP/cES5yttAgrF1zN0BT2eJFnOQgUS8C+Hqx/DxzlWUxEFZwFKGeeVWhh0o9D+BYejRT+LcE+fwR2KdAe3xTHXrk7HMLeBhEgeuvd8H0BlLC5Pk2j4fXbUItEarzl74iHNrc2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720013049; c=relaxed/simple; bh=A706mkrVisk7yEJvKPYx4ZZSaVFTN1MUEAINNchYeDI=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=giXQz3QrZJFW2cx3p7kDC4ZWpkMCbE61/8jbfuTD8Hws5tE1TO0ZbrCvOhfgbtVVWt4371morYb/L3wOi8IRaFzUp9fDKP9JF+t5uNZpbCn1f4cAGb5JZyO8AibvGc7H8nZquExp/ysSMt/+xfxz3s7N7TwZsyMsMNila4mpOw4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4E53321BD0 for ; Wed, 3 Jul 2024 13:24:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720013045; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=6oH0DkDdjmUXLT7dbd9OVpJFB6nISzoCiGz1xAflXzg=; b=tK2MyEvNRgzF016a06/v7MH/v3pDVb0Pa5/27czrz3GpzfW5xsm4KCDjhmCqysTTACJqwH pjXXIcmDydnWi75XEm9rw4Iqa7SYZTIFyFbTknS7ZrSkHCbbMV6G4Emx8uK8i7lLvG5bMm qZndxTK7v1TiHE42/7ydBwZmzCly7sc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720013045; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=6oH0DkDdjmUXLT7dbd9OVpJFB6nISzoCiGz1xAflXzg=; b=lNkxWoMfeZa6G+1s9ivPoK1y4mIn9FqHD2U1R69y+A5GnqGDNUkFYri7SFrHMFHCQONj/9 HcHVU8es2i31kYAw== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720013045; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=6oH0DkDdjmUXLT7dbd9OVpJFB6nISzoCiGz1xAflXzg=; b=tK2MyEvNRgzF016a06/v7MH/v3pDVb0Pa5/27czrz3GpzfW5xsm4KCDjhmCqysTTACJqwH pjXXIcmDydnWi75XEm9rw4Iqa7SYZTIFyFbTknS7ZrSkHCbbMV6G4Emx8uK8i7lLvG5bMm qZndxTK7v1TiHE42/7ydBwZmzCly7sc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720013045; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=6oH0DkDdjmUXLT7dbd9OVpJFB6nISzoCiGz1xAflXzg=; b=lNkxWoMfeZa6G+1s9ivPoK1y4mIn9FqHD2U1R69y+A5GnqGDNUkFYri7SFrHMFHCQONj/9 HcHVU8es2i31kYAw== Date: Wed, 3 Jul 2024 15:24:05 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/5] Support group-size of three in SLP load permutation lowering MIME-Version: 1.0 X-Spamd-Result: default: False [-0.10 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_SPAM_LONG(0.70)[0.200]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[] X-Spam-Score: -0.10 X-Spam-Level: X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20240703132633.3BEF53860C3D@sourceware.org> The following adds support for group-size three in SLP load permutation lowering to match the non-SLP capabilities. This is done by using the non-interleaving fallback code which then creates at VF == 4 from { { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } } the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 } to produce { c0, c1, c2, c3 }. This turns out to be more effective than the scheme implemented for non-SLP for SSE and only slightly worse for AVX512 and a bit more worse for AVX2. It seems to me that this would extend to other non-power-of-two group-sizes though (but the patch does not). Optimal schemes are likely difficult to lay out in VF agnostic form. I'll note that while the lowering assumes even/odd extract is generally available for all vector element sizes (which is probably a good assumption), it doesn't in any way constrain the other permutes it generates based on target availability. Again difficult to do in a VF agnostic way (but at least currently the vector type is fixed). I'll also note that the SLP store side merges lanes in a way producing three-vector permutes for store group-size of three, so the testcase uses a store group-size of four. * tree-vect-slp.cc (vect_lower_load_permutations): Support group-size of three. * gcc.dg/vect/slp-52.c: New testcase. --- gcc/testsuite/gcc.dg/vect/slp-52.c | 14 ++++++++++++ gcc/tree-vect-slp.cc | 35 +++++++++++++++++------------- 2 files changed, 34 insertions(+), 15 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-52.c diff --git a/gcc/testsuite/gcc.dg/vect/slp-52.c b/gcc/testsuite/gcc.dg/vect/slp-52.c new file mode 100644 index 00000000000..ba49f0046e2 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-52.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ + +void foo (int * __restrict x, int *y) +{ + for (int i = 0; i < 1024; ++i) + { + x[4*i+0] = y[3*i+0]; + x[4*i+1] = y[3*i+1] * 2; + x[4*i+2] = y[3*i+2] + 3; + x[4*i+3] = y[3*i+2] * 2 - 5; + } +} + +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { vect_int && vect_int_mult } } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index fdefee90e92..c62b0b5cf88 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -3718,7 +3718,8 @@ vect_build_slp_instance (vec_info *vinfo, with the least number of lanes to one and then repeat until we end up with two inputs. That scheme makes sure we end up with permutes satisfying the restriction of requiring at - most two vector inputs to produce a single vector output. */ + most two vector inputs to produce a single vector output + when the number of lanes is even. */ while (SLP_TREE_CHILDREN (perm).length () > 2) { /* Pick the two nodes with the least number of lanes, @@ -3995,11 +3996,10 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]); /* Only a power-of-two number of lanes matches interleaving with N levels. - The non-SLP path also supports DR_GROUP_SIZE == 3. ??? An even number of lanes could be reduced to 1<= group_lanes / 2) + if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2) continue; /* First build (and possibly re-use) a load node for the @@ -4052,7 +4052,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, while (1) { unsigned group_lanes = SLP_TREE_LANES (l0); - if (SLP_TREE_LANES (load) >= group_lanes / 2) + if (SLP_TREE_LANES (load) >= (group_lanes + 1) / 2) break; /* Try to lower by reducing the group to half its size using an @@ -4062,19 +4062,24 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, Thus { e, e, o, o, e, e, o, o } woud be an even/odd decomposition with N == 2. */ /* ??? Only an even number of lanes can be handed this way, but the - fallback below could work for any number. */ - gcc_assert ((group_lanes & 1) == 0); - unsigned even = (1 << ceil_log2 (group_lanes)) - 1; - unsigned odd = even; - for (auto l : final_perm) + fallback below could work for any number. We have to make sure + to round up in that case. */ + gcc_assert ((group_lanes & 1) == 0 || group_lanes == 3); + unsigned even = 0, odd = 0; + if ((group_lanes & 1) == 0) { - even &= ~l.second; - odd &= l.second; + even = (1 << ceil_log2 (group_lanes)) - 1; + odd = even; + for (auto l : final_perm) + { + even &= ~l.second; + odd &= l.second; + } } /* Now build an even or odd extraction from the unpermuted load. */ lane_permutation_t perm; - perm.create (group_lanes / 2); + perm.create ((group_lanes + 1) / 2); unsigned level; if (even && ((level = 1 << ctz_hwi (even)), true) @@ -4109,7 +4114,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, bitmap_iterator bi; EXECUTE_IF_SET_IN_BITMAP (l, 0, i, bi) perm.quick_push (std::make_pair (0, i)); - while (perm.length () < group_lanes / 2) + while (perm.length () < (group_lanes + 1) / 2) perm.quick_push (perm.last ()); } @@ -4145,7 +4150,7 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, have a "local" CSE map here. */ SLP_TREE_SCALAR_STMTS (p) = perm_stmts; - /* We now have a node for group_lanes / 2 lanes. */ + /* We now have a node for (group_lanes + 1) / 2 lanes. */ l0 = p; }