From patchwork Wed Jul 3 13:23:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1956240 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=rqncHb8j; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=k0v2yNTj; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=rqncHb8j; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=k0v2yNTj; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WDgWW0GBBz1xpN for ; Wed, 3 Jul 2024 23:25:26 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 336CB386101B for ; Wed, 3 Jul 2024 13:25:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id C3830386100E for ; Wed, 3 Jul 2024 13:23:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C3830386100E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C3830386100E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720013040; cv=none; b=djxprGZ4j3SSEH/hSIv1fvESRYygWvPeM1FUDoZUmhg8dreQFfYXsr6a0aIfT+oNQ0VovcDvpQyOPLzPg57fLV0QVGsnVrWwVy9XQ/99ZS4plbrRKKVCSmaETYCykpJJdIVVtBzZUWiGZbK/Ps7YZ4LJjX0otqRwtbxC68AbR+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720013040; c=relaxed/simple; bh=+LBrBY1aRxKmWjPrdodoeeXYr5/P00jAZwcRJysdMrA=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=aiVirep1nfmjvwiumGnzEIUZDUVHpeQ+fx4P52NfnYylwlQlmyU7SjCgvaYtzrhYwOmGTdXIVG1zXL7rVQrkFddc38pT2RKk1853gEgPtGYeiaQeNTUEuE5VjKJwvzo6U8MwBSF9ZTXJywTmE/srrMTeIcDvO3TTRUwaQU1di+A= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D0AD821BD0 for ; Wed, 3 Jul 2024 13:23:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720013037; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=hiwsuPA1Rqp+UnfYnhXq/qstuzJ0jgZdnkaaZ5RVueI=; b=rqncHb8j5uOr8vSj5PlaaVj1QWkdq30rPNeitJ1/iSp2YuDXKfx80UzWafCV+aQvzMEjpQ QLXWp6Muj51MT0wZTMVXg5qKyup75BBPuOTnYuXyOjYCgMyhmPFkg504VrVYUBmCbc5qSF sN2a3ovE2iiMm4DJE6xyrXI9LBvKgWw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720013037; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=hiwsuPA1Rqp+UnfYnhXq/qstuzJ0jgZdnkaaZ5RVueI=; b=k0v2yNTjjvMeaX8j2SqilRenp1eBg5uJZ2UiPTrVe0+tXGkjWqIgv5hXVDmsmn4MX/QdJN ZyQ6/SO0BXfZG0Bg== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1720013037; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=hiwsuPA1Rqp+UnfYnhXq/qstuzJ0jgZdnkaaZ5RVueI=; b=rqncHb8j5uOr8vSj5PlaaVj1QWkdq30rPNeitJ1/iSp2YuDXKfx80UzWafCV+aQvzMEjpQ QLXWp6Muj51MT0wZTMVXg5qKyup75BBPuOTnYuXyOjYCgMyhmPFkg504VrVYUBmCbc5qSF sN2a3ovE2iiMm4DJE6xyrXI9LBvKgWw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1720013037; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=hiwsuPA1Rqp+UnfYnhXq/qstuzJ0jgZdnkaaZ5RVueI=; b=k0v2yNTjjvMeaX8j2SqilRenp1eBg5uJZ2UiPTrVe0+tXGkjWqIgv5hXVDmsmn4MX/QdJN ZyQ6/SO0BXfZG0Bg== Date: Wed, 3 Jul 2024 15:23:57 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/5] Handle gaps in SLP load permutation lowering MIME-Version: 1.0 X-Spamd-Result: default: False [-0.22 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_SPAM_LONG(0.58)[0.166]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_ONE(0.00)[1]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; MISSING_XM_UA(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FROM_HAS_DN(0.00)[] X-Spam-Score: -0.22 X-Spam-Level: X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20240703132524.336CB386101B@sourceware.org> The following adds handling of gaps by representing them with NULL entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node. The SLP discovery changes could be elided if we manually build the load node instead. * tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt. (vect_build_slp_tree_2): Likewise. Release load permutation when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's no actual permutation in that case. (vect_lower_load_permutations): Handle gaps in loads. * gcc.dg/vect/slp-51.c: New testcase. --- gcc/testsuite/gcc.dg/vect/slp-51.c | 17 +++++++++++ gcc/tree-vect-slp.cc | 49 ++++++++++++++++++------------ 2 files changed, 47 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-51.c diff --git a/gcc/testsuite/gcc.dg/vect/slp-51.c b/gcc/testsuite/gcc.dg/vect/slp-51.c new file mode 100644 index 00000000000..91ae763be30 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-51.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ + +void foo (int * __restrict x, int *y) +{ + x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); + y = __builtin_assume_aligned (y, __BIGGEST_ALIGNMENT__); + for (int i = 0; i < 1024; ++i) + { + x[4*i+0] = y[4*i+0]; + x[4*i+1] = y[4*i+2] * 2; + x[4*i+2] = y[4*i+0] + 3; + x[4*i+3] = y[4*i+2] * 2 - 5; + } +} + +/* Check we can handle SLP with gaps and an interleaving scheme. */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { vect_int && vect_int_mult } } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 6f3822af950..fdefee90e92 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1080,10 +1080,15 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, stmt_vec_info stmt_info; FOR_EACH_VEC_ELT (stmts, i, stmt_info) { - gimple *stmt = stmt_info->stmt; swap[i] = 0; matches[i] = false; + if (!stmt_info) + { + matches[i] = true; + continue; + } + gimple *stmt = stmt_info->stmt; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "Build SLP for %G", stmt); @@ -1984,10 +1989,16 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, stmt_vec_info first_stmt_info = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]); bool any_permute = false; + bool any_null = false; FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info) { int load_place; - if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) + if (! load_info) + { + load_place = j; + any_null = true; + } + else if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) load_place = vect_get_place_in_interleaving_chain (load_info, first_stmt_info); else @@ -1996,6 +2007,11 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, any_permute |= load_place != j; load_permutation.quick_push (load_place); } + if (any_null) + { + gcc_assert (!any_permute); + load_permutation.release (); + } if (gcall *stmt = dyn_cast (stmt_info->stmt)) { @@ -3978,24 +3994,11 @@ vect_lower_load_permutations (loop_vec_info loop_vinfo, stmt_vec_info first = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (loads[0])[0]); - /* ??? In principle we have to consider a gap up to the next full - vector, but we have to actually represent a scalar stmt for the - gaps value so delay handling this. The same is true for - inbetween gaps which the load places in the load-permutation - represent. It's probably not worth trying an intermediate packing - to vectors without gap even if that might handle some more cases. - Instead get the gap case correct in some way. */ - unsigned group_lanes = 0; - for (stmt_vec_info s = first; s; s = DR_GROUP_NEXT_ELEMENT (s)) - { - if ((s == first && DR_GROUP_GAP (s) != 0) - || (s != first && DR_GROUP_GAP (s) != 1)) - return; - group_lanes++; - } /* Only a power-of-two number of lanes matches interleaving with N levels. + The non-SLP path also supports DR_GROUP_SIZE == 3. ??? An even number of lanes could be reduced to 1< stmts; stmts.create (group_lanes); for (stmt_vec_info s = first; s; s = DR_GROUP_NEXT_ELEMENT (s)) - stmts.quick_push (s); + { + if (s != first) + for (unsigned i = 1; i < DR_GROUP_GAP (s); ++i) + stmts.quick_push (NULL); + stmts.quick_push (s); + } + for (unsigned i = 0; i < DR_GROUP_GAP (first); ++i) + stmts.quick_push (NULL); poly_uint64 max_nunits; bool *matches = XALLOCAVEC (bool, group_lanes); unsigned limit = 1;