From patchwork Tue Oct 15 12:04:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1997362 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=MqRAegqa; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=NcSGr0E7; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=1HAL58Y6; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=4AIu7q53; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XSXpY1krkz1xsc for ; Tue, 15 Oct 2024 23:04:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7790A3857C5D for ; Tue, 15 Oct 2024 12:04:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id D8824385829B for ; Tue, 15 Oct 2024 12:04:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D8824385829B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D8824385829B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993851; cv=none; b=FuZ63bj2pk6MCDLgnUBhMFqxOtFECw7wmOek2h27karysI7rBw8PRpxS7C0krPxOHbNndmS2toHwrMsuBnrYN3zSsH+bQMety0XhQ6R/6AdNn3AZhicxu7fLZ9YZZGSNwx656h1RUIWnEsBLzoqT+TO7Rd5KiFqu8H/v+K9aSlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993851; c=relaxed/simple; bh=8Wl7o0ZgRhctz4V3bXwBM+0FPkwrPfjTInmjfnQCarg=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=StGw0vTyvEfJt+fyLmu8e1HNO5xfR0nA8+NfsjTaCd8BzpOpb6SmvyUxKKZYzwjuN1ntLn8xqLhH0bhZIMqr4z30YxfWuUUJ8LPLT2UzYksS6p8yRfofmsl6RmJZCBH4FnQznOwaKsLhePrDGlBqvltGLxrV735Biud8elcEzQM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id DA27321C46; Tue, 15 Oct 2024 12:04:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993841; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=rzawy5kdl8tUY2g3Qyic1QDnaV1CDTPHocKQe85Zzaw=; b=MqRAegqaub62p9e+jsbiHhURoOMcBEpUCzLQszAQneu2gJN+ZKevpRtMclqCE0IdCS4JAl gSPUGYH3d7Ysg9BUuMX4mh/+LhZhrAC+hVmOIlF9vSdC7TxWwCYbCANGJz6z2Wo1h0pauJ HNPN1Zh6zQlZtN08mjp1XQr86zdnH1Y= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993841; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=rzawy5kdl8tUY2g3Qyic1QDnaV1CDTPHocKQe85Zzaw=; b=NcSGr0E7sXmFMa0Dc1K8byDNVNMJ7al5oMfXmNVbLRq/Sy8xQZHNnKCgtwX4uHtt1JREYj funDD28HWCD89QCQ== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993840; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=rzawy5kdl8tUY2g3Qyic1QDnaV1CDTPHocKQe85Zzaw=; b=1HAL58Y6N+MN15mnt1JE6JCx+PNbC/6waiyDn6M97uf4fuUUHS9m9zSJ5abIITeNAZWnfH 0x/ea09QlzrJ8UsF8kII716WzUiA+qn3reLFhsvqaI+lIitjHmyMs2IfuUHKSqqaG98st1 vXyCb6eaoDj03ut40C+bmDWjepqjpAY= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993840; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=rzawy5kdl8tUY2g3Qyic1QDnaV1CDTPHocKQe85Zzaw=; b=4AIu7q53V8jN3buM+o0KirzYWU9xAMrF6gSOrK50m3RlV1ycVlBS7lJDW1u1iWnNTdfmKP U65lHmCVuVcgOYAA== Date: Tue, 15 Oct 2024 14:04:00 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI Subject: [PATCH 1/3] Remove SLP_INSTANCE_UNROLLING_FACTOR, compute VF in vect_make_slp_decision MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-1.54 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.74)[-0.738]; NEURAL_HAM_SHORT(-0.20)[-0.986]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[] X-Spam-Score: -1.54 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20241015120451.7790A3857C5D@sourceware.org> The following prepares us for SLP instances with a non-uniform number of lanes. We already have this with load permutation lowering, but we managed to keep that within the constraints of the per SLP instance computed VF based on its max_nunits (with a vector type fixed for each node) and the instance group size which is the number of lanes in the SLP instance root. But in the case where arbitrary splitting and merging SLP nodes at non-power-of-two lane boundaries is allowed this simple calculation based on the outgoing group size falls apart. The following, instead of computing a VF during SLP instance discovery, computes it at vect_make_slp_decision time by walking the SLP graph and looking at each SLP node in isolation. We do track max_nunits per node which could be a VF per node instead or forgo with both completely (though for BB vectorization we need to communicate a VF > 1 requirement upward, or compute that after the fact). In the end we'd like to delay vector type assignment and only compute a minimum VF here, allowing vector types to grow when the actual VF is bigger. There's slight complication with permutes of externs / constants as those get their vector type (and thus max_nunits) assigned late. While we force them to have the same vector type as the result at the moment their number of lanes can differ. So those get handled explicitly there right now to up the VF as needed - the alternative is to fail vectorization, I have an addition to vect_maybe_update_slp_op_vectype that would FAIL if the set vector type isn't within the constraints of the VF. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. * tree-vectorizer.h (SLP_INSTANCE_UNROLLING_FACTOR): Remove. (slp_instance::unrolling_factor): Likewise. * tree-vect-slp.cc (vect_build_slp_instance): Do not set SLP_INSTANCE_UNROLLING_FACTOR. Remove then dead code. Compute and set max_nunits from the RHS nodes merged. (vect_update_slp_vf_for_node): New function. (vect_make_slp_decision): Use vect_update_slp_vf_for_node to compute VF recursively. (vect_build_slp_store_interleaving): Get max_nunits and properly set that on the permute nodes built. (vect_analyze_slp): Do not set SLP_INSTANCE_UNROLLING_FACTOR. --- gcc/tree-vect-slp.cc | 72 +++++++++++++++++++++++++++++++++---------- gcc/tree-vectorizer.h | 4 --- 2 files changed, 55 insertions(+), 21 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 28acd9ad147..959468cad8a 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -3557,13 +3557,15 @@ vect_analyze_slp_instance (vec_info *vinfo, static slp_tree vect_build_slp_store_interleaving (vec &rhs_nodes, - vec &scalar_stmts) + vec &scalar_stmts, + poly_uint64 max_nunits) { unsigned int group_size = scalar_stmts.length (); slp_tree node = vect_create_new_slp_node (scalar_stmts, SLP_TREE_CHILDREN (rhs_nodes[0]).length ()); SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]); + node->max_nunits = max_nunits; for (unsigned l = 0; l < SLP_TREE_CHILDREN (rhs_nodes[0]).length (); ++l) { @@ -3573,6 +3575,7 @@ vect_build_slp_store_interleaving (vec &rhs_nodes, SLP_TREE_CHILDREN (node).quick_push (perm); SLP_TREE_LANE_PERMUTATION (perm).create (group_size); SLP_TREE_VECTYPE (perm) = SLP_TREE_VECTYPE (node); + perm->max_nunits = max_nunits; SLP_TREE_LANES (perm) = group_size; /* ??? We should set this NULL but that's not expected. */ SLP_TREE_REPRESENTATIVE (perm) @@ -3628,6 +3631,7 @@ vect_build_slp_store_interleaving (vec &rhs_nodes, SLP_TREE_LANES (permab) = n; SLP_TREE_LANE_PERMUTATION (permab).create (n); SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm); + permab->max_nunits = max_nunits; /* ??? Should be NULL but that's not expected. */ SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm); SLP_TREE_CHILDREN (permab).quick_push (a); @@ -3698,6 +3702,7 @@ vect_build_slp_store_interleaving (vec &rhs_nodes, SLP_TREE_LANES (permab) = n; SLP_TREE_LANE_PERMUTATION (permab).create (n); SLP_TREE_VECTYPE (permab) = SLP_TREE_VECTYPE (perm); + permab->max_nunits = max_nunits; /* ??? Should be NULL but that's not expected. */ SLP_TREE_REPRESENTATIVE (permab) = SLP_TREE_REPRESENTATIVE (perm); SLP_TREE_CHILDREN (permab).quick_push (a); @@ -3828,7 +3833,6 @@ vect_build_slp_instance (vec_info *vinfo, /* Create a new SLP instance. */ slp_instance new_instance = XNEW (class _slp_instance); SLP_INSTANCE_TREE (new_instance) = node; - SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; SLP_INSTANCE_ROOT_STMTS (new_instance) = root_stmt_infos; SLP_INSTANCE_REMAIN_DEFS (new_instance) = remain; @@ -4027,9 +4031,7 @@ vect_build_slp_instance (vec_info *vinfo, /* Analyze the stored values and pinch them together with a permute node so we can preserve the whole store group. */ auto_vec rhs_nodes; - - /* Calculate the unrolling factor based on the smallest type. */ - poly_uint64 unrolling_factor = 1; + poly_uint64 max_nunits = 1; unsigned int rhs_common_nlanes = 0; unsigned int start = 0, end = i; @@ -4046,13 +4048,8 @@ vect_build_slp_instance (vec_info *vinfo, matches, limit, &tree_size, bst_map); if (node) { - /* ??? Possibly not safe, but not sure how to check - and fail SLP build? */ - unrolling_factor - = force_common_multiple (unrolling_factor, - calculate_unrolling_factor - (max_nunits, end - start)); rhs_nodes.safe_push (node); + vect_update_max_nunits (&max_nunits, node->max_nunits); if (start == 0) rhs_common_nlanes = SLP_TREE_LANES (node); else if (rhs_common_nlanes != SLP_TREE_LANES (node)) @@ -4116,6 +4113,7 @@ vect_build_slp_instance (vec_info *vinfo, SLP_TREE_CHILDREN (rhs_nodes[0]).length ()); SLP_TREE_VECTYPE (node) = SLP_TREE_VECTYPE (rhs_nodes[0]); + node->max_nunits = max_nunits; node->ldst_lanes = true; SLP_TREE_CHILDREN (node) .reserve_exact (SLP_TREE_CHILDREN (rhs_nodes[0]).length () @@ -4132,7 +4130,8 @@ vect_build_slp_instance (vec_info *vinfo, child->refcnt++; } else - node = vect_build_slp_store_interleaving (rhs_nodes, scalar_stmts); + node = vect_build_slp_store_interleaving (rhs_nodes, scalar_stmts, + max_nunits); while (!rhs_nodes.is_empty ()) vect_free_slp_tree (rhs_nodes.pop ()); @@ -4140,7 +4139,6 @@ vect_build_slp_instance (vec_info *vinfo, /* Create a new SLP instance. */ slp_instance new_instance = XNEW (class _slp_instance); SLP_INSTANCE_TREE (new_instance) = node; - SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; SLP_INSTANCE_ROOT_STMTS (new_instance) = root_stmt_infos; SLP_INSTANCE_REMAIN_DEFS (new_instance) = remain; @@ -4881,7 +4879,6 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size, slp_tree invnode = vect_create_new_slp_node (ops); SLP_TREE_DEF_TYPE (invnode) = vect_external_def; SLP_INSTANCE_TREE (new_instance) = invnode; - SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = 1; SLP_INSTANCE_LOADS (new_instance) = vNULL; SLP_INSTANCE_ROOT_STMTS (new_instance) = roots; SLP_INSTANCE_REMAIN_DEFS (new_instance) = vNULL; @@ -7198,6 +7195,47 @@ vect_gather_slp_loads (vec_info *vinfo) } } +/* For NODE update VF based on the number of lanes and the vector types + used. */ + +static void +vect_update_slp_vf_for_node (slp_tree node, poly_uint64 &vf, + hash_set &visited) +{ + if (!node || SLP_TREE_DEF_TYPE (node) != vect_internal_def) + return; + if (visited.add (node)) + return; + + for (slp_tree child : SLP_TREE_CHILDREN (node)) + vect_update_slp_vf_for_node (child, vf, visited); + + /* We do not visit SLP nodes for constants or externals - those neither + have a vector type set yet (vectorizable_* does this) nor do they + have max_nunits set. Instead we rely on internal nodes max_nunit + to cover constant/external operands. + Note that when we stop using fixed size vectors externs and constants + shouldn't influence the (minimum) vectorization factor, instead + vectorizable_* should honor the vectorization factor when trying to + assign vector types to constants and externals and cause iteration + to a higher vectorization factor when required. */ + poly_uint64 node_vf + = calculate_unrolling_factor (node->max_nunits, SLP_TREE_LANES (node)); + vf = force_common_multiple (vf, node_vf); + + /* For permute nodes that are fed from externs or constants we have to + consider their number of lanes as well. Likewise for store-lanes. */ + if (SLP_TREE_CODE (node) == VEC_PERM_EXPR + || node->ldst_lanes) + for (slp_tree child : SLP_TREE_CHILDREN (node)) + if (SLP_TREE_DEF_TYPE (child) != vect_internal_def) + { + poly_uint64 child_vf + = calculate_unrolling_factor (node->max_nunits, + SLP_TREE_LANES (child)); + vf = force_common_multiple (vf, child_vf); + } +} /* For each possible SLP instance decide whether to SLP it and calculate overall unrolling factor needed to SLP the loop. Return TRUE if decided to SLP at @@ -7215,6 +7253,7 @@ vect_make_slp_decision (loop_vec_info loop_vinfo) DUMP_VECT_SCOPE ("vect_make_slp_decision"); + hash_set visited; FOR_EACH_VEC_ELT (slp_instances, i, instance) { /* FORNOW: SLP if you can. */ @@ -7223,9 +7262,8 @@ vect_make_slp_decision (loop_vec_info loop_vinfo) GET_MODE_SIZE (vinfo->vector_mode) * X for some rational X, so they must have a common multiple. */ - unrolling_factor - = force_common_multiple (unrolling_factor, - SLP_INSTANCE_UNROLLING_FACTOR (instance)); + vect_update_slp_vf_for_node (SLP_INSTANCE_TREE (instance), + unrolling_factor, visited); /* Mark all the stmts that belong to INSTANCE as PURE_SLP stmts. Later we call vect_detect_hybrid_slp () to find stmts that need hybrid SLP and diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 5247745cf46..743b4cf0f4a 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -315,9 +315,6 @@ public: otherwise. */ vec remain_defs; - /* The unrolling factor required to vectorized this SLP instance. */ - poly_uint64 unrolling_factor; - /* The group of nodes that contain loads of this SLP instance. */ vec loads; @@ -340,7 +337,6 @@ public: /* Access Functions. */ #define SLP_INSTANCE_TREE(S) (S)->root -#define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads #define SLP_INSTANCE_ROOT_STMTS(S) (S)->root_stmts #define SLP_INSTANCE_REMAIN_DEFS(S) (S)->remain_defs From patchwork Tue Oct 15 12:04:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1997363 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=liHsJfOa; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=cuuN4Q0r; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=0sU5J16M; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=TElPf5Ls; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XSXpY6MxLz1xvP for ; Tue, 15 Oct 2024 23:04:53 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 87D0F3857C6C for ; Tue, 15 Oct 2024 12:04:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id E34D0385841D for ; Tue, 15 Oct 2024 12:04:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E34D0385841D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E34D0385841D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993865; cv=none; b=yFq8ofCe0pwAiDG6U0jnClUeeMRv9alLmRrlzXzEPCE55oyA0NfRXyT1diHmvF3cgn5pBUb0aCqw9QNeCAxOrwxCHuAMEgcT4wz6auUpyjcWq4z+tPEx+bbhPdSGydHL0ldFycC8uCuz8eXRjLCiQc6iISBZRnQz0c9D0N07nNQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993865; c=relaxed/simple; bh=wQY7emxBQM4tLusfovf8NlU/BNazCKiXtOesaSlv5bU=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=lifdRRAiqUmIBW32YN3ZzdQLRV7+GRPrMg0v59PV5BQuvnSda/PuYWdbRz/a/ZRc02xDqHz9AA12axHukoMHNIs7Tk95tgSsE7yI7CHA5LlqTZZa2l4GGgisELDWEhTXq1KXy4v7AjoTWftVrMicBR8Yb/HcIzJxrlYBqyPCTME= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id DD71A1FB9A; Tue, 15 Oct 2024 12:04:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993857; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=07K/F/6HxiMVxvWt9b2xSeiyYqu7eOi5tPI5deAWmGw=; b=liHsJfOaFQlPk4KvKBRXcJ9bN25FHsW9oJrLWwaH3Foj9LzqvZ8Eb5LWTAPTjhY8T4cydM 4ZhByjdzHaejDdKT9SHMNaWkKhkoJaz21cqaP6HMuWXuAVM3x3W08jQic5OziakSGxp8Rt e+3Xix5woyQMLhGH3z01tQRSujUU2J4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993857; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=07K/F/6HxiMVxvWt9b2xSeiyYqu7eOi5tPI5deAWmGw=; b=cuuN4Q0rbCeG4i5wMk6fQpetlLwiEwGpnaDhDZDPazwePXyP6dvZgUrBglYdhftuaGuVlH J35T6b+JYFQYK3Cw== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993856; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=07K/F/6HxiMVxvWt9b2xSeiyYqu7eOi5tPI5deAWmGw=; b=0sU5J16MeYGIqjkFGDiOafU5Kaduedjx52ZH3tRokXBJU/HkHtArsBOBdOkIcWmcitm64m yWcEjuU76tNTH0TFGj7hxQ+Z/DCiXAu3YO1eR114JfCfFjShay1og+nbKO2TtPKGpkIG1Y gJnbmtTZtg9dkFCFz7O9g52IpYPg1Gs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993856; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=07K/F/6HxiMVxvWt9b2xSeiyYqu7eOi5tPI5deAWmGw=; b=TElPf5LsShHYT/LmxTOnTloWfsdTEG5JtHhlIsBjcx1YFX0WkQDq+O6LQ3+s3xTjHK+3q1 HNQ67IFXpwZU0eBA== Date: Tue, 15 Oct 2024 14:04:16 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI Subject: [PATCH 2/3] tree-optimization/117050 - fix ICE with non-grouped .MASK_LOAD SLP MIME-Version: 1.0 X-Spam-Level: X-Spamd-Result: default: False [-1.54 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.74)[-0.741]; NEURAL_HAM_SHORT(-0.20)[-0.986]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[] X-Spam-Score: -1.54 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20241015120451.87D0F3857C6C@sourceware.org> The following is a more complete fix for PR117050, restoring the ability to permute non-grouped .MASK_LOAD with. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. PR tree-optimization/117050 * tree-vect-slp.cc (vect_build_slp_tree_2): Properly handle non-grouped masked loads when handling permutations. --- gcc/tree-vect-slp.cc | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 959468cad8a..af00c5e35dd 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1991,7 +1991,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, stmt_vec_info load_info; load_permutation.create (group_size); stmt_vec_info first_stmt_info - = DR_GROUP_FIRST_ELEMENT (SLP_TREE_SCALAR_STMTS (node)[0]); + = STMT_VINFO_GROUPED_ACCESS (stmt_info) + ? DR_GROUP_FIRST_ELEMENT (stmt_info) : stmt_info; bool any_permute = false; bool any_null = false; FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), j, load_info) @@ -2035,8 +2036,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, loads with gaps. */ if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) && (DR_GROUP_GAP (first_stmt_info) != 0 || has_gaps)) - || STMT_VINFO_STRIDED_P (stmt_info) - || (!STMT_VINFO_GROUPED_ACCESS (stmt_info) && any_permute)) + || STMT_VINFO_STRIDED_P (stmt_info)) { load_permutation.release (); matches[0] = false; @@ -2051,17 +2051,17 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, { /* Discover the whole unpermuted load. */ vec stmts2; - stmts2.create (DR_GROUP_SIZE (first_stmt_info)); - stmts2.quick_grow_cleared (DR_GROUP_SIZE (first_stmt_info)); + unsigned dr_group_size = STMT_VINFO_GROUPED_ACCESS (stmt_info) + ? DR_GROUP_SIZE (first_stmt_info) : 1; + stmts2.create (dr_group_size); + stmts2.quick_grow_cleared (dr_group_size); unsigned i = 0; for (stmt_vec_info si = first_stmt_info; si; si = DR_GROUP_NEXT_ELEMENT (si)) stmts2[i++] = si; - bool *matches2 - = XALLOCAVEC (bool, DR_GROUP_SIZE (first_stmt_info)); + bool *matches2 = XALLOCAVEC (bool, dr_group_size); slp_tree unperm_load - = vect_build_slp_tree (vinfo, stmts2, - DR_GROUP_SIZE (first_stmt_info), + = vect_build_slp_tree (vinfo, stmts2, dr_group_size, &this_max_nunits, matches2, limit, &this_tree_size, bst_map); /* When we are able to do the full masked load emit that From patchwork Tue Oct 15 12:04:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 1997364 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=bJ7tcR6I; dkim=pass header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=zdCm/36Q; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.a=rsa-sha256 header.s=susede2_rsa header.b=bJ7tcR6I; dkim=neutral header.d=suse.de header.i=@suse.de header.a=ed25519-sha256 header.s=susede2_ed25519 header.b=zdCm/36Q; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XSXpp3Lc1z1xsc for ; Tue, 15 Oct 2024 23:05:06 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A0C62385800F for ; Tue, 15 Oct 2024 12:05:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2a07:de40:b251:101:10:150:64:1]) by sourceware.org (Postfix) with ESMTPS id D27793858D21 for ; Tue, 15 Oct 2024 12:04:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D27793858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D27793858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993880; cv=none; b=Bbu/kwBqjkONhPRRvGCKVuI01FLrTns3ZBllRoI9SbK9R7hT8R9wFuhNyB8GWZhrxgKSV8Zrv6zCIl+Iog/cehqAhB0XYe9pcwTGBre1UbI4T8qU9bjqBduoD67Dn4/nxPCD6i3pXFOasoiGNQ2ltjE9kYJOeatzhx9RZyzDWzo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728993880; c=relaxed/simple; bh=+EtB2AOKydu4tiCcHMT+ZGkstGHMPGwAcHbMb/uLqoc=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=g54QGvVH+NarnOpDttXnzyhWonYdK2M7uTHpKAmrtR+NU3H7pKe1RYun73aX73o6LsNOmWB8laiXdKMkxltMkmVd9U0w/Fohi9sjSyroC2Bm0Toqxk+Z7+bwBlu0Vluc3Zlm0HQtVjKeS/G94Qz9WBi2MErnl5cZHNNduXUF0gA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id F06D221D88; Tue, 15 Oct 2024 12:04:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993876; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=mM1NVYSQo0k5CDgzqiNIx4TbEF287dMEuZXIvDc8vT8=; b=bJ7tcR6I76/3qhTbfWvhQ2Sf5Me+tu1gcHDd6x2IUpQsGALa+6dlQbaP+1YPVGQy1lELOW xFxOK0WlYx43DOkYbmYSdc9yfIrYkkwxqba9Cr/1TYvzQKsKcy/v5bDkzbi3wxA5A+8ZF2 3Vu+agPGJsQ7qaOLzrGKKxJi52/ka7c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993876; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=mM1NVYSQo0k5CDgzqiNIx4TbEF287dMEuZXIvDc8vT8=; b=zdCm/36QekcwURGwTbHGMIZFW3XPX0uYv25Fp24xX8gxapQ5G4fTGF6C9UFdnsFq03bK1O ya+WXMokD2ou8aBg== Authentication-Results: smtp-out1.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1728993876; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=mM1NVYSQo0k5CDgzqiNIx4TbEF287dMEuZXIvDc8vT8=; b=bJ7tcR6I76/3qhTbfWvhQ2Sf5Me+tu1gcHDd6x2IUpQsGALa+6dlQbaP+1YPVGQy1lELOW xFxOK0WlYx43DOkYbmYSdc9yfIrYkkwxqba9Cr/1TYvzQKsKcy/v5bDkzbi3wxA5A+8ZF2 3Vu+agPGJsQ7qaOLzrGKKxJi52/ka7c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1728993876; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=mM1NVYSQo0k5CDgzqiNIx4TbEF287dMEuZXIvDc8vT8=; b=zdCm/36QekcwURGwTbHGMIZFW3XPX0uYv25Fp24xX8gxapQ5G4fTGF6C9UFdnsFq03bK1O ya+WXMokD2ou8aBg== Date: Tue, 15 Oct 2024 14:04:35 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: RISC-V CI Subject: [PATCH 3/3] Avoid using SLP_TREE_LOAD_PERMUTATION for non-grouped SLP loads MIME-Version: 1.0 X-Spam-Score: -1.53 X-Spamd-Result: default: False [-1.53 / 50.00]; BAYES_HAM(-3.00)[100.00%]; MISSING_MID(2.50)[]; NEURAL_HAM_LONG(-0.74)[-0.738]; NEURAL_HAM_SHORT(-0.20)[-0.986]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; ARC_NA(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_DN_SOME(0.00)[] X-Spam-Level: X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Message-Id: <20241015120504.A0C62385800F@sourceware.org> The following makes sure to use a VEC_PERM SLP node to produce lane duplications for non-grouped SLP loads as those are later not lowered by load permutation lowering. For some reason gcc.dg/vect/pr106081.c now fails permute optimizing, in particular eliding vector reversal for the reduction. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. * tree-vect-slp.cc (vect_build_slp_tree_2): Use a VEC_PERM SLP node to duplicate lanes for non-grouped loads. * gcc.dg/vect/pr106081.c: Adjust. --- gcc/testsuite/gcc.dg/vect/pr106081.c | 2 +- gcc/tree-vect-slp.cc | 38 +++++++++++++++++++++++++++- 2 files changed, 38 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/pr106081.c b/gcc/testsuite/gcc.dg/vect/pr106081.c index 8f97af2d642..1864320c803 100644 --- a/gcc/testsuite/gcc.dg/vect/pr106081.c +++ b/gcc/testsuite/gcc.dg/vect/pr106081.c @@ -30,4 +30,4 @@ test(double *k) } /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */ -/* { dg-final { scan-tree-dump-times "VEC_PERM" 4 "optimized" { target x86_64-*-* i?86-*-* } } } */ +/* { dg-final { scan-tree-dump-times "VEC_PERM" 5 "optimized" { target x86_64-*-* i?86-*-* } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index af00c5e35dd..b34064103bd 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -2088,7 +2088,43 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, } else { - SLP_TREE_LOAD_PERMUTATION (node) = load_permutation; + if (!STMT_VINFO_GROUPED_ACCESS (stmt_info)) + { + /* Do not use SLP_TREE_LOAD_PERMUTATION for non-grouped + accesses. Instead when duplicated to so via a + VEC_PERM node. */ + if (!any_permute) + load_permutation.release (); + else + { + gcc_assert (group_size != 1); + vec stmts2; + stmts2.create (1); + stmts2.quick_push (stmt_info); + bool matches2; + slp_tree unperm_load + = vect_build_slp_tree (vinfo, stmts2, 1, + &this_max_nunits, &matches2, + limit, &this_tree_size, bst_map); + gcc_assert (unperm_load); + lane_permutation_t lperm; + lperm.create (group_size); + for (unsigned j = 0; j < load_permutation.length (); ++j) + { + gcc_assert (load_permutation[j] == 0); + lperm.quick_push (std::make_pair (0, 0)); + } + SLP_TREE_CODE (node) = VEC_PERM_EXPR; + SLP_TREE_CHILDREN (node).safe_push (unperm_load); + SLP_TREE_LANE_PERMUTATION (node) = lperm; + load_permutation.release (); + *max_nunits = this_max_nunits; + (*tree_size)++; + return node; + } + } + else + SLP_TREE_LOAD_PERMUTATION (node) = load_permutation; return node; } }