From patchwork Thu Oct 10 12:33:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1995432 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4XPTh25J2Sz1xsc for ; Thu, 10 Oct 2024 23:33:34 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5F9363857BB9 for ; Thu, 10 Oct 2024 12:33:32 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0165C3857B90 for ; Thu, 10 Oct 2024 12:33:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0165C3857B90 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0165C3857B90 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728563594; cv=none; b=I518u6qSnpdy9li+GAxLRqN1dpATjHavnbo/93rwApRXuIifu/Dd4SnhT04lemX2V1wtiTJC7/TmNu5CtarSZ1yG04zSI1iRX+swZoVi+VufhIIxOtpeoqTjWwcL1uALXazRnlEglPpv2cQ6ky8mY3qJ6+EA1gkGIquRNTrwsYw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1728563594; c=relaxed/simple; bh=Hkwg8sfp3TI4gWCoOjIt6gvzAppMA8Yau3pErVA/q58=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Z+DgFK4D1J2UB4qRMpaUY/BdmcFN3w/rKQI/74y8ZyU2eLhSQIKEplKr26fImWANsFnrABXjLpL+jFf0nWt2wgd+1UGa79rniBe9H4igYDtjufv/ozGMxRFWVBSKJpYfk5x6ISjfj3TMzDuXm5DgG2F43f3CUNJUGt2f8k9pLMA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 08A06497; Thu, 10 Oct 2024 05:33:41 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E42FF3F58B; Thu, 10 Oct 2024 05:33:10 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org,rguenther@suse.de, richard.sandiford@arm.com Cc: rguenther@suse.de Subject: [PATCH] vect: Avoid divide by zero for permutes of extern VLA vectors Date: Thu, 10 Oct 2024 13:33:09 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 X-Spam-Status: No, score=-18.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org My recent VLA SLP patches caused a regression with cross compilers in gcc.dg/torture/neon-sve-bridge.c. There we have a VEC_PERM_EXPR created from two BIT_FIELD_REFs, with the child node being an external VLA vector: note: node 0x3704a70 (max_nunits=1, refcnt=2) vector(2) long int note: op: VEC_PERM_EXPR note: stmt 0 val1Return_9 = BIT_FIELD_REF ; note: stmt 1 val2Return_10 = BIT_FIELD_REF ; note: lane permutation { 0[0] 0[1] } note: children 0x3704b08 note: node (external) 0x3704b08 (max_nunits=1, refcnt=1) svint64_t note: { } For this kind of external node, the SLP_TREE_LANES is normally the total number of lanes in the vector, but it is zero if the vector has variable length: auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)); unsigned HOST_WIDE_INT const_nunits; if (nunits.is_constant (&const_nunits)) SLP_TREE_LANES (vnode) = const_nunits; This led to division by zero in: /* Check whether the output has N times as many lanes per vector. */ else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits, SLP_TREE_LANES (child) * nunits, &this_unpack_factor) && (i == 0 || unpack_factor == this_unpack_factor)) unpack_factor = this_unpack_factor; No repetition takes place for this kind of external node, so this patch goes with Richard's suggestion to check for external nodes that have no scalar statements. This didn't show up for my native testing since division by zero doesn't trap on AArch64. Bootstrapped & regreesion-tested on aarch64-linux-gnu and spot-checked with a cross compiler. OK to install? gcc/ * tree-vect-slp.cc (vectorizable_slp_permutation_1): Set repeating_p to false if we have an external node for a pre-existing vector. --- gcc/tree-vect-slp.cc | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 9bb765e2cba..1991fb1d3b6 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -10288,10 +10288,19 @@ vectorizable_slp_permutation_1 (vec_info *vinfo, gimple_stmt_iterator *gsi, } auto op_nunits = TYPE_VECTOR_SUBPARTS (op_vectype); unsigned int this_unpack_factor; + /* Detect permutations of external, pre-existing vectors. The external + node's SLP_TREE_LANES stores the total number of units in the vector, + or zero if the vector has variable length. + + We are expected to keep the original VEC_PERM_EXPR for such cases. + There is no repetition to model. */ + if (SLP_TREE_DEF_TYPE (child) == vect_external_def + && SLP_TREE_SCALAR_OPS (child).is_empty ()) + repeating_p = false; /* Check whether the input has twice as many lanes per vector. */ - if (children.length () == 1 - && known_eq (SLP_TREE_LANES (child) * nunits, - SLP_TREE_LANES (node) * op_nunits * 2)) + else if (children.length () == 1 + && known_eq (SLP_TREE_LANES (child) * nunits, + SLP_TREE_LANES (node) * op_nunits * 2)) pack_p = true; /* Check whether the output has N times as many lanes per vector. */ else if (constant_multiple_p (SLP_TREE_LANES (node) * op_nunits,