From patchwork Thu Jan 26 15:44:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1732347 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=SFh6/tDo; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4P2lQ10yj9z23h0 for ; Fri, 27 Jan 2023 02:44:40 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 89A703858C74 for ; Thu, 26 Jan 2023 15:44:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 89A703858C74 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674747877; bh=5qrVUC/+ZL1z9O/qP0UoDoPE6uxXGFAGgrW4VpFpVzg=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=SFh6/tDoAyGcZh3K4K/D6tmkLqJhrDZ8eS5rQDCTrRGQyhrcZU1mmL/7EGrZ/tkmr O6+PlIGY5tcDNO5DEUsjW9sNme7iLxbxhR8LPYMOIt5uO4ZYsvQ4L+gLEUrYa4d2r2 0pXfz4B2nv2ajZ8L7DNR5H7T9VG2UD7PrRTrrXRU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1EB313858D3C for ; Thu, 26 Jan 2023 15:44:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1EB313858D3C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6C694B3; Thu, 26 Jan 2023 07:44:59 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 588053F5A1; Thu, 26 Jan 2023 07:44:17 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Cc: rguenther@suse.de Subject: [PATCH] vect/aarch64: Fix various sve/cond*.c failures Date: Thu, 26 Jan 2023 15:44:16 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-36.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Quite a few gcc.target/aarch64/sve/cond*.c tests started failing after g:68e0063397ba820e71adc220b2da0581dce29ffa, but it turns out that we were cheating passes before the patch. The tests involve comparing the cost of N wide compares, a pack sequence, and a narrow COND_EXPR with the cost of a single COND_EXPR on fewer elements. The costs for the former included all operations, but the costs for the latter didn't model the comparison embedded in the COND_EXPR. The patch made us include the comparison on both sides, making it apples-for-apples, but that's enough to tip the balance in favour of using the wider types. I think the new choice does reflect the current SVE cost model correctly. (Whether and how the model should be tweaked is a different question.) This patch therefore changes the tuning vector length to one that makes the choice more obvious. That in turn needs a tweak to compare_inside_loop_cost. The function compares body_cost1/vf1 with body_cost2/vf2, but for fully-amsked loops, it limits vf to the actual number of iterations. This is so that (say) an expensive 16-element vector body doesn't win over a cheaper 8-element vector body when there are only 7 elements to process. However, the limit was applied using known_le, regardless of the tuning target. For a heuristic like this, it seems better to use the likely minimum (which is a concept that was only added after this code went in). g:68e0063397ba820e71adc220b2da0581dce29ffa also fixed vcond_4_costly.c. Tested on aarch64-linux-gnu. OK to install? Richard gcc/ * tree-vectorizer.cc (vector_costs::compare_inside_loop_cost): Use the likely minimum VF when bounding the denominators to the estimated number of iterations. gcc/testsuite/ * gcc.target/aarch64/sve/cond_asrd_1.c: Tune for a 256-bit vector length. * gcc.target/aarch64/sve/cond_cnot_4.c: Likewise. * gcc.target/aarch64/sve/cond_cnot_6.c: Likewise. * gcc.target/aarch64/sve/cond_unary_5.c: Likewise. * gcc.target/aarch64/sve/cond_unary_6.c: Likewise. * gcc.target/aarch64/sve/cond_uxt_5.c: Likewise. * gcc.target/aarch64/sve/vcond_4_costly.c: Remove XFAILs. --- gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c | 2 +- gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c | 4 ++-- gcc/tree-vectorizer.cc | 6 ++++-- 8 files changed, 12 insertions(+), 10 deletions(-) diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c index 478b52ac27c..aac06bd8093 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_asrd_1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c index 729d3f4f2ac..f6278916e1a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c index d44e357f44a..ef1b067172f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_cnot_6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c index 17b3f86c8c6..03a6636f2d2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c index 1bd342b65d4..c49a3040b21 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_unary_6.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c index 18866286b7f..9a2bd8f152f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_uxt_5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -ftree-vectorize" } */ +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256" } */ #include diff --git a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c index 4aa567e3709..76d7a288612 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/vcond_4_costly.c @@ -61,8 +61,8 @@ TEST_CMP (nuge) TEST_CMP (nugt) /* 2 each for: eq, ne, ueq, nueq. */ -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 { xfail *-*-* } } } */ -/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 8 } } */ +/* { dg-final { scan-assembler-times {\tfcm(?:eq|ne)\tp[0-9]+\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 16 } } */ /* 2 each for: olt, ult, nult, ogt, ugt, nugt. */ /* { dg-final { scan-assembler-times {\tfcm[lg]t\tp[0-9]+\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */ diff --git a/gcc/tree-vectorizer.cc b/gcc/tree-vectorizer.cc index 875acbbf948..89cd0b88b61 100644 --- a/gcc/tree-vectorizer.cc +++ b/gcc/tree-vectorizer.cc @@ -1973,9 +1973,11 @@ vector_costs::compare_inside_loop_cost (const vector_costs *other) const HOST_WIDE_INT estimated_max_niter = likely_max_stmt_executions_int (loop); if (estimated_max_niter != -1) { - if (known_le (estimated_max_niter, this_vf)) + if (estimated_poly_value (this_vf, POLY_VALUE_MIN) + >= estimated_max_niter) this_vf = estimated_max_niter; - if (known_le (estimated_max_niter, other_vf)) + if (estimated_poly_value (other_vf, POLY_VALUE_MIN) + >= estimated_max_niter) other_vf = estimated_max_niter; }