From patchwork Thu Dec 22 23:19:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1719011 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=C9xdwHzz; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4NdR9D1Lhtz1ydd for ; Fri, 23 Dec 2022 10:19:42 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 803A6385B532 for ; Thu, 22 Dec 2022 23:19:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id CD5243858D1E for ; Thu, 22 Dec 2022 23:19:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD5243858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=2lPmId+DNvg5FtcSmt7ycu7L/59O16p5C+28plPhVH4=; b=C9xdwHzz348jfUZs7ZMuMLU7OH UMBF00L8aP1zhstbS9PAHjd1KwvRmUPangRY8QSt/tGoJGb6qR6FLwSSZ1tVZYcvTSfJknteuBNk3 UCxHit1kPaCAfWucFSgZ/DK0CV9Ey+NPvxF20JR6GdCoaE4UNx2IpxXI9+/+ECkNtgKOVRu14v0Pw W+73MBDnb+hUDSu2KMO2GC2S5W1+7OfVikEZspzPCX4h+gYAqZlQTNOlEY0ANVCivaAbdA85BzvDr R+z9xAeX6d39YtYH68hub8GGO5ZpyPezObS76sUPd12MTdjRGWslS/+7GRR1h48KpxBl0YxLpDFfc RsTnjn6Q==; Received: from host109-151-228-216.range109-151.btcentralplus.com ([109.151.228.216]:60526 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p8UqS-0001Yv-9f; Thu, 22 Dec 2022 18:19:24 -0500 From: "Roger Sayle" To: "'GCC Patches'" Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/107548: Handle vec_select in STV. Date: Thu, 22 Dec 2022 23:19:21 -0000 Message-ID: <001d01d9165b$d4690e30$7d3b2a90$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdkWWwFvRpGTi3hVT/SWBdvdKq0rSw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch enhances x86's STV pass to handle VEC_SELECT during general scalar chain conversion, performing SImode scalar extraction from V4SI and DImode scalar extraction from V2DI vector registers. The motivating test case from bugzilla is: typedef unsigned int v4si __attribute__((vector_size(16))); unsigned int f (v4si a, v4si b) { a[0] += b[0]; return a[0] + a[1]; } currently with -O2 -march=znver2 this generates: vpextrd $1, %xmm0, %edx vmovd %xmm0, %eax addl %edx, %eax vmovd %xmm1, %edx addl %edx, %eax ret which performs three transfers from the vector unit to the scalar unit, and performs the two additions there. With this patch, we now generate: vmovdqa %xmm0, %xmm2 vpshufd $85, %xmm0, %xmm0 vpaddd %xmm0, %xmm2, %xmm0 vpaddd %xmm1, %xmm0, %xmm0 vmovd %xmm0, %eax ret which performs the two additions in the vector unit, and then transfers the result to the scalar unit. Technically the (cheap) movdqa isn't needed with better register allocation (or this could be cleaned up during peephole2), but even so this transform is still a win. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-12-22 Roger Sayle gcc/ChangeLog PR target/107548 * config/i386/i386-features.cc (scalar_chain::add_insn): The operands of a VEC_SELECT don't need to added to the scalar chain. (general_scalar_chain::compute_convert_gain) : Provide gains for performing STV on a VEC_SELECT. (general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd, psrldq or no-op. (general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a single element from a vector register to a scalar register. gcc/testsuite/ChangeLog PR target/107548 * gcc.target/i386/pr107548-1.c: New test V4SI case. * gcc.target/i386/pr107548-1.c: New test V2DI case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index fd212262..cb21d3b 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -429,6 +429,11 @@ scalar_chain::add_insn (bitmap candidates, unsigned int insn_uid) for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) analyze_register_chain (candidates, ref); + + /* The operand(s) of VEC_SELECT don't need to be converted/convertible. */ + if (def_set && GET_CODE (SET_SRC (def_set)) == VEC_SELECT) + return; + for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!DF_REF_REG_MEM_P (ref)) analyze_register_chain (candidates, ref); @@ -629,6 +634,23 @@ general_scalar_chain::compute_convert_gain () } break; + case VEC_SELECT: + if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) + { + // movd (4 bytes) replaced with movdqa (4 bytes). + if (!optimize_insn_for_size_p ()) + igain += ix86_cost->sse_to_integer - ix86_cost->xmm_move; + } + else + { + // pshufd; movd replaced with pshufd. + if (optimize_insn_for_size_p ()) + igain += COSTS_N_BYTES (4); + else + igain += ix86_cost->sse_to_integer; + } + break; + default: gcc_unreachable (); } @@ -1167,6 +1189,24 @@ general_scalar_chain::convert_insn (rtx_insn *insn) convert_op (&src, insn); break; + case VEC_SELECT: + if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) + src = XEXP (src, 0); + else if (smode == DImode) + { + rtx tmp = gen_lowpart (V1TImode, XEXP (src, 0)); + dst = gen_lowpart (V1TImode, dst); + src = gen_rtx_LSHIFTRT (V1TImode, tmp, GEN_INT (64)); + } + else + { + rtx tmp = XVECEXP (XEXP (src, 1), 0, 0); + rtvec vec = gen_rtvec (4, tmp, tmp, tmp, tmp); + rtx par = gen_rtx_PARALLEL (VOIDmode, vec); + src = gen_rtx_VEC_SELECT (vmode, XEXP (src, 0), par); + } + break; + default: gcc_unreachable (); } @@ -1917,6 +1957,16 @@ general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode) case CONST_INT: return REG_P (dst); + case VEC_SELECT: + /* Excluding MEM_P (dst) avoids intefering with vpextr[dq]. */ + return REG_P (dst) + && REG_P (XEXP (src, 0)) + && GET_MODE (XEXP (src, 0)) == (mode == DImode ? V2DImode + : V4SImode) + && GET_CODE (XEXP (src, 1)) == PARALLEL + && XVECLEN (XEXP (src, 1), 0) == 1 + && CONST_INT_P (XVECEXP (XEXP (src, 1), 0, 0)); + default: return false; } diff --git a/gcc/testsuite/gcc.target/i386/pr107548-1.c b/gcc/testsuite/gcc.target/i386/pr107548-1.c new file mode 100644 index 0000000..da78f75 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107548-1.c @@ -0,0 +1,25 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mstv -mno-stackrealign" } */ +typedef unsigned int v4si __attribute__((vector_size(16))); + +unsigned int foo1 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[1]; +} + +unsigned int foo2 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[2]; +} + +unsigned int foo3 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[3]; +} + +/* { dg-final { scan-assembler-times "\tmovd\t" 3 } } */ +/* { dg-final { scan-assembler-times "paddd" 6 } } */ +/* { dg-final { scan-assembler-not "addl" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr107548-2.c b/gcc/testsuite/gcc.target/i386/pr107548-2.c new file mode 100644 index 0000000..b57594e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107548-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mstv -mno-stackrealign" } */ +typedef unsigned long long v2di __attribute__((vector_size(16))); + +unsigned long long foo(v2di a, v2di b) +{ + a[0] += b[0]; + return a[0] + a[1]; +} + +/* { dg-final { scan-assembler-not "\taddq\t" } } */ +/* { dg-final { scan-assembler-times "paddq" 2 } } */ +/* { dg-final { scan-assembler "psrldq" } } */