From patchwork Sun Jul 9 20:35:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1805450 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=nzjtmozC; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Qzf6l5NLSz20Ph for ; Mon, 10 Jul 2023 06:36:11 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B36BE3857352 for ; Sun, 9 Jul 2023 20:36:08 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 639843858C2C for ; Sun, 9 Jul 2023 20:35:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 639843858C2C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=WdEsrLNskwk7nyHQqjQhfXUu6ms92a6NgCSSIGJXEJE=; b=nzjtmozCQD7KvAuA3bNTOtQXu3 2ZzdB3ZBzxV/L+POx6ukvd+BO/6lY2Kcf1gt/KCpDnocckpX99RlrsEuQ5a8s1tzhmABivj3LquUN Pe4Uc6q9OCeoM0FZYDlMBeauBv042fzLTbSjsX6QHJwgtRu0vYi3XszVmtEB9KLsw5ve5OBFqClAy 4vUwT/7R0tT9Tfp5HWbaqBSOsXLn5z2OY8Put25/0M3FS82IiLZThZfNCCuS77eDlWaVkwuxOBBAo QntCpQ6FJYtBd0J3KSuOkehb9fBfMUNv+xsMSTuxIgxEC1MEQuaRdg9lBnb8t5u+nPv0stQepzrhp ieL0vwhg==; Received: from host86-161-68-50.range86-161.btcentralplus.com ([86.161.68.50]:52393 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1qIb8N-0001B1-1d; Sun, 09 Jul 2023 16:35:55 -0400 From: "Roger Sayle" To: Cc: "'Uros Bizjak'" Subject: [x86 PATCH] Add AVX512 support for STV of SI/DImode rotation by constant. Date: Sun, 9 Jul 2023 21:35:53 +0100 Message-ID: <035a01d9b2a4$f6078950$e2169bf0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Content-Language: en-gb Thread-Index: Admyo6VrHd7kgPeISMKGusLm8YSK8A== X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Following Uros' suggestion, this patch adds support for AVX512VL's vpro[lr][dq] instructions to the recently added scalar-to-vector (STV) enhancements to handle DImode and SImode rotations by a constant. For the test cases: unsigned long long rot1(unsigned long long x) { return (x>>1) | (x<<63); } void mem1(unsigned long long *p) { *p = rot1(*p); } with -m32 -O2 -mavx512vl, we currently generate: rot1: movl 4(%esp), %eax movl 8(%esp), %edx movl %eax, %ecx shrdl $1, %edx, %eax shrdl $1, %ecx, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vpshufd $20, %xmm0, %xmm0 vpsrlq $1, %xmm0, %xmm0 vpshufd $136, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret with this patch, we now generate: rot1: vmovq 4(%esp), %xmm0 vprorq $1, %xmm0, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vprorq $1, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret This patch has been tested on x86_64-pc-linux-gnu (cascadelake which has avx512) with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-07-09 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Tweak gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL. (general_scalar_chain::convert_rotate): On TARGET_AVX512F generate avx512vl_rolv2di or avx412vl_rolv4si when appropriate. gcc/testsuite/ChangeLog * gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case. Cheers, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 2e751d1..4d69251 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -585,7 +585,9 @@ general_scalar_chain::compute_convert_gain () case ROTATE: case ROTATERT: igain += m * ix86_cost->shift_const; - if (smode == DImode) + if (TARGET_AVX512F) + igain -= ix86_cost->sse_op; + else if (smode == DImode) { int bits = INTVAL (XEXP (src, 1)); if ((bits & 0x0f) == 0) @@ -1225,6 +1227,8 @@ general_scalar_chain::convert_rotate (enum rtx_code code, rtx op0, rtx op1, emit_insn_before (pat, insn); result = gen_lowpart (V2DImode, tmp1); } + else if (TARGET_AVX512F) + result = simplify_gen_binary (code, V2DImode, op0, op1); else if (bits == 16 || bits == 48) { rtx tmp1 = gen_reg_rtx (V8HImode); @@ -1269,6 +1273,8 @@ general_scalar_chain::convert_rotate (enum rtx_code code, rtx op0, rtx op1, emit_insn_before (pat, insn); result = gen_lowpart (V4SImode, tmp1); } + else if (TARGET_AVX512F) + result = simplify_gen_binary (code, V4SImode, op0, op1); else { if (code == ROTATE) diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c new file mode 100644 index 0000000..2f0ead8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-stv-rotatedi-1.c @@ -0,0 +1,35 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2 -mavx512vl" } */ + +unsigned long long rot1(unsigned long long x) { return (x>>1) | (x<<63); } +unsigned long long rot2(unsigned long long x) { return (x>>2) | (x<<62); } +unsigned long long rot3(unsigned long long x) { return (x>>3) | (x<<61); } +unsigned long long rot4(unsigned long long x) { return (x>>4) | (x<<60); } +unsigned long long rot5(unsigned long long x) { return (x>>5) | (x<<59); } +unsigned long long rot6(unsigned long long x) { return (x>>6) | (x<<58); } +unsigned long long rot7(unsigned long long x) { return (x>>7) | (x<<57); } +unsigned long long rot8(unsigned long long x) { return (x>>8) | (x<<56); } +unsigned long long rot9(unsigned long long x) { return (x>>9) | (x<<55); } +unsigned long long rot10(unsigned long long x) { return (x>>10) | (x<<54); } +unsigned long long rot15(unsigned long long x) { return (x>>15) | (x<<49); } +unsigned long long rot16(unsigned long long x) { return (x>>16) | (x<<48); } +unsigned long long rot17(unsigned long long x) { return (x>>17) | (x<<47); } +unsigned long long rot20(unsigned long long x) { return (x>>20) | (x<<44); } +unsigned long long rot24(unsigned long long x) { return (x>>24) | (x<<40); } +unsigned long long rot30(unsigned long long x) { return (x>>30) | (x<<34); } +unsigned long long rot31(unsigned long long x) { return (x>>31) | (x<<33); } +unsigned long long rot32(unsigned long long x) { return (x>>32) | (x<<32); } +unsigned long long rot33(unsigned long long x) { return (x>>33) | (x<<31); } +unsigned long long rot34(unsigned long long x) { return (x>>34) | (x<<30); } +unsigned long long rot40(unsigned long long x) { return (x>>40) | (x<<24); } +unsigned long long rot42(unsigned long long x) { return (x>>42) | (x<<22); } +unsigned long long rot48(unsigned long long x) { return (x>>48) | (x<<16); } +unsigned long long rot50(unsigned long long x) { return (x>>50) | (x<<14); } +unsigned long long rot56(unsigned long long x) { return (x>>56) | (x<<8); } +unsigned long long rot58(unsigned long long x) { return (x>>58) | (x<<6); } +unsigned long long rot60(unsigned long long x) { return (x>>60) | (x<<4); } +unsigned long long rot61(unsigned long long x) { return (x>>61) | (x<<3); } +unsigned long long rot62(unsigned long long x) { return (x>>62) | (x<<2); } +unsigned long long rot63(unsigned long long x) { return (x>>63) | (x<<1); } + +/* { dg-final { scan-assembler-times "vpro\[lr\]q" 29 } } */