From patchwork Sat May 6 16:04:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1778013 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=kRS8Tdp+; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4QDC7b6PCQz214M for ; Sun, 7 May 2023 02:05:15 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A42873856961 for ; Sat, 6 May 2023 16:05:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 7A0673858D39 for ; Sat, 6 May 2023 16:05:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7A0673858D39 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=IZCU0PfQmhyH9+MPhtGuTXVqUQ/o54WG9koGBnIO49U=; b=kRS8Tdp+28AfuFh/cG8RYtkXwT 8+tUk7Q7WP8q43ptyA6hKb3P5lCaCLLe5t1gTewG7qTjApIWZkpAdo6cN0jqSFsS56H6gmP9g7lXu RnUq5lAzUkbbUj95oEqJ0AEqSqo0XRcEnqsgu9OCwj49YwOHP66Nufiyk/tsfU3pzcRKgKoAy0YjM mN/P5Gq3LwSzBAoSm7OWQaUdCJNrT4VtnwOwC734rlTFa7NqdRujEyv3HhIQou3WBShxtKCvF+Q9d NS5eBiHXLDcloz9YXk+DbBokp3JIOxxJ76efLO4H0Iw4Sz8aVDXxEccnhatMpn5nhAniT+LgpOCoB a8+9uBGQ==; Received: from host86-169-41-81.range86-169.btcentralplus.com ([86.169.41.81]:50056 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1pvKP5-0002aC-2Z; Sat, 06 May 2023 12:04:59 -0400 From: "Roger Sayle" To: "'GCC Patches'" Cc: "'Tom de Vries'" Subject: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic. Date: Sat, 6 May 2023 17:04:57 +0100 Message-ID: <007301d98034$82486ea0$86d94be0$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdmAMZgVGSF+KmNQRT+yDP3EjH5V0w== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, HTML_MESSAGE, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" This patch adds support for (a pair of) bit reversal intrinsics __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit and 64-bit bit reversal (using nvptx's brev instruction) matching the __brev and __brevll instrinsics provided by NVidia's nvcc compiler. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT .html This patch has been tested on nvptx-none which make and make -k check with no new failures. Ok for mainline? 2023-05-06 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target builtin for bit reversal using brev instruction. (enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and NVPTX_BUILTIN_BREVLL. (nvptx_init_builtins): Define "brev" and "brevll". (nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function. * doc/extend.texi (Nvidia PTX Builtin-in Functions): New section, document __builtin_nvptx_brev{,ll}. gcc/testsuite/ChangeLog * gcc.target/nvptx/brev-1.c: New 32-bit test case. * gcc.target/nvptx/brev-2.c: Likewise. * gcc.target/nvptx/brevll-1.c: New 64-bit test case. * gcc.target/nvptx/brevll-2.c: Likewise. Thanks in advance, Roger diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 89349da..1b99fca 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -6047,6 +6047,29 @@ nvptx_expand_shuffle (tree exp, rtx target, machine_mode mode, int ignore) return target; } +/* Expander for the bit reverse builtins. */ + +static rtx +nvptx_expand_brev (tree exp, rtx target, machine_mode mode, int ignore) +{ + if (ignore) + return target; + + rtx arg = expand_expr (CALL_EXPR_ARG (exp, 0), + NULL_RTX, mode, EXPAND_NORMAL); + if (!REG_P (arg)) + arg = copy_to_mode_reg (mode, arg); + if (!target) + target = gen_reg_rtx (mode); + rtx pat; + if (mode == SImode) + pat = gen_bitrevsi2 (target, arg); + else + pat = gen_bitrevdi2 (target, arg); + emit_insn (pat); + return target; +} + const char * nvptx_output_red_partition (rtx dst, rtx offset) { @@ -6164,6 +6187,8 @@ enum nvptx_builtins NVPTX_BUILTIN_BAR_RED_AND, NVPTX_BUILTIN_BAR_RED_OR, NVPTX_BUILTIN_BAR_RED_POPC, + NVPTX_BUILTIN_BREV, + NVPTX_BUILTIN_BREVLL, NVPTX_BUILTIN_MAX }; @@ -6292,6 +6317,9 @@ nvptx_init_builtins (void) DEF (BAR_RED_POPC, "bar_red_popc", (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + DEF (BREV, "brev", (UINT, UINT, NULL_TREE)); + DEF (BREVLL, "brevll", (LLUINT, LLUINT, NULL_TREE)); + #undef DEF #undef ST #undef UINT @@ -6339,6 +6367,10 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget), case NVPTX_BUILTIN_BAR_RED_POPC: return nvptx_expand_bar_red (exp, target, mode, ignore); + case NVPTX_BUILTIN_BREV: + case NVPTX_BUILTIN_BREVLL: + return nvptx_expand_brev (exp, target, mode, ignore); + default: gcc_unreachable (); } } diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ac47680..871f0cf 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -14682,6 +14682,7 @@ instructions, but allow the compiler to schedule those calls. * Other MIPS Built-in Functions:: * MSP430 Built-in Functions:: * NDS32 Built-in Functions:: +* Nvidia PTX Built-in Functions:: * Basic PowerPC Built-in Functions:: * PowerPC AltiVec/VSX Built-in Functions:: * PowerPC Hardware Transactional Memory Built-in Functions:: @@ -17941,6 +17942,20 @@ Enable global interrupt. Disable global interrupt. @enddefbuiltin +@node Nvidia PTX Built-in Functions +@subsection Nvidia PTX Built-in Functions + +These built-in functions are available for the Nvidia PTX target: + +@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})} +Reverse the bit order of a 32-bit unsigned integer. +Disable global interrupt. +@enddefbuiltin + +@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})} +Reverse the bit order of a 64-bit unsigned integer. +@enddefbuiltin + @node Basic PowerPC Built-in Functions @subsection Basic PowerPC Built-in Functions diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c b/gcc/testsuite/gcc.target/nvptx/brev-1.c new file mode 100644 index 0000000..fbb4fff --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +unsigned int foo(unsigned int x) +{ + return __builtin_nvptx_brev(x); +} + +/* { dg-final { scan-assembler "brev.b32" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2.c b/gcc/testsuite/gcc.target/nvptx/brev-2.c new file mode 100644 index 0000000..9d0defe --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c @@ -0,0 +1,94 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ +unsigned int bitreverse32(unsigned int x) +{ + return __builtin_nvptx_brev(x); +} + +int main(void) +{ + if (bitreverse32(0x00000000) != 0x00000000) + __builtin_abort(); + if (bitreverse32(0xffffffff) != 0xffffffff) + __builtin_abort(); + + if (bitreverse32(0x00000001) != 0x80000000) + __builtin_abort(); + if (bitreverse32(0x00000002) != 0x40000000) + __builtin_abort(); + if (bitreverse32(0x00000004) != 0x20000000) + __builtin_abort(); + if (bitreverse32(0x00000008) != 0x10000000) + __builtin_abort(); + if (bitreverse32(0x00000010) != 0x08000000) + __builtin_abort(); + if (bitreverse32(0x00000020) != 0x04000000) + __builtin_abort(); + if (bitreverse32(0x00000040) != 0x02000000) + __builtin_abort(); + if (bitreverse32(0x00000080) != 0x01000000) + __builtin_abort(); + if (bitreverse32(0x00000100) != 0x00800000) + __builtin_abort(); + if (bitreverse32(0x00000200) != 0x00400000) + __builtin_abort(); + if (bitreverse32(0x00000400) != 0x00200000) + __builtin_abort(); + if (bitreverse32(0x00000800) != 0x00100000) + __builtin_abort(); + if (bitreverse32(0x00001000) != 0x00080000) + __builtin_abort(); + if (bitreverse32(0x00002000) != 0x00040000) + __builtin_abort(); + if (bitreverse32(0x00004000) != 0x00020000) + __builtin_abort(); + if (bitreverse32(0x00008000) != 0x00010000) + __builtin_abort(); + if (bitreverse32(0x00010000) != 0x00008000) + __builtin_abort(); + if (bitreverse32(0x00020000) != 0x00004000) + __builtin_abort(); + if (bitreverse32(0x00040000) != 0x00002000) + __builtin_abort(); + if (bitreverse32(0x00080000) != 0x00001000) + __builtin_abort(); + if (bitreverse32(0x00100000) != 0x00000800) + __builtin_abort(); + if (bitreverse32(0x00200000) != 0x00000400) + __builtin_abort(); + if (bitreverse32(0x00400000) != 0x00000200) + __builtin_abort(); + if (bitreverse32(0x00800000) != 0x00000100) + __builtin_abort(); + if (bitreverse32(0x01000000) != 0x00000080) + __builtin_abort(); + if (bitreverse32(0x02000000) != 0x00000040) + __builtin_abort(); + if (bitreverse32(0x04000000) != 0x00000020) + __builtin_abort(); + if (bitreverse32(0x08000000) != 0x00000010) + __builtin_abort(); + if (bitreverse32(0x10000000) != 0x00000008) + __builtin_abort(); + if (bitreverse32(0x20000000) != 0x00000004) + __builtin_abort(); + if (bitreverse32(0x40000000) != 0x00000002) + __builtin_abort(); + if (bitreverse32(0x80000000) != 0x00000001) + __builtin_abort(); + + if (bitreverse32(0x01234567) != 0xe6a2c480) + __builtin_abort(); + if (bitreverse32(0xe6a2c480) != 0x01234567) + __builtin_abort(); + if (bitreverse32(0xdeadbeef) != 0xf77db57b) + __builtin_abort(); + if (bitreverse32(0xf77db57b) != 0xdeadbeef) + __builtin_abort(); + if (bitreverse32(0xcafebabe) != 0x7d5d7f53) + __builtin_abort(); + if (bitreverse32(0x7d5d7f53) != 0xcafebabe) + __builtin_abort(); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-1.c b/gcc/testsuite/gcc.target/nvptx/brevll-1.c new file mode 100644 index 0000000..7009d5f --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ +unsigned long foo(unsigned long x) +{ + return __builtin_nvptx_brevll(x); +} + +/* { dg-final { scan-assembler "brev.b64" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2.c b/gcc/testsuite/gcc.target/nvptx/brevll-2.c new file mode 100644 index 0000000..56054b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/brevll-2.c @@ -0,0 +1,154 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ +unsigned long long bitreverse64(unsigned long long x) +{ + return __builtin_nvptx_brevll(x); +} + +int main(void) +{ + if (bitreverse64(0x0000000000000000ll) != 0x0000000000000000ll) + __builtin_abort(); + if (bitreverse64(0xffffffffffffffffll) != 0xffffffffffffffffll) + __builtin_abort(); + + if (bitreverse64(0x0000000000000001ll) != 0x8000000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000002ll) != 0x4000000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000004ll) != 0x2000000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000008ll) != 0x1000000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000010ll) != 0x0800000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000020ll) != 0x0400000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000040ll) != 0x0200000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000080ll) != 0x0100000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000100ll) != 0x0080000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000200ll) != 0x0040000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000400ll) != 0x0020000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000000800ll) != 0x0010000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000001000ll) != 0x0008000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000002000ll) != 0x0004000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000004000ll) != 0x0002000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000008000ll) != 0x0001000000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000010000ll) != 0x0000800000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000020000ll) != 0x0000400000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000040000ll) != 0x0000200000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000080000ll) != 0x0000100000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000100000ll) != 0x0000080000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000200000ll) != 0x0000040000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000400000ll) != 0x0000020000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000000800000ll) != 0x0000010000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000001000000ll) != 0x0000008000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000002000000ll) != 0x0000004000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000004000000ll) != 0x0000002000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000008000000ll) != 0x0000001000000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000010000000ll) != 0x0000000800000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000020000000ll) != 0x0000000400000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000040000000ll) != 0x0000000200000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000080000000ll) != 0x0000000100000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000100000000ll) != 0x0000000080000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000200000000ll) != 0x0000000040000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000400000000ll) != 0x0000000020000000ll) + __builtin_abort(); + if (bitreverse64(0x0000000800000000ll) != 0x0000000010000000ll) + __builtin_abort(); + if (bitreverse64(0x0000001000000000ll) != 0x0000000008000000ll) + __builtin_abort(); + if (bitreverse64(0x0000002000000000ll) != 0x0000000004000000ll) + __builtin_abort(); + if (bitreverse64(0x0000004000000000ll) != 0x0000000002000000ll) + __builtin_abort(); + if (bitreverse64(0x0000008000000000ll) != 0x0000000001000000ll) + __builtin_abort(); + if (bitreverse64(0x0000010000000000ll) != 0x0000000000800000ll) + __builtin_abort(); + if (bitreverse64(0x0000020000000000ll) != 0x0000000000400000ll) + __builtin_abort(); + if (bitreverse64(0x0000040000000000ll) != 0x0000000000200000ll) + __builtin_abort(); + if (bitreverse64(0x0000080000000000ll) != 0x0000000000100000ll) + __builtin_abort(); + if (bitreverse64(0x0000100000000000ll) != 0x0000000000080000ll) + __builtin_abort(); + if (bitreverse64(0x0000200000000000ll) != 0x0000000000040000ll) + __builtin_abort(); + if (bitreverse64(0x0000400000000000ll) != 0x0000000000020000ll) + __builtin_abort(); + if (bitreverse64(0x0000800000000000ll) != 0x0000000000010000ll) + __builtin_abort(); + if (bitreverse64(0x0001000000000000ll) != 0x0000000000008000ll) + __builtin_abort(); + if (bitreverse64(0x0002000000000000ll) != 0x0000000000004000ll) + __builtin_abort(); + if (bitreverse64(0x0004000000000000ll) != 0x0000000000002000ll) + __builtin_abort(); + if (bitreverse64(0x0008000000000000ll) != 0x0000000000001000ll) + __builtin_abort(); + if (bitreverse64(0x0010000000000000ll) != 0x0000000000000800ll) + __builtin_abort(); + if (bitreverse64(0x0020000000000000ll) != 0x0000000000000400ll) + __builtin_abort(); + if (bitreverse64(0x0040000000000000ll) != 0x0000000000000200ll) + __builtin_abort(); + if (bitreverse64(0x0080000000000000ll) != 0x0000000000000100ll) + __builtin_abort(); + if (bitreverse64(0x0100000000000000ll) != 0x0000000000000080ll) + __builtin_abort(); + if (bitreverse64(0x0200000000000000ll) != 0x0000000000000040ll) + __builtin_abort(); + if (bitreverse64(0x0400000000000000ll) != 0x0000000000000020ll) + __builtin_abort(); + if (bitreverse64(0x0800000000000000ll) != 0x0000000000000010ll) + __builtin_abort(); + if (bitreverse64(0x1000000000000000ll) != 0x0000000000000008ll) + __builtin_abort(); + if (bitreverse64(0x2000000000000000ll) != 0x0000000000000004ll) + __builtin_abort(); + if (bitreverse64(0x4000000000000000ll) != 0x0000000000000002ll) + __builtin_abort(); + if (bitreverse64(0x8000000000000000ll) != 0x0000000000000001ll) + __builtin_abort(); + + if (bitreverse64(0x0123456789abcdefll) != 0xf7b3d591e6a2c480ll) + __builtin_abort(); + if (bitreverse64(0xf7b3d591e6a2c480ll) != 0x0123456789abcdefll) + __builtin_abort(); + if (bitreverse64(0xdeadbeefcafebabell) != 0x7d5d7f53f77db57bll) + __builtin_abort(); + if (bitreverse64(0x7d5d7f53f77db57bll) != 0xdeadbeefcafebabell) + __builtin_abort(); + return 0; +} +