From patchwork Thu Jul 11 16:54:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 1959459 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=nextmovesoftware.com header.i=@nextmovesoftware.com header.a=rsa-sha256 header.s=default header.b=oh0gWwd+; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WKgn10xDkz1xqr for ; Fri, 12 Jul 2024 02:54:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 73D743875443 for ; Thu, 11 Jul 2024 16:54:24 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [69.48.154.134]) by sourceware.org (Postfix) with ESMTPS id 40923387542B for ; Thu, 11 Jul 2024 16:54:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 40923387542B Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 40923387542B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=69.48.154.134 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720716846; cv=none; b=fKJGQCH+PyYj1kJ6eNs97T9RLMFwbL+Z6cPqzUjp3+AwyynuezL5czy8/uvIul4/xyvuFmv8TcyV1HpqSkQFl7yJxyWWE6Oju12tmrooKbvlx8Y+SGKRdzN1TmlfyX3i0vKptiZWscqeaTLY/2CzYLUhhPbKjpPjm68jP9Qiszw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1720716846; c=relaxed/simple; bh=+IoEYb8a+XvCQgcp8DSItyGs1651hwJMRQ31iQol0v4=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=hnLtQuzv+f3j4uvbN2UF5EMiZ9Y9sOqU027P5AWkBnTNObGGG1dmUdNd/NcpXGBsaZ5E6EzzUIUOWWRrwMrFrBAsFfOR/ut65iVUnv1trlJKtBMhGppMKpK/laLZhPrcc+xJI2y3JTFXCeVrmnepPX8PtV2QvdAjKdnvYvoWKtM= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=LkC+7eZl1hY8tB7vSuHsfugzNBIFee1YTyZF6EBmhqY=; b=oh0gWwd+8gcaxe6JsCh3phLREz 4B1HkeA7w75+6k5PQbPyng5LPXhOs6kNI+9CRBT0wbRNWhH/JE776bLMwAY9JkdIzgpZTPByxorGz eQKrBeM8QXH4iinanBsCLhxdBsM1PD9ZY3T8ZBVOsRWZOfYh9u1pjUgmScfNGe4Pgv+IKMlA5QXmd dNfY9tiyU0Dl+c93vz32GPv4pweDtRNK5FcIPnjW+cNSXJQ7MYSohnBoPo4hEI0QFm0TH0VPq09m1 LwgKx2DBCex+BWgaf+Yhi1MhB8qnoksGQIoxV2CXk5lQIncFIzrZqSqEMOvvuMGeLGOLh+Y7BF2bX bR03h5ow==; Received: from [185.62.158.67] (port=55191 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1sRx3T-00000000cE2-1qyH; Thu, 11 Jul 2024 12:54:03 -0400 From: "Roger Sayle" To: Cc: "'Thomas Schwinge'" , "'Tom de Vries'" Subject: [nvptx PATCH] Implement rtx_costs target hook for nvptx backend. Date: Thu, 11 Jul 2024 17:54:00 +0100 Message-ID: <001101dad3b2$ef215730$cd640590$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdrTshXVm7A9FOhcRbucvKWeHqb6pw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds support for TARGET_RTX_COSTS to the nvptx backend. Currently, nvptx uses GCC's default instruction timing estimates, but this patch provides (slightly) more accurate timings. The most significant difference is that integer division is much slower (relatively) than other instructions, so the compiler should be making more use of the middle-end's expand_divmod. For an example of the benefit consider: int foo(unsigned int x) { return x/10; } currently with -O2 we generate: .visible .func (.param.u32 %value_out) foo (.param.u32 %in_ar0) { .reg.u32 %value; .reg.u32 %ar0; ld.param.u32 %ar0, [%in_ar0]; .reg.u32 %r24; mov.u32 %r24, %ar0; div.u32 %value, %r24, 10; st.param.u32 [%value_out], %value; ret; } but with this patch, we now generate: .visible .func (.param.u32 %value_out) foo (.param.u32 %in_ar0) { .reg.u32 %value; .reg.u32 %ar0; ld.param.u32 %ar0, [%in_ar0]; .reg.u32 %r24; .reg.u32 %r26; mov.u32 %r24, %ar0; mul.hi.u32 %r26, %r24, -858993459; shr.u32 %value, %r26, 3; st.param.u32 [%value_out], %value; ret; } The performance benefits can be seen/measured by the attached microbenchmark, bench.c, when run with nvptx-none-run-single. Before: result = 266546680000 19004366269 ticks 15.203493 seconds After: result = 266546680000 5153988012 ticks 4.123190 seconds So about a 3.7x performance improvement. This patch has been tested with make and make -k check for nvptx-none hosted on x86_64-pc-linux-gnu with no new failures. Ok for mainline? 2024-07-11 Roger Sayle gcc/ChangeLog * config/nvptx/nvptx.cc (nvptx_rtx_size_costs): New function to estimate the size of an RTX expression (in ptxas instructions). (nvptx_rtx_costs): Implementation of rtx_costs target hook. (TARGET_RTX_COSTS): Define to nvptx_rtx_costs. gcc/testsuite/ChangeLog * gcc.target/nvptx/div10.c: New test case. Thanks in advance, Roger --- #include unsigned long bench() { unsigned long total = 0; for (unsigned int i=0; i<20000; i++) for (unsigned int j=0;j