From patchwork Tue May 23 15:31:11 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 766060 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wXKJN4lj4z9sP4 for ; Wed, 24 May 2017 01:31:47 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="NyETz9q0"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:in-reply-to:references:date:message-id:mime-version :content-type:content-transfer-encoding; q=dns; s=default; b=oCU AKaf3gxqz8LSqjMji1I4dYrTpr6xqj6nv7F31PC9Oy0v+7AWBFFp9tY858/fPZ4T 6ppb6yshJb+WMhuUa+D4hpvNJhkF/3hY+o/9GUL3BYag3ki6/l5fuY/5jbeCUiGq JHpj1e84kOCbjcqvpRqjvVO5mNNVu+/8xHd1BKn4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:in-reply-to:references:date:message-id:mime-version :content-type:content-transfer-encoding; s=default; bh=Y3FKuAC85 86nBKsoKer6vTSQExw=; b=NyETz9q0sUfHZYr+eFinkBDfnEUzyag721kQchP/h mp61Ln6UUhLSGL+A9S4rcxaJdAe3J5WsWp/H20n0rZPykFTJTmjnwLYChLcBts9x FVnXx+4Is92XwEU5h3l3XCBhKoPGAeUWVd2nRNpUKs1tuY3IN5zjNFp0RENmfwm1 PE= Received: (qmail 10512 invoked by alias); 23 May 2017 15:31:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 10394 invoked by uid 89); 23 May 2017 15:31:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.6 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS, URIBL_RED autolearn=ham version=3.3.2 spammy=formally X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 23 May 2017 15:31:25 +0000 Received: from svr-orw-fem-06.mgc.mentorg.com ([147.34.97.120]) by relay1.mentorg.com with esmtp id 1dDBmQ-0003MS-03 from Thomas_Schwinge@mentor.com ; Tue, 23 May 2017 08:31:26 -0700 Received: from tftp-cs (147.34.91.1) by SVR-ORW-FEM-06.mgc.mentorg.com (147.34.97.120) with Microsoft SMTP Server id 14.3.224.2; Tue, 23 May 2017 08:31:22 -0700 Received: by tftp-cs (Postfix, from userid 49978) id AF36BC2301; Tue, 23 May 2017 08:31:21 -0700 (PDT) From: Thomas Schwinge To: Nathan Sidwell , GCC Patches , Jakub Jelinek Subject: Make the OpenACC C++ acc_on_device wrapper "always inline" (was: [openacc] on_device fix) In-Reply-To: <5632B856.4050509@acm.org> References: <56327E51.1090507@acm.org> <568AB69F.4020502@acm.org> <568D3764.1050104@acm.org> <5632B856.4050509@acm.org> User-Agent: Notmuch/0.9-125-g4686d11 (http://notmuchmail.org) Emacs/24.5.1 (x86_64-pc-linux-gnu) Date: Tue, 23 May 2017 17:31:11 +0200 Message-ID: <87h90ba8u8.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Hi! On Thu, 29 Oct 2015 17:22:46 -0700, Nathan Sidwell wrote: > acc_on_device and it's builtin had a conflict. The function formally takes an > enum argument, but the builtin takes an int -- primarily to avoid the compiler > having to generate the enum type internally. > > This works fine for C, where the external declaration of the function (in > openacc.h) matches up with the builtin, and we optimize the builtin as expected. > > It fails for C++ where the builtin doesn't match the declaration in the header. > We end up with emitting a call to acc_on_device, which is resolved by > libgomp. Unfortunately that means we fail to optimize. [...] > [Nathan's trunk r229562] leaves things unchanged for C -- declare a function with an enum arg. > But for C++ we the extern "C" declaration takes an int -- and therefore > matches the builtin. We insert an inline wrapper that takes an enum argument. > Because of C++'s overload resolution both the wrapper and the int-taking > declaration can have the same source name. > --- libgomp/openacc.h (revision 229535) > +++ libgomp/openacc.h (working copy) > -int acc_on_device (acc_device_t) __GOACC_NOTHROW; > +#ifdef __cplusplus > +int acc_on_device (int __arg) __GOACC_NOTHROW; > +#else > +int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW; > +#endif > #ifdef __cplusplus > } > + > +/* Forwarding function with correctly typed arg. */ > + > +inline int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW > +{ > + return acc_on_device ((int) __arg); > +} > #endif > --- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c (revision 0) > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c (working copy) > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > + > +#include > + > +int Foo (acc_device_t x) > +{ > + return acc_on_device (x); > +} > + > +/* { dg-final { scan-assembler-not "acc_on_device" } } */ As a user, I'd expect that when compiling such code with "-O0" instead of "-O2", but adding "__attribute__ ((optimize ("O2")))" to "Foo", that I'd then get "acc_on_device" expanded as a builtin, and no calls to the "acc_on_device library function. In C++ that is currently not working, because the "Forwarding function with correctly typed arg" (cited above) doesn't "inherit" that "optimize" attribute. Making that one "always inline" resolves the problem. Also I cleaned up and extended testing some more. OK for trunk? commit 9cc3a384c17e9f692f7864c604d2e2f9fbf0bac9 Author: Thomas Schwinge Date: Tue May 23 13:21:14 2017 +0200 Make the OpenACC C++ acc_on_device wrapper "always inline" libgomp/ * openacc.h [__cplusplus] (acc_on_device): Mark as "always inline". * testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Remove file; test cases already present... * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: ... in this file. Update. * testsuite/libgomp.oacc-c-c++-common/acc-on-device.c: Remove file; test cases now present... * testsuite/libgomp.oacc-c-c++-common/acc_on_device-2.c: ... in this new file. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Update. --- libgomp/openacc.h | 3 +- .../libgomp.oacc-c-c++-common/acc-on-device-2.c | 22 ------------- .../libgomp.oacc-c-c++-common/acc-on-device.c | 12 ------- .../libgomp.oacc-c-c++-common/acc_on_device-1.c | 38 +++++++++++++--------- .../libgomp.oacc-c-c++-common/acc_on_device-2.c | 21 ++++++++++++ .../libgomp.oacc-c-c++-common/parallel-dims.c | 14 ++++---- 6 files changed, 52 insertions(+), 58 deletions(-) Grüße Thomas diff --git libgomp/openacc.h libgomp/openacc.h index 137e2c1..266f559 100644 --- libgomp/openacc.h +++ libgomp/openacc.h @@ -121,7 +121,8 @@ int acc_set_cuda_stream (int, void *) __GOACC_NOTHROW; /* Forwarding function with correctly typed arg. */ #pragma acc routine seq -inline int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW +inline __attribute__ ((__always_inline__)) int +acc_on_device (acc_device_t __arg) __GOACC_NOTHROW { return acc_on_device ((int) __arg); } diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c deleted file mode 100644 index bfcb67d..0000000 --- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c +++ /dev/null @@ -1,22 +0,0 @@ -/* Test the acc_on_device library function. */ -/* { dg-additional-options "-fno-builtin-acc_on_device" } */ - -#include - -int main () -{ - int dev; - -#pragma acc parallel copyout (dev) - { - dev = acc_on_device (acc_device_not_host); - } - - int expect = 1; - -#if ACC_DEVICE_TYPE_host - expect = 0; -#endif - - return dev != expect; -} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c deleted file mode 100644 index e0d8710..0000000 --- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device.c +++ /dev/null @@ -1,12 +0,0 @@ -/* { dg-do compile } */ -/* We don't expect this to work with optimizations disabled. - { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */ - -#include - -int Foo (acc_device_t x) -{ - return acc_on_device (x); -} - -/* { dg-final { scan-assembler-not "acc_on_device" } } */ diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c index 8112745..eb962e4 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c @@ -1,6 +1,9 @@ /* Disable the acc_on_device builtin; we want to test the libgomp library function. */ +/* { dg-additional-options "-DACC_ON_DEVICE=acc_on_device" } */ /* { dg-additional-options "-fno-builtin-acc_on_device" } */ +/* { dg-additional-options "-fdump-rtl-expand" } + { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 12 "expand" } } */ #include #include @@ -11,13 +14,13 @@ main (int argc, char *argv[]) /* Host. */ { - if (!acc_on_device (acc_device_none)) + if (!ACC_ON_DEVICE (acc_device_none)) abort (); - if (!acc_on_device (acc_device_host)) + if (!ACC_ON_DEVICE (acc_device_host)) abort (); - if (acc_on_device (acc_device_not_host)) + if (ACC_ON_DEVICE (acc_device_not_host)) abort (); - if (acc_on_device (acc_device_nvidia)) + if (ACC_ON_DEVICE (acc_device_nvidia)) abort (); } @@ -26,39 +29,44 @@ main (int argc, char *argv[]) #pragma acc parallel if(0) { - if (!acc_on_device (acc_device_none)) + if (!ACC_ON_DEVICE (acc_device_none)) abort (); - if (!acc_on_device (acc_device_host)) + if (!ACC_ON_DEVICE (acc_device_host)) abort (); - if (acc_on_device (acc_device_not_host)) + if (ACC_ON_DEVICE (acc_device_not_host)) abort (); - if (acc_on_device (acc_device_nvidia)) + if (ACC_ON_DEVICE (acc_device_nvidia)) abort (); } -#if !ACC_DEVICE_TYPE_host + int on_host_p; +#if ACC_DEVICE_TYPE_host + on_host_p = 1; +#else + on_host_p = 0; +#endif /* Offloaded. */ #pragma acc parallel { - if (acc_on_device (acc_device_none)) + if (on_host_p != ACC_ON_DEVICE (acc_device_none)) abort (); - if (acc_on_device (acc_device_host)) + if (on_host_p != ACC_ON_DEVICE (acc_device_host)) abort (); - if (!acc_on_device (acc_device_not_host)) + if (on_host_p == ACC_ON_DEVICE (acc_device_not_host)) abort (); + #if ACC_DEVICE_TYPE_nvidia - if (!acc_on_device (acc_device_nvidia)) + if (!ACC_ON_DEVICE (acc_device_nvidia)) abort (); #else - if (acc_on_device (acc_device_nvidia)) + if (ACC_ON_DEVICE (acc_device_nvidia)) abort (); #endif } -#endif return 0; } diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-2.c new file mode 100644 index 0000000..c3b3378 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-2.c @@ -0,0 +1,21 @@ +/* With the acc_on_device builtin enabled, we don't expect any calls to the + libgomp library function. */ +/* { dg-additional-options "-fdump-rtl-expand" } + { dg-final { scan-rtl-dump-times "\\\(call \[^\\n\]* acc_on_device" 0 "expand" } } */ + +#include + +#ifdef __OPTIMIZE__ +# define ACC_ON_DEVICE acc_on_device +#else +/* Without optimizations enabled, we're not expecting the acc_on_device builtin + to be used, so use here a "-O2" wrapper. */ +#pragma acc routine seq +static int __attribute__ ((optimize ("O2"))) +ACC_ON_DEVICE (acc_device_t arg) +{ + return acc_on_device (arg); +} +#endif + +#include "acc_on_device-1.c" diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c index 8308f7c..1c48ab3 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c @@ -4,14 +4,12 @@ #include #include -/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapper - not behaving as expected for -O0. */ #pragma acc routine seq static unsigned int __attribute__ ((optimize ("O2"))) acc_gang () { - if (acc_on_device ((int) acc_device_host)) + if (acc_on_device (acc_device_host)) return 0; - else if (acc_on_device ((int) acc_device_nvidia)) + else if (acc_on_device (acc_device_nvidia)) { unsigned int r; asm volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (r)); @@ -24,9 +22,9 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_gang () #pragma acc routine seq static unsigned int __attribute__ ((optimize ("O2"))) acc_worker () { - if (acc_on_device ((int) acc_device_host)) + if (acc_on_device (acc_device_host)) return 0; - else if (acc_on_device ((int) acc_device_nvidia)) + else if (acc_on_device (acc_device_nvidia)) { unsigned int r; asm volatile ("mov.u32 %0,%%tid.y;" : "=r" (r)); @@ -39,9 +37,9 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc_worker () #pragma acc routine seq static unsigned int __attribute__ ((optimize ("O2"))) acc_vector () { - if (acc_on_device ((int) acc_device_host)) + if (acc_on_device (acc_device_host)) return 0; - else if (acc_on_device ((int) acc_device_nvidia)) + else if (acc_on_device (acc_device_nvidia)) { unsigned int r; asm volatile ("mov.u32 %0,%%tid.x;" : "=r" (r));