From patchwork Thu Jun 6 12:02:05 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 1944601 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=lUxuwhpC; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vw2yQ3nz7z20PW for ; Thu, 6 Jun 2024 22:02:38 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 76D7A38FF6E9 for ; Thu, 6 Jun 2024 12:02:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x331.google.com (mail-wm1-x331.google.com [IPv6:2a00:1450:4864:20::331]) by sourceware.org (Postfix) with ESMTPS id A564F38FF6C0 for ; Thu, 6 Jun 2024 12:02:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A564F38FF6C0 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A564F38FF6C0 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::331 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717675336; cv=none; b=otq3AXR7XxGJJGEqs/s8drR4VeLhDUOLh1rJwtuL/BHm7iMf61HcdcnPiB8OeDL/juB1MPVCpFhJW2cELV72UvkRgznUp7fE1t8ZIBVnSuMNkrnguhc6whrGsGvfOVE7BGijrUFINyr3Iz6FjXBuM+8taKyTbQjVLJakg49Aq8c= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717675336; c=relaxed/simple; bh=5OzVDG251dzhBTKaZVexCZzOnWPR/QI7c2P/Zmd3WCY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=aA23Ry+9fCp6YkWHcgSrlnCRCh9GrLcMmfcBJ1eL/CveHEmt+uH+1IT2SjJDrdTibK1n8k1q9xsYKviZVL4oSRgzgTVaHcnlKPxRlbgBc6AaJAX3iktfv4XGTC1+R55BvThS/Qk3i4P14p6htT+ahEt1QBOg6TtXe7vXXa0qlX8= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x331.google.com with SMTP id 5b1f17b1804b1-4215f694749so3462355e9.2 for ; Thu, 06 Jun 2024 05:02:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1717675331; x=1718280131; darn=gcc.gnu.org; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=62qkNMZ+kuMv2L0kllis4tiwytGCfTNahoTNxkdZ8Ls=; b=lUxuwhpCFDSUp1Ff5IGATgL13DxOdy3a0Bx8f8fXDgskI+RaKRdO+m1/uS16pRJmxZ YBWpoWJ1+grNaxk/K8HJsCskVYiYQeRZ68rn7TTbG+yl8ykQyUsvmfnjh+7iUV93hT1P Bi8jkvaljh1bRfP5hF/XsypJO0+QWPF690Fv7aIBfvVn1N4f13RRx/xcUWVGSlycnIAJ BhLZVAyvjy0VhRFc3iwxrK3yGnZg8HyCV7Am6okabiqDWMUMVgEsh3pBN3NZFXhKc1+p p5K+I85Rz0k2JJhKJrQCWOFHwx1/Df4Z9NdCtvHvN7JLEGyCcsjZjOZz0MouYkHrttQ0 ryJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717675331; x=1718280131; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=62qkNMZ+kuMv2L0kllis4tiwytGCfTNahoTNxkdZ8Ls=; b=A992EK5k+gHZk1KhcryuJjKqz8XqYFqbt7KH9UMetSBg4mo5fFSImOXGcv1qK5bViZ weFQIDXNCkAOWylu3bJvk+2vpTES3NAxTxsrlJBWzijfx4wNBgXkV4S+kSRNdlMjiaen NIALeT+h5aIGn35RGp2RCbiZ6NPSmo8M2jiau2YBr7yDIXl/BjoZ7v1bj3ruCLtISGAC PZJUdPyUEBxru0AWe4wkd5xgzyAoT6Vq4NpK7BKVB45HxibVe6vhn3lHCQpGz44COSER mAvL5eHOwWpCqSTvSgLEKsMLU+robW5Ww1WXUSOXq9IMpTAvBdlMKu3zMgfPHPIJfG6k VO6g== X-Gm-Message-State: AOJu0Yw/rKuEmmF6QqUb3KWdl4uQPAUnXVcU62l1Ge6uJK1Qgx9YaXfU 8CZ8vRpsP48TwebXe2VCNXWz0Eqh4J3vfaMHDTmvIuJDl5ptjgBLaLoGX+xsOyuut7tJ/VfaKfX KWM8= X-Google-Smtp-Source: AGHT+IHSlzzk4K+HphHEjickdcZT4/ZgNIgSxaeSkfnIf2EkeXZS+5HDITODqV2RFqP8TafMEJUcOw== X-Received: by 2002:a05:600c:1907:b0:420:e88b:6fb3 with SMTP id 5b1f17b1804b1-4215625888dmr42922755e9.0.1717675331322; Thu, 06 Jun 2024 05:02:11 -0700 (PDT) Received: from euler.schwinge.ddns.net (p200300c8b735b200abad01548d5b2541.dip0.t-ipconnect.de. [2003:c8:b735:b200:abad:154:8d5b:2541]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-421580fe3cfsm52543925e9.9.2024.06.06.05.02.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 05:02:10 -0700 (PDT) From: Thomas Schwinge To: gcc-patches@gcc.gnu.org Cc: Tom de Vries Subject: nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading) In-Reply-To: <87r0wqp7jf.fsf@euler.schwinge.homeip.net> References: <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com> <87y1rq7wt4.fsf@dem-tschwing-1.ger.mentorg.com> <87r0wqp7jf.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.3 (x86_64-pc-linux-gnu) Date: Thu, 06 Jun 2024 14:02:05 +0200 Message-ID: <87wmn2mg8y.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! On 2022-12-23T14:35:16+0100, I wrote: > On 2022-12-02T14:35:35+0100, I wrote: >> On 2022-12-01T22:13:38+0100, I wrote: >>> I'm working on support for global constructors/destructors with >>> GCC/nvptx >> >> See "nvptx: Support global constructors/destructors via 'collect2'" >> [posted before] ..., which I then recently revised; see commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53 "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'". > Building on that, attached is now the additional "for offloading" piece: > "nvptx: Support global constructors/destructors via 'collect2' for offloading". Similarly revised, I've now pushed to trunk branch commit 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96 "nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld'", see attached. Grüße Thomas From 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 5 Jun 2024 12:40:50 +0200 Subject: [PATCH] nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53 "nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'" for offloading. libgcc/ * config/nvptx/gbl-ctors.c ["mgomp"] (__do_global_ctors__entry__mgomp) (__do_global_dtors__entry__mgomp): New. [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry): New. libgomp/ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New. (nvptx_close_device, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Call it. --- libgcc/config/nvptx/gbl-ctors.c | 55 +++++++++++++++ libgomp/plugin/plugin-nvptx.c | 117 +++++++++++++++++++++++++++++++- 2 files changed, 171 insertions(+), 1 deletion(-) diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c index a2ca053e5e3..a56d64f8ef8 100644 --- a/libgcc/config/nvptx/gbl-ctors.c +++ b/libgcc/config/nvptx/gbl-ctors.c @@ -68,6 +68,61 @@ __gbl_ctors (void) } +/* For nvptx offloading configurations, need '.entry' wrappers. */ + +# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__) + +/* OpenMP */ + +/* See 'crt0.c', 'mgomp.c'. */ +extern void *__nvptx_stacks[32] __attribute__((shared,nocommon)); +extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon)); + +__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *); + +void +__do_global_ctors__entry__mgomp (void *nvptx_stacks_0) +{ + __nvptx_stacks[0] = nvptx_stacks_0; + __nvptx_uni[0] = 0; + + __static_do_global_ctors (); +} + +__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *); + +void +__do_global_dtors__entry__mgomp (void *nvptx_stacks_0) +{ + __nvptx_stacks[0] = nvptx_stacks_0; + __nvptx_uni[0] = 0; + + __static_do_global_dtors (); +} + +# else + +/* OpenACC */ + +__attribute__((kernel)) void __do_global_ctors__entry (void); + +void +__do_global_ctors__entry (void) +{ + __static_do_global_ctors (); +} + +__attribute__((kernel)) void __do_global_dtors__entry (void); + +void +__do_global_dtors__entry (void) +{ + __static_do_global_dtors (); +} + +# endif + + /* The following symbol just provides a means for the nvptx-tools 'ld' to trigger linking in this file. */ diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 4cedc5390a3..0f3a3be1898 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -346,6 +346,11 @@ static struct ptx_device **ptx_devices; default is set here. */ static unsigned lowlat_pool_size = 8 * 1024; +static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *, + const char *); +static size_t nvptx_stacks_size (); +static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int); + static inline struct nvptx_thread * nvptx_thread (void) { @@ -565,6 +570,18 @@ nvptx_close_device (struct ptx_device *ptx_dev) if (!ptx_dev) return true; + bool ret = true; + + for (struct ptx_image_data *image = ptx_dev->images; + image != NULL; + image = image->next) + { + if (!nvptx_do_global_cdtors (image->module, ptx_dev, + "__do_global_dtors__entry" + /* or "__do_global_dtors__entry__mgomp" */)) + ret = false; + } + for (struct ptx_free_block *b = ptx_dev->free_blocks; b;) { struct ptx_free_block *b_next = b->next; @@ -585,7 +602,8 @@ nvptx_close_device (struct ptx_device *ptx_dev) CUDA_CALL (cuCtxDestroy, ptx_dev->ctx); free (ptx_dev); - return true; + + return ret; } static int @@ -1317,6 +1335,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r)); } +/* Invoke MODULE's global constructors/destructors. */ + +static bool +nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev, + const char *funcname) +{ + bool ret = true; + char *funcname_mgomp = NULL; + CUresult r; + CUfunction funcptr; + r = CUDA_CALL_NOCHECK (cuModuleGetFunction, + &funcptr, module, funcname); + GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n", + funcname, cuda_error (r)); + if (r == CUDA_ERROR_NOT_FOUND) + { + /* Try '[funcname]__mgomp'. */ + + size_t funcname_len = strlen (funcname); + const char *mgomp_suffix = "__mgomp"; + size_t mgomp_suffix_len = strlen (mgomp_suffix); + funcname_mgomp + = GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1); + memcpy (funcname_mgomp, funcname, funcname_len); + memcpy (funcname_mgomp + funcname_len, + mgomp_suffix, mgomp_suffix_len + 1); + funcname = funcname_mgomp; + + r = CUDA_CALL_NOCHECK (cuModuleGetFunction, + &funcptr, module, funcname); + GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n", + funcname, cuda_error (r)); + } + if (r == CUDA_ERROR_NOT_FOUND) + ; + else if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + else + { + /* If necessary, set up soft stack. */ + void *nvptx_stacks_0; + void *kargs[1]; + if (funcname_mgomp) + { + size_t stack_size = nvptx_stacks_size (); + pthread_mutex_lock (&ptx_dev->omp_stacks.lock); + nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1); + nvptx_stacks_0 += stack_size; + kargs[0] = &nvptx_stacks_0; + } + r = CUDA_CALL_NOCHECK (cuLaunchKernel, + funcptr, + 1, 1, 1, 1, 1, 1, + /* sharedMemBytes */ 0, + /* hStream */ NULL, + /* kernelParams */ funcname_mgomp ? kargs : NULL, + /* extra */ NULL); + if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + + r = CUDA_CALL_NOCHECK (cuStreamSynchronize, + NULL); + if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + + if (funcname_mgomp) + pthread_mutex_unlock (&ptx_dev->omp_stacks.lock); + } + + if (funcname_mgomp) + free (funcname_mgomp); + + return ret; +} + /* Load the (partial) program described by TARGET_DATA to device number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE will contain the on-device addresses of the functions for reverse offload. @@ -1546,6 +1651,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, nvptx_set_clocktick (module, dev); + if (!nvptx_do_global_cdtors (module, dev, + "__do_global_ctors__entry" + /* or "__do_global_ctors__entry__mgomp" */)) + return -1; + return fn_entries + var_entries + other_entries; } @@ -1571,6 +1681,11 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data) for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next) if (image->target_data == target_data) { + if (!nvptx_do_global_cdtors (image->module, dev, + "__do_global_dtors__entry" + /* or "__do_global_dtors__entry__mgomp" */)) + ret = false; + *prev_p = image->next; if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS) ret = false; -- 2.34.1