From patchwork Tue Jun  4 16:25:45 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Thomas Schwinge <tschwinge@baylibre.com>
X-Patchwork-Id: 1943471
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@legolas.ozlabs.org
Authentication-Results: legolas.ozlabs.org;
	dkim=pass (2048-bit key;
 unprotected) header.d=baylibre-com.20230601.gappssmtp.com
 header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256
 header.s=20230601 header.b=Qau8yzQI;
	dkim-atps=neutral
Authentication-Results: legolas.ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=patchwork.ozlabs.org)
Received: from server2.sourceware.org (server2.sourceware.org
 [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384)
	(No client certificate requested)
	by legolas.ozlabs.org (Postfix) with ESMTPS id 4Vtwvg0cnwz20PW
	for <incoming@patchwork.ozlabs.org>; Wed,  5 Jun 2024 02:26:22 +1000 (AEST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A70EA3842066
	for <incoming@patchwork.ozlabs.org>; Tue,  4 Jun 2024 16:26:19 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com
 [IPv6:2a00:1450:4864:20::32e])
 by sourceware.org (Postfix) with ESMTPS id 82FE83842040
 for <gcc-patches@gcc.gnu.org>; Tue,  4 Jun 2024 16:25:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 82FE83842040
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=baylibre.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 82FE83842040
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=2a00:1450:4864:20::32e
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1717518357; cv=none;
 b=MC8TuhbhjjFCA5kmqRA+GITpxnRfuemClLv5VIN2fPGNnAfqB8/qGSBpxrTpouQVUOiNV128s1c6M/1BFwnlZr5Q+5dDcbLEQBWDs74FqRWo4errxfOqreym+mUacG3RFWCP5sdUS0YL14KH0nMifWiBuGksc3HU60cXS/8wviY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1717518357; c=relaxed/simple;
 bh=YKTDC1rSt9p0itRi6IcaDDvrgC7hFGiGeR0bqdcCAzE=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=YxLOI18lg9MvOVqewDM6kopAtIeQPSrVPoMf/+n4eB6ksRvkN5/R8rTI9P2Au6aJWM1syplUgg8XswePwX9TTXPLDB/Q/J2Pzu5nPUS5llxDm3mw4fmZ+1rsJ8RtDYdA+k6xBJleLSa12NKzvOn3wPCfIISbY2uXHGgiFQ/9EaU=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: by mail-wm1-x32e.google.com with SMTP id
 5b1f17b1804b1-42155e7084cso1353425e9.2
 for <gcc-patches@gcc.gnu.org>; Tue, 04 Jun 2024 09:25:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1717518353;
 x=1718123153;
 darn=gcc.gnu.org;
 h=mime-version:message-id:date:user-agent:subject:cc:to:from:from:to
 :cc:subject:date:message-id:reply-to;
 bh=BYawczKKH/bJXzUCUbClIoqOsWZNGTJ3Xd1/b3dcnjE=;
 b=Qau8yzQIaZkGog0UTD/3jr1kRLUNOxapoU396e3RTlMVPhiLU9e4wR6+gHTaJaAMwI
 KVW2Yur24sLrEInlfCwR1Ti0ecCSiqXk+IYbC20hEpkX4voPYVarzfZlaXYc/3YBOAPS
 JcaKa1XYeY/9jUT+N5TIhlx1Kpx7uOr/qzVvtQaCpk+hDTROYNsGE4aL1R5FJUm8ilUJ
 NHdUVYfy6o/D39g23wrNtVVNJ7EQuTxK4EZIOxZ5rXsIKiuQeb27EK956cpcF2ZeDVnm
 QbJsKSVtrI5RkF4FISReswzivWcj2mnLc21ioq5c0M3QF7UVBaJa5Ydl+a4UyfoIyjWv
 uPgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1717518353; x=1718123153;
 h=mime-version:message-id:date:user-agent:subject:cc:to:from
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=BYawczKKH/bJXzUCUbClIoqOsWZNGTJ3Xd1/b3dcnjE=;
 b=mARWpZoIlU+WrFhTZC4WoK5B2Wb+5YejExoIol2Sm9zv40NVxdwawne936r1AwCDz7
 8TtkaQrIGqxqybdzzRvWrS09SWKCqC5rQt3+5DvN7Yfjn4IQjUku66fKoqr7gjvABm4n
 HdgTkCOS9omId0UBboMCbtVE6GTcokNn8s+7CVb5L34ofYhFI3FfrcR/QqrpHuheQJz+
 OPlWLksv/V5FS/cAKmPpkj5iZWpH9Mo9vBTYQCMMSKRjvKUZWhYB8rE9Q3ZU38Bvizw2
 x4hn3MEo1MiqmquIdfB14qrzWhZmgXuJG3DksoBbp+rviGSrS5HvKJQ0OhLsvIY7TTeR
 FWjg==
X-Gm-Message-State: AOJu0YwyYF9Js3suNyyGdmc6brsvnP/c8L5eBcYHvlHBjHsSvKRNna4T
 7/rF3wzyfLp+5M77AA9P1OVJ4+8jaiDEVpydv7sj1miIW5TPIbHhIDrVU6Rp+IcpTpWUayD0tKu
 d
X-Google-Smtp-Source: 
 AGHT+IG81gWY2cs//8fp3XRfnjQC8CqqG4y0G1c/AUOPnyNDUMEUiMDrhhY33w1oojZIZXANdkKaug==
X-Received: by 2002:a05:600c:46d3:b0:421:2adb:dd5d with SMTP id
 5b1f17b1804b1-421562c7fa6mr1147465e9.8.1717518352999;
 Tue, 04 Jun 2024 09:25:52 -0700 (PDT)
Received: from euler.schwinge.ddns.net
 (p200300c8b735b200abad01548d5b2541.dip0.t-ipconnect.de.
 [2003:c8:b735:b200:abad:154:8d5b:2541])
 by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-4212b84de44sm159629425e9.11.2024.06.04.09.25.52
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Tue, 04 Jun 2024 09:25:52 -0700 (PDT)
From: Thomas Schwinge <tschwinge@baylibre.com>
To: gcc-patches@gcc.gnu.org, Tobias Burnus <tburnus@baylibre.com>, Jakub
 Jelinek <jakub@redhat.com>, Tom de Vries <tdevries@suse.de>
Cc: Andrew Stubbs <ams@baylibre.com>
Subject: nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'
 environment variable [PR97384, PR105274]
User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.3
 (x86_64-pc-linux-gnu)
Date: Tue, 04 Jun 2024 18:25:45 +0200
Message-ID: <877cf4hdyu.fsf@euler.schwinge.ddns.net>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org

Hi!

Any comments before I push to trunk branch the attached
"nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable [PR97384, PR105274]"?

While this happens to implement some baseline work for the PRs indicated,
my original need for this is in upcoming libgomp Fortran test cases
(where I can't easily call 'cuCtxSetLimit(CU_LIMIT_STACK_SIZE, [bytes])'
in the test cases themselves).


Grüße
 Thomas

From d32f1a6a73b767ab5cf2da502fc88975612b80f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Fri, 31 May 2024 17:04:39 +0200
Subject: [PATCH] nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'
 environment variable [PR97384, PR105274]

... as a means to manually set the "native" GPU thread stack size.

	PR libgomp/97384
	PR libgomp/105274
	libgomp/
	* plugin/cuda-lib.def (cuCtxSetLimit): Add.
	* plugin/plugin-nvptx.c (nvptx_open_device): Handle
	'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.
---
 libgomp/plugin/cuda-lib.def   |  1 +
 libgomp/plugin/plugin-nvptx.c | 45 +++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+)

diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 007c6e0f4df..9255c1cff68 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -4,6 +4,7 @@ CUDA_ONE_CALL (cuCtxGetCurrent)
 CUDA_ONE_CALL (cuCtxGetDevice)
 CUDA_ONE_CALL (cuCtxPopCurrent)
 CUDA_ONE_CALL (cuCtxPushCurrent)
+CUDA_ONE_CALL (cuCtxSetLimit)
 CUDA_ONE_CALL (cuCtxSynchronize)
 CUDA_ONE_CALL (cuDeviceGet)
 CUDA_ONE_CALL (cuDeviceGetAttribute)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index a4a050521b4..e722ee2b400 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -150,6 +150,8 @@ init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+static void notify_var (const char *, const char *);
+
 #undef MIN
 #undef MAX
 #define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
@@ -341,6 +343,9 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+/* "Native" GPU thread stack size.  */
+static unsigned native_gpu_thread_stack_size = 0;
+
 /* OpenMP kernels reserve a small amount of ".shared" space for use by
    omp_alloc.  The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the
    default is set here.  */
@@ -550,6 +555,46 @@ nvptx_open_device (int n)
   ptx_dev->free_blocks = NULL;
   pthread_mutex_init (&ptx_dev->free_blocks_lock, NULL);
 
+  /* "Native" GPU thread stack size.  */
+  {
+    /* This is intentionally undocumented, until we work out a proper, common
+       scheme (as much as makes sense) between all offload plugins as well
+       as between nvptx offloading use of "native" stacks for OpenACC vs.
+       OpenMP "soft stacks" vs. OpenMP '-msoft-stack-reserve-local=[...]'.
+
+       GCN offloading has a 'GCN_STACK_SIZE' environment variable (without
+       'GOMP_' prefix): documented; presumably used for all things OpenACC and
+       OpenMP?  Based on GCN command-line option '-mstack-size=[...]' (marked
+       "obsolete"), that one may be set via a GCN 'mkoffload'-synthesized
+       'constructor' function.  */
+    const char *var_name = "GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE";
+    const char *env_var = secure_getenv (var_name);
+    notify_var (var_name, env_var);
+
+    if (env_var != NULL)
+      {
+	char *endptr;
+	unsigned long val = strtoul (env_var, &endptr, 10);
+	if (endptr == NULL || *endptr != '\0'
+	    || errno == ERANGE || errno == EINVAL
+	    || val > UINT_MAX)
+	  GOMP_PLUGIN_error ("Error parsing %s", var_name);
+	else
+	  native_gpu_thread_stack_size = val;
+      }
+  }
+  if (native_gpu_thread_stack_size == 0)
+    ; /* Zero means use default.  */
+  else
+    {
+      GOMP_PLUGIN_debug (0, "Setting \"native\" GPU thread stack size"
+			 " ('CU_LIMIT_STACK_SIZE') to %u bytes\n",
+			 native_gpu_thread_stack_size);
+      CUDA_CALL (cuCtxSetLimit,
+		 CU_LIMIT_STACK_SIZE, (size_t) native_gpu_thread_stack_size);
+    }
+
+  /* OpenMP "soft stacks".  */
   ptx_dev->omp_stacks.ptr = 0;
   ptx_dev->omp_stacks.size = 0;
   pthread_mutex_init (&ptx_dev->omp_stacks.lock, NULL);
-- 
2.34.1