From patchwork Mon Jul 15 09:29:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 1960532 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=baylibre-com.20230601.gappssmtp.com header.i=@baylibre-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=KF7BF0n4; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4WMxkX3BMwz1xrK for ; Mon, 15 Jul 2024 19:30:12 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7C6093865487 for ; Mon, 15 Jul 2024 09:30:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32f.google.com (mail-wm1-x32f.google.com [IPv6:2a00:1450:4864:20::32f]) by sourceware.org (Postfix) with ESMTPS id 2D898385840D for ; Mon, 15 Jul 2024 09:29:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2D898385840D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=baylibre.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=baylibre.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 2D898385840D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::32f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721035785; cv=none; b=S1THztO2iRSHGxgjiYzJK06lM+umkCDEcHnhSf6quKelCfqnKZcHPLP7V9Z/udnB39VhJRGqc/xcbr9R4eygO866/B98HcV4dFnf6xOL+gMfT6Pr01Y3xlsDjm1AjUaLt+5EQTvieghrVX7ZJHOVXDnp8YtbA9CQde+QJ8rgSkI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1721035785; c=relaxed/simple; bh=oihsM25RhaSNiXixr8A6Lu7HWIWdSM09oh6/3CajF6A=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gj8X4xx7lDDW+UB2Thnk2mvgowZ/TIVd62WzM/kFprP1JQ0iUaJHibvoyCZpxgMFDDKRkvgWOdZOCqjoh4W0l+DrURv07Hez1ws9uTizBWRpIVNnAkM1R7/U7SAnT25XR5JzRD2M7j6eYgFod3pB4C1bnbKPY1GYv2xAc8rm2lY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-427b1d4da32so2444745e9.0 for ; Mon, 15 Jul 2024 02:29:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=baylibre-com.20230601.gappssmtp.com; s=20230601; t=1721035778; x=1721640578; darn=gcc.gnu.org; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=lV3lEyUCBWTEDpkbS7qmcONcRAA1NEqZkJzZpp/7+c4=; b=KF7BF0n4MBxu7J6fDzPLRNcrLoFqVjIJoq7wTaAK8FB2syrodCebLg+c/MiMkL0xzO Y4W8a+e/Wr8AyA39yKJIQKqYDjs8a7MvPxgrJyAtepWDVNBkZcuQBs7LZR2n/o1MuyyJ nH3BM0YjIUgGv1sPiChATkl60DaEYbTCgARZGFBORPpwfBYLQwKAZtv+aB8pn7MHynwV bMJJToXPIRt3Gazdc2NljLNsKyORT8r9aIqrM120PnfyM//MhSR9epvT4dnDkr70segO 70NdmXTTJP47RfEebo+8hJ8ZqahdOFrN+U8xBw6BxZCjFkWAYQpIhqYEqTH+Xj+BXxbA DzLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721035778; x=1721640578; h=mime-version:message-id:date:user-agent:references:in-reply-to :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=lV3lEyUCBWTEDpkbS7qmcONcRAA1NEqZkJzZpp/7+c4=; b=kOKKtx9OiSwonDbFFyHgAAeQu21jADhogcSTlX+0euQhKpogtQxZZDWhdyExD/Ge9z EitHmYco4nV6ohyfasKsQIZSz1tABEUw9smBYY2j4qW7s6h7F8oX0w4BCZd2Xx/DhyJz K3W0L0M4L860I5eRfsiP0wSZiDEOxqnBe/tTPtaRsuZQt9H2sa9eCAPoDJ66cyFXhSol 6CClTtbfSxli8GM7VbzY6xz7Q53CPu/kEf8psPmsO0BFbzpofKYIEzili85BKWcZ5XDq dU1sC18k9kOtxHeLBaSkWzFdqmUFv55eaZeI89vXNXP5/nqMh7rcWL9ZmX+JiiMPT64k R+JA== X-Gm-Message-State: AOJu0YyIoOBC1/pDYKMdRl/BGX1e3BlJA4LwlXSlkY+kVo2ZAa4Bs5+g n2+0KWPoKpJ6z1hlQXq2LOFImq/dHrGHueoFU3YboQ+2+73H/SrCrd4R+Kgf8oE7vwqr+YDRSxo q X-Google-Smtp-Source: AGHT+IEwcSFIvdD3sS4Z+CemXlRmPk0L7ueSehdbL02XZYZc8HKKu0uHQxV0o4Iwr8/WPtO8Z3pGEQ== X-Received: by 2002:a05:600c:214d:b0:426:606e:f526 with SMTP id 5b1f17b1804b1-4279dafc510mr59383435e9.20.1721035777740; Mon, 15 Jul 2024 02:29:37 -0700 (PDT) Received: from euler.schwinge.ddns.net (p200300c8b733b9005e8fc6f38b6af531.dip0.t-ipconnect.de. [2003:c8:b733:b900:5e8f:c6f3:8b6a:f531]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-427a5edb540sm79430285e9.30.2024.07.15.02.29.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jul 2024 02:29:37 -0700 (PDT) From: Thomas Schwinge To: gcc-patches@gcc.gnu.org, Andrew Stubbs Cc: Tobias Burnus , Jakub Jelinek Subject: GCN: Honor OpenMP 5.1 'num_teams' lower bound (was: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound) In-Reply-To: <20211112175804.GJ2710@tucnak> References: <20211112132023.GC2710@tucnak> <20211112132716.GD2710@tucnak> <20211112175804.GJ2710@tucnak> User-Agent: Notmuch/0.30+8~g47a4bad (https://notmuchmail.org) Emacs/29.4 (x86_64-pc-linux-gnu) Date: Mon, 15 Jul 2024 11:29:35 +0200 Message-ID: <87y163t340.fsf@euler.schwinge.ddns.net> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces~incoming=patchwork.ozlabs.org@gcc.gnu.org Hi! On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches wrote: > And finally here is a third version, [...] ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788 "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound". Attached here is "GCN: Honor OpenMP 5.1 'num_teams' lower bound", which are exactly the corresponding changes for GCN (see below Jakub's nvptx changes for reference); OK to push? Grüße Thomas > 2021-11-12 Jakub Jelinek > > * config/nvptx/team.c (__gomp_team_num): Define as > __attribute__((shared)) var. > (gomp_nvptx_main): Initialize __gomp_team_num to 0. > * config/nvptx/target.c (__gomp_team_num): Declare as > extern __attribute__((shared)) var. > (GOMP_teams4): Use __gomp_team_num as the team number instead of > %ctaid.x. If first, initialize it to %ctaid.x. If num_teams_lower > is bigger than num_blocks, use num_teams_lower teams and arrange for > bumping of __gomp_team_num if !first and returning false once we run > out of teams. > * config/nvptx/teams.c (__gomp_team_num): Declare as > extern __attribute__((shared)) var. > (omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x. > > --- libgomp/config/nvptx/team.c.jj 2021-05-25 13:43:02.793121350 +0200 > +++ libgomp/config/nvptx/team.c 2021-11-12 17:49:02.847341650 +0100 > @@ -32,6 +32,7 @@ > #include > > struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon)); > +int __gomp_team_num __attribute__((shared)); > > static void gomp_thread_start (struct gomp_thread_pool *); > > @@ -57,6 +58,7 @@ gomp_nvptx_main (void (*fn) (void *), vo > /* Starting additional threads is not supported. */ > gomp_global_icv.dyn_var = true; > > + __gomp_team_num = 0; > nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs)); > memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs)); > > --- libgomp/config/nvptx/target.c.jj 2021-11-12 15:57:29.400632875 +0100 > +++ libgomp/config/nvptx/target.c 2021-11-12 17:47:39.499533296 +0100 > @@ -26,28 +26,41 @@ > #include "libgomp.h" > #include > > +extern int __gomp_team_num __attribute__((shared)); > + > bool > GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, > unsigned int thread_limit, bool first) > { > + unsigned int num_blocks, block_id; > + asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); > if (!first) > - return false; > + { > + unsigned int team_num; > + if (num_blocks > gomp_num_teams_var) > + return false; > + team_num = __gomp_team_num; > + if (team_num > gomp_num_teams_var - num_blocks) > + return false; > + __gomp_team_num = team_num + num_blocks; > + return true; > + } > if (thread_limit) > { > struct gomp_task_icv *icv = gomp_icv (true); > icv->thread_limit_var > = thread_limit > INT_MAX ? UINT_MAX : thread_limit; > } > - unsigned int num_blocks, block_id; > - asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks)); > - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); > - /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple > - times for some CTAs. */ > - (void) num_teams_lower; > - if (!num_teams_upper || num_teams_upper >= num_blocks) > + if (!num_teams_upper) > num_teams_upper = num_blocks; > - else if (block_id >= num_teams_upper) > + else if (num_blocks < num_teams_lower) > + num_teams_upper = num_teams_lower; > + else if (num_blocks < num_teams_upper) > + num_teams_upper = num_blocks; > + asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id)); > + if (block_id >= num_teams_upper) > return false; > + __gomp_team_num = block_id; > gomp_num_teams_var = num_teams_upper - 1; > return true; > } > --- libgomp/config/nvptx/teams.c.jj 2021-05-25 13:43:02.793121350 +0200 > +++ libgomp/config/nvptx/teams.c 2021-11-12 17:37:18.933361024 +0100 > @@ -28,6 +28,8 @@ > > #include "libgomp.h" > > +extern int __gomp_team_num __attribute__((shared)); > + > void > GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams, > unsigned int thread_limit, unsigned int flags) > @@ -48,9 +50,7 @@ omp_get_num_teams (void) > int > omp_get_team_num (void) > { > - int ctaid; > - asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid)); > - return ctaid; > + return __gomp_team_num; > } > > ialias (omp_get_num_teams) > > > Jakub From f078b635f033dcb80ce8cd48de3bf62ad5e285bf Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Mon, 15 Jul 2024 11:19:28 +0200 Subject: [PATCH] GCN: Honor OpenMP 5.1 'num_teams' lower bound Corresponding to commit 9fa72756d90e0d9edadf6e6f5f56476029925788 "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound", these are the GCN offloading changes to fix: PASS: libgomp.c/../libgomp.c-c++-common/teams-2.c (test for excess errors) [-FAIL:-]{+PASS:+} libgomp.c/../libgomp.c-c++-common/teams-2.c execution test PASS: libgomp.c++/../libgomp.c-c++-common/teams-2.c (test for excess errors) [-FAIL:-]{+PASS:+} libgomp.c++/../libgomp.c-c++-common/teams-2.c execution test ..., and omptests' 't-critical' test case. I've cross checked that those test cases are the ones that regress for nvptx offloading, if I locally revert the "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound" changes. libgomp/ * config/gcn/libgomp-gcn.h (GOMP_TEAM_NUM): Inject. * config/gcn/target.c (GOMP_teams4): Handle. * config/gcn/team.c (gomp_gcn_enter_kernel): Initialize. * config/gcn/teams.c (omp_get_team_num): Adjust. --- libgomp/config/gcn/libgomp-gcn.h | 9 +++++---- libgomp/config/gcn/target.c | 29 ++++++++++++++++++++--------- libgomp/config/gcn/team.c | 3 +++ libgomp/config/gcn/teams.c | 5 +++-- 4 files changed, 31 insertions(+), 15 deletions(-) diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h index e94f0c7ae68..48a3741b04d 100644 --- a/libgomp/config/gcn/libgomp-gcn.h +++ b/libgomp/config/gcn/libgomp-gcn.h @@ -34,10 +34,11 @@ #define DEFAULT_TEAM_ARENA_SIZE (64*1024) /* These define the LDS location of data needed by OpenMP. */ -#define TEAM_ARENA_START 16 /* LDS offset of free pointer. */ -#define TEAM_ARENA_FREE 24 /* LDS offset of free pointer. */ -#define TEAM_ARENA_END 32 /* LDS offset of end pointer. */ -#define GCN_LOWLAT_HEAP 40 /* LDS offset of the OpenMP low-latency heap. */ +#define GOMP_TEAM_NUM 16 +#define TEAM_ARENA_START 24 /* LDS offset of free pointer. */ +#define TEAM_ARENA_FREE 32 /* LDS offset of free pointer. */ +#define TEAM_ARENA_END 40 /* LDS offset of end pointer. */ +#define GCN_LOWLAT_HEAP 48 /* LDS offset of the OpenMP low-latency heap. */ struct heap { diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c index 1d4a23cb8d2..e57d2e5f93f 100644 --- a/libgomp/config/gcn/target.c +++ b/libgomp/config/gcn/target.c @@ -33,26 +33,37 @@ bool GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper, unsigned int thread_limit, bool first) { + int __lds *gomp_team_num = (int __lds *) GOMP_TEAM_NUM; + unsigned int num_workgroups = __builtin_gcn_dim_size (0); if (!first) - return false; + { + unsigned int team_num; + if (num_workgroups > gomp_num_teams_var) + return false; + team_num = *gomp_team_num; + if (team_num > gomp_num_teams_var - num_workgroups) + return false; + *gomp_team_num = team_num + num_workgroups; + return true; + } if (thread_limit) { struct gomp_task_icv *icv = gomp_icv (true); icv->thread_limit_var = thread_limit > INT_MAX ? UINT_MAX : thread_limit; } - unsigned int num_workgroups, workgroup_id; - num_workgroups = __builtin_gcn_dim_size (0); - workgroup_id = __builtin_gcn_dim_pos (0); - /* FIXME: If num_teams_lower > num_workgroups, we want to loop - multiple times at least for some workgroups. */ - (void) num_teams_lower; - if (!num_teams_upper || num_teams_upper >= num_workgroups) + if (!num_teams_upper) num_teams_upper = ((GOMP_ADDITIONAL_ICVS.nteams > 0 && num_workgroups > GOMP_ADDITIONAL_ICVS.nteams) ? GOMP_ADDITIONAL_ICVS.nteams : num_workgroups); - else if (workgroup_id >= num_teams_upper) + else if (num_workgroups < num_teams_lower) + num_teams_upper = num_teams_lower; + else if (num_workgroups < num_teams_upper) + num_teams_upper = num_workgroups; + unsigned int workgroup_id = __builtin_gcn_dim_pos (0); + if (workgroup_id >= num_teams_upper) return false; + *gomp_team_num = workgroup_id; gomp_num_teams_var = num_teams_upper - 1; return true; } diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c index bd3df448b52..aa68b3abe0b 100644 --- a/libgomp/config/gcn/team.c +++ b/libgomp/config/gcn/team.c @@ -68,6 +68,9 @@ gomp_gcn_enter_kernel (void) /* Starting additional threads is not supported. */ gomp_global_icv.dyn_var = true; + int __lds *gomp_team_num = (int __lds *) GOMP_TEAM_NUM; + *gomp_team_num = 0; + /* Initialize the team arena for optimized memory allocation. The arena has been allocated on the host side, and the address passed in via the kernargs. Each team takes a small slice of it. */ diff --git a/libgomp/config/gcn/teams.c b/libgomp/config/gcn/teams.c index 8a91ba8f5c1..57404184c89 100644 --- a/libgomp/config/gcn/teams.c +++ b/libgomp/config/gcn/teams.c @@ -44,10 +44,11 @@ omp_get_num_teams (void) return gomp_num_teams_var + 1; } -int __attribute__ ((__optimize__ ("O2"))) +int omp_get_team_num (void) { - return __builtin_gcn_dim_pos (0); + int __lds *gomp_team_num = (int __lds *) GOMP_TEAM_NUM; + return *gomp_team_num; } ialias (omp_get_num_teams) -- 2.34.1