Message ID | 87y1aue0vd.fsf@euler.schwinge.ddns.net |
---|---|
State | New |
Headers | show |
Series | GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal (was: [PATCH 7/7 libgomp,amdgcn] GCN Libgomp Plugin) | expand |
On 07/03/2024 11:29, Thomas Schwinge wrote: > Hi! > > On 2019-11-12T13:29:16+0000, Andrew Stubbs <ams@codesourcery.com> wrote: >> This patch contributes the GCN libgomp plugin, with the various >> configure and make bits to go with it. > > An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is > different from the libgomp-level host-fallback execution): > >> --- /dev/null >> +++ b/libgomp/plugin/plugin-gcn.c > >> +/* Flag to decide if the runtime should suppress a possible fallback to host >> + execution. */ >> + >> +static bool suppress_host_fallback; > >> +static void >> +init_environment_variables (void) >> +{ >> + [...] >> + if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK")) >> + suppress_host_fallback = true; >> + else >> + suppress_host_fallback = false; > >> +/* Return true if the HSA runtime can run function FN_PTR. */ >> + >> +bool >> +GOMP_OFFLOAD_can_run (void *fn_ptr) >> +{ >> + struct kernel_info *kernel = (struct kernel_info *) fn_ptr; >> + >> + init_kernel (kernel); >> + if (kernel->initialization_failed) >> + goto failure; >> + >> + return true; >> + >> +failure: >> + if (suppress_host_fallback) >> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >> + return false; >> +} > > This originates in the libgomp HSA plugin, where the idea was -- in my > understanding -- that you wouldn't have device code available for all > 'fn_ptr's, and in that case transparently (shared-memory system!) do > host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, > you'd get those diagnosed. > > This has then been copied into the libgomp GCN plugin (see above). > However, is it really still applicable there; don't we assume that we're > generating device code for all relevant functions? (I suppose everyone > really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) Should we thus > actually remove 'suppress_host_fallback' (that is, make it > always-'true'), including removal of the 'can_run' hook? (I suppose that > even in a future shared-memory "GCN" configuration, we're not expecting > to use this again; expecting always-'true' for 'can_run'?) > > > Now my actual issue: the libgomp GCN plugin then invented an additional > use of 'GCN_SUPPRESS_HOST_FALLBACK': > >> +/* Initialize hsa_context if it has not already been done. >> + Return TRUE on success. */ >> + >> +static bool >> +init_hsa_context (void) >> +{ >> + hsa_status_t status; >> + int agent_index = 0; >> + >> + if (hsa_context.initialized) >> + return true; >> + init_environment_variables (); >> + if (!init_hsa_runtime_functions ()) >> + { >> + GCN_WARNING ("Run-time could not be dynamically opened\n"); >> + if (suppress_host_fallback) >> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >> + return false; >> + } > > That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its > original purpose), and you have the libgomp GCN plugin configured, but > don't have 'libhsa-runtime64.so.1' available, you run into a fatal error. > > The libgomp nvptx plugin in such cases silently disables the > plugin/device (and thus lets libgomp proper do its thing), and I propose > we do the same here. OK to push the attached > "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal"? If you try to run the offload testsuite on a device that is not properly configured then we want FAIL, not pass-via-fallback. You're breaking that. Andrew
Hi, Thomas Schwinge wrote: > An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is > different from the libgomp-level host-fallback execution): >> +failure: >> + if (suppress_host_fallback) >> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >> + return false; >> +} > > This originates in the libgomp HSA plugin, where the idea was -- in my > understanding -- that you wouldn't have device code available for all > 'fn_ptr's, and in that case transparently (shared-memory system!) do > host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, > you'd get those diagnosed. > > This has then been copied into the libgomp GCN plugin (see above). > However, is it really still applicable there; don't we assume that we're > generating device code for all relevant functions? (I suppose everyone > really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it is also not really desirable. If I run on my Linux system the system compiler with nvptx + gcn suppost installed, I get (with a nvptx permission problem): $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out libgomp: GCN host fallback has been suppressed And exit code = 1. The same result with '-foffload=disable' or with '-foffload=nvptx-none'. > Should we thus > actually remove 'suppress_host_fallback' (that is, make it > always-'true'), If we want to remove it, we can make it always false - but I am strongly against making it always true. Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to prevent the host fallback, but don't break somewhat common systems. Tobias
Hi Andrew! On 2024-03-07T11:38:27+0000, Andrew Stubbs <ams@baylibre.com> wrote: > On 07/03/2024 11:29, Thomas Schwinge wrote: >> On 2019-11-12T13:29:16+0000, Andrew Stubbs <ams@codesourcery.com> wrote: >>> This patch contributes the GCN libgomp plugin, with the various >>> configure and make bits to go with it. >> >> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is >> different from the libgomp-level host-fallback execution): >> >>> --- /dev/null >>> +++ b/libgomp/plugin/plugin-gcn.c >> >>> +/* Flag to decide if the runtime should suppress a possible fallback to host >>> + execution. */ >>> + >>> +static bool suppress_host_fallback; >> >>> +static void >>> +init_environment_variables (void) >>> +{ >>> + [...] >>> + if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK")) >>> + suppress_host_fallback = true; >>> + else >>> + suppress_host_fallback = false; >> >>> +/* Return true if the HSA runtime can run function FN_PTR. */ >>> + >>> +bool >>> +GOMP_OFFLOAD_can_run (void *fn_ptr) >>> +{ >>> + struct kernel_info *kernel = (struct kernel_info *) fn_ptr; >>> + >>> + init_kernel (kernel); >>> + if (kernel->initialization_failed) >>> + goto failure; >>> + >>> + return true; >>> + >>> +failure: >>> + if (suppress_host_fallback) >>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >>> + return false; >>> +} >> >> This originates in the libgomp HSA plugin, where the idea was -- in my >> understanding -- that you wouldn't have device code available for all >> 'fn_ptr's, and in that case transparently (shared-memory system!) do >> host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, >> you'd get those diagnosed. >> >> This has then been copied into the libgomp GCN plugin (see above). >> However, is it really still applicable there; don't we assume that we're >> generating device code for all relevant functions? (I suppose everyone >> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) Should we thus >> actually remove 'suppress_host_fallback' (that is, make it >> always-'true'), including removal of the 'can_run' hook? (I suppose that >> even in a future shared-memory "GCN" configuration, we're not expecting >> to use this again; expecting always-'true' for 'can_run'?) >> >> >> Now my actual issue: the libgomp GCN plugin then invented an additional >> use of 'GCN_SUPPRESS_HOST_FALLBACK': >> >>> +/* Initialize hsa_context if it has not already been done. >>> + Return TRUE on success. */ >>> + >>> +static bool >>> +init_hsa_context (void) >>> +{ >>> + hsa_status_t status; >>> + int agent_index = 0; >>> + >>> + if (hsa_context.initialized) >>> + return true; >>> + init_environment_variables (); >>> + if (!init_hsa_runtime_functions ()) >>> + { >>> + GCN_WARNING ("Run-time could not be dynamically opened\n"); >>> + if (suppress_host_fallback) >>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>> + return false; >>> + } >> >> That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its >> original purpose), and you have the libgomp GCN plugin configured, but >> don't have 'libhsa-runtime64.so.1' available, you run into a fatal error. >> >> The libgomp nvptx plugin in such cases silently disables the >> plugin/device (and thus lets libgomp proper do its thing), and I propose >> we do the same here. OK to push the attached >> "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal"? > > If you try to run the offload testsuite on a device that is not properly > configured then we want FAIL Exactly, and that's what I'm working towards. (Currently we're not implementing that properly.) But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling 'init_hsa_runtime_functions' relevant for that? As you know, that function only deals with dynamically loading 'libhsa-runtime64.so.1', and Failure to load that one (because it doesn't exist) should have the agreed-upon behavior of *not* raising an error. (Any other, later errors should be fatal, I certainly agree.) > not pass-via-fallback. You're breaking that. Sorry, I don't follow, please explain? Grüße Thomas
Hi Tobias! On 2024-03-07T12:43:07+0100, Tobias Burnus <tburnus@baylibre.com> wrote: > Thomas Schwinge wrote: >> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is >> different from the libgomp-level host-fallback execution): >>> +failure: >>> + if (suppress_host_fallback) >>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >>> + return false; >>> +} >> >> This originates in the libgomp HSA plugin, where the idea was -- in my >> understanding -- that you wouldn't have device code available for all >> 'fn_ptr's, and in that case transparently (shared-memory system!) do >> host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, >> you'd get those diagnosed. >> >> This has then been copied into the libgomp GCN plugin (see above). >> However, is it really still applicable there; don't we assume that we're >> generating device code for all relevant functions? (I suppose everyone >> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) > > First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it > is also not really desirable. External users probably don't, but certainly all our internal testing is setting it, and also implicitly all nvptx offloading testing: simply by means of having such knob in the libgomp nvptx plugin. That is, the libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for (the original meaning of) that flag (and does not have the "init"-error behavior that I consider bogus, and try to remove from the libgomp GCN plugin). And, one step back: how is (the original meaning of) 'suppress_host_fallback = false' even supposed to work on non-shared memory systems as currently implemented by the libgomp GCN plugin? > If I run on my Linux system the system compiler with nvptx + gcn suppost > installed, I get (with a nvptx permission problem): > > $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out > > libgomp: GCN host fallback has been suppressed > > And exit code = 1. The same result with '-foffload=disable' or with > '-foffload=nvptx-none'. I can't tell if that's what you expect to see there, or not? (For avoidance of doubt: I'm expecting silent host-fallback execution in case that libgomp GCN and/or nvptx plugins are available, but no corresponding devices. That's what my patch achieves.) >> Should we thus >> actually remove 'suppress_host_fallback' (that is, make it >> always-'true'), > > If we want to remove it, we can make it always false - but I am strongly > against making it always true. I'm confused. So you want the GCN and nvptx plugins to behave differently in that regard? What is the rationale for that? In particular also regarding this whole concept of dynamic plugin-level host-fallback execution being in conflict with our current non-shared memory system configurations? > Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to > prevent the host fallback, but don't break somewhat common systems. That's an orthogonal concept? Grüße Thomas
On 07/03/2024 13:37, Thomas Schwinge wrote: > Hi Andrew! > > On 2024-03-07T11:38:27+0000, Andrew Stubbs <ams@baylibre.com> wrote: >> On 07/03/2024 11:29, Thomas Schwinge wrote: >>> On 2019-11-12T13:29:16+0000, Andrew Stubbs <ams@codesourcery.com> wrote: >>>> This patch contributes the GCN libgomp plugin, with the various >>>> configure and make bits to go with it. >>> >>> An issue with libgomp GCN plugin 'GCN_SUPPRESS_HOST_FALLBACK' (which is >>> different from the libgomp-level host-fallback execution): >>> >>>> --- /dev/null >>>> +++ b/libgomp/plugin/plugin-gcn.c >>> >>>> +/* Flag to decide if the runtime should suppress a possible fallback to host >>>> + execution. */ >>>> + >>>> +static bool suppress_host_fallback; >>> >>>> +static void >>>> +init_environment_variables (void) >>>> +{ >>>> + [...] >>>> + if (secure_getenv ("GCN_SUPPRESS_HOST_FALLBACK")) >>>> + suppress_host_fallback = true; >>>> + else >>>> + suppress_host_fallback = false; >>> >>>> +/* Return true if the HSA runtime can run function FN_PTR. */ >>>> + >>>> +bool >>>> +GOMP_OFFLOAD_can_run (void *fn_ptr) >>>> +{ >>>> + struct kernel_info *kernel = (struct kernel_info *) fn_ptr; >>>> + >>>> + init_kernel (kernel); >>>> + if (kernel->initialization_failed) >>>> + goto failure; >>>> + >>>> + return true; >>>> + >>>> +failure: >>>> + if (suppress_host_fallback) >>>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>>> + GCN_WARNING ("GCN target cannot be launched, doing a host fallback\n"); >>>> + return false; >>>> +} >>> >>> This originates in the libgomp HSA plugin, where the idea was -- in my >>> understanding -- that you wouldn't have device code available for all >>> 'fn_ptr's, and in that case transparently (shared-memory system!) do >>> host-fallback execution. Or, with 'GCN_SUPPRESS_HOST_FALLBACK' set, >>> you'd get those diagnosed. >>> >>> This has then been copied into the libgomp GCN plugin (see above). >>> However, is it really still applicable there; don't we assume that we're >>> generating device code for all relevant functions? (I suppose everyone >>> really is testing with 'GCN_SUPPRESS_HOST_FALLBACK' set?) Should we thus >>> actually remove 'suppress_host_fallback' (that is, make it >>> always-'true'), including removal of the 'can_run' hook? (I suppose that >>> even in a future shared-memory "GCN" configuration, we're not expecting >>> to use this again; expecting always-'true' for 'can_run'?) >>> >>> >>> Now my actual issue: the libgomp GCN plugin then invented an additional >>> use of 'GCN_SUPPRESS_HOST_FALLBACK': >>> >>>> +/* Initialize hsa_context if it has not already been done. >>>> + Return TRUE on success. */ >>>> + >>>> +static bool >>>> +init_hsa_context (void) >>>> +{ >>>> + hsa_status_t status; >>>> + int agent_index = 0; >>>> + >>>> + if (hsa_context.initialized) >>>> + return true; >>>> + init_environment_variables (); >>>> + if (!init_hsa_runtime_functions ()) >>>> + { >>>> + GCN_WARNING ("Run-time could not be dynamically opened\n"); >>>> + if (suppress_host_fallback) >>>> + GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); >>>> + return false; >>>> + } >>> >>> That is, if 'GCN_SUPPRESS_HOST_FALLBACK' is (routinely) set (for its >>> original purpose), and you have the libgomp GCN plugin configured, but >>> don't have 'libhsa-runtime64.so.1' available, you run into a fatal error. >>> >>> The libgomp nvptx plugin in such cases silently disables the >>> plugin/device (and thus lets libgomp proper do its thing), and I propose >>> we do the same here. OK to push the attached >>> "GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal"? >> >> If you try to run the offload testsuite on a device that is not properly >> configured then we want FAIL > > Exactly, and that's what I'm working towards. (Currently we're not > implementing that properly.) > > But why is 'GCN_SUPPRESS_HOST_FALLBACK' controlling > 'init_hsa_runtime_functions' relevant for that? As you know, that > function only deals with dynamically loading 'libhsa-runtime64.so.1', and > Failure to load that one (because it doesn't exist) should have the > agreed-upon behavior of *not* raising an error. (Any other, later errors > should be fatal, I certainly agree.) > >> not pass-via-fallback. You're breaking that. > > Sorry, I don't follow, please explain? If the plugin load fails then libgomp will run in host-fallback. In that case, IIRC, this is the *only* opportunity we get to enforce GCN_SUPPRESS_HOST_FALLBACK. As far as I'm aware, that variable is internal, undocumented, meant for dev testing only. It says "I'm testing GCN features and if they're not working then I want to know about it." Users should be using official OMP features. Andrew
Hi Thomas, first, I have the feeling we talk about (more or less) the same code region and use the same words – but we talk about rather different things. Thus, you confuse me (and possibly Andrew) – and my reply confuses you. Thomas Schwinge wrote: > On 2024-03-07T12:43:07+0100, Tobias Burnus<tburnus@baylibre.com> wrote: >> Thomas Schwinge wrote: >>> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it >>> is also not really desirable. > External users probably don't, but certainly all our internal testing is > setting it, First, I doubt it – secondly, if it were true, it was broken for the last 5 years or so as we definitely did not notice fails due to not working offload devices. – Neither for AMD GCN nor ... > and also implicitly all nvptx offloading testing: simply by > means of having such knob in the libgomp nvptx plugin. I did see it at some places set for AMD but I do not see any nvptx-specific environment variable which permits to do the same. However: > That is, the > libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for > (the original meaning of) that flag I think that's one of the problems here – you talk about suppress_host_fallback (implicit, original meaning), while I talk about the GCN_SUPPRESS_HOST_FALLBACK environment variable. Besides all the talk about suppress_host_fallback, 'init_hsa_runtime_functions' is not fatal' of the subject line seems to be something to be considered (beyond the patches you already suggested). >> If I run on my Linux system the system compiler with nvptx + gcn suppost >> installed, I get (with a nvptx permission problem): >> >> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out >> >> libgomp: GCN host fallback has been suppressed >> >> And exit code = 1. The same result with '-foffload=disable' or with >> '-foffload=nvptx-none'. > I can't tell if that's what you expect to see there, or not? Well, obviously not that I get this error by default – and as your wording indicated that the internal variable will be always true – and not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I worry that I would get the error any time. > (For avoidance of doubt: I'm expecting silent host-fallback execution in > case that libgomp GCN and/or nvptx plugins are available, but no > corresponding devices. That's what my patch achieves.) I concur that the silent host fallback should happen by default (unless env vars tell otherwise) - at least when either no code was generated for the device (e.g. -foffload=disable) or when the vendor runtime library is not available or no device (be it no hardware or no permission). That's the current behavior and if that remains, my main concern evaporates. * * * >> If we want to remove it, we can make it always false - but I am strongly >> against making it always true. > I'm confused. So you want the GCN and nvptx plugins to behave > differently in that regard? No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set. >> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to >> prevent the host fallback, but don't break somewhat common systems. > That's an orthogonal concept? No – It's the same concept of the main use of the GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time error instead of a silent host fallback. But I have in the whole thread the feeling that – while talking about the same code region and throwing in the same words – we actually talk about completely different things. Tobias
Hi! On 2024-03-07T15:07:32+0100, Tobias Burnus <tburnus@baylibre.com> wrote: > first, I have the feeling we talk about (more or less) the same code > region and use the same words – but we talk about rather different > things. Thus, you confuse me (and possibly Andrew) – and my reply > confuses you. That, indeed, is my impression, too. :-/ And actually the biggest confusion seems to be that both you would like 'GCN_SUPPRESS_HOST_FALLBACK' to mean something else than 'HSA_SUPPRESS_HOST_FALLBACK' originally meant. Hopefully the "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)" does clarify that. Just to close this out, let's try again for the other discussion items: > Thomas Schwinge wrote: >> On 2024-03-07T12:43:07+0100, Tobias Burnus<tburnus@baylibre.com> wrote: >>> Thomas Schwinge wrote: >>>> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it >>>> is also not really desirable. >> External users probably don't, but certainly all our internal testing is >> setting it, > > First, I doubt it 'git grep --cached GCN_SUPPRESS_HOST_FALLBACK' in our internal scripts is your friend. > secondly, if it were true, it was broken for the > last 5 years or so as we definitely did not notice fails due to not > working offload devices. – Neither for AMD GCN nor ... You're saying that 'GCN_SUPPRESS_HOST_FALLBACK=1' doesn't report as fatal certain errors during device probing? That's not what the code as well as my experience says. >> and also implicitly all nvptx offloading testing: simply by >> means of having ["no" missing here -- sorry!] such knob in the libgomp nvptx plugin. > > I did see it at some places set for AMD but I do not see any > nvptx-specific environment variable which permits to do the same. Right, that was confusing: there was a "no" missing in that sentence -- sorry! > However: >> That is, the >> libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for >> (the original meaning of) that flag > > I think that's one of the problems here – you talk about > suppress_host_fallback (implicit, original meaning), while I talk about > the GCN_SUPPRESS_HOST_FALLBACK environment variable. The 'suppress_host_fallback' internal variable directly corresponds to the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable. > Besides all the talk about suppress_host_fallback, > 'init_hsa_runtime_functions' is not fatal' of the subject line seems to > be something to be considered (beyond the patches you already suggested). I'll next submit "GCN, nvptx: Errors during device probing are fatal". >>> If I run on my Linux system the system compiler with nvptx + gcn suppost >>> installed, I get (with a nvptx permission problem): >>> >>> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out >>> >>> libgomp: GCN host fallback has been suppressed >>> >>> And exit code = 1. The same result with '-foffload=disable' or with >>> '-foffload=nvptx-none'. >> I can't tell if that's what you expect to see there, or not? > > Well, obviously In this discussion thread here, nothing was obvious to my anymore... ;-| > not that I get this error by default – and as your > wording indicated that the internal variable will be always true That always-'true' suggestion was only for the *original* meaning of the variable: the use in 'GOMP_OFFLOAD_can_run'. > – and > not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I > worry that I would get the error any time. That was exactly the point of my patch in this thread: to get rid of the *additional*/*new* behavior that the libgomp GCN plugin derives from 'GCN_SUPPRESS_HOST_FALLBACK', different from what 'HSA_SUPPRESS_HOST_FALLBACK' originally meant. However, I now understand that Andrew would like to keep that *new* behavior. >> (For avoidance of doubt: I'm expecting silent host-fallback execution in >> case that libgomp GCN and/or nvptx plugins are available, but no >> corresponding devices. That's what my patch achieves.) > > I concur that the silent host fallback should happen by default (unless > env vars tell otherwise) - at least when either no code was generated > for the device (e.g. -foffload=disable) or when the vendor runtime > library is not available or no device (be it no hardware or no permission). > > That's the current behavior and if that remains, my main concern evaporates. ACK, thanks. Grüße Thomas >>> If we want to remove it, we can make it always false - but I am strongly >>> against making it always true. >> I'm confused. So you want the GCN and nvptx plugins to behave >> differently in that regard? > No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set. >>> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to >>> prevent the host fallback, but don't break somewhat common systems. >> That's an orthogonal concept? > > No – It's the same concept of the main use of the > GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time > error instead of a silent host fallback. > > But I have in the whole thread the feeling that – while talking about > the same code region and throwing in the same words – we actually talk > about completely different things. > > Tobias
From f037d2d8274940f042633a0ecb18a53942c075f5 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tschwinge@baylibre.com> Date: Thu, 7 Mar 2024 10:43:15 +0100 Subject: [PATCH] GCN: Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal 'GCN_SUPPRESS_HOST_FALLBACK' controls the libgomp GCN plugin's capability to transparently use host-fallback execution for certain device functions; it shouldn't control failure of libgomp GCN plugin initialization (which libgomp handles fine: triggering use of a different plugin/device, or general host-fallback execution, or fatal error, as applicable). libgomp/ * plugin/plugin-gcn.c (init_hsa_context): Even with 'GCN_SUPPRESS_HOST_FALLBACK' set, failure to 'init_hsa_runtime_functions' is not fatal. --- libgomp/plugin/plugin-gcn.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 2771123252a..464164afb03 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -1524,8 +1524,6 @@ init_hsa_context (void) if (!init_hsa_runtime_functions ()) { GCN_WARNING ("Run-time could not be dynamically opened\n"); - if (suppress_host_fallback) - GOMP_PLUGIN_fatal ("GCN host fallback has been suppressed"); return false; } status = hsa_fns.hsa_init_fn (); -- 2.34.1