diff mbox series

libgomp/plugin/plugin-gcn.c: Show device number in ISA error

Message ID 2bbc038d-41d2-46f8-806a-fe87baa57708@baylibre.com
State New
Headers show
Series libgomp/plugin/plugin-gcn.c: Show device number in ISA error | expand

Commit Message

Tobias Burnus Nov. 11, 2024, 9:42 a.m. UTC
Currently, for GCN, only one offload ISA is supported; this might lead
to errors when multiple different AMD GPUs are installed on the same system,
at least when using the "wrong" device/device number.

In case of the testsuite, this occurs for instance
with libgomp.c-c++-common/icv-9.c which iterates all devices.

In order to be more helpful, this patch also outputs the device
number:

libgomp: GCN fatal error: GCN code object ISA 'gfx90a' does not match 
GPU ISA 'gfx906' of device 1. Try to recompile with 
'-foffload-options=-march=gfx906'. OK for mainline?

Tobias

PS: I increased the buffer size to ensure the new ' of device '
and device numbers <= 99 and ISA names like 'gfx10-3-generic'
and GPU ISA names like gfx1103 will fit.

Comments

Andrew Stubbs Nov. 11, 2024, 10:52 a.m. UTC | #1
On 11/11/2024 09:42, Tobias Burnus wrote:
> Currently, for GCN, only one offload ISA is supported; this might lead
> to errors when multiple different AMD GPUs are installed on the same system,
> at least when using the "wrong" device/device number.
> 
> In case of the testsuite, this occurs for instance
> with libgomp.c-c++-common/icv-9.c which iterates all devices.
> 
> In order to be more helpful, this patch also outputs the device
> number:
> 
> libgomp: GCN fatal error: GCN code object ISA 'gfx90a' does not match 
> GPU ISA 'gfx906' of device 1. Try to recompile with '-foffload-options=- 
> march=gfx906'. OK for mainline?
> 
> Tobias
> 
> PS: I increased the buffer size to ensure the new ' of device '
> and device numbers <= 99 and ISA names like 'gfx10-3-generic'
> and GPU ISA names like gfx1103 will fit.
> 

I think I'd prefer

   libgomp: GCN fatal error: GCN code object ISA 'gfx90a' does not match
   GPU ISA 'gfx906' (device 1).
   Try to recompile with '-foffload-options=-march=gfx906',
   or use ROCR_VISIBLE_DEVICES to disable incompatible devices.

So, brackets instead of "of", and explain how to fix both possible 
issues. Disabling the device will also allow host-fallback to work, 
which might be the right thing for some end-users.

Andrew
Tobias Burnus Nov. 11, 2024, 11:33 a.m. UTC | #2
Hi Andrew,

Andrew Stubbs wrote:
> I think I'd prefer […]
> So, brackets instead of "of", and explain how to fix both possible 
> issues. Disabling the device will also allow host-fallback to work, 
> which might be the right thing for some end-users.

Done so in commit r15-5080-g8473010807a264.

Thanks for the suggestion / review.

Tobias

PS: See 
https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html for 
details on ROCR_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL, HIP_VISIBLE_DEVICES, …
diff mbox series

Patch

libgomp/plugin/plugin-gcn.c: Show device number in ISA error

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (isa_matches_agent): Mention the device number
	when reporting an ISA mismatch error.

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 592a7b6daba..7718b6fe3bc 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -2414,14 +2414,15 @@  isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image)
 
   if (isa_field != agent->device_isa)
     {
-      char msg[120];
+      char msg[146];
       const char *agent_isa_s = isa_name (agent->device_isa);
       assert (agent_isa_s);
 
       snprintf (msg, sizeof msg,
-		"GCN code object ISA '%s' does not match GPU ISA '%s'.\n"
+		"GCN code object ISA '%s' does not match GPU ISA '%s' of "
+		"device %d.\n"
 		"Try to recompile with '-foffload-options=-march=%s'.\n",
-		isa_s, agent_isa_s, agent_isa_s);
+		isa_s, agent_isa_s, agent->device_id, agent_isa_s);
 
       hsa_error (msg, HSA_STATUS_ERROR);
       return false;