Message ID | 2bbc038d-41d2-46f8-806a-fe87baa57708@baylibre.com |
---|---|
State | New |
Headers | show |
Series | libgomp/plugin/plugin-gcn.c: Show device number in ISA error | expand |
On 11/11/2024 09:42, Tobias Burnus wrote: > Currently, for GCN, only one offload ISA is supported; this might lead > to errors when multiple different AMD GPUs are installed on the same system, > at least when using the "wrong" device/device number. > > In case of the testsuite, this occurs for instance > with libgomp.c-c++-common/icv-9.c which iterates all devices. > > In order to be more helpful, this patch also outputs the device > number: > > libgomp: GCN fatal error: GCN code object ISA 'gfx90a' does not match > GPU ISA 'gfx906' of device 1. Try to recompile with '-foffload-options=- > march=gfx906'. OK for mainline? > > Tobias > > PS: I increased the buffer size to ensure the new ' of device ' > and device numbers <= 99 and ISA names like 'gfx10-3-generic' > and GPU ISA names like gfx1103 will fit. > I think I'd prefer libgomp: GCN fatal error: GCN code object ISA 'gfx90a' does not match GPU ISA 'gfx906' (device 1). Try to recompile with '-foffload-options=-march=gfx906', or use ROCR_VISIBLE_DEVICES to disable incompatible devices. So, brackets instead of "of", and explain how to fix both possible issues. Disabling the device will also allow host-fallback to work, which might be the right thing for some end-users. Andrew
Hi Andrew, Andrew Stubbs wrote: > I think I'd prefer […] > So, brackets instead of "of", and explain how to fix both possible > issues. Disabling the device will also allow host-fallback to work, > which might be the right thing for some end-users. Done so in commit r15-5080-g8473010807a264. Thanks for the suggestion / review. Tobias PS: See https://rocm.docs.amd.com/en/latest/conceptual/gpu-isolation.html for details on ROCR_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL, HIP_VISIBLE_DEVICES, …
libgomp/plugin/plugin-gcn.c: Show device number in ISA error libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_matches_agent): Mention the device number when reporting an ISA mismatch error. diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 592a7b6daba..7718b6fe3bc 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -2414,14 +2414,15 @@ isa_matches_agent (struct agent_info *agent, Elf64_Ehdr *image) if (isa_field != agent->device_isa) { - char msg[120]; + char msg[146]; const char *agent_isa_s = isa_name (agent->device_isa); assert (agent_isa_s); snprintf (msg, sizeof msg, - "GCN code object ISA '%s' does not match GPU ISA '%s'.\n" + "GCN code object ISA '%s' does not match GPU ISA '%s' of " + "device %d.\n" "Try to recompile with '-foffload-options=-march=%s'.\n", - isa_s, agent_isa_s, agent_isa_s); + isa_s, agent_isa_s, agent->device_id, agent_isa_s); hsa_error (msg, HSA_STATUS_ERROR); return false;