diff mbox series

libgomp: Enable USM for AMD APUs and MI200 devices

Message ID 1cff6e57-a218-475d-8d34-22cb071cdc1f@baylibre.com
State New
Headers show
Series libgomp: Enable USM for AMD APUs and MI200 devices | expand

Commit Message

Tobias Burnus May 29, 2024, 12:15 p.m. UTC
This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html

AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK=1 set) can access host memory; the run-time library
returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.

Thus, it makes sense to enable USM support for those devices, which
this patch does. — A simple test with all unified_shared_memory tests
shipping with sollve_vv now works:*

   Test passed on the device.

as tested on an MI200 series device. In line with (some) other compilers,
it requires that HSA_XNACK=1 is set, otherwise the code will be executed
on the host.

(* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
IPA pass: cpin gcn gcc) 1 runtime fail for 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
all 15 Fortran and 16 C tests PASS.)

Comments, remarks, suggestions?
Any reason not to commit it to mainline?

Tobias

PS: Richard confirmed that his gfx1036 APU also has
HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT == true; at least when
he disables the discrete gfx1030, which neither supports xnack not
is an APU.

** rocgdb shows:

Thread 4 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to thread 4, lane 0 (AMDGPU Lane 1:1:1:1/0 (0,0,0)[0,0,0])]
0x00007ffff7309c30 in main._omp_fn () at tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp:88
88            if (Emp.name[i] != RefStr[i]) {

but I have not tried to debug this.

Comments

Jakub Jelinek May 29, 2024, 12:24 p.m. UTC | #1
On Wed, May 29, 2024 at 02:15:07PM +0200, Tobias Burnus wrote:
> +      bool b;
> +      hsa_status_t status;
> +      status = hsa_fns.hsa_system_get_info_fn (
> +		 HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, &b);
> +      if (status != HSA_STATUS_SUCCESS)
> +	GOMP_PLUGIN_error (
> +	  "HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT failed");

Formatting, the (s at the end of lines look terrible.
In the first case, perhaps using a temporary would help,
      hsa_system_info_t arg = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;
      status = hsa_fns.hsa_system_get_info_fn (arg, &b);
(or use something else instead of arg, as long as its short), while in the
second
	GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT "
			   "failed");
will do.

Other than that LGTM.

	Jakub
Andrew Stubbs May 31, 2024, 1:03 p.m. UTC | #2
On 29/05/2024 13:15, Tobias Burnus wrote:
> This patch depends (on the libgomp/target.c parts) of the patch
> "[patch] libgomp: Enable USM for some nvptx devices",
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html
> 
> AMD GPUs that are either APU devices or MI200 [or MI300X]
> (with HSA_XNACK=1 set) can access host memory; the run-time library
> returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.
> 
> Thus, it makes sense to enable USM support for those devices, which
> this patch does. — A simple test with all unified_shared_memory tests
> shipping with sollve_vv now works:*
> 
>    Test passed on the device.
> 
> as tested on an MI200 series device. In line with (some) other compilers,
> it requires that HSA_XNACK=1 is set, otherwise the code will be executed
> on the host.
> 
> (* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
> IPA pass: cpin gcn gcc) 1 runtime fail for 
> tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
> all 15 Fortran and 16 C tests PASS.)
> 
> Comments, remarks, suggestions?
> Any reason not to commit it to mainline?


> index f9b5d9daf85..3c7be95d7fd 100644
> --- a/include/hsa.h
> +++ b/include/hsa.h
> @@ -466,7 +466,9 @@ typedef enum {
>    /**
>    * String containing the ROCr build identifier.
>    */
> -  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
> +  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
> +
> +  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
>  } hsa_system_info_t;
>  
>  /**

Please don't edit imported files.

Andrew
diff mbox series

Patch

libgomp: Enable USM for AMD APUs and MI200 devices

If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.

XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.

include/ChangeLog:

	* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
	enum value.

libgomp/ChangeLog:

	* libgomp.texi (gcn): Update USM handling
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
	USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.

 gcc/config/gcn/gcn-hsa.h    |  2 +-
 include/hsa.h               |  4 +++-
 libgomp/libgomp.texi        |  9 +++++++--
 libgomp/plugin/plugin-gcn.c | 18 ++++++++++++++++++
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4611bc55392..03220555075 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -80,7 +80,7 @@  extern unsigned int gcn_local_sym_hash (const char *name);
    writes a new AMD GPU object file and the ABI version needs to be the
    same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
    GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
+   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
    supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 			 "!march=*|march=*:--amdhsa-code-object-version=4"
diff --git a/include/hsa.h b/include/hsa.h
index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@  typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22868635230..e79bd7a3392 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6360,8 +6360,13 @@  The implementation remark:
       such that the next reverse offload region is only executed after the previous
       one returned.
 @item OpenMP code that has a @code{requires} directive with
-      @code{unified_shared_memory} will remove any GCN device from the list of
-      available devices (``host fallback'').
+      @code{unified_shared_memory} is only supported if all AMD GPUs have the
+      @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
+      discrete GPUs, this may require setting the @code{HSA_XNACK} environment
+      variable to @samp{1}; for systems with both an APU and a discrete GPU that
+      does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
+      enable only the APU.  If not supported, all AMD GPU devices are removed
+      from the list of available devices (``host fallback'').
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
       environment variable; the default is 32 kiB per thread.
 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 3cdc7ba929f..ba1f51b3d8a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3355,8 +3355,26 @@  GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (hsa_context.agent_count > 0
       && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	       | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	       | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
     return -1;
+  /* Check whether host page access is supported; this is per system level
+     (all GPUs supported by HSA).  While intrinsically true for APUs, it
+     requires XNACK support for discrete GPUs.  */
+  if (hsa_context.agent_count > 0
+      && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+    {
+      bool b;
+      hsa_status_t status;
+      status = hsa_fns.hsa_system_get_info_fn (
+		 HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, &b);
+      if (status != HSA_STATUS_SUCCESS)
+	GOMP_PLUGIN_error (
+	  "HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT failed");
+      if (!b)
+	return -1;
+    }
+
   return hsa_context.agent_count;
 }