diff mbox series

AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

Message ID aa154203-0ed5-4c1d-be40-1df99de935dd@baylibre.com
State New
Headers show
Series AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-' | expand

Commit Message

Tobias Burnus Oct. 29, 2024, 11:44 a.m. UTC
While users can set HSA_XNACK themselves, it is much more convenient if
the compiler sets it for them (at least if it is overriddable).

Some systems don't have XNACK, but for those that have it, the somewhat
newisher object code versions support three modes: unset (GCC: '-mxnack=any';
supporting both XNACK and not), set and unset; the last two only work if the
compiled-for mode matches the actual mode in which the GPU is running.
Therefore, setting HSA_XNACK in this case makes sense.

XNACK (when available) also needs to be enabled in order to have a working
unified-shared memory access, hence, setting it in that case also makes sense.
Therefore, this patch sets HSA_XNACK to 0 or 1.

This somewhat matches what is done in OG13 and in Andrew's patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html
albeit the code is somewhat different.
[For some reasons, this code is not in OG14 ?!?]

While doing so, I also updated the documentation and moved the code from
the existing stack-size constructor in the existing 'init' constructor to
reduce the number of used constructors.

OK for mainline?

Tobias

Comments

Andrew Stubbs Oct. 29, 2024, 12:07 p.m. UTC | #1
On 29/10/2024 11:44, Tobias Burnus wrote:
> While users can set HSA_XNACK themselves, it is much more convenient if
> the compiler sets it for them (at least if it is overriddable).
> 
> Some systems don't have XNACK, but for those that have it, the somewhat
> newisher object code versions support three modes: unset (GCC: '- 
> mxnack=any';
> supporting both XNACK and not), set and unset; the last two only work if 
> the
> compiled-for mode matches the actual mode in which the GPU is running.
> Therefore, setting HSA_XNACK in this case makes sense.
> 
> XNACK (when available) also needs to be enabled in order to have a working
> unified-shared memory access, hence, setting it in that case also makes 
> sense.
> Therefore, this patch sets HSA_XNACK to 0 or 1.
> 
> This somewhat matches what is done in OG13 and in Andrew's patch at
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html
> albeit the code is somewhat different.
> [For some reasons, this code is not in OG14 ?!?]
> 
> While doing so, I also updated the documentation and moved the code from
> the existing stack-size constructor in the existing 'init' constructor to
> reduce the number of used constructors.
> 
> OK for mainline?

This conflicts with my patch that already does (some of) this that is 
submitted as part of the USM series that is still awaiting review.

https://patchwork.sourceware.org/project/gcc/patch/20240628102449.562467-6-ams@baylibre.com/

Andrew
Tobias Burnus Oct. 29, 2024, 12:10 p.m. UTC | #2
Hi Andrew,

Am 29.10.24 um 13:07 schrieb Andrew Stubbs:
> On 29/10/2024 11:44, Tobias Burnus wrote:
>> This somewhat matches what is done in OG13 and in Andrew's patch at
>> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html
>> albeit the code is somewhat different.
>> [For some reasons, this code is not in OG14 ?!?]
...
> This conflicts with my patch that already does (some of) this that is 
> submitted as part of the USM series that is still awaiting review.
>
> https://patchwork.sourceware.org/project/gcc/patch/20240628102449.562467-6-ams@baylibre.com/ 
>

Well, you you go to the link above, it shows the same patch …

Tobias
Andrew Stubbs Oct. 29, 2024, 12:15 p.m. UTC | #3
On 29/10/2024 12:10, Tobias Burnus wrote:
> Hi Andrew,
> 
> Am 29.10.24 um 13:07 schrieb Andrew Stubbs:
>> On 29/10/2024 11:44, Tobias Burnus wrote:
>>> This somewhat matches what is done in OG13 and in Andrew's patch at
>>> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655951.html
>>> albeit the code is somewhat different.
>>> [For some reasons, this code is not in OG14 ?!?]
> ...
>> This conflicts with my patch that already does (some of) this that is 
>> submitted as part of the USM series that is still awaiting review.
>>
>> https://patchwork.sourceware.org/project/gcc/ 
>> patch/20240628102449.562467-6-ams@baylibre.com/
> 
> Well, you you go to the link above, it shows the same patch …

So, can we get the USM stuff reviewed and cleared before we deliberately 
introduce conflicts?

BTW, it's not on OG14 because it was delayed while I was reworking it to 
repost. I should have pushed the series there at that time, but I 
forgot. :-(  I should fix that.

Andrew
diff mbox series

Patch

AMD GCN: Set HSA_XNACK for USM and 'xnack+' / 'xnack-'

Code compiled with explicit 'xnack-' (-mxnack=off) or 'xnack+' (-mnack=on)
only runs when the hardware mode matches.  Additionally, on GPUs that support
XNACK, the GPU has to use this mode in order that unified-shared memory access
is possible.

This commit adds code to the constructor, created by mkoffload.cc, to set the
HSA_XNACK environment variable (if not set by the user) - either to 0 for
'xnack-' or to 1 for 'xnack+' and if the code contains
'omp requires self_maps' or '... unified_shared_memory'.

There is a compile-time warning when combining 'xnack-' with USM. At runtime,
when XNACK is supported but not enabled, the GPU is excluded (host fallback).

The new setenv has been added to the 'init' constructor - and the existing
setenv code for the stack size has been moved there to reduce the pointless
larger overhead due having multiple constructors.

Note: In GCC, gfx90{0,6,8} default to 'xnack-'; hence, the constructor will
set HSA_XNACK=0 by default. 

Note 2: There is probably a preexisting endian issue in handling omp_requires
if the host compiler and the target compiler (in the GCC sense) have a different
endianess, which would be fixable. [If target and offload target would have
different endianess, it would be mostly unfixable, esp. with USM.]

Co-Authored-By: Andrew Stubbs <ams@baylibre.com>

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (process_asm): Take omp_requires argument;
	extend conditions for requiring an '#include' in the generated C file.
	Remove code generation for the GCN_STACK_SIZE constructor.
	(process_obj): Inside the generated 'init' constructor, set
	GCN_STACK_SIZE and HSA_XNACK, if applicable.
	(main): Update process_asm call.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Mention 'self_maps' besides
	'unified_shared_memory' clause.
	(AMD GCN): Likewise; state that GCC might automatically
	set HSA_XNACK.
	* testsuite/libgomp.c-c++-common/requires-4.c: Add dg-prune-output
	for the -mxnack=off with USM warning.
	* testsuite/libgomp.c-c++-common/requires-4a.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-implicit-map-4.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-link-3.c: Likewise.
	* testsuite/libgomp.c-c++-common/target-link-4.c: Likewise.
	* testsuite/libgomp.fortran/self_maps.f90: Likewise.

 gcc/config/gcn/mkoffload.cc                        | 67 ++++++++++++++++------
 libgomp/libgomp.texi                               | 24 +++++---
 .../testsuite/libgomp.c-c++-common/requires-4.c    |  5 ++
 .../testsuite/libgomp.c-c++-common/requires-4a.c   |  5 ++
 .../testsuite/libgomp.c-c++-common/requires-5.c    |  5 ++
 .../testsuite/libgomp.c-c++-common/requires-7.c    |  5 ++
 .../libgomp.c-c++-common/target-implicit-map-4.c   |  5 ++
 .../testsuite/libgomp.c-c++-common/target-link-3.c |  5 ++
 .../testsuite/libgomp.c-c++-common/target-link-4.c |  5 ++
 libgomp/testsuite/libgomp.fortran/self_maps.f90    |  5 ++
 10 files changed, 105 insertions(+), 26 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 17a33421134..27066ba1be3 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -140,6 +140,7 @@  uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU architecture.
 uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
 
 static int gcn_stack_size = 0;  /* Zero means use default.  */
+bool xnack_supported = false;
 
 /* Delete tempfiles.  */
 
@@ -442,7 +443,7 @@  copy_early_debug_info (const char *infile, const char *outfile)
    encoded as structured data.  */
 
 static void
-process_asm (FILE *in, FILE *out, FILE *cfile)
+process_asm (FILE *in, FILE *out, FILE *cfile, uint32_t omp_requires)
 {
   int fn_count = 0, var_count = 0, ind_fn_count = 0;
   int dims_count = 0, regcount_count = 0;
@@ -616,7 +617,14 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (&dims_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (&regcounts_os, struct regcount *);
 
-  if (gcn_stack_size)
+  /* Used in the constructor, generated in process_obj.  */
+  if (gcn_stack_size
+      || TEST_XNACK_OFF (elf_flags)
+      || TEST_XNACK_ON (elf_flags)
+      || (SET_XNACK_ANY (elf_flags)
+	  && (omp_requires
+	      & (GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
+		 | GOMP_REQUIRES_SELF_MAPS))))
     {
       fprintf (cfile, "#include <stdlib.h>\n");
       fprintf (cfile, "#include <stdbool.h>\n\n");
@@ -662,18 +670,6 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
     }
   fprintf (cfile, "\n};\n\n");
 
-  /* Set the stack size if the user configured a value.  */
-  if (gcn_stack_size)
-    fprintf (cfile,
-	     "static __attribute__((constructor))\n"
-	     "void configure_stack_size (void)\n"
-	     "{\n"
-	     "  const char *val = getenv (\"GCN_STACK_SIZE\");\n"
-	     "  if (!val || val[0] == '\\0')\n"
-	     "    setenv (\"GCN_STACK_SIZE\", \"%d\", true);\n"
-	     "}\n\n",
-	     gcn_stack_size);
-
   obstack_free (&fns_os, NULL);
   for (i = 0; i < dims_count; i++)
     free (dims[i].name);
@@ -736,8 +732,42 @@  process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 
   fprintf (cfile, "extern const void *const __OFFLOAD_TABLE__[];\n\n");
 
+  /* Create the constructor to handle stack size and xnack and to call
+     GOMP_offload_register_ver. */
+
   fprintf (cfile, "static __attribute__((constructor)) void init (void)\n"
-	   "{\n"
+	   "{\n");
+
+  /* Set the stack size if the user configured a value.  */
+  if (gcn_stack_size)
+    fprintf (cfile,
+	     "  const char *val2 = getenv (\"GCN_STACK_SIZE\");\n"
+	     "  if (!val2 || val2[0] == '\\0')\n"
+	     "    setenv (\"GCN_STACK_SIZE\", \"%d\", true);\n",
+	     gcn_stack_size);
+
+  /* If the user explicitly disabled XNACK ('xnack-', -mxnack=off), set at
+     runtime HSA_XNACK=0.  Otherwise, if either USM was requested or explicitly
+     compiled for XNACK ('xnack-', -mxnack=off), set HSA_XNACK=1.
+     In either case, a user-set HSA_XNACK env var is not overridden.  */
+  if (TEST_XNACK_OFF (elf_flags)
+      && (omp_requires & (GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
+			  | GOMP_REQUIRES_SELF_MAPS)))
+    warning (0, "%<-mxnack=off%> is incompatible with OpenMP's "
+		"%<self_maps%> and %<unified_shared_memory%> clauses");
+  if (TEST_XNACK_OFF (elf_flags)
+      || TEST_XNACK_ON (elf_flags)
+      || (SET_XNACK_ANY (elf_flags)
+	  && (omp_requires
+	      & (GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
+		 | GOMP_REQUIRES_SELF_MAPS))))
+    fprintf (cfile,
+	     "  const char *val = getenv (\"HSA_XNACK\");\n"
+	     "  if (!val || val[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n",
+	     TEST_XNACK_OFF (elf_flags) ? 0 : 1);
+
+  fprintf (cfile,
 	   "  GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__,"
 	   " %d/*GCN*/, &gcn_data);\n"
 	   "};\n",
@@ -1016,7 +1046,9 @@  main (int argc, char **argv)
       gcc_unreachable ();
     }
 
-  /* Set the default ELF flags for XNACK.  */
+  /* Set the default ELF flags for XNACK; additionally set
+     xnack_supported for ANY but not for those defaulting
+     to NO.  */
   switch (elf_arch)
     {
 #define GCN_DEVICE(name, NAME, ELF, ISA, XNACK, SRAM, ...) \
@@ -1024,6 +1056,7 @@  main (int argc, char **argv)
 #define HSACO_ATTR_UNSUPPORTED SET_XNACK_UNSET (elf_flags)
 #define HSACO_ATTR_OFF SET_XNACK_OFF (elf_flags)
 #define HSACO_ATTR_ANY \
+      xnack_supported = true; \
       if (TEST_XNACK_UNSET (elf_flags)) SET_XNACK_ANY (elf_flags)
 #include "gcn-devices.def"
 #undef HSACO_ATTR_UNSUPPORTED
@@ -1250,7 +1283,7 @@  main (int argc, char **argv)
       if (!out)
 	fatal_error (input_location, "cannot open %qs", gcn_s2_name);
 
-      process_asm (in, out, cfile);
+      process_asm (in, out, cfile, omp_requires);
 
       fclose (in);
       fclose (out);
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 6860963f368..667edd9ecc4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6576,14 +6576,20 @@  The implementation remark:
       @code{device(ancestor:1)}) are processed serially per @code{target} region
       such that the next reverse offload region is only executed after the previous
       one returned.
-@item OpenMP code that has a @code{requires} directive with
-      @code{unified_shared_memory} is only supported if all AMD GPUs have the
-      @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
-      discrete GPUs, this may require setting the @code{HSA_XNACK} environment
-      variable to @samp{1}; for systems with both an APU and a discrete GPU that
-      does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
-      enable only the APU.  If not supported, all AMD GPU devices are removed
+@item OpenMP code that has a @code{requires} directive with a @code{self_maps} or
+      @code{unified_shared_memory} clause is only supported if all AMD GPUs have
+      the @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property.  With
+      some discrete GPUs, this requires that the @code{HSA_XNACK} environment
+      variable is set to @samp{1}. For systems with both an APU and a discrete
+      GPU that does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES}
+      to enable only the APU.  If not supported, all AMD GPU devices are removed
       from the list of available devices (``host fallback'').
+@item If unset and XNACK is supported by the architecture, GCC automatically
+      sets the  @code{HSA_XNACK} environment variable as follows:  It is set to
+      @samp{0} when compiled with @code{-mxnack=off}; otherwise, it is set to
+      @samp{1} if either compiled with @code{-mxnack=on} or if the OpenMP
+      @code{self_maps} or @code{unified_shared_memory} requirement clause has
+      been specified. Else, it is not set.
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
       environment variable; the default is 32 kiB per thread.
 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
@@ -6663,8 +6669,8 @@  The implementation remark:
       Per device, reverse offload regions are processed serially such that
       the next reverse offload region is only executed after the previous
       one returned.
-@item OpenMP code that has a @code{requires} directive with
-      @code{unified_shared_memory} runs on nvptx devices if and only if
+@item OpenMP code that has a @code{requires} directive with a @code{self_maps} or
+      @code{unified_shared_memory} clause runs on nvptx devices if and only if
       all of those support the @code{pageableMemoryAccess} property;@footnote{
       @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
       otherwise, all nvptx device are removed from the list of available
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
index 8cb4821ee53..e76c6ff4f5f 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -34,3 +34,8 @@  main (void)
   foo ();
   return 0;
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4a.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4a.c
index 0e0db927c2c..de7412913f1 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4a.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4a.c
@@ -38,3 +38,8 @@  main (void)
   __builtin_free (a);
   return 0;
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
index d43d78db6fa..dda758c318f 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
@@ -28,3 +28,8 @@  main (void)
   foo ();
   return 0;
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-7.c b/libgomp/testsuite/libgomp.c-c++-common/requires-7.c
index 63afcc92b9a..7f5dd28bb2c 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-7.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-7.c
@@ -30,3 +30,8 @@  main (void)
      ..., but we may still verify that the rest of the diagnostic is correct:
      { dg-note {' has 'unified_address'} {} { target *-*-* } 0 } */
 /* { dg-excess-errors "Ignore messages like: errors during merging of translation units|mkoffload returned 1 exit status" } */
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-4.c b/libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-4.c
index d0b0cd178c0..c7de68ed410 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-implicit-map-4.c
@@ -157,3 +157,8 @@  main()
     test_device (i);
   return 0;
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
index c707b38b7d4..5de44cc4532 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
@@ -50,3 +50,8 @@  main ()
       q = 42;
     }
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-4.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-4.c
index 785055e216d..a624e969819 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-link-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-4.c
@@ -50,3 +50,8 @@  main ()
       q = 42;
     }
 }
+
+/* AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+   hence, GCC defaults to OFF, causing the warning
+     gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses */
+/* { dg-prune-output "mxnack=off" } */
diff --git a/libgomp/testsuite/libgomp.fortran/self_maps.f90 b/libgomp/testsuite/libgomp.fortran/self_maps.f90
index 208fd1c71d5..8486e769101 100644
--- a/libgomp/testsuite/libgomp.fortran/self_maps.f90
+++ b/libgomp/testsuite/libgomp.fortran/self_maps.f90
@@ -47,3 +47,8 @@  do i = 0, omp_get_num_devices()
   !$omp end target
 end do
 end
+
+! AMD GCN's gfx900/gfx906/gfx908 support in principle the XNACK flag, but the support is lurkwarm;
+! hence, GCC defaults to OFF, causing the warning
+!   gcn mkoffload: warning: ‘-mxnack=off’ is incompatible with OpenMP's ‘self_maps’ and ‘unified_shared_memory’ clauses
+! { dg-prune-output "mxnack=off" }