diff mbox series

[v2,5/8] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

Message ID 20240628102449.562467-6-ams@baylibre.com
State New
Headers show
Series OpenMP: Unified Shared Memory via Managed Memory | expand

Commit Message

Andrew Stubbs June 28, 2024, 10:24 a.m. UTC
From: Andrew Stubbs <ams@codesourcery.com>

The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend.  The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

Other toolchains require the end-user to set HSA_XNACK manually (or else wonder
why it's not working), so the constructor also checks that any existing manual
setting is compatible with the binary's requirements.

gcc/ChangeLog:

	* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
	(gcn_init_cumulative_args): Handle attribute "omp unified memory".
	(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
	* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
	(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
	Emit configure_xnack constructor, as required.
	* omp-low.c (create_omp_child_function): Add attribute "omp unified
	memory".
---
 gcc/config/gcn/gcn.cc       | 32 +++++++++++++++++++++++++++++++-
 gcc/config/gcn/mkoffload.cc | 35 ++++++++++++++++++++++++++++++++++-
 gcc/omp-low.cc              |  4 ++++
 3 files changed, 69 insertions(+), 2 deletions(-)

Comments

Tobias Burnus Oct. 29, 2024, 12:27 p.m. UTC | #1
Hi Andrew,

Am 28.06.24 um 12:24 schrieb Andrew Stubbs:
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc
> @@ -70,6 +70,11 @@ static bool ext_gcn_constants_init = 0;
>   
>   enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
>   
> +/* Record whether the host compiler added "omp unifed memory" attributes to
> +   any functions.  We can then pass this on to mkoffload to ensure xnack is
> +   compatible there too.  */
> +static bool unified_shared_memory_enabled = false;

…

Why is this needed instead of relying on omp_requires?

> +  if (fndecl && lookup_attribute ("omp unified memory",
> +	case PROCESSOR_GFX1036:
> +	case PROCESSOR_GFX1100:
> +	case PROCESSOR_GFX1103:
> +	  error ("GPU architecture does not support Unified Shared Memory");
> +	  break;

This seems to be wrong in two ways:

First, while not really usable, it feels wrong to print an error (and 
not a warning) if USM is not supported. Running the code with host 
fallback seems to be fine, albeit there is admittedly the question 
whether it makes sense to generate the GCN code in this case.

Secondly, gfx1036 has the property 
HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, i.e. this APU supports 
USM, albeit not XNACK.

* * *

> +	default:
> +	  if (flag_xnack == HSACO_ATTR_OFF)
> +	    error ("Unified Shared Memory is enabled, but XNACK is disabled");

Likewise – I understand that USM won't work in this case, but the 
question is whether that should be a warning or an error as it does work 
(by using host fallback in this case).

* * *

> +  /* Emit a constructor function to set the HSA_XNACK environment variable.
> +     This must be done before the ROCr runtime library is loaded.
> +     We never override a user value (exit empty string), but we do emit a
> +     useful diagnostic in the wrong mode (the ROCr message is not good.  */
> +  if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled)
> +    fatal_error (input_location,
> +		 "conflicting settings; XNACK is forced off but Unified "
> +		 "Shared Memory is on");

Is this reachable? I thought the code in gcn-lto1 already prints a 
diagnostic?

Otherwise, the same applies here: error vs. warning.

> +  if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled)

I think you need to exclude XNACK UNSUPPORTED on the RHS, albeit I might 
have missed some condition, which ensures that 
unified_shared_memory_enabled is not set in that case.

> +    fprintf (cfile,
> +	     "static __attribute__((constructor))\n"
> +	     "void configure_xnack (void)\n"
> +	     "{\n"

Constructors are somewhat expensive. Why don't you combine all three 
constructors features into a single constructor?

> +	     "  const char *val = getenv (\"HSA_XNACK\");\n"
> +	     "  if (!val || val[0] == '\\0')\n"
> +	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
> +	     "  else if (%s)\n"
> +	     "    {\n"
> +	     "      fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; "
> +			    "please unset\\n\", val);\n"
> +	     "      exit (1);\n"
> +	     "    }\n"

This looks wrong – not having support for USM is not an error but should 
fall back to a host execution.

But, admittedly, XNACK- and HSA_XNACK=1 or +/0 are incompatible, i.e. we 
might want to keep the warning in that case.

> --- a/gcc/omp-low.cc
> +++ b/gcc/omp-low.cc
> @@ -2124,6 +2124,10 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
>   	DECL_ATTRIBUTES (decl)
>   	  = tree_cons (get_identifier (target_attr),
>   		       NULL_TREE, DECL_ATTRIBUTES (decl));
> +      if (omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
> +	DECL_ATTRIBUTES (decl)
> +	  = tree_cons (get_identifier ("omp unified memory"),
> +		       NULL_TREE, DECL_ATTRIBUTES (decl));

As mentioned, it is unclear to me why 'omp_requires' is not enough. In 
any case, it also needs to handle the (admittedly newer) self_maps flag.

Finally, I think it would be user friendly to mention this feature in 
libgomp.texi.

Tobias
diff mbox series

Patch

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index d6531f55190..6a83ff2a1b4 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -70,6 +70,11 @@  static bool ext_gcn_constants_init = 0;
 
 enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
 
+/* Record whether the host compiler added "omp unifed memory" attributes to
+   any functions.  We can then pass this on to mkoffload to ensure xnack is
+   compatible there too.  */
+static bool unified_shared_memory_enabled = false;
+
 /* Reserve this much space for LDS (for propagating variables from
    worker-single mode to worker-partitioned mode), per workgroup.  Global
    analysis could calculate an exact bound, but we don't do that yet.
@@ -2942,6 +2947,29 @@  gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
     gcn_detect_incoming_pointer_arg (fndecl);
 
+  if (fndecl && lookup_attribute ("omp unified memory",
+				  DECL_ATTRIBUTES (fndecl)))
+    {
+      unified_shared_memory_enabled = true;
+
+      switch (gcn_arch)
+	{
+	case PROCESSOR_FIJI:
+	case PROCESSOR_VEGA10:
+	case PROCESSOR_VEGA20:
+	case PROCESSOR_GFX908:
+	case PROCESSOR_GFX1030:
+	case PROCESSOR_GFX1036:
+	case PROCESSOR_GFX1100:
+	case PROCESSOR_GFX1103:
+	  error ("GPU architecture does not support Unified Shared Memory");
+	  break;
+	default:
+	  if (flag_xnack == HSACO_ATTR_OFF)
+	    error ("Unified Shared Memory is enabled, but XNACK is disabled");
+	}
+    }
+
   reinit_regs ();
 }
 
@@ -6820,12 +6848,14 @@  gcn_hsa_declare_function_name (FILE *file, const char *name,
   fputs (",@function\n", file);
   ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
 
-  /* This comment is read by mkoffload.  */
+  /* These comments are read by mkoffload.  */
   if (flag_openacc)
     fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n",
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER),
 	     oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name);
+  if (unified_shared_memory_enabled)
+    fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n");
 }
 
 /* Implement TARGET_ASM_SELECT_SECTION.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..3dcb6943c45 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -487,6 +487,7 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, ind_fn_count = 0;
   int dims_count = 0, regcount_count = 0;
+  bool unified_shared_memory_enabled = false;
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -511,6 +512,7 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
     { IN_CODE,
       IN_METADATA,
@@ -531,6 +533,9 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
 		dims_count++;
 	      }
 
+	    if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0)
+	      unified_shared_memory_enabled = true;
+
 	    break;
 	  }
 	case IN_METADATA:
@@ -591,7 +596,6 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-      char dummy;
       if (sscanf (buf, " .section .gnu.offload_vars%c", &dummy) > 0)
 	{
 	  state = IN_VARS;
@@ -660,6 +664,7 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
   fprintf (cfile, "#include <stdlib.h>\n");
   fprintf (cfile, "#include <stdint.h>\n");
   fprintf (cfile, "#include <stdbool.h>\n\n");
+  fprintf (cfile, "#include <stdio.h>\n\n");
 
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
@@ -713,6 +718,34 @@  process_asm (FILE *in, FILE *out, FILE *cfile)
 	     "}\n\n",
 	     gcn_stack_size);
 
+  /* Emit a constructor function to set the HSA_XNACK environment variable.
+     This must be done before the ROCr runtime library is loaded.
+     We never override a user value (exit empty string), but we do emit a
+     useful diagnostic in the wrong mode (the ROCr message is not good.  */
+  if (TEST_XNACK_OFF (elf_flags) && unified_shared_memory_enabled)
+    fatal_error (input_location,
+		 "conflicting settings; XNACK is forced off but Unified "
+		 "Shared Memory is on");
+  if (!TEST_XNACK_ANY (elf_flags) || unified_shared_memory_enabled)
+    fprintf (cfile,
+	     "static __attribute__((constructor))\n"
+	     "void configure_xnack (void)\n"
+	     "{\n"
+	     "  const char *val = getenv (\"HSA_XNACK\");\n"
+	     "  if (!val || val[0] == '\\0')\n"
+	     "    setenv (\"HSA_XNACK\", \"%d\", true);\n"
+	     "  else if (%s)\n"
+	     "    {\n"
+	     "      fprintf (stderr, \"error: HSA_XNACK=%%s is incompatible; "
+			    "please unset\\n\", val);\n"
+	     "      exit (1);\n"
+	     "    }\n"
+	     "}\n\n",
+	     unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags),
+	     (unified_shared_memory_enabled || TEST_XNACK_ON (elf_flags)
+	      ? "val[0] != '1' || val[1] != '\\0'"
+	      : "val[0] == '1' && val[1] == '\\0'"));
+
   obstack_free (&fns_os, NULL);
   for (i = 0; i < dims_count; i++)
     free (dims[i].name);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d3f9ccc4567..b21b48280f2 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2124,6 +2124,10 @@  create_omp_child_function (omp_context *ctx, bool task_copy)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
 		       NULL_TREE, DECL_ATTRIBUTES (decl));
+      if (omp_requires_mask & OMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+	DECL_ATTRIBUTES (decl)
+	  = tree_cons (get_identifier ("omp unified memory"),
+		       NULL_TREE, DECL_ATTRIBUTES (decl));
     }
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),