diff mbox

[3/3] Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx plugin

Message ID 5e41dd53-5217-cbcc-ca3d-f30dc153c18d@mentor.com
State New
Headers show

Commit Message

Tom de Vries July 4, 2017, 10:23 a.m. UTC
On 07/04/2017 12:05 PM, Tom de Vries wrote:
> On 07/03/2017 04:24 PM, Tom de Vries wrote:
>> On 07/03/2017 04:08 PM, Thomas Schwinge wrote:
>>> Hi!
>>>
>>> On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek <jakub@redhat.com> 
>>> wrote:
>>>> On Mon, Jun 26, 2017 at 03:26:57PM +0000, Joseph Myers wrote:
>>>>> On Mon, 26 Jun 2017, Tom de Vries wrote:
>>>>>
>>>>>>> 2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx 
>>>>>>> plugin
>>>>>>
>>>>>> This patch adds handling of:
>>>>>> - GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
>>>>>> - GOMP_OPENACC_NVPTX_DISASM=[01]
>>>
>>> Why the "OPENACC" in these names?
>>
>> I took the format from 'GOMP_OPENACC_DIM'.
>>
>>> Doesn't this debugging aid apply to
>>> any variant of offloading?
>>
>> I guess you're right. These environment variables would also be 
>> applicable for f.i. offloading via openmp on nvptx. I'll strip the 
>> 'OPENACC_' bit from the variables.
>>
>>>>>> The filename used for dumping the module is plugin-nvptx.<pid>.cubin.
>>>
>>> Also, I suggest to make these names similar to their controlling 
>>> options,
>>> that is: "gomp-nvptx*", for example.
>>>
>>
>> Makes sense, will do.
> 
> Changes in the patch series:
> - removed OPENACC_ from environment variable names
> - made temp files use gomp-nvptx prefix.
> - fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
> - merged the three GOMP_NVPTX_JIT patches into one
> - rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
>    invocation if GOMP_NVPTX_JIT if not defined, removing the need for
>    hardcoding default values
> - added CU_JIT_TARGET to plugin/cuda/cuda.h
> 
> Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).
> 
> The patch series now looks like:
> 1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
> 2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
> 3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx
>     plugin
> 
> I'll repost the patch series in reply to this email.
> 

3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx
    plugin
    ( combination of 3 GOMP_NVPTX_JIT patches originally submitted at:
      https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01921.html
      https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01920.html
      https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02407.html )

Thanks,
- Tom

Comments

Tom de Vries Aug. 29, 2017, 6:56 a.m. UTC | #1
On 07/04/2017 12:23 PM, Tom de Vries wrote:
> On 07/04/2017 12:05 PM, Tom de Vries wrote:
>> On 07/03/2017 04:24 PM, Tom de Vries wrote:
>>> On 07/03/2017 04:08 PM, Thomas Schwinge wrote:
>>>> Hi!
>>>>
>>>> On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek <jakub@redhat.com> 
>>>> wrote:
>>>>> On Mon, Jun 26, 2017 at 03:26:57PM +0000, Joseph Myers wrote:
>>>>>> On Mon, 26 Jun 2017, Tom de Vries wrote:
>>>>>>
>>>>>>>> 2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp 
>>>>>>>> nvptx plugin
>>>>>>>
>>>>>>> This patch adds handling of:
>>>>>>> - GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
>>>>>>> - GOMP_OPENACC_NVPTX_DISASM=[01]
>>>>
>>>> Why the "OPENACC" in these names?
>>>
>>> I took the format from 'GOMP_OPENACC_DIM'.
>>>
>>>> Doesn't this debugging aid apply to
>>>> any variant of offloading?
>>>
>>> I guess you're right. These environment variables would also be 
>>> applicable for f.i. offloading via openmp on nvptx. I'll strip the 
>>> 'OPENACC_' bit from the variables.
>>>
>>>>>>> The filename used for dumping the module is 
>>>>>>> plugin-nvptx.<pid>.cubin.
>>>>
>>>> Also, I suggest to make these names similar to their controlling 
>>>> options,
>>>> that is: "gomp-nvptx*", for example.
>>>>
>>>
>>> Makes sense, will do.
>>
>> Changes in the patch series:
>> - removed OPENACC_ from environment variable names
>> - made temp files use gomp-nvptx prefix.
>> - fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
>> - merged the three GOMP_NVPTX_JIT patches into one
>> - rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
>>    invocation if GOMP_NVPTX_JIT if not defined, removing the need for
>>    hardcoding default values
>> - added CU_JIT_TARGET to plugin/cuda/cuda.h
>>
>> Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).
>>
>> The patch series now looks like:
>> 1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
>> 2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
>> 3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx
>>     plugin
>>
>> I'll repost the patch series in reply to this email.
>>
> 
> 3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx
>     plugin
>     ( combination of 3 GOMP_NVPTX_JIT patches originally submitted at:
>       https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01921.html
>       https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01920.html
>       https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02407.html )
> 


Ping. I'd like to use GOMP_NVPTX_JIT in a workaround for a cuda JIT bug 
triggered in libgomp.c/for-5.c (see PR81805), like this:
...
  /* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */
...

Thanks,- Tom

> 
> 0003-Handle-GOMP_NVPTX_JIT-O-0-4-ori-arch-n-in-libgomp-nvptx-plugin.patch
> 
> 
> Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx plugin
> 
> 2017-06-26  Tom de Vries  <tom@codesourcery.com>
> 
> 	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL,
> 	CU_JIT_NEW_SM3X_OPT and CU_JIT_TARGET.
> 	* plugin/plugin-nvptx.c (parse_number): New function.
> 	(process_GOMP_NVPTX_JIT): New function.
> 	(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL, CU_JIT_NEW_SM3X_OPT and
> 	CU_JIT_TARGET to opts if specified.
> 
> ---
>   libgomp/plugin/cuda/cuda.h    |   5 +-
>   libgomp/plugin/plugin-nvptx.c | 108 ++++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 109 insertions(+), 4 deletions(-)
> 
> diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
> index 25d5d19..7d190f1 100644
> --- a/libgomp/plugin/cuda/cuda.h
> +++ b/libgomp/plugin/cuda/cuda.h
> @@ -88,7 +88,10 @@ typedef enum {
>     CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
>     CU_JIT_ERROR_LOG_BUFFER = 5,
>     CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
> -  CU_JIT_LOG_VERBOSE = 12
> +  CU_JIT_OPTIMIZATION_LEVEL = 7,
> +  CU_JIT_TARGET = 9,
> +  CU_JIT_LOG_VERBOSE = 12,
> +  CU_JIT_NEW_SM3X_OPT = 15
>   } CUjit_option;
>   
>   typedef enum {
> diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
> index cc2ee5e..f5b9502 100644
> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c
> @@ -144,6 +144,10 @@ init_cuda_lib (void)
>   
>   #include "secure_getenv.h"
>   
> +#if CUDA_VERSION < 8000
> +#define CU_JIT_NEW_SM3X_OPT 15
> +#endif
> +
>   /* Convenience macros for the frequently used CUDA library call and
>      error handling sequence as well as CUDA library calls that
>      do the error checking themselves or don't do it at all.  */
> @@ -1106,11 +1110,77 @@ post_process_ptx (unsigned num, const char **res_code, size_t *res_size)
>   }
>   
>   static bool
> +parse_number (const char *c, unsigned long* resp, char **end)
> +{
> +  unsigned long res;
> +
> +  errno = 0;
> +  res = strtoul (c, end, 10);
> +  if (errno)
> +    return false;
> +
> +  *resp = res;
> +  return true;
> +}
> +
> +static void
> +process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o, intptr_t *gomp_nvptx_ori,
> +			uintptr_t *gomp_nvptx_target)
> +{
> +  const char *var_name = "GOMP_NVPTX_JIT";
> +  const char *env_var = getenv (var_name);
> +  notify_var (var_name, env_var);
> +
> +  if (env_var == NULL)
> +    return;
> +
> +  const char *c = env_var;
> +  while (*c != '\0')
> +    {
> +      while (*c == ' ')
> +	c++;
> +
> +      if (c[0] == '-' && c[1] == 'O'
> +	  && '0' <= c[2] && c[2] <= '4'
> +	  && (c[3] == '\0' || c[3] == ' '))
> +	{
> +	  *gomp_nvptx_o = c[2] - '0';
> +	  c += 3;
> +	  continue;
> +	}
> +
> +      if (c[0] == '-' && c[1] == 'o' && c[2] == 'r' && c[3] == 'i'
> +	  && (c[4] == '\0' || c[4] == ' '))
> +	{
> +	  *gomp_nvptx_ori = 1;
> +	  c += 4;
> +	  continue;
> +	}
> +
> +      if (c[0] == '-' && c[1] == 'a' && c[2] == 'r' && c[3] == 'c'
> +	  && c[4] == 'h' && c[5] == '=')
> +	{
> +	  const char *end;
> +	  unsigned long val;
> +	  if (parse_number (&c[6], &val, (char**)&end))
> +	    {
> +	      *gomp_nvptx_target = val;
> +	      c = end;
> +	      continue;
> +	    }
> +	}
> +
> +      GOMP_PLUGIN_error ("Error parsing %s", var_name);
> +      break;
> +    }
> +}
> +
> +static bool
>   link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
>   	  unsigned num_objs)
>   {
> -  CUjit_option opts[6];
> -  void *optvals[6];
> +  CUjit_option opts[9];
> +  void *optvals[9];
>     float elapsed = 0.0;
>     char elog[1024];
>     char ilog[16384];
> @@ -1137,7 +1207,39 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
>     opts[5] = CU_JIT_LOG_VERBOSE;
>     optvals[5] = (void *) 1;
>   
> -  CUDA_CALL (cuLinkCreate, 6, opts, optvals, &linkstate);
> +  static intptr_t gomp_nvptx_o = -1;
> +  static intptr_t gomp_nvptx_ori = -1;
> +  static uintptr_t gomp_nvptx_target = 0;
> +
> +  static bool init_done = false;
> +  if (!init_done)
> +    {
> +      process_GOMP_NVPTX_JIT (&gomp_nvptx_o, &gomp_nvptx_ori,
> +			      &gomp_nvptx_target);
> +      init_done = true;
> +  }
> +
> +  int nopts = 6;
> +  if (gomp_nvptx_o != -1)
> +    {
> +      opts[nopts] = CU_JIT_OPTIMIZATION_LEVEL;
> +      optvals[nopts] = (void *) gomp_nvptx_o;
> +      nopts++;
> +    }
> +  if (gomp_nvptx_ori != -1)
> +    {
> +      opts[nopts] = CU_JIT_NEW_SM3X_OPT;
> +      optvals[nopts] = (void *) gomp_nvptx_ori;
> +      nopts++;
> +    }
> +  if (gomp_nvptx_target != 0)
> +    {
> +      opts[nopts] = CU_JIT_TARGET;
> +      optvals[nopts] = (void *) gomp_nvptx_target;
> +      nopts++;
> +    }
> +
> +  CUDA_CALL (cuLinkCreate, nopts, opts, optvals, &linkstate);
>   
>     for (; num_objs--; ptx_objs++)
>       {
>
diff mbox

Patch

Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=<n>} in libgomp nvptx plugin

2017-06-26  Tom de Vries  <tom@codesourcery.com>

	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL,
	CU_JIT_NEW_SM3X_OPT and CU_JIT_TARGET.
	* plugin/plugin-nvptx.c (parse_number): New function.
	(process_GOMP_NVPTX_JIT): New function.
	(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL, CU_JIT_NEW_SM3X_OPT and
	CU_JIT_TARGET to opts if specified.

---
 libgomp/plugin/cuda/cuda.h    |   5 +-
 libgomp/plugin/plugin-nvptx.c | 108 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..7d190f1 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,7 +88,10 @@  typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
+  CU_JIT_TARGET = 9,
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_NEW_SM3X_OPT = 15
 } CUjit_option;
 
 typedef enum {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index cc2ee5e..f5b9502 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -144,6 +144,10 @@  init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+#if CUDA_VERSION < 8000
+#define CU_JIT_NEW_SM3X_OPT 15
+#endif
+
 /* Convenience macros for the frequently used CUDA library call and
    error handling sequence as well as CUDA library calls that
    do the error checking themselves or don't do it at all.  */
@@ -1106,11 +1110,77 @@  post_process_ptx (unsigned num, const char **res_code, size_t *res_size)
 }
 
 static bool
+parse_number (const char *c, unsigned long* resp, char **end)
+{
+  unsigned long res;
+
+  errno = 0;
+  res = strtoul (c, end, 10);
+  if (errno)
+    return false;
+
+  *resp = res;
+  return true;
+}
+
+static void
+process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o, intptr_t *gomp_nvptx_ori,
+			uintptr_t *gomp_nvptx_target)
+{
+  const char *var_name = "GOMP_NVPTX_JIT";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var == NULL)
+    return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+    {
+      while (*c == ' ')
+	c++;
+
+      if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+      if (c[0] == '-' && c[1] == 'o' && c[2] == 'r' && c[3] == 'i'
+	  && (c[4] == '\0' || c[4] == ' '))
+	{
+	  *gomp_nvptx_ori = 1;
+	  c += 4;
+	  continue;
+	}
+
+      if (c[0] == '-' && c[1] == 'a' && c[2] == 'r' && c[3] == 'c'
+	  && c[4] == 'h' && c[5] == '=')
+	{
+	  const char *end;
+	  unsigned long val;
+	  if (parse_number (&c[6], &val, (char**)&end))
+	    {
+	      *gomp_nvptx_target = val;
+	      c = end;
+	      continue;
+	    }
+	}
+
+      GOMP_PLUGIN_error ("Error parsing %s", var_name);
+      break;
+    }
+}
+
+static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[6];
-  void *optvals[6];
+  CUjit_option opts[9];
+  void *optvals[9];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -1137,7 +1207,39 @@  link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   opts[5] = CU_JIT_LOG_VERBOSE;
   optvals[5] = (void *) 1;
 
-  CUDA_CALL (cuLinkCreate, 6, opts, optvals, &linkstate);
+  static intptr_t gomp_nvptx_o = -1;
+  static intptr_t gomp_nvptx_ori = -1;
+  static uintptr_t gomp_nvptx_target = 0;
+
+  static bool init_done = false;
+  if (!init_done)
+    {
+      process_GOMP_NVPTX_JIT (&gomp_nvptx_o, &gomp_nvptx_ori,
+			      &gomp_nvptx_target);
+      init_done = true;
+  }
+
+  int nopts = 6;
+  if (gomp_nvptx_o != -1)
+    {
+      opts[nopts] = CU_JIT_OPTIMIZATION_LEVEL;
+      optvals[nopts] = (void *) gomp_nvptx_o;
+      nopts++;
+    }
+  if (gomp_nvptx_ori != -1)
+    {
+      opts[nopts] = CU_JIT_NEW_SM3X_OPT;
+      optvals[nopts] = (void *) gomp_nvptx_ori;
+      nopts++;
+    }
+  if (gomp_nvptx_target != 0)
+    {
+      opts[nopts] = CU_JIT_TARGET;
+      optvals[nopts] = (void *) gomp_nvptx_target;
+      nopts++;
+    }
+
+  CUDA_CALL (cuLinkCreate, nopts, opts, optvals, &linkstate);
 
   for (; num_objs--; ptx_objs++)
     {