diff mbox series

[v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Message ID 902f00ac-fdad-488d-a3a1-76d613bc85ba@baylibre.com
State New
Headers show
Series [v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines | expand

Commit Message

Tobias Burnus Sept. 19, 2024, 5:11 p.m. UTC
Minor update – addressing the issues that Andre raised (thanks!):

'Add.' → 'New functions.' in the ChangeLog for 'fortran.c' and otherwise 
libgomp.texi changes, only:

A bunch of typo fixes (preexisting and in the new text). I also added an 
made-up example UUID for the GPUs, which should help to reduce confusion.

Any additional comments or suggestions?

Tobias

Tobias Burnus wrote:
> in order to know and potentially re-use a specific offload device 
> (reproducibility,
> affinity wise close to a CPU (socket), …) a mapping between an 
> (universal?) unique
> identifier and the OpenMP device number is useful. Thus, TR13 added 
> support for it.
>
> This is a collateral patch caused by looking at the API routines for 
> other reasons
> and looking at that part of the spec during the OpenMP F2F.
>
> Besides the added API routines, the UID will be used elsewhere:
> * In context selectors: 'target_device' supports 'uid(<string>)'.
> * In the OMP_AVAILABLE_DEVICES and OMP_DEFAULT_DEVICE env vars.
>
> @Sandra: Besides the usual .texi part, for the 'target_device' trait set:
> if you add a new GOMP routine for kind/arch/isa - can you also add an
> UID argument such that we don't have to update the API when needing in 
> the
> not so far future.
>
> @Andrew + @Thomas: Any comment? Especially to the nvptx/gcn side 
> (plugin +
> .texi)?
>
> @Jakub or anyone else — any comments, suggestions, remarks?
>
> [The patch was tested without GPUs, with one Nvidia GPU and one AMD GPU
> and seems to work fine.]
diff mbox series

Patch

OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
a specific device by mapping an OpenMP device number to a
unique ID (UID). The GPU device UIDs should be universally unique,
the one for the host is not.

gcc/ChangeLog:

	* omp-general.cc (omp_runtime_api_procname): Add
	get_device_from_uid and omp_get_uid_from_device routines.

include/ChangeLog:

	* cuda/cuda.h (cuDeviceGetUuid): Declare.
	(cuDeviceGetUuid_v2): Add prototype.

libgomp/ChangeLog:

	* config/gcn/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Add stub implementation.
	* config/nvptx/target.c (omp_get_uid_from_device,
	omp_get_device_from_uid): Likewise.
	* fortran.c (omp_get_uid_from_device_,
	omp_get_uid_from_device_8_): New functions.
	* libgomp-plugin.h (GOMP_OFFLOAD_get_uid): Add prototype.
	* libgomp.h (struct gomp_device_descr): Add 'uid' and 'get_uid_func'.
	* libgomp.map (GOMP_6.0): New, includind the new UID routines.
	* libgomp.texi (OpenMP Technical Report 13): Mark UID routines as 'Y'.
	(Device Information Routines): Document new UID routines.
	(Offload-Target Specifics): Document UID format.
	* omp.h.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New prototype.
	* omp_lib.f90.in (omp_get_device_from_uid, omp_get_uid_from_device):
	New interface.
	* omp_lib.h.in: Likewise.
	* plugin/cuda-lib.def: Add cuDeviceGetUuid and cuDeviceGetUuid_v2 via
	CUDA_ONE_CALL_MAYBE_NULL.
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_uid): New.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_uid): New.
	* target.c (str_omp_initial_device): New static var.
	(STR_OMP_DEV_PREFIX): Define.
	(gomp_get_uid_for_device, omp_get_uid_from_device,
	omp_get_device_from_uid): New.
	(gomp_load_plugin_for_device): DLSYM_OPT the function 'get_uid'.
	(gomp_target_init): Set the device's 'uid' field to NULL.
	* testsuite/libgomp.c/device_uid.c: New test.
	* testsuite/libgomp.fortran/device_uid.f90: New test.

 gcc/omp-general.cc                               |  4 +-
 include/cuda/cuda.h                              |  7 ++
 libgomp/config/gcn/target.c                      | 14 ++++
 libgomp/config/nvptx/target.c                    | 14 ++++
 libgomp/fortran.c                                | 15 ++++
 libgomp/libgomp-plugin.h                         |  1 +
 libgomp/libgomp.h                                |  2 +
 libgomp/libgomp.map                              |  8 +++
 libgomp/libgomp.texi                             | 89 ++++++++++++++++++++++--
 libgomp/omp.h.in                                 |  3 +
 libgomp/omp_lib.f90.in                           | 23 ++++++
 libgomp/omp_lib.h.in                             | 23 ++++++
 libgomp/plugin/cuda-lib.def                      |  2 +
 libgomp/plugin/plugin-gcn.c                      | 16 +++++
 libgomp/plugin/plugin-nvptx.c                    | 34 +++++++++
 libgomp/target.c                                 | 56 +++++++++++++++
 libgomp/testsuite/libgomp.c/device_uid.c         | 38 ++++++++++
 libgomp/testsuite/libgomp.fortran/device_uid.f90 | 42 +++++++++++
 18 files changed, 384 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index de91ba8a4a7..12788ad0249 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -3260,6 +3260,7 @@  omp_runtime_api_procname (const char *name)
       "alloc",
       "calloc",
       "free",
+      "get_device_from_uid",
       "get_interop_int",
       "get_interop_ptr",
       "get_mapped_ptr",
@@ -3338,12 +3339,13 @@  omp_runtime_api_procname (const char *name)
 	 as DECL_NAME only omp_* and omp_*_8 appear.  */
       "display_env",
       "get_ancestor_thread_num",
-      "init_allocator",
+      "omp_get_uid_from_device",
       "get_partition_place_nums",
       "get_place_num_procs",
       "get_place_proc_ids",
       "get_schedule",
       "get_team_size",
+      "init_allocator",
       "set_default_device",
       "set_dynamic",
       "set_max_active_levels",
diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 804d08ca57e..0f90ade57c8 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -201,6 +201,10 @@  typedef struct {
   size_t WidthInBytes, Height, Depth;
 } CUDA_MEMCPY3D_PEER;
 
+typedef struct {
+  char bytes[16];
+} CUuuid;
+
 #define cuCtxCreate cuCtxCreate_v2
 CUresult cuCtxCreate (CUcontext *, unsigned, CUdevice);
 #define cuCtxDestroy cuCtxDestroy_v2
@@ -214,6 +218,9 @@  CUresult cuCtxPushCurrent (CUcontext);
 CUresult cuCtxSynchronize (void);
 CUresult cuCtxSetLimit (CUlimit, size_t);
 CUresult cuDeviceGet (CUdevice *, int);
+/* _v2 was added in CUDA 11.4 and 'will supplant' the old one in 12.0. */
+CUresult cuDeviceGetUuid (CUuuid*, CUdevice);
+CUresult cuDeviceGetUuid_v2 (CUuuid*, CUdevice);
 #define cuDeviceTotalMem cuDeviceTotalMem_v2
 CUresult cuDeviceTotalMem (size_t *, CUdevice);
 CUresult cuDeviceGetAttribute (int *, CUdevice_attribute, CUdevice);
diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index f7fa6aa6396..0a3008454b7 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -283,6 +283,18 @@  omp_get_interop_rc_desc (const omp_interop_t interop __attribute__ ((unused)),
   return rc_strings[omp_irc_no_value - ret_code];
 }
 
+const char *
+omp_get_uid_from_device (int device_num __attribute__ ((unused)))
+{
+  return NULL;
+}
+
+int
+omp_get_device_from_uid (const char *uid __attribute__ ((unused)))
+{
+  return omp_invalid_device;
+}
+
 ialias (omp_get_num_interop_properties)
 ialias (omp_get_interop_int)
 ialias (omp_get_interop_ptr)
@@ -290,3 +302,5 @@  ialias (omp_get_interop_str)
 ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index 69666578c29..811396122b4 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -295,6 +295,18 @@  omp_get_interop_rc_desc (const omp_interop_t interop __attribute__ ((unused)),
   return rc_strings[omp_irc_no_value - ret_code];
 }
 
+const char *
+omp_get_uid_from_device (int device_num __attribute__ ((unused)))
+{
+  return NULL;
+}
+
+int
+omp_get_device_from_uid (const char *uid __attribute__ ((unused)))
+{
+  return omp_invalid_device;
+}
+
 ialias (omp_get_num_interop_properties)
 ialias (omp_get_interop_int)
 ialias (omp_get_interop_ptr)
@@ -302,3 +314,5 @@  ialias (omp_get_interop_str)
 ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
diff --git a/libgomp/fortran.c b/libgomp/fortran.c
index a76c33cee52..732475e3ff4 100644
--- a/libgomp/fortran.c
+++ b/libgomp/fortran.c
@@ -834,6 +834,21 @@  omp_get_interop_rc_desc_ (const char **res, size_t *res_len,
   *res_len = *res ? strlen (*res) : 0;
 }
 
+void
+omp_get_uid_from_device_ (const char **res, size_t *res_len,
+		     	 int32_t device_num) 
+{
+  *res = omp_get_uid_from_device (device_num);
+  *res_len = *res ? strlen (*res) : 0;
+}
+
+void
+omp_get_uid_from_device_8_ (const char **res, size_t *res_len,
+			    int64_t device_num) 
+{
+  omp_get_uid_from_device_ (res, res_len, (int32_t) device_num);
+}
+
 #ifndef LIBGOMP_OFFLOADED_ONLY
 
 void
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 0c9c28c65cf..ce8f7f3236f 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -127,6 +127,7 @@  extern void GOMP_PLUGIN_target_rev (uint64_t, uint64_t, uint64_t, uint64_t,
 
 /* Prototypes for functions implemented by libgomp plugins.  */
 extern const char *GOMP_OFFLOAD_get_name (void);
+extern const char *GOMP_OFFLOAD_get_uid (int);
 extern unsigned int GOMP_OFFLOAD_get_caps (void);
 extern int GOMP_OFFLOAD_get_type (void);
 extern int GOMP_OFFLOAD_get_num_devices (unsigned int);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 089393846d1..f3ecd95b377 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1387,6 +1387,7 @@  struct gomp_device_descr
 
   /* The name of the device.  */
   const char *name;
+  const char *uid;
 
   /* Capabilities of device (supports OpenACC, OpenMP).  */
   unsigned int capabilities;
@@ -1399,6 +1400,7 @@  struct gomp_device_descr
 
   /* Function handlers.  */
   __typeof (GOMP_OFFLOAD_get_name) *get_name_func;
+  __typeof (GOMP_OFFLOAD_get_uid) *get_uid_func;
   __typeof (GOMP_OFFLOAD_get_caps) *get_caps_func;
   __typeof (GOMP_OFFLOAD_get_type) *get_type_func;
   __typeof (GOMP_OFFLOAD_get_num_devices) *get_num_devices_func;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 7c2345eb29b..0023d3e1b6d 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -443,6 +443,14 @@  GOMP_5.1.3 {
 	omp_get_interop_rc_desc_;
 } GOMP_5.1.2;
 
+GOMP_6.0 {
+  global:
+	omp_get_device_from_uid;
+	omp_get_uid_from_device;
+	omp_get_uid_from_device_;
+	omp_get_uid_from_device_8_;
+} GOMP_5.1.3;
+
 OACC_2.0 {
   global:
 	acc_get_num_devices;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e8003df6f02..936d3f4e2e4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -443,7 +443,7 @@  to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
       of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
       on other devices @tab N @tab
-@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mappter}
+@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mapper}
       @tab N @tab
 @end multitable
 
@@ -582,7 +582,7 @@  Technical Report (TR) 13 is the third preview for OpenMP 6.0.
 @item @code{omp_is_free_agent} and @code{omp_ancestor_is_free_agent} routines
       @tab N @tab
 @item @code{omp_get_device_from_uid} and @code{omp_get_uid_from_device} routines
-      @tab N @tab
+      @tab Y @tab
 @item @code{omp_get_device_num_teams}, @code{omp_set_device_num_teams},
       @code{omp_get_device_teams_thread_limit}, and
       @code{omp_set_device_teams_thread_limit} routines @tab N @tab
@@ -1675,12 +1675,12 @@  They have C linkage and do not throw exceptions.
 @menu
 * omp_get_num_procs::           Number of processors online
 @c * omp_get_max_progress_width:: <fixme>/TR11
-@c * omp_get_device_from_uid::  <fixme>/TR13
-@c * omp_get_uid_from_device::  <fixme>/TR13
 * omp_set_default_device::      Set the default device for target regions
 * omp_get_default_device::      Get the default device for target regions
 * omp_get_num_devices::         Number of target devices
 * omp_get_device_num::          Get device that current thread is running on
+* omp_get_device_from_uid::     Obtain the device number to a unique id
+* omp_get_uid_from_device::     Obtain the unique id of a device
 * omp_is_initial_device::       Whether executing on the host device
 * omp_get_initial_device::      Device number of host device
 @c * omp_get_device_num_teams::  <fixme>/TR13
@@ -1830,6 +1830,71 @@  as required since OpenMP 5.0.
 
 
 
+@node omp_get_device_from_uid
+@subsection @code{omp_get_device_from_uid} -- Obtain the device number to a unique id
+@table @asis
+@item @emph{Description}:
+This function returns the device number associated with the passed
+unique-identifier (UID) string.  If no device with this UID is available, the value
+@code{omp_invalid_device} is returned.  The effect of running this routine in a
+@code{target} region is unspecified.
+
+GCC treats the UID string case sensitive; for the initial device, GCC currently
+only accepts the value @code{OMP_INITIAL_DEVICE} and returns for it the value
+of @code{omp_initial_device}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{int omp_get_device_from_uid(const char *uid);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{integer function omp_get_device_from_uid(uid)}
+@item                   @tab @code{character(len=*), intent(in) :: uid}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.7
+@end table
+
+
+
+@node omp_get_uid_from_device
+@subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device
+@table @asis
+@item @emph{Description}:
+This function returns a pointer to a string that represents a unique identifier
+(UID) for the device specified by @var{device_num}.  It returns a @code{NULL} (C/C++)
+or a disassociated pointer (Fortran) for @code{omp_invalid_device}.  The effect of
+running this routine in a @code{target} region is unspecified.
+
+GCC currently returns for initial device the value @code{OMP_INITIAL_DEVICE}.
+
+@item @emph{C/C++}:
+@multitable @columnfractions .20 .80
+@item @emph{Prototype}: @tab @code{const char *omp_get_uid_from_device(int device_num);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{character(:) function omp_get_uid_from_device(device_num)}
+@item @emph{Interface}: @tab @code{pointer :: omp_get_uid_from_device}
+@item                   @tab @code{integer, intent(in) :: device_num}
+@end multitable
+
+@item @emph{See also}:
+@ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
+
+@item @emph{Reference}:
+@uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.8
+@end table
+
+
+
 @node omp_is_initial_device
 @subsection @code{omp_is_initial_device} -- Whether executing on the host device
 @table @asis
@@ -3455,13 +3520,13 @@  The effect when invoked from within a @code{target} region is unspecified.
 
 @item @emph{Fortran}:
 @multitable @columnfractions .20 .80
-@item @emph{Interface}: @tab @code{subroutine omp_display_env(vebose)}
+@item @emph{Interface}: @tab @code{subroutine omp_display_env(verbose)}
 @item                   @tab @code{logical, intent(in) :: verbose}
 @end multitable
 
 @item @emph{Example}:
 Note that the GCC-specific ICVs, such as the shown @code{GOMP_SPINCOUNT},
-are only printed when @var{varbose} set to @code{true}.
+are only printed when @var{verbose} set to @code{true}.
 
 @smallexample
 OPENMP DISPLAY ENVIRONMENT BEGIN
@@ -6517,6 +6582,11 @@  The implementation remark:
       @code{omp_thread_mem_alloc}, all use low-latency memory as first
       preference, and fall back to main graphics memory when the low-latency
       pool is exhausted.
+@item The unique identifier (UID), used with OpenMP's API UID routines, is the
+      value returned by the HSA runtime library for @code{HSA_AMD_AGENT_INFO_UUID}.
+      For GPUs, it is currently @samp{GPU-} followed by 16 lower-case hex digits,
+      yielding a string like @code{GPU-f914a2142fc3413a}.  The output matches
+      the one used by @code{rocminfo}.
 @end itemize
 
 
@@ -6604,6 +6674,13 @@  The implementation remark:
       @code{omp_thread_mem_alloc}, all use low-latency memory as first
       preference, and fall back to main graphics memory when the low-latency
       pool is exhausted.
+@item The unique identifier (UID), used with OpenMP's API UID routines, consists
+      of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
+      the CUDA runtime library.  This UUID is output in grouped lower-case
+      hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
+      4 digits, hyphen, 4 digits, hyphen, 16 digits. This leads to a string
+      like @code{GPU-a8081c9e-f03e-18eb-1827-bf5ba95afa5d}.  The output
+      matches the format used by @code{nvidia-smi}.
 @end itemize
 
 
diff --git a/libgomp/omp.h.in b/libgomp/omp.h.in
index 4ce790833ed..04aae8b51a3 100644
--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -425,6 +425,9 @@  extern const char *omp_get_interop_type_desc (const omp_interop_t,
 extern const char *omp_get_interop_rc_desc (const omp_interop_t,
 					    omp_interop_rc_t) __GOMP_NOTHROW;
 
+extern int omp_get_device_from_uid (const char *) __GOMP_NOTHROW;
+extern const char *omp_get_uid_from_device (int) __GOMP_NOTHROW;
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/libgomp/omp_lib.f90.in b/libgomp/omp_lib.f90.in
index 1861c40266b..360352c5a07 100644
--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
@@ -1003,6 +1003,29 @@ 
           end function omp_get_interop_rc_desc
         end interface
 
+        interface
+          ! Note: In gfortran, strings are \0 termined
+          integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+            use iso_c_binding
+            character(c_char), intent(in) :: uid(*)
+          end function omp_get_device_from_uid
+        end interface
+
+        interface omp_get_uid_from_device
+          ! Deviation from OpenMP 6.0: VALUE added.
+          character(:) function omp_get_uid_from_device (device_num)
+            use iso_c_binding
+            pointer :: omp_get_uid_from_device
+            integer(c_int32_t), intent(in), value :: device_num
+          end function omp_get_uid_from_device
+
+          character(:) function omp_get_uid_from_device_8 (device_num)
+            use iso_c_binding
+            pointer :: omp_get_uid_from_device_8
+            integer(c_int64_t), intent(in), value :: device_num
+          end function omp_get_uid_from_device_8
+        end interface omp_get_uid_from_device
+
 #if _OPENMP >= 201811
 !GCC$ ATTRIBUTES DEPRECATED :: omp_get_nested, omp_set_nested
 !GCC$ ATTRIBUTES DEPRECATED :: omp_lock_hint_kind, omp_lock_hint_none
diff --git a/libgomp/omp_lib.h.in b/libgomp/omp_lib.h.in
index 6959f1e96c7..10038611d80 100644
--- a/libgomp/omp_lib.h.in
+++ b/libgomp/omp_lib.h.in
@@ -610,3 +610,26 @@ 
           integer (omp_interop_rc_kind), value :: ret_code
         end function omp_get_interop_rc_desc
       end interface
+
+      interface
+!       Note: In gfortran, strings are \0 termined
+        integer(c_int) function omp_get_device_from_uid(uid) bind(C)
+          use iso_c_binding
+          character(c_char), intent(in) :: uid(*)
+        end function omp_get_device_from_uid
+      end interface
+
+      interface omp_get_uid_from_device
+!       Deviation from OpenMP 6.0: VALUE added.
+        character(:) function omp_get_uid_from_device (device_num)
+          use iso_c_binding
+          pointer :: omp_get_uid_from_device
+          integer(c_int32_t), intent(in), value :: device_num
+        end function omp_get_uid_from_device
+
+        character(:) function omp_get_uid_from_device_8 (device_num)
+          use iso_c_binding
+          pointer :: omp_get_uid_from_device_8
+          integer(c_int64_t), intent(in), value :: device_num
+        end function omp_get_uid_from_device_8
+      end interface omp_get_uid_from_device
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 9255c1cff68..eb562ace95e 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -10,6 +10,8 @@  CUDA_ONE_CALL (cuDeviceGet)
 CUDA_ONE_CALL (cuDeviceGetAttribute)
 CUDA_ONE_CALL (cuDeviceGetCount)
 CUDA_ONE_CALL (cuDeviceGetName)
+CUDA_ONE_CALL_MAYBE_NULL (cuDeviceGetUuid)
+CUDA_ONE_CALL_MAYBE_NULL (cuDeviceGetUuid_v2)
 CUDA_ONE_CALL (cuDeviceTotalMem)
 CUDA_ONE_CALL (cuDriverGetVersion)
 CUDA_ONE_CALL (cuEventCreate)
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 3d882b5ab63..bf6ad371ea2 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3316,6 +3316,22 @@  GOMP_OFFLOAD_get_name (void)
   return "gcn";
 }
 
+const char *
+GOMP_OFFLOAD_get_uid (int ord)
+{
+  char *str;
+  hsa_status_t status;
+  struct agent_info *agent = get_agent_info (ord);
+
+  /* HSA documentation states: maximally 21 characters including NUL.  */
+  str = GOMP_PLUGIN_malloc (21 * sizeof (char));
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AMD_AGENT_INFO_UUID,
+					  str);
+  if (status != HSA_STATUS_SUCCESS)
+    hsa_fatal ("Could not obtain device UUID", status);
+  return str;
+}
+
 /* Return the specific capabilities the HSA accelerator have.  */
 
 unsigned int
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 99cbcb699b3..261eb868611 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1242,6 +1242,40 @@  GOMP_OFFLOAD_get_name (void)
   return "nvptx";
 }
 
+const char *
+GOMP_OFFLOAD_get_uid (int ord)
+{
+  CUresult r;
+  CUuuid s;
+  struct ptx_device *dev = ptx_devices[ord];
+
+  if (CUDA_CALL_EXISTS (cuDeviceGetUuid_v2))
+    r = CUDA_CALL_NOCHECK (cuDeviceGetUuid_v2, &s, dev->dev);
+  else if (CUDA_CALL_EXISTS (cuDeviceGetUuid))
+    r = CUDA_CALL_NOCHECK (cuDeviceGetUuid, &s, dev->dev);
+  else
+    r = CUDA_ERROR_NOT_FOUND;
+  if (r != CUDA_SUCCESS)
+    GOMP_PLUGIN_fatal ("cuDeviceGetUuid error: %s", cuda_error (r));
+
+  size_t len = strlen ("GPU-12345678-9abc-defg-hijk-lmniopqrstuv");
+  char *str = (char *) GOMP_PLUGIN_malloc (len + 1);
+  sprintf (str,
+	   "GPU-%02x" "%02x" "%02x" "%02x"
+	   "-%02x" "%02x"
+	   "-%02x" "%02x"
+	   "-%02x" "%02x" "%02x" "%02x" "%02x" "%02x" "%02x" "%02x",
+	   (unsigned char) s.bytes[0], (unsigned char) s.bytes[1],
+	   (unsigned char) s.bytes[2], (unsigned char) s.bytes[3],
+	   (unsigned char) s.bytes[4], (unsigned char) s.bytes[5],
+	   (unsigned char) s.bytes[6], (unsigned char) s.bytes[7],
+	   (unsigned char) s.bytes[8], (unsigned char) s.bytes[9],
+	   (unsigned char) s.bytes[10], (unsigned char) s.bytes[11],
+	    (unsigned char) s.bytes[12], (unsigned char) s.bytes[13],
+	   (unsigned char) s.bytes[14], (unsigned char) s.bytes[15]);
+  return str;
+}
+
 unsigned int
 GOMP_OFFLOAD_get_caps (void)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index 47ec36928a6..fe7879b3741 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -51,6 +51,9 @@ 
 #define splay_tree_c
 #include "splay-tree.h"
 
+/* Used by omp_get_device_from_uid / omp_get_uid_from_device for the host.  */
+static char *str_omp_initial_device = "OMP_INITIAL_DEVICE";
+#define STR_OMP_DEV_PREFIX "OMP_DEV_"
 
 typedef uintptr_t *hash_entry_type;
 static inline void * htab_alloc (size_t size) { return gomp_malloc (size); }
@@ -5223,6 +5226,56 @@  ialias (omp_get_interop_name)
 ialias (omp_get_interop_type_desc)
 ialias (omp_get_interop_rc_desc)
 
+static const char *
+gomp_get_uid_for_device (struct gomp_device_descr *devicep, int device_num)
+{
+  if (devicep->uid)
+    return devicep->uid;
+
+  if (devicep->get_uid_func)
+    devicep->uid = devicep->get_uid_func (devicep->target_id);
+  if (!devicep->uid)
+    {
+      size_t ln = strlen (STR_OMP_DEV_PREFIX) + 10 + 1;
+      char *uid;
+      uid = gomp_malloc (ln);
+      snprintf (uid, ln, "%s%d", STR_OMP_DEV_PREFIX, device_num);
+      devicep->uid = uid;
+    }
+  return devicep->uid;
+}
+
+const char *
+omp_get_uid_from_device (int device_num)
+{
+  if (device_num < omp_initial_device || device_num > gomp_get_num_devices ())
+    return NULL;
+
+  if (device_num == omp_initial_device || device_num == gomp_get_num_devices ())
+    return str_omp_initial_device;
+
+  struct gomp_device_descr *devicep = resolve_device (device_num, false);
+  if (devicep == NULL)
+    return NULL;
+  return gomp_get_uid_for_device (devicep, device_num);
+}
+
+int
+omp_get_device_from_uid (const char *uid)
+{
+  if (uid == NULL)
+    return omp_invalid_device;
+  if (strcmp (uid, str_omp_initial_device) == 0)
+    return omp_initial_device;
+  for (int dev = 0; dev < gomp_get_num_devices (); dev++)
+    if (strcmp (uid, gomp_get_uid_for_device (&devices[dev], dev)) == 0)
+      return dev;
+  return omp_invalid_device;
+}
+
+ialias (omp_get_uid_from_device)
+ialias (omp_get_device_from_uid)
+
 #ifdef PLUGIN_SUPPORT
 
 /* This function tries to load a plugin for DEVICE.  Name of plugin is passed
@@ -5264,6 +5317,7 @@  gomp_load_plugin_for_device (struct gomp_device_descr *device,
     }
 
   DLSYM (get_name);
+  DLSYM_OPT (get_uid, get_uid);
   DLSYM (get_caps);
   DLSYM (get_type);
   DLSYM (get_num_devices);
@@ -5449,6 +5503,8 @@  gomp_target_init (void)
 		  }
 
 		current_device.name = current_device.get_name_func ();
+		/* Defer UID setting until needed + after gomp_init_device.  */
+	        current_device.uid = NULL;
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
diff --git a/libgomp/testsuite/libgomp.c/device_uid.c b/libgomp/testsuite/libgomp.c/device_uid.c
new file mode 100644
index 00000000000..0412d06f615
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/device_uid.c
@@ -0,0 +1,38 @@ 
+#include <stdlib.h>
+#include <string.h>
+#include <omp.h>
+
+int main()
+{
+  const char **strs = (const char **) malloc (sizeof (char*) * (omp_get_num_devices () + 1));
+  for (int i = omp_invalid_device - 1; i <= omp_get_num_devices () + 1; i++)
+    {
+      const char *str = omp_get_uid_from_device (i);
+      int dev = omp_get_device_from_uid (str);
+// __builtin_printf("%i -> %s -> %d\n", i, str, dev);
+      if (i < omp_initial_device || i > omp_get_num_devices ())
+	{
+	  if (dev != omp_invalid_device || str != NULL)
+	    abort ();
+	  continue;
+	}
+      if (i == omp_initial_device || i == omp_get_num_devices ())
+	{
+	  if ((dev != omp_initial_device && dev != omp_get_num_devices ())
+	      || str == NULL
+	      || strcmp (str, "OMP_INITIAL_DEVICE") != 0) /* GCC impl. choice */
+	    abort ();
+	  dev = omp_get_num_devices ();
+	}
+      else if (dev != i || str == NULL || str[0] == '\0')
+	abort ();
+      strs[dev] = str;
+    }
+
+  for (int i = 0; i < omp_get_num_devices (); i++)
+    for (int j = i + 1; j <= omp_get_num_devices (); j++)
+      if (strcmp (strs[i], strs[j]) == 0)
+	abort ();
+  free (strs);
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/device_uid.f90 b/libgomp/testsuite/libgomp.fortran/device_uid.f90
new file mode 100644
index 00000000000..5104984f55e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/device_uid.f90
@@ -0,0 +1,42 @@ 
+program main
+  use omp_lib
+  implicit none (type, external)
+  integer :: i, j, dev
+  character(:), pointer :: str
+  type t
+    character(:), pointer :: str
+  end type t
+  type(t), allocatable :: strs(:)
+
+  allocate(strs(0:omp_get_num_devices ()))
+
+  do i = omp_invalid_device - 1, omp_get_num_devices () + 1
+    str => omp_get_uid_from_device (i)
+    dev = omp_get_device_from_uid (str);
+!print *, i, str, dev
+    if (i < omp_initial_device .or. i > omp_get_num_devices ()) then
+      if (dev /= omp_invalid_device .or. associated(str)) &
+        stop 1
+      cycle
+    end if
+    if (.not. associated(str)) &
+      stop 2
+    if (i == omp_initial_device .or. i == omp_get_num_devices ()) then
+      if ((dev /= omp_initial_device .and. dev /= omp_get_num_devices ()) &
+          .or. str /= "OMP_INITIAL_DEVICE") & ! /* GCC impl. choice */
+       stop 3
+      dev = omp_get_num_devices ()
+    else if (dev /= i .or. len(str) == 0) then
+      stop 4
+    end if
+    strs(dev)%str => str
+  end do 
+
+  do i = 0, omp_get_num_devices () - 1
+    do j = i + 1, omp_get_num_devices ()
+      if (strs(i)%str == strs(j)%str) &
+        stop 4
+    end do
+  end do
+  deallocate (strs)
+end