diff mbox series

OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

Message ID 3ef24b00-7ce1-43df-a62e-2817b2700fb9@baylibre.com
State New
Headers show
Series OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*} | expand

Commit Message

Tobias Burnus Feb. 19, 2024, 9:36 p.m. UTC
While waiting for some testing to finish, I got distracted and added the
very low hanging OpenACC 3.3 fruits, i.e. those Fortran routines that directly
map to their C counter part.

Comments, remarks?

Tobias

Comments

Thomas Schwinge Feb. 27, 2024, 11:18 a.m. UTC | #1
Hi Tobias!

On 2024-02-19T22:36:51+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> While waiting for some testing to finish, I got distracted and added the
> very low hanging OpenACC 3.3 fruits, i.e. those Fortran routines that directly
> map to their C counter part.
>
> Comments, remarks?

Thanks, that largely looks straight-forward.  I've not done an in-depth
review, just a few comments.  Resolve these as you think is necessary,
and then 'git push'.

I don't know much about Fortran interfaces -- I trust you've got that
under control.  ;-)

Thanks for the test cases.  Would be nice to have test cases covering all
interfaces -- but I don't think we're currently complete in that regard,
so shall not hold your contribution to higher standards.

> OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}
>
> These routines map simply to the C counterpart and are meanwhile
> defined in OpenACC 3.3. (There are additional routine changes,
> including the Fortran addition of acc_attach/acc_detach, that
> require more work than a simple addition of an interface and
> are therefore excluded.)

I saw:

  - <https://gcc.gnu.org/PR113997> "Bogus 'Warning: Interface mismatch in global procedure' with C binding"
  - <https://gcc.gnu.org/PR114002> "[OpenACC][OpenACC 3.3] Add 'acc_attach'/'acc_detach' routine"

> --- a/libgomp/libgomp.texi
> +++ b/libgomp/libgomp.texi

>  @section @code{acc_malloc} -- Allocate device memory.
>  @table @asis
>  @item @emph{Description}
> -This function allocates @var{len} bytes of device memory. It returns
> +This function allocates @var{bytes} of device memory. It returns

Not '@var{bytes} {+bytes+}' or similar?

>  @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.

>  @item @emph{C/C++}:
>  @multitable @columnfractions .20 .80
> -@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
> +@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device(d_void* data_dev_dest,}
> +@item                   @tab @code{h_void* data_host_src, size_t bytes);}
> +@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device_async(d_void* data_dev_dest,}
> +@item                   @tab @code{h_void* data_host_src, size_t bytes, int async_arg);}
> +@end multitable
> +
> +@item @emph{Fortran}:
> +@multitable @columnfractions .20 .80
> +@item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device(data_dev_dest, &}
> +@item                   @tab @code{data_host_src, bytes)}
> +@item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device_async(data_dev_dest, &}
> +@item                   @tab @code{data_host_src, bytes, async_arg)}
> +@item                   @tab @code{type(c_ptr), value :: data_dev_dest}
> +@item                   @tab @code{type(*), dimension(*) :: data_host_src}
> +@item                   @tab @code{integer(c_size_t), value :: bytes}
> +@item                   @tab @code{integer(acc_handle_kind), value :: async_arg}
>  @end multitable

I did wonder whether we should (here, and elsewhere) also update the
'@menu' in "OpenACC Runtime Library Routines" to list the 'async'
routines -- but the OpenACC specification also doesn't, so it shall be
fine as is here, too.

>  @item @emph{Reference}:
>  @uref{https://www.openacc.org, OpenACC specification v2.6}, section
> -3.2.31.
> +3.2.31  @uref{https://www.openacc.org, OpenACC specification v3.3}, section

(Fine as is, of course, but could -- generally -- simplify the 'diff' by
starting the new '@uref' on its own line.)

> +3.2.26..

Double '.'.

> --- a/libgomp/openacc.f90
> +++ b/libgomp/openacc.f90
> @@ -758,6 +758,93 @@ module openacc_internal
>        integer (c_int), value :: async
>      end subroutine
>    end interface
> +
> +  interface
> +    type(c_ptr) function acc_malloc (bytes) bind(C)
> +[...]
> +    end subroutine
> +  end interface
>  end module openacc_internal

Assuming that 'module openacc_internal' currently is sorted per
appearance in the OpenACC specification (?), I suggest we continue to do
so.  (..., like in 'openacc_lib.h', too.)

> @@ -794,6 +881,9 @@ module openacc
>    public :: acc_copyin_async, acc_create_async, acc_copyout_async
>    public :: acc_delete_async, acc_update_device_async, acc_update_self_async
>    public :: acc_copyout_finalize, acc_delete_finalize
> +  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
> +  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
> +  public :: acc_memcpy_from_device, acc_memcpy_from_device_async

Likewise.

> @@ -871,9 +961,6 @@ module openacc
>      procedure :: acc_on_device_h
>    end interface
>  
> -  ! acc_malloc: Only available in C/C++
> -  ! acc_free: Only available in C/C++
> -
>    ! As vendor extension, the following code supports both 32bit and 64bit
>    ! arguments for "size"; the OpenACC standard only permits default-kind
>    ! integers, which are of kind 4 (i.e. 32 bits).
> @@ -953,20 +1040,12 @@ module openacc
>      procedure :: acc_update_self_array_h
>    end interface
>  
> -  ! acc_map_data: Only available in C/C++
> -  ! acc_unmap_data: Only available in C/C++
> -  ! acc_deviceptr: Only available in C/C++
> -  ! acc_hostptr: Only available in C/C++
> -
>    interface acc_is_present
>      procedure :: acc_is_present_32_h
>      procedure :: acc_is_present_64_h
>      procedure :: acc_is_present_array_h
>    end interface
>  
> -  ! acc_memcpy_to_device: Only available in C/C++
> -  ! acc_memcpy_from_device: Only available in C/C++
> -
>    interface acc_copyin_async
>      procedure :: acc_copyin_async_32_h
>      procedure :: acc_copyin_async_64_h

Is that now a different style that we're not listing the new interfaces
in 'module openacc' here?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90
> @@ -0,0 +1,43 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +! Fortran version of libgomp.oacc-c-c++-common/lib-59.c 

I like to also put a cross reference into the originating C/C++ test
case, so that anyone adjusting either one also is aware that another one
may need adjusting, too.

> +  ! The following assumes sizeof(void*) being the same on host and device:

That's generally required anyway.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
> @@ -0,0 +1,47 @@
> +! { dg-do run }
> +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
> +
> +! based on libgomp.oacc-c-c++-common/lib-60.c

Likewise.


Grüße
 Thomas
Tobias Burnus Feb. 27, 2024, 12:29 p.m. UTC | #2
Hi Thomas,

(Regarding 'call acc_attach(x)' – the problem is that one needs the 
address of '&x' and 'x'; while 'x' is readily available, for '&x' no 
temporary variable has to get involved – and there are plenty of ways 
temporaries can get introduced; for most cases, an interface exists that 
prevents this but they are mutually exclusive. Hence, this needs support 
in the FE. The simplest workaround for a user is to use '!$acc attach' 
instead ...)

Thomas Schwinge:
>>   @table @asis
>>   @item @emph{Description}
>> -This function allocates @var{len} bytes of device memory. It returns
>> +This function allocates @var{bytes} of device memory. It returns
> Not '@var{bytes} {+bytes+}' or similar?

I think either works – depending how one parses @var{<name>} mentally, 
one of the variants sounds smooth and the other very odd. But I can/will 
change it.

>> --- a/libgomp/openacc.f90
>> +++ b/libgomp/openacc.f90
> Assuming that 'module openacc_internal' currently is sorted per
> appearance in the OpenACC specification (?), I suggest we continue to do
> so.  (..., like in 'openacc_lib.h', too.)
I will check – it looks only block-wise sorted but I might be wrong.I 
followed location of the comments, placing it before the routines that 
followed the comment, assuming that the comments were at the right spot.
>> @@ -794,6 +881,9 @@ module openacc
>> ...
>> +  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
>> +  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
>> +  public :: acc_memcpy_from_device, acc_memcpy_from_device_async
>>   ...
>> -  ! acc_malloc: Only available in C/C++
>> -  ! acc_free: Only available in C/C++
>> -
>> ...
>>     interface acc_is_present
>>       procedure :: acc_is_present_32_h
>>       procedure :: acc_is_present_64_h
>>       procedure :: acc_is_present_array_h
>>     end interface
> Is that now a different style that we're not listing the new interfaces
> in 'module openacc' here?

As there is no precedent for this type of interface, the style is by 
nature differently. But the question is which style is better. The 
current 'openacc' is very short – and contains not a single specific 
interface, but only generic interfaces. The actual specific-procedure 
declarations are only in 'openacc_internal'.

Those new procedures are the first ones that do not have a generic 
interface and only a specific one. Thus, one can either put the specific 
one into 'openacc_internal' and refer it from 'openacc' (via 'use 
openacc_internal' + 'public :: acc_<routine-name>') – or place the 
interface directly into 'openacc' (and not touching 'openacc_internal' 
at all).

During development, I had a accidentally a mixture between both - and 
then settled for the current variant. – Possibly, moving the interface 
to 'openacc' is clearer?

Thoughts?

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90
>> [...]
>> +! Fortran version of libgomp.oacc-c-c++-common/lib-59.c
> I like to also put a cross reference into the originating C/C++ test
> case, so that anyone adjusting either one also is aware that another one
> may need adjusting, too.
OK - I will do so.
>> +  ! The following assumes sizeof(void*) being the same on host and device:
> That's generally required anyway.

I have to admit that I don't know OpenACC well enough to see whether 
that's the case or not. And, while I am not very consistent, I do try to 
document stricter requirements / implementation-specific parts in a 
testcases.

I know that OpenMP permits that the pointer size differs and 'void *p = 
omp_target_alloc (...);' might in this case not return the device 
pointer but a handle to the device ptr. (For instance, it could be a 
pointer to an uint128_t variable for a 128bit device pointer; I think 
such a hardware exists in real - and uses several bits for other 
purposes like flags.)

In that case, host-side pointer arithmetic won't work and 
'is_device_ptr' clauses etc. need to do transfer work.

But, admittedly, in GCC there it is assumed at many places that both 
sides use the same pointer size* and also during specification 
development, everyone implicitly assumes that routines and clauses yield 
bare device pointers and not some opaque pointer to the actual data (a 
handle); hence, one has to keep remind oneself that the spec permits 
system where that's not the case.

Tobias

(* There are a few spots which handle a smaller device pointer than the 
host pointer or consider a different size but that's not done very 
consistently and largely lacking.)
Thomas Schwinge Feb. 27, 2024, 1:47 p.m. UTC | #3
Hi Tobias!

On 2024-02-27T13:29:33+0100, Tobias Burnus <tburnus@baylibre.com> wrote:
> Thomas Schwinge:
>>>   @table @asis
>>>   @item @emph{Description}
>>> -This function allocates @var{len} bytes of device memory. It returns
>>> +This function allocates @var{bytes} of device memory. It returns

>> Not '@var{bytes} {+bytes+}' or similar?
>
> I think either works – depending how one parses @var{<name>} mentally, 
> one of the variants sounds smooth and the other very odd. But I can/will 
> change it.

Yeah, I see.  Not the strongest argument ("upstream vs. local" style),
but I see that while OpenACC 3.3 doesn't for 'acc_malloc', it does, for
example, for 'acc_copyin' talk about "'bytes' bytes" (or, avoiding the
issue: "'bytes' specifies the data size in bytes").


>>> --- a/libgomp/openacc.f90
>>> +++ b/libgomp/openacc.f90

>> Assuming that 'module openacc_internal' currently is sorted per
>> appearance in the OpenACC specification (?), I suggest we continue to do
>> so.  (..., like in 'openacc_lib.h', too.)

> I will check – it looks only block-wise sorted but I might be wrong.

OK, but please don't sink too much time into that.

> I 
> followed location of the comments, placing it before the routines that 
> followed the comment, assuming that the comments were at the right spot.


>>> @@ -794,6 +881,9 @@ module openacc
>>> ...
>>> +  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
>>> +  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
>>> +  public :: acc_memcpy_from_device, acc_memcpy_from_device_async
>>>   ...
>>> -  ! acc_malloc: Only available in C/C++
>>> -  ! acc_free: Only available in C/C++
>>> -
>>> ...
>>>     interface acc_is_present
>>>       procedure :: acc_is_present_32_h
>>>       procedure :: acc_is_present_64_h
>>>       procedure :: acc_is_present_array_h
>>>     end interface

>> Is that now a different style that we're not listing the new interfaces
>> in 'module openacc' here?
>
> As there is no precedent for this type of interface, the style is by 
> nature differently. But the question is which style is better. The 
> current 'openacc' is very short – and contains not a single specific 
> interface, but only generic interfaces. The actual specific-procedure 
> declarations are only in 'openacc_internal'.
>
> Those new procedures are the first ones that do not have a generic 
> interface and only a specific one. Thus, one can either put the specific 
> one into 'openacc_internal' and refer it from 'openacc' (via 'use 
> openacc_internal' + 'public :: acc_<routine-name>') – or place the 
> interface directly into 'openacc' (and not touching 'openacc_internal' 
> at all).
>
> During development, I had a accidentally a mixture between both - and 
> then settled for the current variant. – Possibly, moving the interface 
> to 'openacc' is clearer?
>
> Thoughts?

No, sorry.  As I said: "I don't know much about Fortran interfaces".  :-|


>>> --- /dev/null
>>> +++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90

>>> +  ! The following assumes sizeof(void*) being the same on host and device:

>> That's generally required anyway.
>
> I have to admit that I don't know OpenACC well enough to see whether 
> that's the case or not.

My thinking, "simply", is that this follows implicitly from the fact that
data layout has to match between host and device, and if pointers have
different sizes, that breaks?

For example, OpenACC 3.3, 2.6.4 "Data Structures with Pointers":

| [...]
| When a data object is copied to device memory, the values are copied exactly. If the data is a data
| structure that includes a pointer, or is just a pointer, the pointer value copied to device memory
| will be the host pointer value. [...]

> And, while I am not very consistent, I do try to 
> document stricter requirements / implementation-specific parts in a 
> testcases.

ACK, that's always good practice.

> I know that OpenMP permits that the pointer size differs

Oh, really!?

> and 'void *p = 
> omp_target_alloc (...);' might in this case not return the device 
> pointer but a handle to the device ptr. (For instance, it could be a 
> pointer to an uint128_t variable for a 128bit device pointer; I think 
> such a hardware exists in real - and uses several bits for other 
> purposes like flags.)

I do see in OpenMP 5.2, 1.2.6 "Data Terminology":

| *device address*  An address of an object that may be referenced on a _target device_.

| *device pointer*  An _implementation-defined handle_ that refers to a _device address_.

..., and I'm now -- at least vaguely -- curious how OpenMP handles
different-sized pointers for host vs. device in host/device-shared data
layout.  ("Fortunately" ;-) I have too many higher-priority items to look
after, so not able to spend more time on that questions...)

> In that case, host-side pointer arithmetic won't work and 
> 'is_device_ptr' clauses etc. need to do transfer work.
>
> But, admittedly, in GCC there it is assumed at many places that both 
> sides use the same pointer size* and also during specification 
> development, everyone implicitly assumes that routines and clauses yield 
> bare device pointers and not some opaque pointer to the actual data (a 
> handle); hence, one has to keep remind oneself that the spec permits 
> system where that's not the case.

> (* There are a few spots which handle a smaller device pointer than the 
> host pointer or consider a different size but that's not done very 
> consistently and largely lacking.)

Yeah, I guess.


Grüße
 Thomas
diff mbox series

Patch

OpenACC: Add Fortran routines acc_{alloc,free,hostptr,deviceptr,memcpy_{to,from}_device*}

These routines map simply to the C counterpart and are meanwhile
defined in OpenACC 3.3. (There are additional routine changes,
including the Fortran addition of acc_attach/acc_detach, that
require more work than a simple addition of an interface and
are therefore excluded.)

libgomp/ChangeLog:

	* libgomp.texi (OpenACC Runtime Library Routines): Document new 3.3
	routines that simply map to their C counterpart.
	* openacc.f90 (openacc_internal, openacc): Add them.
	* openacc_lib.h: Likewise.
	* testsuite/libgomp.fortran/acc_host_device_ptr.f90: New test.
	* testsuite/libgomp.oacc-fortran/acc-memcpy.f90: New test.

 libgomp/libgomp.texi                               | 171 ++++++++++++++++-----
 libgomp/openacc.f90                                | 101 ++++++++++--
 libgomp/openacc_lib.h                              |  94 ++++++++++-
 .../libgomp.fortran/acc_host_device_ptr.f90        |  43 ++++++
 .../testsuite/libgomp.oacc-fortran/acc-memcpy.f90  |  47 ++++++
 5 files changed, 399 insertions(+), 57 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index f57190f203c..d7da799a922 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2157,8 +2157,6 @@  dimensions.
 Running this routine in a @code{target} region is not supported except on
 the initial device.
 
-
-
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect_async(void *dst,}
@@ -4684,7 +4682,6 @@  returns @code{false}.
 @item                   @tab @code{logical acc_on_device}
 @end multitable
 
-
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
 3.2.17.
@@ -4696,17 +4693,24 @@  returns @code{false}.
 @section @code{acc_malloc} -- Allocate device memory.
 @table @asis
 @item @emph{Description}
-This function allocates @var{len} bytes of device memory. It returns
+This function allocates @var{bytes} of device memory. It returns
 the device address of the allocated memory.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t len);}
+@item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function acc_malloc(bytes)}
+@item                   @tab @code{integer(c_size_t), value :: bytes}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.18.
+3.2.18.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.16.
 @end table
 
 
@@ -4715,16 +4719,23 @@  the device address of the allocated memory.
 @section @code{acc_free} -- Free device memory.
 @table @asis
 @item @emph{Description}
-Free previously allocated device memory at the device address @code{a}.
+Free previously allocated device memory at the device address @code{data_dev}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_free(d_void *a);}
+@item @emph{Prototype}: @tab @code{void acc_free(d_void *data_dev);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_free(data_dev)}
+@item                   @tab @code{type(c_ptr), value :: data_dev}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.19.
+3.2.19.  @uref{https://www.openacc.org, openacc specification v3.3}, section
+3.2.17.
 @end table
 
 
@@ -5092,17 +5103,26 @@  array element and @var{len} specifies the length in bytes.
 @table @asis
 @item @emph{Description}
 This function maps previously allocated device and host memory. The device
-memory is specified with the device address @var{d}. The host memory is
-specified with the host address @var{h} and a length of @var{len}.
+memory is specified with the device address @var{data_dev}. The host memory is
+specified with the host address @var{data_arg} and a length of @var{bytes}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_map_data(h_void *h, d_void *d, size_t len);}
+@item @emph{Prototype}: @tab @code{void acc_map_data(h_void *data_arg, d_void *data_dev, size_t bytes);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_map_data(data_arg, data_dev, bytes)}
+@item                   @tab @code{type(*), dimension(*) :: data_arg}
+@item                   @tab @code{type(c_ptr), value :: data_dev}
+@item                   @tab @code{integer(c_size_t), value :: bytes}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.26.
+3.2.26.  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.21.
 @end table
 
 
@@ -5112,16 +5132,23 @@  specified with the host address @var{h} and a length of @var{len}.
 @table @asis
 @item @emph{Description}
 This function unmaps previously mapped device and host memory. The latter
-specified by @var{h}.
+specified by @var{data_arg}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_unmap_data(h_void *h);}
+@item @emph{Prototype}: @tab @code{void acc_unmap_data(h_void *data_arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_unmap_data(data_arg)}
+@item                   @tab @code{type(*), dimension(*) :: data_arg}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.27.
+3.2.27. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.22.
 @end table
 
 
@@ -5131,16 +5158,23 @@  specified by @var{h}.
 @table @asis
 @item @emph{Description}
 This function returns the device address that has been mapped to the
-host address specified by @var{h}.
+host address specified by @var{data_arg}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *h);}
+@item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *data_arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function acc_deviceptr(data_arg)}
+@item                   @tab @code{type(*), dimension(*) :: data_arg}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.28.
+3.2.28.  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.23.
 @end table
 
 
@@ -5150,16 +5184,23 @@  host address specified by @var{h}.
 @table @asis
 @item @emph{Description}
 This function returns the host address that has been mapped to the
-device address specified by @var{d}.
+device address specified by @var{data_dev}.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *d);}
+@item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *data_dev);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{type(c_ptr) function acc_hostptr(data_dev)}
+@item                   @tab @code{type(c_ptr), value :: data_dev}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.29.
+3.2.29.  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.24.
 @end table
 
 
@@ -5207,18 +5248,34 @@  a @code{false} is return to indicate the mapped memory is not present.
 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
 @table @asis
 @item @emph{Description}
-This function copies host memory specified by host address of @var{src} to
-device memory specified by the device address @var{dest} for a length of
-@var{bytes} bytes.
+This function copies host memory specified by host address of
+@var{data_host_src} to device memory specified by the device address
+@var{data_dev_dest} for a length of @var{bytes} bytes.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_memcpy_to_device(d_void *dest, h_void *src, size_t bytes);}
+@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device(d_void* data_dev_dest,}
+@item                   @tab @code{h_void* data_host_src, size_t bytes);}
+@item @emph{Prototype}: @tab @code{void acc_memcpy_to_device_async(d_void* data_dev_dest,}
+@item                   @tab @code{h_void* data_host_src, size_t bytes, int async_arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device(data_dev_dest, &}
+@item                   @tab @code{data_host_src, bytes)}
+@item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device_async(data_dev_dest, &}
+@item                   @tab @code{data_host_src, bytes, async_arg)}
+@item                   @tab @code{type(c_ptr), value :: data_dev_dest}
+@item                   @tab @code{type(*), dimension(*) :: data_host_src}
+@item                   @tab @code{integer(c_size_t), value :: bytes}
+@item                   @tab @code{integer(acc_handle_kind), value :: async_arg}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.31.
+3.2.31  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.26..
 @end table
 
 
@@ -5227,18 +5284,34 @@  device memory specified by the device address @var{dest} for a length of
 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
 @table @asis
 @item @emph{Description}
-This function copies host memory specified by host address of @var{src} from
-device memory specified by the device address @var{dest} for a length of
-@var{bytes} bytes.
+This function copies device memory specified by device address of
+@var{data_dev_src} to host memory specified by the host address
+@var{data_host_dest} for a length of @var{bytes} bytes.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_memcpy_from_device(d_void *dest, h_void *src, size_t bytes);}
+@item @emph{Prototype}: @tab @code{void acc_memcpy_from_device(h_void* data_host_dest,}
+@item                   @tab @code{d_void* data_dev_src, size_t bytes);}
+@item @emph{Prototype}: @tab @code{void acc_memcpy_from_device_async(h_void* data_host_dest,}
+@item                   @tab @code{d_void* data_dev_src, size_t bytes, int async_arg);}
+@end multitable
+
+@item @emph{Fortran}:
+@multitable @columnfractions .20 .80
+@item @emph{Interface}: @tab @code{subroutine acc_memcpy_from_device(data_host_dest, &}
+@item                   @tab @code{data_dev_src, bytes)}
+@item @emph{Interface}: @tab @code{subroutine acc_memcpy_from_device_async(data_host_dest, &}
+@item                   @tab @code{data_dev_src, bytes, async_arg)}
+@item                   @tab @code{type(*), dimension(*) :: data_host_dest}
+@item                   @tab @code{type(c_ptr), value :: data_dev_src}
+@item                   @tab @code{integer(c_size_t), value :: bytes}
+@item                   @tab @code{integer(acc_handle_kind), value :: async_arg}
 @end multitable
 
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
-3.2.32.
+3.2.32.  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+3.2.27.
 @end table
 
 
@@ -5252,13 +5325,23 @@  address to pointing to the corresponding device data.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_attach(h_void **ptr);}
-@item @emph{Prototype}: @tab @code{acc_attach_async(h_void **ptr, int async);}
+@item @emph{Prototype}: @tab @code{void acc_attach(h_void **ptr_addr);}
+@item @emph{Prototype}: @tab @code{void acc_attach_async(h_void **ptr_addr, int async);}
 @end multitable
 
+@c @item @emph{Fortran}:
+@c @multitable @columnfractions .20 .80
+@c @item @emph{Interface}: @tab @code{subroutine acc_attach(ptr_addr)}
+@c @item @emph{Interface}: @tab @code{subroutine acc_attach_async(ptr_addr, async_arg)}
+@c @item                   @tab @code{type(*), dimension(..) :: ptr_addr}
+@c @item                   @tab @code{integer(acc_handle_kind), value :: async_arg}
+@c @end multitable
+
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
 3.2.34.
+@c  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+@c 3.2.29.
 @end table
 
 
@@ -5272,15 +5355,27 @@  address to pointing to the corresponding host data.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
-@item @emph{Prototype}: @tab @code{acc_detach(h_void **ptr);}
-@item @emph{Prototype}: @tab @code{acc_detach_async(h_void **ptr, int async);}
-@item @emph{Prototype}: @tab @code{acc_detach_finalize(h_void **ptr);}
-@item @emph{Prototype}: @tab @code{acc_detach_finalize_async(h_void **ptr, int async);}
+@item @emph{Prototype}: @tab @code{void acc_detach(h_void **ptr_addr);}
+@item @emph{Prototype}: @tab @code{void acc_detach_async(h_void **ptr_addr, int async);}
+@item @emph{Prototype}: @tab @code{void acc_detach_finalize(h_void **ptr_addr);}
+@item @emph{Prototype}: @tab @code{void acc_detach_finalize_async(h_void **ptr_addr, int async);}
 @end multitable
 
+@c @item @emph{Fortran}:
+@c @multitable @columnfractions .20 .80
+@c @item @emph{Interface}: @tab @code{subroutine acc_detach(ptr_addr)}
+@c @item @emph{Interface}: @tab @code{subroutine acc_detach_async(ptr_addr, async_arg)}
+@c @item @emph{Interface}: @tab @code{subroutine acc_detach_finalize(ptr_addr)}
+@c @item @emph{Interface}: @tab @code{subroutine acc_detach_finalize_async(ptr_addr, async_arg)}
+@c @item                   @tab @code{type(*), dimension(..) :: ptr_addr}
+@c @item                   @tab @code{integer(acc_handle_kind), value :: async_arg}
+@c @end multitable
+
 @item @emph{Reference}:
 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
 3.2.35.
+@c  @uref{https://www.openacc.org, OpenACC specification v3.3}, section
+@c 3.2.29.
 @end table
 
 
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index 7270653a98a..42db07a757d 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -758,6 +758,93 @@  module openacc_internal
       integer (c_int), value :: async
     end subroutine
   end interface
+
+  interface
+    type(c_ptr) function acc_malloc (bytes) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      integer(c_size_t), value :: bytes
+    end function
+  end interface
+
+  interface
+    subroutine acc_free (data_dev) bind(C)
+      use iso_c_binding, only: c_ptr
+      type(c_ptr), value :: data_dev
+    end subroutine
+  end interface
+
+  interface
+    subroutine acc_map_data (data_arg, data_dev, bytes) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      type(*), dimension(*) :: data_arg
+      type(c_ptr), value :: data_dev
+      integer(c_size_t), value :: bytes
+    end subroutine
+  end interface
+
+  interface
+    subroutine acc_unmap_data (data_arg) bind(C)
+      type(*), dimension(*) :: data_arg
+    end subroutine
+  end interface
+
+  interface
+    type(c_ptr) function acc_deviceptr (data_arg) bind(C)
+      use iso_c_binding, only: c_ptr
+      type(*), dimension(*) :: data_arg
+    end function
+  end interface
+
+  interface
+    type(c_ptr) function acc_hostptr (data_dev) bind(C)
+      use iso_c_binding, only: c_ptr
+      type(c_ptr), value :: data_dev
+    end function
+  end interface
+
+  interface
+    subroutine acc_memcpy_to_device (data_dev_dest, data_host_src,  &
+                                     bytes) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      type(c_ptr), value :: data_dev_dest
+      type(*),dimension(*) :: data_host_src
+      integer(c_size_t), value :: bytes
+    end subroutine
+  end interface
+
+  interface
+    subroutine acc_memcpy_to_device_async (data_dev_dest, data_host_src,  &
+                                           bytes, async_arg) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      import :: acc_handle_kind
+      type(c_ptr), value :: data_dev_dest
+      type(*),dimension(*) :: data_host_src
+      integer(c_size_t), value :: bytes
+      integer(acc_handle_kind), value :: async_arg
+    end subroutine
+  end interface
+
+  interface
+    subroutine acc_memcpy_from_device (data_host_dest, data_dev_src,  &
+                                       bytes) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      type(*),dimension(*) :: data_host_dest
+      type(c_ptr), value :: data_dev_src
+      integer(c_size_t), value :: bytes
+    end subroutine
+  end interface
+
+  interface
+    subroutine acc_memcpy_from_device_async (data_host_dest, data_dev_src,  &
+                                             bytes, async_arg) bind(C)
+      use iso_c_binding, only: c_ptr, c_size_t
+      import :: acc_handle_kind
+      type(*),dimension(*) :: data_host_dest
+      type(c_ptr), value :: data_dev_src
+      integer(c_size_t), value :: bytes
+      integer(acc_handle_kind), value :: async_arg
+    end subroutine
+  end interface
 end module openacc_internal
 
 module openacc
@@ -794,6 +881,9 @@  module openacc
   public :: acc_copyin_async, acc_create_async, acc_copyout_async
   public :: acc_delete_async, acc_update_device_async, acc_update_self_async
   public :: acc_copyout_finalize, acc_delete_finalize
+  public :: acc_malloc, acc_free, acc_map_data, acc_unmap_data, acc_deviceptr
+  public :: acc_hostptr, acc_memcpy_to_device, acc_memcpy_to_device_async
+  public :: acc_memcpy_from_device, acc_memcpy_from_device_async
 
   integer, parameter :: openacc_version = 201711
 
@@ -871,9 +961,6 @@  module openacc
     procedure :: acc_on_device_h
   end interface
 
-  ! acc_malloc: Only available in C/C++
-  ! acc_free: Only available in C/C++
-
   ! As vendor extension, the following code supports both 32bit and 64bit
   ! arguments for "size"; the OpenACC standard only permits default-kind
   ! integers, which are of kind 4 (i.e. 32 bits).
@@ -953,20 +1040,12 @@  module openacc
     procedure :: acc_update_self_array_h
   end interface
 
-  ! acc_map_data: Only available in C/C++
-  ! acc_unmap_data: Only available in C/C++
-  ! acc_deviceptr: Only available in C/C++
-  ! acc_hostptr: Only available in C/C++
-
   interface acc_is_present
     procedure :: acc_is_present_32_h
     procedure :: acc_is_present_64_h
     procedure :: acc_is_present_array_h
   end interface
 
-  ! acc_memcpy_to_device: Only available in C/C++
-  ! acc_memcpy_from_device: Only available in C/C++
-
   interface acc_copyin_async
     procedure :: acc_copyin_async_32_h
     procedure :: acc_copyin_async_64_h
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
index dfbf0a75a8f..913c3f1aa3d 100644
--- a/libgomp/openacc_lib.h
+++ b/libgomp/openacc_lib.h
@@ -204,8 +204,19 @@ 
         end function
       end interface
 
-      ! acc_malloc: Only available in C/C++
-      ! acc_free: Only available in C/C++
+      interface
+        type(c_ptr) function acc_malloc(bytes) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          integer(c_size_t), value :: bytes
+        end function
+      end interface
+
+      interface
+        subroutine acc_free(data_dev) bind(C)
+          use iso_c_binding, only: c_ptr
+          type(c_ptr), value :: data_dev
+        end subroutine
+      end interface
 
       interface acc_copyin
         subroutine acc_copyin_32_h (a, len)
@@ -419,10 +430,34 @@ 
         end subroutine
       end interface
 
-      ! acc_map_data: Only available in C/C++
-      ! acc_unmap_data: Only available in C/C++
-      ! acc_deviceptr: Only available in C/C++
-      ! acc_hostptr: Only available in C/C++
+      interface
+        subroutine acc_map_data(data_arg, data_dev, bytes) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          type(*), dimension(*) :: data_arg
+          type(c_ptr), value :: data_dev
+          integer(c_size_t), value :: bytes
+        end subroutine
+      end interface
+
+      interface
+        subroutine acc_unmap_data(data_arg) bind(C)
+          type(*), dimension(*) :: data_arg
+        end subroutine
+      end interface
+
+      interface
+        type(c_ptr) function acc_deviceptr(data_arg) bind(C)
+          use iso_c_binding, only: c_ptr
+          type(*), dimension(*) :: data_arg
+        end function
+      end interface
+
+      interface
+        type(c_ptr) function acc_hostptr(data_dev) bind(C)
+          use iso_c_binding, only: c_ptr
+          type(c_ptr), value :: data_dev
+        end function
+      end interface
 
       interface acc_is_present
         function acc_is_present_32_h (a, len)
@@ -447,8 +482,51 @@ 
         end function
       end interface
 
-      ! acc_memcpy_to_device: Only available in C/C++
-      ! acc_memcpy_from_device: Only available in C/C++
+      interface
+        subroutine acc_memcpy_to_device(data_dev_dest, data_host_src,   &
+     &                                  bytes) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          type(c_ptr), value :: data_dev_dest
+          type(*),dimension(*) :: data_host_src
+          integer(c_size_t), value :: bytes
+        end subroutine
+      end interface
+
+      interface
+        subroutine acc_memcpy_to_device_async(data_dev_dest,            &
+     &                                        data_host_src, bytes,     &
+     &                                        async_arg) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          import :: acc_handle_kind
+          type(c_ptr), value :: data_dev_dest
+          type(*),dimension(*) :: data_host_src
+          integer(c_size_t), value :: bytes
+          integer(acc_handle_kind), value :: async_arg
+        end subroutine
+      end interface
+
+      interface
+        subroutine acc_memcpy_from_device(data_host_dest,               &
+     &                                    data_dev_src, bytes) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          type(*),dimension(*) :: data_host_dest
+          type(c_ptr), value :: data_dev_src
+          integer(c_size_t), value :: bytes
+        end subroutine
+      end interface
+
+      interface
+        subroutine acc_memcpy_from_device_async(data_host_dest,         &
+     &                                          data_dev_src, bytes,    &
+     &                                          async_arg) bind(C)
+          use iso_c_binding, only: c_ptr, c_size_t
+          import :: acc_handle_kind
+          type(*),dimension(*) :: data_host_dest
+          type(c_ptr), value :: data_dev_src
+          integer(c_size_t), value :: bytes
+          integer(acc_handle_kind), value :: async_arg
+        end subroutine
+      end interface
 
       interface acc_copyin_async
         subroutine acc_copyin_async_32_h (a, len, async)
diff --git a/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90 b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90
new file mode 100644
index 00000000000..56b9597dadf
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/acc_host_device_ptr.f90
@@ -0,0 +1,43 @@ 
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+! Fortran version of libgomp.oacc-c-c++-common/lib-59.c 
+
+program main
+  use iso_c_binding
+  use openacc
+  implicit none (type, external)
+
+  integer(c_size_t), parameter :: N = 256
+  character(c_char), allocatable, target :: h_data(:)
+  type(c_ptr) :: dptr, dptr_t
+  integer(c_intptr_t) :: iptr, i
+
+  allocate(h_data(0:N))
+  dptr = acc_malloc (N+1)
+
+  call acc_map_data (h_data, dptr, N+1)
+
+  ! The following assumes sizeof(void*) being the same on host and device:
+  do i = 0, N
+    dptr_t = transfer (transfer(dptr, iptr) + i, dptr_t)
+    if (.not. c_associated (acc_hostptr (dptr_t), c_loc (h_data(i)))) &
+      stop 1
+    if (.not. c_associated (dptr_t, acc_deviceptr (h_data(i)))) &
+      stop 2
+  end do
+
+  call acc_unmap_data (h_data)
+
+  do i = 0, N
+    dptr_t = transfer (transfer(dptr, iptr) + i, dptr_t)
+    if (c_associated (acc_hostptr (dptr_t))) &
+      stop 3
+    if (c_associated (acc_deviceptr (h_data(i)))) &
+      stop 4
+  end do
+
+  call acc_free (dptr)
+
+  deallocate (h_data)
+end
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
new file mode 100644
index 00000000000..670dc50ff07
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc-memcpy.f90
@@ -0,0 +1,47 @@ 
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+! based on libgomp.oacc-c-c++-common/lib-60.c
+
+program main
+  use openacc
+  use iso_fortran_env
+  use iso_c_binding
+  implicit none (type, external)
+  integer(int8), allocatable :: char(:)
+  type(c_ptr) :: dptr
+  integer(c_intptr_t) :: i
+  integer(int8) :: j
+
+  allocate(char(-128:127))
+  do i = -128, 127
+    char(j) = int (j, int8)
+  end do
+
+  dptr = acc_malloc (256_c_size_t)
+  call acc_memcpy_to_device (dptr, char, 255_c_size_t)
+
+  do i = 0, 255
+    if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
+      stop 1
+  end do
+
+  char = 0_int8
+
+  call acc_memcpy_from_device (char, dptr, 256_c_size_t)
+
+  do i = -128, 127
+    char(i) = int (j, int8)
+    if (char(i) /= j) &
+      stop 2
+  end do
+
+  do i = 0, 255
+    if (acc_is_present (transfer (transfer(char, i) + i, dptr), 1)) &
+      stop 3
+  end do
+
+  call acc_free (dptr)
+
+  deallocate (char)
+end