diff mbox series

[Fortran,PR107635,Part,1] Rework handling of allocatable components in derived type coarrays.

Message ID 20241206191008.55f0ec63@vepi2
State New
Headers show
Series [Fortran,PR107635,Part,1] Rework handling of allocatable components in derived type coarrays. | expand

Commit Message

Andre Vehreschild Dec. 6, 2024, 6:10 p.m. UTC
Hi all,

I had to dive deeply into the issue with handling allocatable components in
derived types and to find a future proof solution. I hope do have found a
universal and flexible one now:

For each allocatable (or pointer) component in a derived type a coarray token
is required. While this is quite easy for the current compilation unit, it is
very difficult to do for arbitrary compilation units or when one gets libraries
that are not aware of coarray's needs. The approach this patch now implements
is to delegate the evaluation of a reference into a coarray into a separate
access routine. This routine is present on every image, because it is generated
by the compiler at the time it knows that a coarray access is done. With
external compilation units or libraries this solves the access issue, because
each image knows how to access its own objects, but does not need a coarray
token for allocatable (or pointer) components anymore. The access on the remote
image's object is done by the remote image itself (for the MPI implementation in
a separate thread). Therefore it knows about the bounds of arrays, allocation
and association state of components and can handle those.

Furthermore is this approach faster, because it has O(1) complexity regarding
the communication. The old approach was O(N) where N is the number of
allocatable/pointer components + array descriptors on the path of the access.
The new approach sends a set of parameters to the remote image and gets the
desired data in return.

At the moment the patch handles only getting of data from a remote image. It is
split into two patchsets. The first one does some preparatory clean up, like
stopping to add caf_get calls into the expression tree and removing them
afterwards again, where they are in the way.

The second patch is then doing the access routine creation. Unfortunately is
this the longer patch. I have also updated the documentation of the caf API. I
hope to not have overlooked something.

This is the first part of a series to rework all coarray access routines to use
the new approach and then remove the deprecated calls. This makes things
clearer and easier to maintain, although the tree-dump now presents some more
generated routines, which might look odd.

Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok for
mainline?

I will continue working on the coarray stuff and fix upcoming bugs in the
future.

Regards,
	Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de

Comments

Andre Vehreschild Dec. 16, 2024, 9:58 a.m. UTC | #1
PING!

On Fri, 6 Dec 2024 19:10:08 +0100
Andre Vehreschild <vehre@gmx.de> wrote:

> Hi all,
>
> I had to dive deeply into the issue with handling allocatable components in
> derived types and to find a future proof solution. I hope do have found a
> universal and flexible one now:
>
> For each allocatable (or pointer) component in a derived type a coarray token
> is required. While this is quite easy for the current compilation unit, it is
> very difficult to do for arbitrary compilation units or when one gets
> libraries that are not aware of coarray's needs. The approach this patch now
> implements is to delegate the evaluation of a reference into a coarray into a
> separate access routine. This routine is present on every image, because it
> is generated by the compiler at the time it knows that a coarray access is
> done. With external compilation units or libraries this solves the access
> issue, because each image knows how to access its own objects, but does not
> need a coarray token for allocatable (or pointer) components anymore. The
> access on the remote image's object is done by the remote image itself (for
> the MPI implementation in a separate thread). Therefore it knows about the
> bounds of arrays, allocation and association state of components and can
> handle those.
>
> Furthermore is this approach faster, because it has O(1) complexity regarding
> the communication. The old approach was O(N) where N is the number of
> allocatable/pointer components + array descriptors on the path of the access.
> The new approach sends a set of parameters to the remote image and gets the
> desired data in return.
>
> At the moment the patch handles only getting of data from a remote image. It
> is split into two patchsets. The first one does some preparatory clean up,
> like stopping to add caf_get calls into the expression tree and removing them
> afterwards again, where they are in the way.
>
> The second patch is then doing the access routine creation. Unfortunately is
> this the longer patch. I have also updated the documentation of the caf API. I
> hope to not have overlooked something.
>
> This is the first part of a series to rework all coarray access routines to
> use the new approach and then remove the deprecated calls. This makes things
> clearer and easier to maintain, although the tree-dump now presents some more
> generated routines, which might look odd.
>
> Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok
> for mainline?
>
> I will continue working on the coarray stuff and fix upcoming bugs in the
> future.
>
> Regards,
> 	Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
Damian Rouson Dec. 16, 2024, 12:53 p.m. UTC | #2
This is a big deal, Andre!   Thanks for working on this patch.  I have some
test code that I can dig up if that’s helpful.  Have you tested with nested
derived type components, i.e., allocatable components that are themselves
derived types that have allocatable components?

The need for this feature in one of my projects is one of two things that
led me to prioritize other compilers at the end of 2023.  At the time, the
other missing feature was automatic parallelization of `do concurrent`,
including automatic GPU offloading.  Then a few months ago, the death blow
that I couldn’t work around was robust support for kind type parameters.

D


Damian

On Fri, Dec 6, 2024 at 10:13 Andre Vehreschild <vehre@gmx.de> wrote:

> Hi all,
>
> I had to dive deeply into the issue with handling allocatable components in
> derived types and to find a future proof solution. I hope do have found a
> universal and flexible one now:
>
> For each allocatable (or pointer) component in a derived type a coarray
> token
> is required. While this is quite easy for the current compilation unit, it
> is
> very difficult to do for arbitrary compilation units or when one gets
> libraries
> that are not aware of coarray's needs. The approach this patch now
> implements
> is to delegate the evaluation of a reference into a coarray into a separate
> access routine. This routine is present on every image, because it is
> generated
> by the compiler at the time it knows that a coarray access is done. With
> external compilation units or libraries this solves the access issue,
> because
> each image knows how to access its own objects, but does not need a coarray
> token for allocatable (or pointer) components anymore. The access on the
> remote
> image's object is done by the remote image itself (for the MPI
> implementation in
> a separate thread). Therefore it knows about the bounds of arrays,
> allocation
> and association state of components and can handle those.
>
> Furthermore is this approach faster, because it has O(1) complexity
> regarding
> the communication. The old approach was O(N) where N is the number of
> allocatable/pointer components + array descriptors on the path of the
> access.
> The new approach sends a set of parameters to the remote image and gets the
> desired data in return.
>
> At the moment the patch handles only getting of data from a remote image.
> It is
> split into two patchsets. The first one does some preparatory clean up,
> like
> stopping to add caf_get calls into the expression tree and removing them
> afterwards again, where they are in the way.
>
> The second patch is then doing the access routine creation. Unfortunately
> is
> this the longer patch. I have also updated the documentation of the caf
> API. I
> hope to not have overlooked something.
>
> This is the first part of a series to rework all coarray access routines
> to use
> the new approach and then remove the deprecated calls. This makes things
> clearer and easier to maintain, although the tree-dump now presents some
> more
> generated routines, which might look odd.
>
> Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41.
> Ok for
> mainline?
>
> I will continue working on the coarray stuff and fix upcoming bugs in the
> future.
>
> Regards,
>         Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>
Steve Kargl Dec. 18, 2024, 4:05 a.m. UTC | #3
On Mon, Dec 16, 2024 at 04:53:42AM -0800, Damian Rouson wrote:
> including automatic GPU offloading.  Then a few months ago, the death blow
> that I couldn’t work around was robust support for kind type parameters.
> 

gfortran doesn't have robust kind type parameters?

% cat xx.f90
program foo
   use iso_fortran_env
   print '(A,*(I0,1X))', '         real kinds: ', real_kinds
   print '(A,*(I0,1X))', '      integer kinds: ', integer_kinds
   print '(A,*(I0,1X))', '      logical kinds: ', logical_kinds
   print '(A,*(I0,1X))', '    character kinds: ', character_kinds
end program foo
% gfcx -o z xx.f90 && ./z
         real kinds: 4 8 10 16
      integer kinds: 1 2 4 8 16
      logical kinds: 1 2 4 8 16
    character kinds: 1 4
% ../tmp/lfortran/work/bin/lfortran -o z xx.f90
Segmentation fault (core dumped)
% root/build/bin/flang -o z xx.f90 && ./z
         real kinds: 2 3 4 8 10
      integer kinds: 1 2 4 8 16
      logical kinds: 1 2 4 8
    character kinds: 1 2 4

So, flang supports IEEE binary16 and brain float16, which makes
sense as Nvidia has a vested interest in these types.  flang
does not support IEEE binary128; while gfortran does.  flang
also has an additional character kind type.  Are you doing
work with 'CHARACTER(KIND=2)' while not needing a 16-byte
logical kind type?

Or, perhaps, you care to qualify your generalization?
Jerry D Dec. 18, 2024, 8:40 p.m. UTC | #4
Andre, have you tested this with the tests in OpenCoarrays suite?

I ask since this touches coarray things.

Jerry

On 12/16/24 1:58 AM, Andre Vehreschild wrote:
> PING!
> 
> On Fri, 6 Dec 2024 19:10:08 +0100
> Andre Vehreschild <vehre@gmx.de> wrote:
> 
>> Hi all,
>>
>> I had to dive deeply into the issue with handling allocatable components in
>> derived types and to find a future proof solution. I hope do have found a
>> universal and flexible one now:
>>
>> For each allocatable (or pointer) component in a derived type a coarray token
>> is required. While this is quite easy for the current compilation unit, it is
>> very difficult to do for arbitrary compilation units or when one gets
>> libraries that are not aware of coarray's needs. The approach this patch now
>> implements is to delegate the evaluation of a reference into a coarray into a
>> separate access routine. This routine is present on every image, because it
>> is generated by the compiler at the time it knows that a coarray access is
>> done. With external compilation units or libraries this solves the access
>> issue, because each image knows how to access its own objects, but does not
>> need a coarray token for allocatable (or pointer) components anymore. The
>> access on the remote image's object is done by the remote image itself (for
>> the MPI implementation in a separate thread). Therefore it knows about the
>> bounds of arrays, allocation and association state of components and can
>> handle those.
>>
>> Furthermore is this approach faster, because it has O(1) complexity regarding
>> the communication. The old approach was O(N) where N is the number of
>> allocatable/pointer components + array descriptors on the path of the access.
>> The new approach sends a set of parameters to the remote image and gets the
>> desired data in return.
>>
>> At the moment the patch handles only getting of data from a remote image. It
>> is split into two patchsets. The first one does some preparatory clean up,
>> like stopping to add caf_get calls into the expression tree and removing them
>> afterwards again, where they are in the way.
>>
>> The second patch is then doing the access routine creation. Unfortunately is
>> this the longer patch. I have also updated the documentation of the caf API. I
>> hope to not have overlooked something.
>>
>> This is the first part of a series to rework all coarray access routines to
>> use the new approach and then remove the deprecated calls. This makes things
>> clearer and easier to maintain, although the tree-dump now presents some more
>> generated routines, which might look odd.
>>
>> Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok
>> for mainline?
>>
>> I will continue working on the coarray stuff and fix upcoming bugs in the
>> future.
>>
>> Regards,
>> 	Andre
>> --
>> Andre Vehreschild * Email: vehre ad gmx dot de
> 
> 
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
Andre Vehreschild Dec. 18, 2024, 9:10 p.m. UTC | #5
Hi Jerry,

I am working on the OpenCoarrays library in parallel. At the moment some of
their tests are non functional. And today I even found a bug in the current
implementation of caf-single and in my new approach. I am still fixing that.

But that should not keep you from looking into this patch. I will post
several parts of it that extend the already published one and complete the
whole set of operations. So feel free to comment early, because the earlier
the easier changes can be made.

Regards,
Andre
Andre Vehreschild
Andre Vehreschild Dec. 20, 2024, 12:37 p.m. UTC | #6
Hi all,

pinging again.

Please note, that attached patch is a slightly updated version.

Regtests ok on x86_64-pc-linux-gnu. Ok for mainline?

Regards,
	Andre

On Mon, 16 Dec 2024 10:58:22 +0100
Andre Vehreschild <vehre@gmx.de> wrote:

> PING!
>
> On Fri, 6 Dec 2024 19:10:08 +0100
> Andre Vehreschild <vehre@gmx.de> wrote:
>
> > Hi all,
> >
> > I had to dive deeply into the issue with handling allocatable components in
> > derived types and to find a future proof solution. I hope do have found a
> > universal and flexible one now:
> >
> > For each allocatable (or pointer) component in a derived type a coarray
> > token is required. While this is quite easy for the current compilation
> > unit, it is very difficult to do for arbitrary compilation units or when
> > one gets libraries that are not aware of coarray's needs. The approach this
> > patch now implements is to delegate the evaluation of a reference into a
> > coarray into a separate access routine. This routine is present on every
> > image, because it is generated by the compiler at the time it knows that a
> > coarray access is done. With external compilation units or libraries this
> > solves the access issue, because each image knows how to access its own
> > objects, but does not need a coarray token for allocatable (or pointer)
> > components anymore. The access on the remote image's object is done by the
> > remote image itself (for the MPI implementation in a separate thread).
> > Therefore it knows about the bounds of arrays, allocation and association
> > state of components and can handle those.
> >
> > Furthermore is this approach faster, because it has O(1) complexity
> > regarding the communication. The old approach was O(N) where N is the
> > number of allocatable/pointer components + array descriptors on the path of
> > the access. The new approach sends a set of parameters to the remote image
> > and gets the desired data in return.
> >
> > At the moment the patch handles only getting of data from a remote image. It
> > is split into two patchsets. The first one does some preparatory clean up,
> > like stopping to add caf_get calls into the expression tree and removing
> > them afterwards again, where they are in the way.
> >
> > The second patch is then doing the access routine creation. Unfortunately is
> > this the longer patch. I have also updated the documentation of the caf
> > API. I hope to not have overlooked something.
> >
> > This is the first part of a series to rework all coarray access routines to
> > use the new approach and then remove the deprecated calls. This makes things
> > clearer and easier to maintain, although the tree-dump now presents some
> > more generated routines, which might look odd.
> >
> > Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and 41. Ok
> > for mainline?
> >
> > I will continue working on the coarray stuff and fix upcoming bugs in the
> > future.
> >
> > Regards,
> > 	Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
Jerry D Dec. 20, 2024, 4:53 p.m. UTC | #7
I got as far as applying and regression testing version 2. I have queried
around for others to look over the patch. I will scan this one.

Since you are the expert in this area, we may just have you push and let
the pieces fall out.

I will get back to you later today.

Jerry

On Fri, Dec 20, 2024, 4:37 AM Andre Vehreschild <vehre@gmx.de> wrote:

> Hi all,
>
> pinging again.
>
> Please note, that attached patch is a slightly updated version.
>
> Regtests ok on x86_64-pc-linux-gnu. Ok for mainline?
>
> Regards,
>         Andre
>
> On Mon, 16 Dec 2024 10:58:22 +0100
> Andre Vehreschild <vehre@gmx.de> wrote:
>
> > PING!
> >
> > On Fri, 6 Dec 2024 19:10:08 +0100
> > Andre Vehreschild <vehre@gmx.de> wrote:
> >
> > > Hi all,
> > >
> > > I had to dive deeply into the issue with handling allocatable
> components in
> > > derived types and to find a future proof solution. I hope do have
> found a
> > > universal and flexible one now:
> > >
> > > For each allocatable (or pointer) component in a derived type a coarray
> > > token is required. While this is quite easy for the current compilation
> > > unit, it is very difficult to do for arbitrary compilation units or
> when
> > > one gets libraries that are not aware of coarray's needs. The approach
> this
> > > patch now implements is to delegate the evaluation of a reference into
> a
> > > coarray into a separate access routine. This routine is present on
> every
> > > image, because it is generated by the compiler at the time it knows
> that a
> > > coarray access is done. With external compilation units or libraries
> this
> > > solves the access issue, because each image knows how to access its own
> > > objects, but does not need a coarray token for allocatable (or pointer)
> > > components anymore. The access on the remote image's object is done by
> the
> > > remote image itself (for the MPI implementation in a separate thread).
> > > Therefore it knows about the bounds of arrays, allocation and
> association
> > > state of components and can handle those.
> > >
> > > Furthermore is this approach faster, because it has O(1) complexity
> > > regarding the communication. The old approach was O(N) where N is the
> > > number of allocatable/pointer components + array descriptors on the
> path of
> > > the access. The new approach sends a set of parameters to the remote
> image
> > > and gets the desired data in return.
> > >
> > > At the moment the patch handles only getting of data from a remote
> image. It
> > > is split into two patchsets. The first one does some preparatory clean
> up,
> > > like stopping to add caf_get calls into the expression tree and
> removing
> > > them afterwards again, where they are in the way.
> > >
> > > The second patch is then doing the access routine creation.
> Unfortunately is
> > > this the longer patch. I have also updated the documentation of the caf
> > > API. I hope to not have overlooked something.
> > >
> > > This is the first part of a series to rework all coarray access
> routines to
> > > use the new approach and then remove the deprecated calls. This makes
> things
> > > clearer and easier to maintain, although the tree-dump now presents
> some
> > > more generated routines, which might look odd.
> > >
> > > Bootstrapped and regtested ok on x86_64-pc-linux-gnu / Fedora 39 and
> 41. Ok
> > > for mainline?
> > >
> > > I will continue working on the coarray stuff and fix upcoming bugs in
> the
> > > future.
> > >
> > > Regards,
> > >     Andre
> > > --
> > > Andre Vehreschild * Email: vehre ad gmx dot de
> >
> >
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>
Andre Vehreschild Dec. 20, 2024, 5:09 p.m. UTC | #8
Thank you very much Jerry.

The delta between the two patches is really minor. In resolve.c I have
removed some attr.pointer setting and in caf/single.c the handling for
those as well as treating non-descriptor arrays differently.

Most changes in the diff of v2 to v3 are about lines having moved due to
other patches.

And I will be around for fixing occurring issues. Btw, in the OpenCoarrays
GitHub repo the branch for 774 implements this new way of accessing data
for mpi.

Thanks again for looking and regards,
Andre

Andre Vehreschild
Jerry D Dec. 21, 2024, 11:27 p.m. UTC | #9
On 12/20/24 9:09 AM, Andre Vehreschild wrote:
> Thank you very much Jerry.
> 
> The delta between the two patches is really minor. In resolve.c I have 
> removed some attr.pointer setting and in caf/single.c the handling for 
> those as well as treating non-descriptor arrays differently.
> 
> Most changes in the diff of v2 to v3 are about lines having moved due to 
> other patches.
> 
> And I will be around for fixing occurring issues. Btw, in the 
> OpenCoarrays GitHub repo the branch for 774 implements this new way of 
> accessing data for mpi.
> 
> Thanks again for looking and regards,
> Andre
> 
> Andre Vehreschild

Please go ahead and Push. We have had some off list discussions and this 
is the best path forward.

Best regards to all,

Jerry
Damian Rouson Dec. 22, 2024, 1:15 a.m. UTC | #10
On Tue, Dec 17, 2024 at 20:05 Steve Kargl <sgk@troutmask.apl.washington.edu>
wrote:

> On Mon, Dec 16, 2024 at 04:53:42AM -0800,
Or, perhaps, you care to qualify your generalization?

Thanks for asking.  When I tried using kind type parameters on derived
types with gfortran a few months ago, I gave up because the issues appeared
to be so fundamental that I couldn’t even find workarounds that would allow
me to use the feature at all while awaiting patches.  I corresponded with
Paul, who did the initial implementation of parameterized derives.  I hope
Paul is ok with me quoting his description of the overall situation about 2
months ago.  Paul’s assessment is below.  Also,  I believe Andre is
planning to work on PDT support sometime next year.

Damian


— snip —

The most significant change that is required comes about from PR82649.
While kludges exist, the best approach is to completely change the PDT
representation,
which is generated in decl.cc(gfc_get_pdt_instance). My plan was to
represent a PDT instance by a container:

type(pdt_instance)
   type(pdt_data), dimension(gfc_max_dimensions), pointer :: data
   integer, kind :: param1
...
  integer, len :: paramN
...
end type

Of course, this change will result in a huge amount of downstream fallout
in resolve.cc and trans*.cc. Grepping "attr.pdt" will reveal the affected
files.

The present representation for *each element* is:
type(pdt_instance)
   integer, kind :: param1
...
  integer, len :: paramN
...
  data_components
end type

A possible alternative approach would be to add a "hidden" field,
_allocated and to leave a single element with the parameter fields
retaining their values after deallocation.
Andre Vehreschild Dec. 22, 2024, 3:29 p.m. UTC | #11
Hi Jerry,

thanks for taking a look. Committed as gcc-15-6415-g586477d67bf.

Thanks again,
	Andre

On Sat, 21 Dec 2024 15:27:52 -0800
Jerry D <jvdelisle2@gmail.com> wrote:

> On 12/20/24 9:09 AM, Andre Vehreschild wrote:
> > Thank you very much Jerry.
> >
> > The delta between the two patches is really minor. In resolve.c I have
> > removed some attr.pointer setting and in caf/single.c the handling for
> > those as well as treating non-descriptor arrays differently.
> >
> > Most changes in the diff of v2 to v3 are about lines having moved due to
> > other patches.
> >
> > And I will be around for fixing occurring issues. Btw, in the
> > OpenCoarrays GitHub repo the branch for 774 implements this new way of
> > accessing data for mpi.
> >
> > Thanks again for looking and regards,
> > Andre
> >
> > Andre Vehreschild
>
> Please go ahead and Push. We have had some off list discussions and this
> is the best path forward.
>
> Best regards to all,
>
> Jerry


--
Andre Vehreschild * Email: vehre ad gmx dot de
diff mbox series

Patch

From b7e96f3419988e24bd54c7fb00e634b8c675f08d Mon Sep 17 00:00:00 2001
From: Andre Vehreschild <vehre@gcc.gnu.org>
Date: Fri, 6 Dec 2024 08:57:34 +0100
Subject: [PATCH 2/2] Fortran: Replace getting of coarray data with
 accessor-based version. [PR107635]

Getting coarray data from remote images was slow, inefficient and did
not work for object files that where not compiled with coarray support
for derived types with allocatable/pointer components.  The old approach
emulated accessing data through a whole structure ref, which was error
prone for corner cases.  Furthermore was did it have a runtime
complexity of O(N), where N is the number of allocatable/pointer
components and descriptors involved.  Each of those needed communication
twice.  The new approach creates a routine for each access into a
coarray object putting all required operations there.  Looking a
tree-dump one will see those small routines.  But this time it is just
compiled fortran with all the knowledge of the compiler of bounds and so
on.  New paradigms will be available out of the box.  Furthermore is the
complexity of the communication reduced to be O(1).  E.g. the mpi
implementation sends one message for the parameters of the access and
one message back with the results without caring about the number of
allocatable/pointer/descriptor components in the access.

Identification of access routines is done be adding them to a hash map,
where the hash is the same on all images.  Translating the hash to an
index, which is the same on all images again, allows for fast calls of
the access routines.  Resolving the hash to an index is cached at
runtime, preventing additional hash map lookups.  A hashmap was use
because not all processor OS combinations may use the same address for
the access routine.

gcc/fortran/ChangeLog:

	PR fortran/107635

	* gfortran.h (gfc_add_caf_accessor): New function.
	* gfortran.texi: Document new API routines.
	* resolve.cc (get_arrayspec_from_expr): Synthesize the arrayspec
	resulting from an expression, i.e. not only the rank, but also
	the bounds.
	(remove_coarray_from_derived_type): Remove coarray ref from a
	derived type to access it in access routine.
	(convert_coarray_class_to_derived_type): Same but for classes.
	The result is a derived type.
	(split_expr_at_caf_ref): Split an expression at the coarray
	reference to move the reference after the coarray ref into the
	access routine.
	(check_add_new_component): Helper to add variables as
	components to derived type transfered to the access routine.
	(create_get_parameter_type): Create the derived type to transfer
	addressing data to the access routine.
	(create_get_callback): Create the access routine.
	(add_caf_get_intrinsic): Use access routine instead of old
	caf_get.
	* trans-decl.cc (gfc_build_builtin_function_decls): Register new
	API routines.
	(gfc_create_module_variable): Use renamed flag.
	(gfc_emit_parameter_debug_info):
	(struct caf_accessor): Linked list of hash-access routine pairs.
	(gfc_add_caf_accessor): Add a hash-access routine pair to above
	linked list.
	(create_caf_accessor_register): Add all registered hash-access
	routine pairs to the current caf_init.
	(generate_coarray_init): Use routine above.
	(gfc_generate_module_vars): Use renamed flag.
	(generate_local_decl): Same.
	(gfc_generate_function_code): Same.
	(gfc_process_block_locals): Same.
	* trans-intrinsic.cc (conv_shape_to_cst): Build the product of a
	shape.
	(gfc_conv_intrinsic_caf_get): Create call to access routine.
	(conv_caf_send): Adapt to caf_get using less arguments.
	(gfc_conv_intrinsic_function): Same.
	* trans.cc (gfc_trans_force_lval): Helper to ensure that an
	expression can be used as an lvalue-ref.
	* trans.h (gfc_trans_force_lval): See above.

libgfortran/ChangeLog:

	* caf/libcaf.h (_gfortran_caf_register_accessor): New function
	to register access routines at runtime.
	(_gfortran_caf_register_accessors_finish): New function to
	finish registration of access routine and sort hash map.
	(_gfortran_caf_get_remote_function_index): New function to
	convert an hash to an index.
	(_gfortran_caf_get_by_ct): New function to get data from a
	remote image using the access routine given by an index.
	* caf/single.c (struct accessor_hash_t): Hashmap type.
	(_gfortran_caf_send): Fixed formatting.
	(_gfortran_caf_register_accessor): Register a hash accessor
	routine.
	(hash_compare): Compare two hashes for sort() and bsearch().
	(_gfortran_caf_register_accessors_finish): Sort the hashmap to
	allow bsearch()'s quick lookup.
	(_gfortran_caf_get_remote_function_index): Map a hash to an
	index.
	(_gfortran_caf_get_by_ct): Get data from a remote image using
	the index provided by get_remote_function_index().

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray_atomic_5.f90: Adapted to look for
	get_by_ct.
	* gfortran.dg/coarray_lib_comm_1.f90: Same.
	* gfortran.dg/coarray_stat_function.f90: Same.
---
 gcc/fortran/gfortran.h                        |   1 +
 gcc/fortran/gfortran.texi                     | 195 +++++-
 gcc/fortran/resolve.cc                        | 653 +++++++++++++++++-
 gcc/fortran/trans-decl.cc                     | 102 ++-
 gcc/fortran/trans-intrinsic.cc                | 405 +++++------
 gcc/fortran/trans.cc                          |  10 +
 gcc/fortran/trans.h                           |  11 +
 .../gfortran.dg/coarray_atomic_5.f90          |   6 +-
 .../gfortran.dg/coarray_lib_comm_1.f90        |   5 +-
 .../gfortran.dg/coarray_stat_function.f90     |   6 +-
 libgfortran/caf/libcaf.h                      |  18 +
 libgfortran/caf/single.c                      | 135 +++-
 12 files changed, 1301 insertions(+), 246 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index d66c13b2661..87307c5531e 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -4172,5 +4172,6 @@  bool gfc_is_reallocatable_lhs (gfc_expr *);

 void finish_oacc_declare (gfc_namespace *, gfc_symbol *, bool);
 void gfc_adjust_builtins (void);
+void gfc_add_caf_accessor (gfc_expr *, gfc_expr *);

 #endif /* GCC_GFORTRAN_H  */
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index c91d548fc55..7706cd8612f 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -4157,10 +4157,14 @@  future implementation of teams.  It is about to change without further notice.
 * _gfortran_caf_stopped_images :: Get an array of the indexes of the stopped images
 * _gfortran_caf_register:: Registering coarrays
 * _gfortran_caf_deregister:: Deregistering coarrays
+* _gfortran_caf_register_accessor:: Register an accessor for remote access
+* _gfortran_caf_register_accessors_finish:: Finish registering accessor functions
+* _gfortran_caf_get_remote_function_index:: Get the index of an accessor
 * _gfortran_caf_is_present:: Query whether an allocatable or pointer component in a derived type coarray is allocated
 * _gfortran_caf_send:: Sending data from a local image to a remote image
 * _gfortran_caf_get:: Getting data from a remote image
 * _gfortran_caf_sendget:: Sending data between remote images
+* _gfortran_caf_get_by_ct:: Getting data from a remote image using a remote side accessor
 * _gfortran_caf_send_by_ref:: Sending data from a local image to a remote image using enhanced references
 * _gfortran_caf_get_by_ref:: Getting data from a remote image using enhanced references
 * _gfortran_caf_sendget_by_ref:: Sending data between remote images using enhanced references
@@ -4414,8 +4418,9 @@  in the @var{DESC}'s data-ptr is registered or allocate when the data-ptr is
 @code{NULL}.

 @item @emph{Syntax}:
-@code{void caf_register (size_t size, caf_register_t type, caf_token_t *token,
-gfc_descriptor_t *desc, int *stat, char *errmsg, size_t errmsg_len)}
+@code{void _gfortran_caf_register (size_t size, caf_register_t type,
+caf_token_t *token, gfc_descriptor_t *desc, int *stat, char *errmsg,
+size_t errmsg_len)}

 @item @emph{Arguments}:
 @multitable @columnfractions .15 .70
@@ -4466,7 +4471,7 @@  not null.  The library is only expected to free memory it allocated itself
 during a call to @code{_gfortran_caf_register}.

 @item @emph{Syntax}:
-@code{void caf_deregister (caf_token_t *token, caf_deregister_t type,
+@code{void _gfortran_caf_deregister (caf_token_t *token, caf_deregister_t type,
 int *stat, char *errmsg, size_t errmsg_len)}

 @item @emph{Arguments}:
@@ -4489,6 +4494,114 @@  and via destructors.
 @end table


+@node _gfortran_caf_register_accessor
+@subsection @code{_gfortran_caf_register_accessor} --- Register an accessor for remote access
+@cindex Coarray, _gfortran_caf_register_accessor
+
+@table @asis
+@item @emph{Description}:
+Identification of access funtions across images is done using a unique hash.
+For each given hash an accessor has to be registered.  This routine is expected
+to register an accessor function pointer for the given hash in nearly constant
+time.  I.e. it is expected to add the hash and accessor to a buffer and return.
+Sorting shall be done in @code{_gfortran_caf_register_accessors_finish}.
+
+@item @emph{Syntax}:
+@code{void _gfortran_caf_register_accessor (const int hash,
+void (*accessor)(void **, int32_t *, void *, void *, size_t *,
+size_t *))}
+
+@item @emph{Arguments}:
+@multitable @columnfractions .15 .70
+@item @var{hash} @tab intent(in)  The unique hash value this accessor is to be
+identified by.
+@item @var{accessor} @tab intent(in)  A pointer to the function on this image.
+The function has the signature @code{void accessor (void **dst_ptr,
+int32_t *free_dst, void *src_ptr, void *get_data, size_t *opt_src_charlen,
+size_t *opt_dst_charlen)}.  GFortran ensures that functions provided to
+@code{_gfortran_caf_register_accessor} adhere to this interface.
+@end multitable
+
+@item @emph{NOTES}
+This function is required to have a nearly constant runtime complexity, because
+it will be called to register multiple accessor in a sequence.  GFortran ensures
+that before the first remote accesses commences
+@code{_gfortran_caf_register_accessors_finish} is called at least once.  It is
+valid to register further accessors after a call to
+@code{_gfortran_caf_register_accessors_finish}.  It is invalid to call
+@code{_gfortran_caf_register_accessor} after the first remote access has been
+done.  See also @ref{_gfortran_caf_register_accessors_finish} and
+@ref{_gfortran_caf_get_remote_function_index}
+@end table
+
+
+@node _gfortran_caf_register_accessors_finish
+@subsection @code{_gfortran_caf_register_accessors_finish} --- Finish registering accessor functions
+@cindex Coarray, _gfortran_caf_register_accessors_finish
+
+@table @asis
+@item @emph{Description}:
+Called to finalize registering of accessor functions.  This function is expected
+to prepare a lookup table that has fast lookup time for the hash supplied to
+@code{_gfortran_caf_get_remote_function_index} and constant access time for
+indexing operations.
+
+@item @emph{Syntax}:
+@code{void _gfortran_caf_register_accessors_finish ()}
+
+@item @emph{Arguments}:
+No arguments.
+
+@item @emph{NOTES}
+This function may be called multiple times with and without new hash-accessors-
+pairs being added.  The post-condition after each call has to be, that hashes
+can be looked up quickly and indexing on the lookup table of hash-accessor-pairs
+is a constant time operation.
+@end table
+
+
+@node _gfortran_caf_get_remote_function_index
+@subsection @code{_gfortran_caf_get_remote_function_index} --- Get the index of an accessor
+@cindex Coarray, _gfortran_caf_get_remote_function_index
+
+@table @asis
+@item @emph{Description}:
+Return the index of the accessor in the lookup table build by
+@ref{_gfortran_caf_register_accessor} and
+@ref{_gfortran_caf_register_accessors_finish}.  This function is expected to be
+fast, because it may be called often.  A log(N) lookup time for a given hash is
+preferred.  The reference implementation uses @code{bsearch ()}, for example.
+The index returned shall be an array index to be used by
+@ref{_gfortran_caf_get_by_ct}, i.e. a constant time operation is mandatory for
+quick access.
+
+The GFortran compiler ensures, that
+@code{_gfortran_caf_get_remote_function_index} is called once only for each
+hash and the result be stored in a static variable to prevent future redundant
+lookups.
+
+@item @emph{Syntax}:
+@code{int _gfortran_caf_get_remote_function_index (const int hash)}
+
+@item @emph{Arguments}:
+@multitable @columnfractions .15 .70
+@item @var{hash} @tab intent(in)  The hash of the accessor desired.
+@end multitable
+
+@item @emph{Result}:
+The zero based index to access the accessor funtion in a lookup table.
+On error, @code{-1} can be returned.
+
+@item @emph{NOTES}
+The function's complexity is expected to be significantly smaller than N,
+where N is the number of all accessors registered.  Although returning @code{-1}
+is valid, will this most likely crash the Fortran program when accessing the
+-1-th accessor function.  It is therefore advised to terminate with an error
+message, when the hash could not be found.
+@end table
+
+
+
 @node _gfortran_caf_is_present
 @subsection @code{_gfortran_caf_is_present} --- Query whether an allocatable or pointer component in a derived type coarray is allocated
 @cindex Coarray, _gfortran_caf_is_present
@@ -4817,6 +4930,82 @@  error message why the operation is not permitted.
 @end table


+@node _gfortran_caf_get_by_ct
+@subsection @code{_gfortran_caf_get_by_ct} --- Getting data from a remote image using a remote side accessor
+@cindex Coarray, _gfortran_caf_get_by_ct
+
+@table @asis
+@item @emph{Description}:
+Called to get a scalar, an array section or a whole array from a remote image
+identified by the @var{image_index}.
+
+@item @emph{Syntax}:
+@code{void _gfortran_caf_get_by_ct (caf_token_t token,
+const gfc_descriptor_t *opt_src_desc, const size_t *opt_src_charlen,
+const int image_index, const size_t dst_size, void **dst_data,
+size_t *opt_dst_charlen, gfc_descriptor_t *opt_dst_desc,
+const bool may_realloc_dst, const int getter_index, void *get_data,
+const size_t get_data_size, int *stat, caf_team_t *team, int *team_number)}
+
+@item @emph{Arguments}:
+@multitable @columnfractions .15 .70
+@item @var{token} @tab intent(in)  An opaque pointer identifying the coarray.
+@item @var{opt_src_desc} @tab intent(in)  A pointer to the descriptor when the
+object identified by @var{token} is an array with a descriptor.  The parameter
+needs to be set to @code{-1}, when @var{token} identifies a descriptor-less
+array and to @code{NULL}, when @var{token} identifies a scalar.
+@item @var{opt_src_charlen} @tab intent(in) When the object to get is a char
+array with deferred length, then this parameter needs to be set to point to its
+length.  Else the parameter needs to be set to @code{NULL}.
+@item @var{image_index} @tab intent(in)  The ID of the remote image; must be a
+positive number.  @code{this_image ()} is valid.
+@item @var{dst_size} @tab intent(in) The size of data expected to be transferred
+from the remote image.  If the data type to get is a string or string array,
+then this needs to be set to the byte size of each character, i.e. @code{4} for
+a @code{CHARACTER (KIND=4)} string.  The length of the string is then returned
+in @code{opt_dst_charlen} (also for string arrays).
+@item @var{dst_data} @tab intent(inout) A pointer to the adress the data is
+stored.  To prevent copying of data into an output buffer the adress to the live
+data is returned here.  When a descriptor is provided also its data-member is
+set to that adress.  When @var{may_realloc_dst} is set, then the memory may be
+reallocated by the remote function, which needs to be replicated by this
+function.
+@item @var{opt_dst_charlen} @tab intent(inout)  When a char array is returned,
+this parameter is set to the length where applicable.  The value can also be
+read to prevent reallocation in the accessor.
+@item @var{opt_dst_desc} @tab intent(inout)  When a descriptor array is
+returned, it is stored in the memory pointed to by this optional parameter.
+When @var{may_realloc_dst} is set, then the descriptor may be changed, i.e.
+its bounds, but upto now not its rank.
+@item @var{may_realloc_dst} @tab intent(in)  Set when the returned data may
+require reallocation of the output buffer in @var{dst_data} or
+@var{opt_dst_desc}.
+@item @var{getter_index} @tab intent(in)  The index of the accessor to execute
+as returned by @code{_gfortran_caf_get_remote_function_index ()}.
+@item @var{get_data} @tab intent(inout)  Additional data needed in the accessor.
+I.e., when an array reference uses a local variable @var{v}, it is transported
+in this structure and all references in the accessor are rewritten to access the
+member.  The data in the structure of @var{get_data} may be changed by the
+accessor, but these changes are lost to the calling Fortran program.
+@item @var{get_data_size} @tab intent(in)  The size of the @var{get_data}
+structure.
+@item @var{stat} @tab intent(out) When non-@code{NULL} give the result of the
+operation, i.e., zero on success and non-zero on error.  When @code{NULL} and an
+error occurs, then an error message is printed and the program is terminated.
+@item @var{team} @tab intent(in)  The opaque team handle as returned by
+@code{FORM TEAM}.  Unused at the moment.
+@item @var{team_number} @tab intent(in)  The number of the team this access is
+to be part of.  Unused at the moment.
+@end multitable
+
+@item @emph{NOTES}
+It is permitted to have @code{image_index} equal the current image; the memory
+to get and the memory to store the data may (partially) overlap.  The
+implementation has to take care that it handles this case, e.g. using
+@code{memmove} which handles (partially) overlapping memory.
+@end table
+
+
 @node _gfortran_caf_sendget_by_ref
 @subsection @code{_gfortran_caf_sendget_by_ref} --- Sending data between remote images using enhanced references on both sides
 @cindex Coarray, _gfortran_caf_sendget_by_ref
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 06d870d80de..47d42f64b32 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -5904,11 +5904,634 @@  gfc_op_rank_conformable (gfc_expr *op1, gfc_expr *op2)
 	     || op1->corank == op2->corank);
 }

+static gfc_array_spec *
+get_arrayspec_from_expr (gfc_expr *expr)
+{
+  gfc_array_spec *src_as, *dst_as = NULL;
+  gfc_ref *ref;
+  gfc_array_ref mod_src_ar;
+  int dst_rank = 0;
+
+  if (expr->rank == 0)
+    return NULL;
+
+  /* Follow any component references.  */
+  if (expr->expr_type == EXPR_VARIABLE || expr->expr_type == EXPR_CONSTANT)
+    {
+      if (expr->symtree)
+	src_as = expr->symtree->n.sym->as;
+      else
+	src_as = NULL;
+
+      for (ref = expr->ref; ref; ref = ref->next)
+	{
+	  switch (ref->type)
+	    {
+	    case REF_COMPONENT:
+	      src_as = ref->u.c.component->as;
+	      continue;
+
+	    case REF_SUBSTRING:
+	    case REF_INQUIRY:
+	      continue;
+
+	    case REF_ARRAY:
+	      switch (ref->u.ar.type)
+		{
+		case AR_ELEMENT:
+		  src_as = NULL;
+		  break;
+		  case AR_SECTION: {
+		    if (!dst_as)
+		      dst_as = gfc_get_array_spec ();
+		    memset (&mod_src_ar, 0, sizeof (gfc_array_ref));
+		    mod_src_ar = ref->u.ar;
+		    for (int dim = 0; dim < src_as->rank; ++dim)
+		      {
+			switch (ref->u.ar.dimen_type[dim])
+			  {
+			  case DIMEN_ELEMENT:
+			    gfc_free_expr (mod_src_ar.start[dim]);
+			    mod_src_ar.start[dim] = NULL;
+			    break;
+			  case DIMEN_RANGE:
+			    dst_as->lower[dst_rank]
+			      = gfc_copy_expr (ref->u.ar.start[dim]);
+			    mod_src_ar.start[dst_rank]
+			      = gfc_copy_expr (ref->u.ar.start[dim]);
+			    if (ref->u.ar.end[dim])
+			      {
+				dst_as->upper[dst_rank]
+				  = gfc_copy_expr (ref->u.ar.end[dim]);
+				mod_src_ar.end[dst_rank] = ref->u.ar.end[dim];
+				mod_src_ar.stride[dst_rank]
+				  = ref->u.ar.stride[dim];
+			      }
+			    else
+			      dst_as->upper[dst_rank]
+				= gfc_copy_expr (ref->u.ar.as->upper[dim]);
+			    ++dst_rank;
+			    break;
+			  case DIMEN_STAR:
+			    dst_as->lower[dst_rank]
+			      = gfc_copy_expr (ref->u.ar.as->lower[dim]);
+			    mod_src_ar.start[dst_rank]
+			      = gfc_copy_expr (ref->u.ar.start[dim]);
+			    if (ref->u.ar.as->upper[dim])
+			      {
+				dst_as->upper[dst_rank]
+				  = gfc_copy_expr (ref->u.ar.as->upper[dim]);
+				mod_src_ar.end[dst_rank] = ref->u.ar.end[dim];
+				mod_src_ar.stride[dst_rank]
+				  = ref->u.ar.stride[dim];
+			      }
+			    ++dst_rank;
+			    break;
+			  case DIMEN_VECTOR:
+			    dst_as->lower[dst_rank]
+			      = gfc_get_constant_expr (BT_INTEGER,
+						       gfc_index_integer_kind,
+						       &expr->where);
+			    mpz_set_ui (dst_as->lower[dst_rank]->value.integer,
+					1);
+			    mod_src_ar.start[dst_rank]
+			      = gfc_copy_expr (ref->u.ar.start[dim]);
+			    dst_as->upper[dst_rank]
+			      = gfc_get_constant_expr (BT_INTEGER,
+						       gfc_index_integer_kind,
+						       &expr->where);
+			    mpz_set (dst_as->upper[dst_rank]->value.integer,
+				     ref->u.ar.start[dim]->shape[0]);
+			    ++dst_rank;
+			    break;
+			  case DIMEN_THIS_IMAGE:
+			  case DIMEN_UNKNOWN:
+			    gcc_unreachable ();
+			  }
+			if (ref->u.ar.dimen_type[dim] != DIMEN_ELEMENT)
+			  mod_src_ar.dimen_type[dst_rank]
+			    = ref->u.ar.dimen_type[dim];
+		      }
+		    dst_as->rank = dst_rank;
+		    dst_as->type = AS_EXPLICIT;
+		    ref->u.ar = mod_src_ar;
+		    ref->u.ar.dimen = dst_rank;
+		    break;
+
+		  case AR_UNKNOWN:
+		    src_as = NULL;
+		    break;
+
+		  case AR_FULL:
+		    dst_as = gfc_copy_array_spec (src_as);
+		    break;
+		  }
+		  break;
+		}
+	    }
+	}
+    }
+  else
+    src_as = NULL;
+
+  return dst_as;
+}
+
+static void
+remove_coarray_from_derived_type (gfc_symbol *base, gfc_namespace *ns,
+				  gfc_array_spec *src_as = NULL)
+{
+  gfc_symbol *derived;
+  gfc_symbol *src_derived = base->ts.u.derived;
+
+  if (!src_as)
+    src_as = src_derived->as;
+  gfc_get_symbol (src_derived->name, ns, &derived);
+  derived->attr.flavor = FL_DERIVED;
+  derived->attr.alloc_comp = src_derived->attr.alloc_comp;
+  if (src_as && src_as->rank != 0)
+    {
+      base->attr.dimension = 1;
+      base->as = gfc_copy_array_spec (src_as);
+      base->as->corank = 0;
+    }
+  for (gfc_component *p = NULL, *c = src_derived->components; c; c = c->next)
+    {
+      gfc_component *n = gfc_get_component ();
+      *n = *c;
+      if (n->as)
+	n->as = gfc_copy_array_spec (c->as);
+      n->backend_decl = NULL;
+      n->initializer = NULL;
+      n->param_list = NULL;
+      if (p)
+	p->next = n;
+      else
+	derived->components = n;
+
+      p = n;
+    }
+  gfc_set_sym_referenced (derived);
+  gfc_commit_symbol (derived);
+  base->ts.u.derived = derived;
+  gfc_commit_symbol (base);
+}
+
+static void
+convert_coarray_class_to_derived_type (gfc_symbol *base, gfc_namespace *ns)
+{
+  gfc_symbol *src_derived = CLASS_DATA (base)->ts.u.derived;
+  gfc_array_spec *src_as = CLASS_DATA (base)->as;
+  const bool attr_allocatable = CLASS_DATA (base)->attr.allocatable,
+	     attr_pointer = (CLASS_DATA (base)->attr.dimension
+			     && CLASS_DATA (base)->attr.pointer)
+			    || base->attr.associate_var;
+
+  base->ts.type = BT_DERIVED;
+  base->ts.u.derived = src_derived;
+
+  remove_coarray_from_derived_type (base, ns, src_as);
+
+  base->attr.allocatable = attr_allocatable;
+  base->attr.pointer = attr_pointer;
+}
+
+static void
+split_expr_at_caf_ref (gfc_expr *expr, gfc_namespace *ns,
+		       gfc_expr **post_caf_ref_expr)
+{
+  gfc_ref *caf_ref = NULL;
+  gfc_symtree *st;
+  gfc_symbol *base;
+
+  gcc_assert (expr->expr_type == EXPR_VARIABLE);
+  if (!expr->symtree->n.sym->attr.codimension)
+    {
+      /* The coarray is in some component.  Find it.  */
+      caf_ref = expr->ref;
+      while (caf_ref)
+	{
+	  if (caf_ref->type == REF_COMPONENT
+	      && caf_ref->u.c.component->attr.codimension)
+	    break;
+	  caf_ref = caf_ref->next;
+	}
+    }
+
+  gcc_assert (!gfc_get_sym_tree (!caf_ref ? expr->symtree->name : "base", ns,
+				 &st, false));
+  st->n.sym->attr.flavor = FL_PARAMETER;
+  st->n.sym->attr.dummy = 1;
+  st->n.sym->attr.intent = INTENT_IN;
+  st->n.sym->ts = caf_ref ? caf_ref->u.c.sym->ts : expr->symtree->n.sym->ts;
+
+  *post_caf_ref_expr = gfc_get_variable_expr (st);
+  (*post_caf_ref_expr)->where = expr->where;
+  base = (*post_caf_ref_expr)->symtree->n.sym;
+
+  if (!caf_ref)
+    {
+      (*post_caf_ref_expr)->ref = gfc_copy_ref (expr->ref);
+      if (expr->symtree->n.sym->attr.dimension)
+	{
+	  base->as = gfc_copy_array_spec (expr->symtree->n.sym->as);
+	  base->as->corank = 0;
+	  base->attr.dimension = 1;
+	  base->attr.allocatable = expr->symtree->n.sym->attr.allocatable;
+	  base->attr.pointer = expr->symtree->n.sym->attr.pointer
+			       || expr->symtree->n.sym->attr.associate_var;
+	}
+      else
+	base->attr.pointer = expr->symtree->n.sym->attr.save
+			     || expr->symtree->n.sym->attr.associate_var;
+    }
+  else
+    {
+      (*post_caf_ref_expr)->ref = gfc_copy_ref (caf_ref->next);
+      if (caf_ref->u.c.component->attr.dimension)
+	{
+	  base->as = gfc_copy_array_spec (caf_ref->u.c.component->as);
+	  base->as->corank = 0;
+	  base->attr.dimension = 1;
+	  base->attr.allocatable = caf_ref->u.c.component->attr.allocatable;
+	  base->attr.pointer = caf_ref->u.c.component->attr.pointer;
+	}
+      else
+	base->attr.pointer = 1;
+      base->ts = caf_ref->u.c.component->ts;
+    }
+  (*post_caf_ref_expr)->ts = expr->ts;
+  if (base->ts.type == BT_CHARACTER)
+    {
+      base->ts.u.cl = gfc_get_charlen ();
+      *base->ts.u.cl = *(caf_ref ? caf_ref->u.c.component->ts.u.cl
+				 : expr->symtree->n.sym->ts.u.cl);
+      base->ts.deferred = 1;
+      base->ts.u.cl->length = nullptr;
+    }
+
+  if (base->ts.type == BT_DERIVED)
+    remove_coarray_from_derived_type (base, ns);
+  else if (base->ts.type == BT_CLASS)
+    convert_coarray_class_to_derived_type (base, ns);
+
+  gfc_expression_rank (expr);
+  gfc_expression_rank (*post_caf_ref_expr);
+}
+
+static void
+check_add_new_component (gfc_symbol *type, gfc_expr *e, gfc_symbol *get_data)
+{
+  if (e)
+    {
+      switch (e->expr_type)
+	{
+	case EXPR_CONSTANT:
+	case EXPR_NULL:
+	  break;
+	case EXPR_OP:
+	  check_add_new_component (type, e->value.op.op1, get_data);
+	  if (e->value.op.op2)
+	    check_add_new_component (type, e->value.op.op2, get_data);
+	  break;
+	case EXPR_COMPCALL:
+	  for (gfc_actual_arglist *actual = e->value.compcall.actual; actual;
+	       actual = actual->next)
+	    check_add_new_component (type, actual->expr, get_data);
+	  break;
+	case EXPR_FUNCTION:
+	  if (!e->symtree->n.sym->attr.pure
+	      && !e->symtree->n.sym->attr.elemental)
+	    {
+	      // Treat non-pure functions.
+	      gfc_error ("Sorry, not yet able to call a non-pure/non-elemental"
+			 " function %s in a coarray reference;  use a temporary"
+			 " for the function's result instead",
+			 e->symtree->n.sym->name);
+	    }
+	  for (gfc_actual_arglist *actual = e->value.function.actual; actual;
+	       actual = actual->next)
+	    check_add_new_component (type, actual->expr, get_data);
+	  break;
+	  case EXPR_VARIABLE: {
+	    gfc_component *comp;
+	    gfc_ref *ref;
+	    int old_rank = e->rank;
+
+	    /* Can't use gfc_find_component here, because type is not yet
+	       complete.  */
+	    comp = type->components;
+	    while (comp)
+	      {
+		if (strcmp (comp->name, e->symtree->name) == 0)
+		  break;
+		comp = comp->next;
+	      }
+	    if (!comp)
+	      {
+		gcc_assert (gfc_add_component (type, e->symtree->name, &comp));
+		/* Take a copy of e, before modifying it.  */
+		gfc_expr *init = gfc_copy_expr (e);
+		if (e->ref)
+		  {
+		    switch (e->ref->type)
+		      {
+		      case REF_ARRAY:
+			comp->as = get_arrayspec_from_expr (e);
+			comp->attr.dimension = e->ref->u.ar.dimen != 0;
+			comp->ts = e->ts;
+			break;
+		      case REF_COMPONENT:
+			comp->ts = e->ref->u.c.sym->ts;
+			break;
+		      default:
+			gcc_unreachable ();
+			break;
+		      }
+		  }
+		else
+		  comp->ts = e->ts;
+		comp->attr.access = ACCESS_PRIVATE;
+		comp->initializer = init;
+	      }
+	    else
+	      gcc_assert (comp->ts.type == e->ts.type
+			  && comp->ts.u.derived == e->ts.u.derived);
+
+	    ref = e->ref;
+	    e->ref = NULL;
+	    gcc_assert (gfc_find_component (get_data->ts.u.derived,
+					    e->symtree->name, false, true,
+					    &e->ref));
+	    e->symtree
+	      = gfc_find_symtree (get_data->ns->sym_root, get_data->name);
+	    e->ref->next = ref;
+	    gfc_free_shape (&e->shape, old_rank);
+	    gfc_expression_rank (e);
+	    break;
+	  }
+	case EXPR_ARRAY:
+	case EXPR_PPC:
+	case EXPR_STRUCTURE:
+	case EXPR_SUBSTRING:
+	  gcc_unreachable ();
+	default:;
+	}
+    }
+}
+
+static gfc_symbol *
+create_get_parameter_type (gfc_expr *expr, gfc_namespace *ns,
+			   gfc_symbol *get_data)
+{
+  static int type_cnt = 0;
+  char tname[GFC_MAX_SYMBOL_LEN + 1];
+  char *name;
+  gfc_symbol *type;
+
+  gcc_assert (expr->expr_type == EXPR_VARIABLE);
+
+  strcpy (tname, expr->symtree->name);
+  name = xasprintf ("@_rget_data_t_%s_%d", tname, ++type_cnt);
+  gfc_get_symbol (name, ns, &type);
+
+  type->attr.flavor = FL_DERIVED;
+  get_data->ts.u.derived = type;
+
+  for (gfc_ref *ref = expr->ref; ref; ref = ref->next)
+    {
+      if (ref->type == REF_ARRAY)
+	{
+	  gfc_array_ref *ar = &ref->u.ar;
+	  for (int i = 0; i < ar->dimen; ++i)
+	    {
+	      check_add_new_component (type, ar->start[i], get_data);
+	      check_add_new_component (type, ar->end[i], get_data);
+	      check_add_new_component (type, ar->stride[i], get_data);
+	    }
+	}
+    }
+
+  gfc_set_sym_referenced (type);
+  gfc_commit_symbol (type);
+  return type;
+}
+
+
+static gfc_expr *
+create_get_callback (gfc_expr *expr)
+{
+  static int cnt = 0;
+  gfc_namespace *ns;
+  gfc_symbol *extproc, *proc, *buffer, *free_buffer, *base, *get_data,
+    *old_buffer_data;
+  char tname[GFC_MAX_SYMBOL_LEN + 1];
+  char *name;
+  const char *mname;
+  gfc_expr *cb, *post_caf_ref_expr;
+  gfc_code *code;
+  int expr_rank = expr->rank;
+
+  /* Find the top-level namespace.  */
+  for (ns = gfc_current_ns; ns->parent; ns = ns->parent)
+    ;
+
+  if (expr->expr_type == EXPR_VARIABLE)
+    strcpy (tname, expr->symtree->name);
+  else
+    strcpy (tname, "dummy");
+  if (expr->symtree->n.sym->module)
+    mname = expr->symtree->n.sym->module;
+  else
+    mname = "main";
+  name = xasprintf ("_caf_rget_%s_%s_%d", mname, tname, ++cnt);
+  gfc_get_symbol (name, ns, &extproc);
+  gfc_set_sym_referenced (extproc);
+  ++extproc->refs;
+  gfc_commit_symbol (extproc);
+
+  /* Set up namespace.  */
+  gfc_namespace *sub_ns = gfc_get_namespace (ns, 0);
+  sub_ns->sibling = ns->contained;
+  ns->contained = sub_ns;
+  sub_ns->resolved = 1;
+  /* Set up procedure symbol.  */
+  gfc_find_symbol (name, sub_ns, 1, &proc);
+  sub_ns->proc_name = proc;
+  proc->attr.if_source = IFSRC_DECL;
+  proc->attr.access = ACCESS_PUBLIC;
+  gfc_add_subroutine (&proc->attr, name, NULL);
+  proc->attr.host_assoc = 1;
+  proc->attr.always_explicit = 1;
+  ++proc->refs;
+  gfc_commit_symbol (proc);
+  free (name);
+
+  split_expr_at_caf_ref (expr, sub_ns, &post_caf_ref_expr);
+
+  if (ns->proc_name->attr.flavor == FL_MODULE)
+    proc->module = ns->proc_name->name;
+  gfc_set_sym_referenced (proc);
+  /* Set up formal arguments.  */
+  gfc_formal_arglist **argptr = &proc->formal;
+#define ADD_ARG(name, nsym, stype, sintent) \
+  gfc_get_symbol (name, sub_ns, &nsym); \
+  nsym->ts.type = stype; \
+  nsym->attr.flavor = FL_PARAMETER; \
+  nsym->attr.dummy = 1; \
+  nsym->attr.intent = sintent; \
+  gfc_set_sym_referenced (nsym); \
+  *argptr = gfc_get_formal_arglist (); \
+  (*argptr)->sym = nsym; \
+  argptr = &(*argptr)->next
+
+  ADD_ARG ("buffer", buffer, expr->ts.type, INTENT_INOUT);
+  buffer->ts = expr->ts;
+  if (expr_rank)
+    {
+      buffer->as = gfc_get_array_spec ();
+      buffer->as->rank = expr_rank;
+      if (expr->shape)
+	{
+	  buffer->as->type = AS_EXPLICIT;
+	  for (int d = 0; d < expr_rank; ++d)
+	    {
+	      buffer->as->lower[d]
+		= gfc_get_constant_expr (BT_INTEGER, gfc_index_integer_kind,
+					 &gfc_current_locus);
+	      gfc_mpz_set_hwi (buffer->as->lower[d]->value.integer, 1);
+	      buffer->as->upper[d]
+		= gfc_get_constant_expr (BT_INTEGER, gfc_index_integer_kind,
+					 &gfc_current_locus);
+	      gfc_mpz_set_hwi (buffer->as->upper[d]->value.integer,
+			       gfc_mpz_get_hwi (expr->shape[d]));
+	    }
+	  buffer->attr.allocatable = 1;
+	}
+      else
+	{
+	  buffer->as->type = AS_DEFERRED;
+	  buffer->attr.allocatable = 1;
+	}
+      buffer->attr.dimension = 1;
+    }
+  else
+    buffer->attr.pointer = 1;
+  if (buffer->ts.type == BT_CHARACTER)
+    {
+      buffer->ts.u.cl = gfc_get_charlen ();
+      *buffer->ts.u.cl = *expr->ts.u.cl;
+      buffer->ts.deferred = 1;
+      buffer->ts.u.cl->length = nullptr;
+    }
+  gfc_commit_symbol (buffer);
+  ADD_ARG ("free_buffer", free_buffer, BT_LOGICAL, INTENT_OUT);
+  free_buffer->ts.kind = gfc_default_logical_kind;
+  gfc_commit_symbol (free_buffer);
+
+  // ADD_ARG (expr->symtree->name, base, BT_VOID, INTENT_IN);
+  base = post_caf_ref_expr->symtree->n.sym;
+  gfc_set_sym_referenced (base);
+  gfc_commit_symbol (base);
+  *argptr = gfc_get_formal_arglist ();
+  (*argptr)->sym = base;
+  argptr = &(*argptr)->next;
+
+  gfc_commit_symbol (base);
+  ADD_ARG ("get_data", get_data, BT_DERIVED, INTENT_IN);
+  gfc_commit_symbol (get_data);
+#undef ADD_ARG
+
+  /* Set up code.  */
+  if (expr->rank != 0)
+    {
+      /* Code: old_buffer_ptr = C_LOC (buffer);  */
+      code = sub_ns->code = gfc_get_code (EXEC_ASSIGN);
+      gfc_get_symbol ("old_buffer_data", sub_ns, &old_buffer_data);
+      old_buffer_data->ts.type = BT_VOID;
+      old_buffer_data->attr.flavor = FL_VARIABLE;
+      gfc_set_sym_referenced (old_buffer_data);
+      gfc_commit_symbol (old_buffer_data);
+      code->expr1 = gfc_lval_expr_from_sym (old_buffer_data);
+      code->expr2 = gfc_build_intrinsic_call (ns, GFC_ISYM_C_LOC, "C_LOC",
+					      gfc_current_locus, 1,
+					      gfc_lval_expr_from_sym (buffer));
+      code->next = gfc_get_code (EXEC_ASSIGN);
+      code = code->next;
+    }
+  else
+    code = sub_ns->code = gfc_get_code (EXEC_POINTER_ASSIGN);
+
+  /* Code: buffer = expr;  */
+  code->expr1 = gfc_lval_expr_from_sym (buffer);
+  code->expr2 = post_caf_ref_expr;
+  gfc_ref *ref = code->expr2->ref, **pref = &code->expr2->ref;
+  if (ref && ref->type == REF_ARRAY && ref->u.ar.codimen != 0)
+    {
+      if (ref->u.ar.dimen != 0)
+	{
+	  ref->u.ar.codimen = 0;
+	  pref = &ref->next;
+	  ref = ref->next;
+	}
+      else
+	{
+	  code->expr2->ref = ref->next;
+	  ref->next = NULL;
+	  gfc_free_ref_list (ref);
+	  ref = code->expr2->ref;
+	  pref = &code->expr2->ref;
+	}
+    }
+  if (ref && ref->type == REF_COMPONENT)
+    {
+      gfc_find_component (code->expr2->symtree->n.sym->ts.u.derived,
+			  ref->u.c.component->name, false, false, pref);
+      if (*pref != ref)
+	{
+	  (*pref)->next = ref->next;
+	  ref->next = NULL;
+	  gfc_free_ref_list (ref);
+	}
+    }
+  get_data->ts.u.derived
+    = create_get_parameter_type (code->expr2, ns, get_data);
+  if (code->expr2->rank == 0)
+    code->expr2 = gfc_build_intrinsic_call (ns, GFC_ISYM_C_LOC, "C_LOC",
+					    gfc_current_locus, 1, code->expr2);
+
+  /* Code: *free_buffer = old_buffer_ptr /= C_LOC (buffer); for rank != 0 or
+   *       *free_buffer = 0; for rank == 0.  */
+  code->next = gfc_get_code (EXEC_ASSIGN);
+  code = code->next;
+  code->expr1 = gfc_lval_expr_from_sym (free_buffer);
+  if (expr->rank != 0)
+    {
+      code->expr2 = gfc_get_operator_expr (
+	&gfc_current_locus, INTRINSIC_NE_OS,
+	gfc_lval_expr_from_sym (old_buffer_data),
+	gfc_build_intrinsic_call (ns, GFC_ISYM_C_LOC, "C_LOC",
+				  gfc_current_locus, 1,
+				  gfc_lval_expr_from_sym (buffer)));
+      code->expr2->ts.type = BT_LOGICAL;
+      code->expr2->ts.kind = gfc_default_logical_kind;
+    }
+  else
+    {
+      code->expr2 = gfc_get_logical_expr (gfc_default_logical_kind,
+					  &gfc_current_locus, false);
+    }
+
+  cb = gfc_lval_expr_from_sym (extproc);
+  cb->ts.interface = extproc;
+
+  return cb;
+}

 static void
 add_caf_get_intrinsic (gfc_expr *e)
 {
-  gfc_expr *wrapper, *tmp_expr;
+  gfc_expr *wrapper, *tmp_expr, *rget_expr, *rget_hash_expr;
   gfc_ref *ref;
   int n;

@@ -5924,8 +6547,18 @@  add_caf_get_intrinsic (gfc_expr *e)

   tmp_expr = XCNEW (gfc_expr);
   *tmp_expr = *e;
+  rget_expr = create_get_callback (tmp_expr);
+  rget_hash_expr = gfc_get_expr ();
+  rget_hash_expr->expr_type = EXPR_CONSTANT;
+  rget_hash_expr->ts.type = BT_INTEGER;
+  rget_hash_expr->ts.kind = gfc_default_integer_kind;
+  rget_hash_expr->where = tmp_expr->where;
+  mpz_init_set_ui (rget_hash_expr->value.integer,
+		   gfc_hash_value (rget_expr->symtree->n.sym));
   wrapper = gfc_build_intrinsic_call (gfc_current_ns, GFC_ISYM_CAF_GET,
-				      "caf_get", tmp_expr->where, 1, tmp_expr);
+				      "caf_get", tmp_expr->where, 3, tmp_expr,
+				      rget_hash_expr, rget_expr);
+  gfc_add_caf_accessor (rget_hash_expr, rget_expr);
   wrapper->ts = e->ts;
   wrapper->rank = e->rank;
   wrapper->corank = e->corank;
@@ -13052,22 +13685,10 @@  start:

 	  if (flag_coarray == GFC_FCOARRAY_LIB
 	      && (gfc_is_coindexed (code->expr1)
-		  || caf_possible_reallocate (code->expr1)
-		  || (code->expr2->expr_type == EXPR_FUNCTION
-		      && code->expr2->value.function.isym
-		      && code->expr2->value.function.isym->id
-			   == GFC_ISYM_CAF_GET
-		      && (code->expr1->rank == 0 || code->expr2->rank != 0)
-		      && !gfc_expr_attr (code->expr2).allocatable
-		      && !gfc_has_vector_subscript (code->expr2))))
+		  || caf_possible_reallocate (code->expr1)))
 	    {
 	      /* Insert a GFC_ISYM_CAF_SEND intrinsic, when the LHS is a
-		 coindexed variable.  Additionally, insert this code when the
-		 RHS is a CAF as we then use the GFC_ISYM_CAF_SEND intrinsic
-		 just to avoid a temporary; but do not do so if the LHS is
-		 (re)allocatable or has a vector subscript.  If the LHS is a
-		 noncoindexed array and the RHS is a coindexed scalar, use the
-		 normal code path.  */
+		 coindexed variable.  */
 	      code->op = EXEC_CALL;
 	      gfc_get_sym_tree (GFC_PREFIX ("caf_send"), ns, &code->symtree,
 				true);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index d69c8430484..0b1474d7559 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -84,7 +84,7 @@  static struct module_htab_entry *cur_module;

 /* With -fcoarray=lib: For generating the registering call
    of static coarrays.  */
-static bool has_coarray_vars;
+static bool has_coarray_vars_or_accessors;
 static stmtblock_t caf_init_block;


@@ -135,12 +135,21 @@  tree gfor_fndecl_caf_this_image;
 tree gfor_fndecl_caf_num_images;
 tree gfor_fndecl_caf_register;
 tree gfor_fndecl_caf_deregister;
+
+// Deprecate start
 tree gfor_fndecl_caf_get;
 tree gfor_fndecl_caf_send;
 tree gfor_fndecl_caf_sendget;
 tree gfor_fndecl_caf_get_by_ref;
 tree gfor_fndecl_caf_send_by_ref;
 tree gfor_fndecl_caf_sendget_by_ref;
+// Deprecate end
+
+tree gfor_fndecl_caf_register_accessor;
+tree gfor_fndecl_caf_register_accessors_finish;
+tree gfor_fndecl_caf_get_remote_function_index;
+tree gfor_fndecl_caf_get_by_ct;
+
 tree gfor_fndecl_caf_sync_all;
 tree gfor_fndecl_caf_sync_memory;
 tree gfor_fndecl_caf_sync_images;
@@ -3982,11 +3991,12 @@  gfc_build_builtin_function_decls (void)
   /* Coarray library calls.  */
   if (flag_coarray == GFC_FCOARRAY_LIB)
     {
-      tree pint_type, pppchar_type;
+      tree pint_type, pppchar_type, psize_type;

       pint_type = build_pointer_type (integer_type_node);
       pppchar_type
 	= build_pointer_type (build_pointer_type (pchar_type_node));
+      psize_type = build_pointer_type (size_type_node);

       gfor_fndecl_caf_init = gfc_build_library_function_decl_with_spec (
 	get_identifier (PREFIX("caf_init")), ". W W ",
@@ -4015,6 +4025,7 @@  gfc_build_builtin_function_decls (void)
 	ppvoid_type_node, integer_type_node, pint_type, pchar_type_node,
 	size_type_node);

+      // Deprecate start
       gfor_fndecl_caf_get = gfc_build_library_function_decl_with_spec (
 	get_identifier (PREFIX("caf_get")), ". r . . r r w . . . w ",
 	void_type_node, 10,
@@ -4058,6 +4069,30 @@  gfc_build_builtin_function_decls (void)
 	    pvoid_type_node, integer_type_node, integer_type_node,
 	    boolean_type_node, pint_type, pint_type, integer_type_node,
 	    integer_type_node);
+      // Deprecate end
+
+      gfor_fndecl_caf_register_accessor
+	= gfc_build_library_function_decl_with_spec (
+	  get_identifier (PREFIX ("caf_register_accessor")), ". r r ",
+	  void_type_node, 2, integer_type_node, pvoid_type_node);
+
+      gfor_fndecl_caf_register_accessors_finish
+	= gfc_build_library_function_decl_with_spec (
+	  get_identifier (PREFIX ("caf_register_accessors_finish")), ". ",
+	  void_type_node, 0);
+
+      gfor_fndecl_caf_get_remote_function_index
+	= gfc_build_library_function_decl_with_spec (
+	  get_identifier (PREFIX ("caf_get_remote_function_index")), ". r ",
+	  integer_type_node, 1, integer_type_node);
+
+      gfor_fndecl_caf_get_by_ct = gfc_build_library_function_decl_with_spec (
+	get_identifier (PREFIX ("caf_get_by_ct")),
+	". r r r r r w w w r r w r w r r ", void_type_node, 15, pvoid_type_node,
+	pvoid_type_node, psize_type, integer_type_node, size_type_node,
+	ppvoid_type_node, psize_type, pvoid_type_node, boolean_type_node,
+	integer_type_node, pvoid_type_node, size_type_node, pint_type,
+	pvoid_type_node, pint_type);

       gfor_fndecl_caf_sync_all = gfc_build_library_function_decl_with_spec (
 	get_identifier (PREFIX("caf_sync_all")), ". w w . ", void_type_node,
@@ -5554,7 +5589,7 @@  gfc_create_module_variable (gfc_symbol * sym)

   if (sym->attr.codimension && !sym->attr.dummy && !sym->attr.allocatable
       && sym->attr.referenced && !sym->attr.use_assoc)
-    has_coarray_vars = true;
+    has_coarray_vars_or_accessors = true;
 }

 /* Emit debug information for USE statements.  */
@@ -5937,6 +5972,49 @@  generate_coarray_sym_init (gfc_symbol *sym)
     }
 }

+struct caf_accessor
+{
+  struct caf_accessor *next;
+  gfc_expr *hash, *fdecl;
+};
+
+static struct caf_accessor *caf_accessor_head = NULL;
+
+void
+gfc_add_caf_accessor (gfc_expr *h, gfc_expr *f)
+{
+  struct caf_accessor *n = XCNEW (struct caf_accessor);
+  n->next = caf_accessor_head;
+  n->hash = h;
+  n->fdecl = f;
+  caf_accessor_head = n;
+}
+
+void
+create_caf_accessor_register (stmtblock_t *block)
+{
+  gfc_se se;
+  tree hash, fdecl;
+  gfc_init_se (&se, NULL);
+  for (struct caf_accessor *curr = caf_accessor_head; curr;)
+    {
+      gfc_conv_expr (&se, curr->hash);
+      hash = se.expr;
+      gfc_conv_expr (&se, curr->fdecl);
+      fdecl = se.expr;
+      TREE_USED (fdecl) = 1;
+      TREE_STATIC (fdecl) = 1;
+      gcc_assert (FUNCTION_POINTER_TYPE_P (TREE_TYPE (fdecl)));
+      gfc_add_expr_to_block (
+	block, build_call_expr (gfor_fndecl_caf_register_accessor, 2, hash,
+				/*gfc_build_addr_expr (NULL_TREE,*/ fdecl));
+      curr = curr->next;
+      free (caf_accessor_head);
+      caf_accessor_head = curr;
+    }
+  gfc_add_expr_to_block (
+    block, build_call_expr (gfor_fndecl_caf_register_accessors_finish, 0));
+}

 /* Generate constructor function to initialize static, nonallocatable
    coarrays.  */
@@ -5973,6 +6051,8 @@  generate_coarray_init (gfc_namespace *ns)
   pushlevel ();
   gfc_init_block (&caf_init_block);

+  create_caf_accessor_register (&caf_init_block);
+
   gfc_traverse_ns (ns, generate_coarray_sym_init);

   DECL_SAVED_TREE (fndecl) = gfc_finish_block (&caf_init_block);
@@ -6028,13 +6108,13 @@  gfc_generate_module_vars (gfc_namespace * ns)
   /* Generate COMMON blocks.  */
   gfc_trans_common (ns);

-  has_coarray_vars = false;
+  has_coarray_vars_or_accessors = caf_accessor_head != NULL;

   /* Create decls for all the module variables.  */
   gfc_traverse_ns (ns, gfc_create_module_variable);
   gfc_traverse_ns (ns, create_module_nml_decl);

-  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars)
+  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars_or_accessors)
     generate_coarray_init (ns);

   cur_module = NULL;
@@ -6135,7 +6215,7 @@  generate_local_decl (gfc_symbol * sym)
     {
       if (sym->attr.codimension && !sym->attr.dummy && !sym->attr.allocatable
 	  && sym->attr.referenced && !sym->attr.use_assoc)
-	has_coarray_vars = true;
+	has_coarray_vars_or_accessors = true;

       if (!sym->attr.dummy && !sym->ns->proc_name->attr.entry_master)
 	generate_dependency_declarations (sym);
@@ -7889,10 +7969,10 @@  gfc_generate_function_code (gfc_namespace * ns)

   gfc_generate_contained_functions (ns);

-  has_coarray_vars = false;
+  has_coarray_vars_or_accessors = caf_accessor_head != NULL;
   generate_local_vars (ns);

-  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars)
+  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars_or_accessors)
     generate_coarray_init (ns);

   /* Keep the parent fake result declaration in module functions
@@ -8113,7 +8193,7 @@  gfc_generate_function_code (gfc_namespace * ns)
 	 If there are static coarrays in this function, the nested _caf_init
 	 function has already called cgraph_create_node, which also created
 	 the cgraph node for this function.  */
-      if (!has_coarray_vars || flag_coarray != GFC_FCOARRAY_LIB)
+      if (!has_coarray_vars_or_accessors || flag_coarray != GFC_FCOARRAY_LIB)
 	(void) cgraph_node::get_create (fndecl);
     }
   else
@@ -8240,11 +8320,11 @@  gfc_process_block_locals (gfc_namespace* ns)
   tree decl;

   saved_local_decls = NULL_TREE;
-  has_coarray_vars = false;
+  has_coarray_vars_or_accessors = caf_accessor_head != NULL;

   generate_local_vars (ns);

-  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars)
+  if (flag_coarray == GFC_FCOARRAY_LIB && has_coarray_vars_or_accessors)
     generate_coarray_init (ns);

   decl = nreverse (saved_local_decls);
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 41a1739080e..5aec4a92f92 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -42,6 +42,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "dependency.h"	/* For CAF array alias analysis.  */
 #include "attribs.h"
 #include "realmpfr.h"
+#include "constructor.h"

 /* Only for gfc_trans_assign and gfc_trans_pointer_assign.  */

@@ -1667,31 +1668,59 @@  conv_expr_ref_to_caf_ref (stmtblock_t *block, gfc_expr *expr)
 			      : NULL_TREE;
 }

+static tree
+conv_shape_to_cst (gfc_expr *e)
+{
+  tree tmp = NULL;
+  for (int d = 0; d < e->rank; ++d)
+    {
+      if (!tmp)
+	tmp = gfc_conv_mpz_to_tree (e->shape[d], gfc_size_kind);
+      else
+	tmp = fold_build2 (MULT_EXPR, TREE_TYPE (tmp), tmp,
+			   gfc_conv_mpz_to_tree (e->shape[d], gfc_size_kind));
+    }
+  return fold_convert (size_type_node, tmp);
+}
+
 /* Get data from a remote coarray.  */

 static void
-gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
-			    tree may_require_tmp, bool may_realloc,
-			    symbol_attribute *caf_attr)
+gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs,
+			    bool may_realloc, symbol_attribute *caf_attr)
 {
+  static int call_cnt = 0;
   gfc_expr *array_expr, *tmp_stat;
   gfc_se argse;
-  tree caf_decl, token, offset, image_index, tmp;
-  tree res_var, dst_var, type, kind, vec, stat;
-  tree caf_reference;
+  tree caf_decl, token, image_index, tmp, res_var, type, stat, dest_size,
+    dest_data, opt_dest_desc, rget_index_tree, rget_data_tree, rget_data_size,
+    opt_src_desc, opt_src_charlen, opt_dest_charlen;
   symbol_attribute caf_attr_store;
+  gfc_namespace *ns;
+  gfc_expr *rget_hash = expr->value.function.actual->next->expr,
+	   *rget_fn_expr = expr->value.function.actual->next->next->expr;
+  gfc_symbol *gdata_sym
+    = rget_fn_expr->symtree->n.sym->formal->next->next->next->sym;
+  gfc_expr rget_data, rget_data_init, rget_index;
+  char *name;
+  gfc_symtree *data_st, *index_st;
+  gfc_constructor *con;
+  stmtblock_t blk;

   gcc_assert (flag_coarray == GFC_FCOARRAY_LIB);

   if (se->ss && se->ss->info->useflags)
     {
-       /* Access the previously obtained result.  */
-       gfc_conv_tmp_array_ref (se);
-       return;
+      /* Access the previously obtained result.  */
+      gfc_conv_tmp_array_ref (se);
+      return;
     }

-  /* If lhs is set, the CAF_GET intrinsic has already been stripped.  */
-  array_expr = (lhs == NULL_TREE) ? expr->value.function.actual->expr : expr;
+  array_expr = expr->value.function.actual->expr;
+  ns = array_expr->expr_type == EXPR_VARIABLE
+	   && !array_expr->symtree->n.sym->attr.associate_var
+	 ? array_expr->symtree->n.sym->ns
+	 : gfc_current_ns;
   type = gfc_typenode_for_spec (&array_expr->ts);

   if (caf_attr == NULL)
@@ -1701,9 +1730,7 @@  gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
     }

   res_var = lhs;
-  dst_var = lhs;

-  vec = null_pointer_node;
   tmp_stat = gfc_find_stat_co (expr);

   if (tmp_stat)
@@ -1718,198 +1745,173 @@  gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   else
     stat = null_pointer_node;

-  /* Only use the new get_by_ref () where it is necessary.  I.e., when the lhs
-     is reallocatable or the right-hand side has allocatable components.  */
-  if (caf_attr->alloc_comp || caf_attr->pointer_comp || may_realloc)
-    {
-      /* Get using caf_get_by_ref.  */
-      caf_reference = conv_expr_ref_to_caf_ref (&se->pre, array_expr);
-
-      if (caf_reference != NULL_TREE)
-	{
-	  if (lhs == NULL_TREE)
-	    {
-	      if (array_expr->ts.type == BT_CHARACTER)
-		gfc_init_se (&argse, NULL);
-	      if (array_expr->rank == 0)
-		{
-		  symbol_attribute attr;
-		  gfc_clear_attr (&attr);
-		  if (array_expr->ts.type == BT_CHARACTER)
-		    {
-		      res_var = gfc_conv_string_tmp (se,
-						     build_pointer_type (type),
-					     array_expr->ts.u.cl->backend_decl);
-		      argse.string_length = array_expr->ts.u.cl->backend_decl;
-		    }
-		  else
-		    res_var = gfc_create_var (type, "caf_res");
-		  dst_var = gfc_conv_scalar_to_descriptor (se, res_var, attr);
-		  dst_var = gfc_build_addr_expr (NULL_TREE, dst_var);
-		}
-	      else
-		{
-		  /* Create temporary.  */
-		  if (array_expr->ts.type == BT_CHARACTER)
-		    gfc_conv_expr_descriptor (&argse, array_expr);
-		  may_realloc = gfc_trans_create_temp_array (&se->pre,
-							     &se->post,
-							     se->ss, type,
-							     NULL_TREE, false,
-							     false, false,
-							     &array_expr->where)
-		      == NULL_TREE;
-		  res_var = se->ss->info->data.array.descriptor;
-		  dst_var = gfc_build_addr_expr (NULL_TREE, res_var);
-		  if (may_realloc)
-		    {
-		      tmp = gfc_conv_descriptor_data_get (res_var);
-		      tmp = gfc_deallocate_with_status (tmp, NULL_TREE,
-							NULL_TREE, NULL_TREE,
-							NULL_TREE, true,
-							NULL,
-						     GFC_CAF_COARRAY_NOCOARRAY);
-		      gfc_add_expr_to_block (&se->post, tmp);
-		    }
-		}
-	    }
-
-	  kind = build_int_cst (integer_type_node, expr->ts.kind);
-	  if (lhs_kind == NULL_TREE)
-	    lhs_kind = kind;
-
-	  caf_decl = gfc_get_tree_for_caf_expr (array_expr);
-	  if (TREE_CODE (TREE_TYPE (caf_decl)) == REFERENCE_TYPE)
-	    caf_decl = build_fold_indirect_ref_loc (input_location, caf_decl);
-	  image_index = gfc_caf_get_image_index (&se->pre, array_expr,
-						 caf_decl);
-	  gfc_get_caf_token_offset (se, &token, NULL, caf_decl, NULL,
-				    array_expr);
-
-	  /* No overlap possible as we have generated a temporary.  */
-	  if (lhs == NULL_TREE)
-	    may_require_tmp = boolean_false_node;
-
-	  /* It guarantees memory consistency within the same segment.  */
-	  tmp = gfc_build_string_const (strlen ("memory") + 1, "memory");
-	  tmp = build5_loc (input_location, ASM_EXPR, void_type_node,
-			    gfc_build_string_const (1, ""), NULL_TREE,
-			    NULL_TREE, tree_cons (NULL_TREE, tmp, NULL_TREE),
-			    NULL_TREE);
-	  ASM_VOLATILE_P (tmp) = 1;
-	  gfc_add_expr_to_block (&se->pre, tmp);
+  memset (&rget_data, 0, sizeof (gfc_expr));
+  gfc_clear_ts (&rget_data.ts);
+  rget_data.expr_type = EXPR_VARIABLE;
+  name = xasprintf ("__caf_rget_data_%d", call_cnt);
+  gcc_assert (!gfc_get_sym_tree (name, ns, &data_st, false));
+  name = xasprintf ("__caf_rget_index_%d", call_cnt);
+  ++call_cnt;
+  gcc_assert (!gfc_get_sym_tree (name, ns, &index_st, false));
+  free (name);
+  data_st->n.sym->attr.flavor = FL_VARIABLE;
+  data_st->n.sym->ts = gdata_sym->ts;
+  rget_data.symtree = data_st;
+  gfc_set_sym_referenced (rget_data.symtree->n.sym);
+  rget_data.ts = data_st->n.sym->ts;
+  gfc_commit_symbol (data_st->n.sym);
+
+  memset (&rget_data_init, 0, sizeof (gfc_expr));
+  gfc_clear_ts (&rget_data_init.ts);
+  rget_data_init.expr_type = EXPR_STRUCTURE;
+  rget_data_init.ts = rget_data.ts;
+  for (gfc_component *comp = rget_data.ts.u.derived->components; comp;
+       comp = comp->next)
+    {
+      con = gfc_constructor_get ();
+      con->expr = comp->initializer;
+      comp->initializer = NULL;
+      gfc_constructor_append (&rget_data_init.value.constructor, con);
+    }
+
+  index_st->n.sym->attr.flavor = FL_VARIABLE;
+  index_st->n.sym->attr.save = SAVE_EXPLICIT;
+  index_st->n.sym->value
+    = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
+			     &gfc_current_locus);
+  mpz_init_set_si (index_st->n.sym->value->value.integer, -1);
+  index_st->n.sym->ts.type = BT_INTEGER;
+  index_st->n.sym->ts.kind = gfc_default_integer_kind;
+  gfc_set_sym_referenced (index_st->n.sym);
+  memset (&rget_index, 0, sizeof (gfc_expr));
+  gfc_clear_ts (&rget_index.ts);
+  rget_index.expr_type = EXPR_VARIABLE;
+  rget_index.symtree = index_st;
+  rget_index.ts = index_st->n.sym->ts;
+  gfc_commit_symbol (index_st->n.sym);

-	  tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get_by_ref,
-				     10, token, image_index, dst_var,
-				     caf_reference, lhs_kind, kind,
-				     may_require_tmp,
-				     may_realloc ? boolean_true_node :
-						   boolean_false_node,
-				     stat, build_int_cst (integer_type_node,
-							  array_expr->ts.type));
+  gfc_init_se (&argse, NULL);
+  gfc_conv_expr (&argse, &rget_index);
+  gfc_add_block_to_block (&se->pre, &argse.pre);
+  rget_index_tree = argse.expr;

-	  gfc_add_expr_to_block (&se->pre, tmp);
+  gfc_init_se (&argse, NULL);
+  gfc_conv_expr (&argse, rget_hash);

-	  if (se->ss)
-	    gfc_advance_se_ss_chain (se);
+  gfc_init_block (&blk);
+  tmp = build_call_expr (gfor_fndecl_caf_get_remote_function_index, 1,
+			 argse.expr);

-	  se->expr = res_var;
-	  if (array_expr->ts.type == BT_CHARACTER)
-	    se->string_length = argse.string_length;
+  gfc_add_modify (&blk, rget_index_tree, tmp);
+  gfc_add_expr_to_block (
+    &se->pre,
+    build3 (COND_EXPR, void_type_node,
+	    gfc_likely (build2 (EQ_EXPR, logical_type_node, rget_index_tree,
+				build_int_cst (integer_type_node, -1)),
+			PRED_FIRST_MATCH),
+	    gfc_finish_block (&blk), NULL_TREE));

-	  return;
-	}
+  if (rget_data.ts.u.derived->components)
+    {
+      gfc_init_se (&argse, NULL);
+      gfc_conv_expr (&argse, &rget_data);
+      rget_data_tree = argse.expr;
+      gfc_add_expr_to_block (&se->pre,
+			     gfc_trans_structure_assign (rget_data_tree,
+							 &rget_data_init, true,
+							 false));
+      gfc_constructor_free (rget_data_init.value.constructor);
+      rget_data_size = TREE_TYPE (rget_data_tree)->type_common.size_unit;
+      rget_data_tree = gfc_build_addr_expr (pvoid_type_node, rget_data_tree);
+    }
+  else
+    {
+      rget_data_tree = build_zero_cst (pvoid_type_node);
+      rget_data_size = build_zero_cst (size_type_node);
     }

-  gfc_init_se (&argse, NULL);
   if (array_expr->rank == 0)
     {
-      symbol_attribute attr;
-
-      gfc_clear_attr (&attr);
-      gfc_conv_expr (&argse, array_expr);
-
-      if (lhs == NULL_TREE)
+      res_var = gfc_create_var (type, "caf_res");
+      if (array_expr->ts.type == BT_CHARACTER)
 	{
-	  gfc_clear_attr (&attr);
-	  if (array_expr->ts.type == BT_CHARACTER)
-	    res_var = gfc_conv_string_tmp (se, build_pointer_type (type),
-					   argse.string_length);
-	  else
-	    res_var = gfc_create_var (type, "caf_res");
-	  dst_var = gfc_conv_scalar_to_descriptor (&argse, res_var, attr);
-	  dst_var = gfc_build_addr_expr (NULL_TREE, dst_var);
+	  gfc_conv_string_length (array_expr->ts.u.cl, array_expr, &se->pre);
+	  argse.string_length = array_expr->ts.u.cl->backend_decl;
+	  opt_src_charlen = gfc_build_addr_expr (
+	    NULL_TREE, gfc_trans_force_lval (&se->pre, argse.string_length));
+	  dest_size = build_int_cstu (size_type_node, array_expr->ts.kind);
+	}
+      else
+	{
+	  dest_size = res_var->typed.type->type_common.size_unit;
+	  opt_src_charlen
+	    = build_zero_cst (build_pointer_type (size_type_node));
 	}
-      argse.expr = gfc_conv_scalar_to_descriptor (&argse, argse.expr, attr);
-      argse.expr = gfc_build_addr_expr (NULL_TREE, argse.expr);
+      dest_data
+	= gfc_evaluate_now (gfc_build_addr_expr (NULL_TREE, res_var), &se->pre);
+      res_var = build_fold_indirect_ref (dest_data);
+      dest_data = gfc_build_addr_expr (pvoid_type_node, dest_data);
+      opt_dest_desc = build_zero_cst (pvoid_type_node);
     }
   else
     {
-      /* If has_vector, pass descriptor for whole array and the
-         vector bounds separately.  */
-      gfc_array_ref *ar, ar2;
-      bool has_vector = false;
-
-      if (gfc_is_coindexed (expr) && gfc_has_vector_subscript (expr))
+      /* Create temporary.  */
+      may_realloc = gfc_trans_create_temp_array (&se->pre, &se->post, se->ss,
+						 type, NULL_TREE, false, false,
+						 false, &array_expr->where)
+		    == NULL_TREE;
+      res_var = se->ss->info->data.array.descriptor;
+      if (array_expr->ts.type == BT_CHARACTER)
 	{
-          has_vector = true;
-          ar = gfc_find_array_ref (expr);
-	  ar2 = *ar;
-	  memset (ar, '\0', sizeof (*ar));
-	  ar->as = ar2.as;
-	  ar->type = AR_FULL;
-	}
-      // TODO: Check whether argse.want_coarray = 1 can help with the below.
-      gfc_conv_expr_descriptor (&argse, array_expr);
-      /* Using gfc_conv_expr_descriptor, we only get the descriptor, but that
-	 has the wrong type if component references are done.  */
-      gfc_add_modify (&argse.pre, gfc_conv_descriptor_dtype (argse.expr),
-		      gfc_get_dtype_rank_type (has_vector ? ar2.dimen
-							  : array_expr->rank,
-					       type));
-      if (has_vector)
-	{
-	  vec = conv_caf_vector_subscript (&argse.pre, argse.expr, &ar2);
-	  *ar = ar2;
+	  argse.string_length = array_expr->ts.u.cl->backend_decl;
+	  opt_src_charlen = gfc_build_addr_expr (
+	    NULL_TREE, gfc_trans_force_lval (&se->pre, argse.string_length));
+	  dest_size = build_int_cstu (size_type_node, array_expr->ts.kind);
 	}
-
-      if (lhs == NULL_TREE)
+      else
 	{
-	  /* Create temporary.  */
-	  for (int n = 0; n < se->ss->loop->dimen; n++)
-	    if (se->loop->to[n] == NULL_TREE)
-	      {
-		se->loop->from[n] = gfc_conv_descriptor_lbound_get (argse.expr,
-							       gfc_rank_cst[n]);
-		se->loop->to[n] = gfc_conv_descriptor_ubound_get (argse.expr,
-							       gfc_rank_cst[n]);
-	      }
-	  gfc_trans_create_temp_array (&argse.pre, &argse.post, se->ss, type,
-				       NULL_TREE, false, true, false,
-				       &array_expr->where);
-	  res_var = se->ss->info->data.array.descriptor;
-	  dst_var = gfc_build_addr_expr (NULL_TREE, res_var);
-	}
-      argse.expr = gfc_build_addr_expr (NULL_TREE, argse.expr);
-    }
-
-  kind = build_int_cst (integer_type_node, expr->ts.kind);
-  if (lhs_kind == NULL_TREE)
-    lhs_kind = kind;
-
-  gfc_add_block_to_block (&se->pre, &argse.pre);
-  gfc_add_block_to_block (&se->post, &argse.post);
-
+	  opt_src_charlen
+	    = build_zero_cst (build_pointer_type (size_type_node));
+	  dest_size = fold_build2 (
+	    MULT_EXPR, size_type_node,
+	    fold_convert (size_type_node,
+			  array_expr->shape
+			    ? conv_shape_to_cst (array_expr)
+			    : gfc_conv_descriptor_size (res_var,
+							array_expr->rank)),
+	    fold_convert (size_type_node,
+			  gfc_conv_descriptor_span_get (res_var)));
+	}
+      opt_dest_desc = res_var;
+      dest_data = gfc_conv_descriptor_data_get (res_var);
+      opt_dest_desc = gfc_build_addr_expr (NULL_TREE, opt_dest_desc);
+      if (may_realloc)
+	{
+	  tmp = gfc_conv_descriptor_data_get (res_var);
+	  tmp = gfc_deallocate_with_status (tmp, NULL_TREE, NULL_TREE,
+					    NULL_TREE, NULL_TREE, true, NULL,
+					    GFC_CAF_COARRAY_NOCOARRAY);
+	  gfc_add_expr_to_block (&se->post, tmp);
+	}
+      dest_data
+	= gfc_build_addr_expr (NULL_TREE,
+			       gfc_trans_force_lval (&se->pre, dest_data));
+    }
+
+  opt_dest_charlen = opt_src_charlen;
   caf_decl = gfc_get_tree_for_caf_expr (array_expr);
-  if (POINTER_TYPE_P (TREE_TYPE (caf_decl)))
+  if (TREE_CODE (TREE_TYPE (caf_decl)) == REFERENCE_TYPE)
     caf_decl = build_fold_indirect_ref_loc (input_location, caf_decl);
-  image_index = gfc_caf_get_image_index (&se->pre, array_expr, caf_decl);
-  gfc_get_caf_token_offset (se, &token, &offset, caf_decl, argse.expr,
-			    array_expr);

-  /* No overlap possible as we have generated a temporary.  */
-  if (lhs == NULL_TREE)
-    may_require_tmp = boolean_false_node;
+  if (!TYPE_LANG_SPECIFIC (TREE_TYPE (caf_decl))->rank)
+    opt_src_desc = build_zero_cst (pvoid_type_node);
+  else if (GFC_ARRAY_TYPE_P (TREE_TYPE (caf_decl)))
+    opt_src_desc = build_minus_one_cst (pvoid_type_node);
+  else
+    opt_src_desc = gfc_build_addr_expr (pvoid_type_node, caf_decl);
+
+  image_index = gfc_caf_get_image_index (&se->pre, array_expr, caf_decl);
+  gfc_get_caf_token_offset (se, &token, NULL, caf_decl, NULL, array_expr);

   /* It guarantees memory consistency within the same segment.  */
   tmp = gfc_build_string_const (strlen ("memory") + 1, "memory");
@@ -1919,9 +1921,12 @@  gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   ASM_VOLATILE_P (tmp) = 1;
   gfc_add_expr_to_block (&se->pre, tmp);

-  tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_get, 10,
-			     token, offset, image_index, argse.expr, vec,
-			     dst_var, kind, lhs_kind, may_require_tmp, stat);
+  tmp = build_call_expr_loc (
+    input_location, gfor_fndecl_caf_get_by_ct, 15, token, opt_src_desc,
+    opt_src_charlen, image_index, dest_size, dest_data, opt_dest_charlen,
+    opt_dest_desc, constant_boolean_node (may_realloc, boolean_type_node),
+    rget_index_tree, rget_data_tree, rget_data_size, stat, null_pointer_node,
+    null_pointer_node);

   gfc_add_expr_to_block (&se->pre, tmp);

@@ -1931,6 +1936,8 @@  gfc_conv_intrinsic_caf_get (gfc_se *se, gfc_expr *expr, tree lhs, tree lhs_kind,
   se->expr = res_var;
   if (array_expr->ts.type == BT_CHARACTER)
     se->string_length = argse.string_length;
+
+  return;
 }

 static bool
@@ -1995,8 +2002,9 @@  conv_caf_send (gfc_code *code) {
 	  gfc_clear_attr (&attr);
 	  gfc_conv_expr (&lhs_se, lhs_expr);
 	  lhs_type = TREE_TYPE (lhs_se.expr);
-	  lhs_se.expr = gfc_conv_scalar_to_descriptor (&lhs_se, lhs_se.expr,
-						       attr);
+	  if (lhs_is_coindexed)
+	    lhs_se.expr = gfc_conv_scalar_to_descriptor (&lhs_se, lhs_se.expr,
+							 attr);
 	  lhs_se.expr = gfc_build_addr_expr (NULL_TREE, lhs_se.expr);
 	}
     }
@@ -2174,17 +2182,13 @@  conv_caf_send (gfc_code *code) {
 	lhs_may_realloc = lhs_may_realloc
 	    && gfc_full_array_ref_p (lhs_expr->ref, NULL);
       gfc_add_block_to_block (&block, &lhs_se.pre);
-      gfc_conv_intrinsic_caf_get (&rhs_se, rhs_expr, lhs_se.expr, lhs_kind,
-				  may_require_tmp, lhs_may_realloc,
-				  &rhs_caf_attr);
+      gfc_conv_intrinsic_caf_get (&rhs_se, code->ext.actual->next->expr,
+				  lhs_se.expr, lhs_may_realloc, &rhs_caf_attr);
       gfc_add_block_to_block (&block, &rhs_se.pre);
       gfc_add_block_to_block (&block, &rhs_se.post);
       gfc_add_block_to_block (&block, &lhs_se.post);
       return gfc_finish_block (&block);
     }
-  else if (rhs_expr->expr_type == EXPR_FUNCTION
-	   && rhs_expr->value.function.isym->id == GFC_ISYM_CAF_GET)
-    rhs_expr = rhs_expr->value.function.actual->expr;

   gfc_add_block_to_block (&block, &lhs_se.pre);

@@ -2301,8 +2305,8 @@  conv_caf_send (gfc_code *code) {
 	{
 	  tree reference, dst_realloc;
 	  reference = conv_expr_ref_to_caf_ref (&block, lhs_expr);
-	  dst_realloc = lhs_caf_attr.allocatable ? boolean_true_node
-					     : boolean_false_node;
+	  dst_realloc
+	    = lhs_caf_attr.allocatable ? boolean_true_node : boolean_false_node;
 	  tmp = build_call_expr_loc (input_location,
 				     gfor_fndecl_caf_send_by_ref,
 				     10, token, image_index, rhs_se.expr,
@@ -2310,7 +2314,7 @@  conv_caf_send (gfc_code *code) {
 				     may_require_tmp, dst_realloc, src_stat,
 				     build_int_cst (integer_type_node,
 						    lhs_expr->ts.type));
-	  }
+	}
       else
 	tmp = build_call_expr_loc (input_location, gfor_fndecl_caf_send, 11,
 				   token, offset, image_index, lhs_se.expr, vec,
@@ -11290,8 +11294,7 @@  gfc_conv_intrinsic_function (gfc_se * se, gfc_expr * expr)
       break;

     case GFC_ISYM_CAF_GET:
-      gfc_conv_intrinsic_caf_get (se, expr, NULL_TREE, NULL_TREE, NULL_TREE,
-				  false, NULL);
+      gfc_conv_intrinsic_caf_get (se, expr, NULL_TREE, false, NULL);
       break;

     case GFC_ISYM_CMPLX:
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 604cb53f417..caf95d65340 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -241,6 +241,16 @@  gfc_add_modify (stmtblock_t * pblock, tree lhs, tree rhs)
   gfc_add_modify_loc (input_location, pblock, lhs, rhs);
 }

+tree
+gfc_trans_force_lval (stmtblock_t *pblock, tree e)
+{
+  if (VAR_P (e))
+    return e;
+
+  tree v = gfc_create_var (TREE_TYPE (e), NULL);
+  gfc_add_modify (pblock, v, e);
+  return v;
+}

 /* Create a new scope/binding level and initialize a block.  Care must be
    taken when translating expressions as any temporaries will be placed in
diff --git a/gcc/fortran/trans.h b/gcc/fortran/trans.h
index 4679ea0d6e1..608e8e5132c 100644
--- a/gcc/fortran/trans.h
+++ b/gcc/fortran/trans.h
@@ -493,6 +493,8 @@  void gfc_init_se (gfc_se *, gfc_se *);
 tree gfc_create_var (tree, const char *);
 /* Like above but doesn't add it to the current scope.  */
 tree gfc_create_var_np (tree, const char *);
+/* Ensure that tree can be used as an lvalue.  */
+tree gfc_trans_force_lval (stmtblock_t *, tree);

 /* Store the result of an expression in a temp variable so it can be used
    repeatedly even if the original changes */
@@ -881,12 +883,21 @@  extern GTY(()) tree gfor_fndecl_caf_this_image;
 extern GTY(()) tree gfor_fndecl_caf_num_images;
 extern GTY(()) tree gfor_fndecl_caf_register;
 extern GTY(()) tree gfor_fndecl_caf_deregister;
+
+// Deprecate start
 extern GTY(()) tree gfor_fndecl_caf_get;
 extern GTY(()) tree gfor_fndecl_caf_send;
 extern GTY(()) tree gfor_fndecl_caf_sendget;
 extern GTY(()) tree gfor_fndecl_caf_get_by_ref;
 extern GTY(()) tree gfor_fndecl_caf_send_by_ref;
 extern GTY(()) tree gfor_fndecl_caf_sendget_by_ref;
+// Deprecate end
+
+extern GTY (()) tree gfor_fndecl_caf_register_accessor;
+extern GTY (()) tree gfor_fndecl_caf_register_accessors_finish;
+extern GTY (()) tree gfor_fndecl_caf_get_remote_function_index;
+extern GTY (()) tree gfor_fndecl_caf_get_by_ct;
+
 extern GTY(()) tree gfor_fndecl_caf_sync_all;
 extern GTY(()) tree gfor_fndecl_caf_sync_memory;
 extern GTY(()) tree gfor_fndecl_caf_sync_images;
diff --git a/gcc/testsuite/gfortran.dg/coarray_atomic_5.f90 b/gcc/testsuite/gfortran.dg/coarray_atomic_5.f90
index 005f3e5eae8..70c3d2ff4eb 100644
--- a/gcc/testsuite/gfortran.dg/coarray_atomic_5.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_atomic_5.f90
@@ -20,6 +20,6 @@  program atomic
 end program

 ! { dg-final { scan-tree-dump-times "value.. = 0;" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_define \\(caf_token.0, 0, 1, &value.., 0B, 1, 4\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_op \\(1, caf_token.0, 0, 1, &me, 0B, 0B, 1, 4\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_ref \\(caf_token.0, 0, 1, &me, 0B, 1, 4\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_define \\(caf_token.., 0, 1, &value.., 0B, 1, 4\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_op \\(1, caf_token.., 0, 1, &me, 0B, 0B, 1, 4\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_atomic_ref \\(caf_token.., 0, 1, &me, 0B, 1, 4\\);" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/coarray_lib_comm_1.f90 b/gcc/testsuite/gfortran.dg/coarray_lib_comm_1.f90
index a8954e7afa3..0650eedfde0 100644
--- a/gcc/testsuite/gfortran.dg/coarray_lib_comm_1.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_lib_comm_1.f90
@@ -38,7 +38,6 @@  B(1:5) = B(3:7)
 if (any (A-B /= 0)) STOP 4
 end

-! { dg-final { scan-tree-dump-times "_gfortran_caf_get \\\(caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) a, 1, &parm.\[0-9\]+, 0B, &parm.\[0-9\]+, 4, 4, 1, 0B\\\);" 3 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_get \\\(caf_token.1, \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) b, 1, &parm.\[0-9\]+, 0B, &parm.\[0-9\]+, 4, 4, 0, 0B\\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_sendget \\\(caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) a, 1, &parm.\[0-9\]+, 0B, caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) a, 1, &parm.\[0-9\]+, 0B, 4, 4, 1, 0B\\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_get_by_ct \\\(caf_token.., -1B, 0B, 1, \\\(unsigned long\\\) atmp.\[0-9\]+.span" 4 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_sendget \\\(caf_token.., \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) a, 1, &parm.\[0-9\]+, 0B, caf_token.., \\\(integer\\\(kind=\[48\]\\\)\\\) parm.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) a, 1, &parm.\[0-9\]+, 0B, 4, 4, 1, 0B\\\);" 1 "original" } }

diff --git a/gcc/testsuite/gfortran.dg/coarray_stat_function.f90 b/gcc/testsuite/gfortran.dg/coarray_stat_function.f90
index c29687efbe2..4d85b6ca852 100644
--- a/gcc/testsuite/gfortran.dg/coarray_stat_function.f90
+++ b/gcc/testsuite/gfortran.dg/coarray_stat_function.f90
@@ -40,6 +40,6 @@  contains

 end program function_stat

-! { dg-final { scan-tree-dump-times "_gfortran_caf_get \\\(caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) desc.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) me, 4, &desc.\[0-9\]+, 0B, &desc.\[0-9\]+, 4, 4, 0, &stat\\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_get \\\(caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) desc.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) me, 1, &desc.\[0-9\]+, 0B, &desc.\[0-9\]+, 4, 4, 0, &stat2\\\);" 1 "original" } }
-! { dg-final { scan-tree-dump-times "_gfortran_caf_get \\\(caf_token.0, \\\(integer\\\(kind=\[48\]\\\)\\\) desc.\[0-9\]+.data - \\\(integer\\\(kind=\[48\]\\\)\\\) me, 3, &desc.\[0-9\]+, 0B, &desc.\[0-9\]+, 4, 4, 0, &stat\\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_get_by_ct \\\(caf_token.., 0B, 0B, 4, 4, \\\(void \\\*\\\) &D....., 0B, 0B, 0, __caf_rget_index_., 0B, 0, &stat, 0B, 0B\\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_get_by_ct \\\(caf_token.., 0B, 0B, 1, 4, \\\(void \\\*\\\) &D....., 0B, 0B, 0, __caf_rget_index_., 0B, 0, &stat2, 0B, 0B\\\);" 1 "original" } }
+! { dg-final { scan-tree-dump-times "_gfortran_caf_get_by_ct \\\(caf_token.., 0B, 0B, 3, 4, \\\(void \\\*\\\) &D....., 0B, 0B, 0, __caf_rget_index_., 0B, 0, &stat, 0B, 0B\\\);" 1 "original" } }
diff --git a/libgfortran/caf/libcaf.h b/libgfortran/caf/libcaf.h
index dde0469c89a..552d1afde5f 100644
--- a/libgfortran/caf/libcaf.h
+++ b/libgfortran/caf/libcaf.h
@@ -237,6 +237,24 @@  void _gfortran_caf_sendget_by_ref (
 	int dst_kind, int src_kind, bool may_require_tmp, int *dst_stat,
 	int *src_stat, int dst_type, int src_type);

+void _gfortran_caf_register_accessor (const int hash,
+				      void (*accessor) (void **, int32_t *,
+							void *, void *,
+							const size_t *,
+							size_t *));
+
+void _gfortran_caf_register_accessors_finish (void);
+
+int _gfortran_caf_get_remote_function_index (const int hash);
+
+void _gfortran_caf_get_by_ct (
+	caf_token_t token, const gfc_descriptor_t *opt_src_desc,
+	const size_t *opt_src_charlen, const int image_index,
+	const size_t dst_size, void **dst_data, size_t *opt_dst_charlen,
+	gfc_descriptor_t *opt_dst_desc, const bool may_realloc_dst,
+	const int getter_index, void *get_data, const size_t get_data_size,
+	int *stat, caf_team_t *team, int *team_number);
+
 void _gfortran_caf_atomic_define (caf_token_t, size_t, int, void *, int *,
 				  int, int);
 void _gfortran_caf_atomic_ref (caf_token_t, size_t, int, void *, int *,
diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c
index 41da970e830..e7de840a05a 100644
--- a/libgfortran/caf/single.c
+++ b/libgfortran/caf/single.c
@@ -57,6 +57,25 @@  typedef struct caf_single_token *caf_single_token_t;
 /* Global variables.  */
 caf_static_t *caf_static_list = NULL;

+typedef void (*accessor_t) (void **, int32_t *, void *, void *, const size_t *,
+			    size_t *);
+struct accessor_hash_t
+{
+  int hash;
+  int pad;
+  accessor_t accessor;
+};
+
+static struct accessor_hash_t *accessor_hash_table = NULL;
+static int aht_cap = 0;
+static int aht_size = 0;
+static enum {
+  AHT_UNINITIALIZED,
+  AHT_OPEN,
+  AHT_PREPARED
+} accessor_hash_table_state
+  = AHT_UNINITIALIZED;
+
 /* Keep in sync with mpi.c.  */
 static void
 caf_runtime_error (const char *message, ...)
@@ -1073,11 +1092,11 @@  _gfortran_caf_send (caf_token_t token, size_t offset,
 				  - dest->dim[j].lower_bound + 1))
 			      * dest->dim[j]._stride;
 	  extent = (dest->dim[j]._ubound - dest->dim[j].lower_bound + 1);
-          stride = dest->dim[j]._stride;
+	  stride = dest->dim[j]._stride;
 	}
-      array_offset_dst += (i / extent) * dest->dim[rank-1]._stride;
-      void *dst = (void *)((char *) MEMTOK (token) + offset
-			   + array_offset_dst*GFC_DESCRIPTOR_SIZE (dest));
+      array_offset_dst += (i / extent) * dest->dim[rank - 1]._stride;
+      void *dst = (void *) ((char *) MEMTOK (token) + offset
+			    + array_offset_dst * dest->span);
       void *sr;
       if (GFC_DESCRIPTOR_RANK (src) != 0)
 	{
@@ -1094,8 +1113,7 @@  _gfortran_caf_send (caf_token_t token, size_t offset,
 	      stride = src->dim[j]._stride;
 	    }
 	  array_offset_sr += (i / extent) * src->dim[rank-1]._stride;
-	  sr = (void *)((char *) src->base_addr
-			+ array_offset_sr*GFC_DESCRIPTOR_SIZE (src));
+	  sr = (void *) ((char *) src->base_addr + array_offset_sr * src->span);
 	}
       else
 	sr = src->base_addr;
@@ -2823,6 +2841,111 @@  _gfortran_caf_sendget_by_ref (caf_token_t dst_token, int dst_image_index,
     free (GFC_DESCRIPTOR_DATA (&temp));
 }

+void
+_gfortran_caf_register_accessor (const int hash, accessor_t accessor)
+{
+  if (accessor_hash_table_state == AHT_UNINITIALIZED)
+    {
+      aht_cap = 16;
+      accessor_hash_table = calloc (aht_cap, sizeof (struct accessor_hash_t));
+      accessor_hash_table_state = AHT_OPEN;
+    }
+  if (aht_size == aht_cap)
+    {
+      aht_cap += 16;
+      accessor_hash_table = realloc (accessor_hash_table,
+				     aht_cap * sizeof (struct accessor_hash_t));
+    }
+  if (accessor_hash_table_state == AHT_PREPARED)
+    {
+      accessor_hash_table_state = AHT_OPEN;
+    }
+  accessor_hash_table[aht_size].hash = hash;
+  accessor_hash_table[aht_size].accessor = accessor;
+  ++aht_size;
+}
+
+static int
+hash_compare (const struct accessor_hash_t *lhs,
+	      const struct accessor_hash_t *rhs)
+{
+  return lhs->hash < rhs->hash ? -1 : (lhs->hash > rhs->hash ? 1 : 0);
+}
+
+void
+_gfortran_caf_register_accessors_finish (void)
+{
+  if (accessor_hash_table_state == AHT_PREPARED
+      || accessor_hash_table_state == AHT_UNINITIALIZED)
+    return;
+
+  qsort (accessor_hash_table, aht_size, sizeof (struct accessor_hash_t),
+	 (int (*) (const void *, const void *)) hash_compare);
+  accessor_hash_table_state = AHT_PREPARED;
+}
+
+int
+_gfortran_caf_get_remote_function_index (const int hash)
+{
+  if (accessor_hash_table_state != AHT_PREPARED)
+    {
+      caf_runtime_error ("the accessor hash table is not prepared.");
+    }
+
+  struct accessor_hash_t cand;
+  cand.hash = hash;
+  struct accessor_hash_t *f
+    = bsearch (&cand, accessor_hash_table, aht_size,
+	       sizeof (struct accessor_hash_t),
+	       (int (*) (const void *, const void *)) hash_compare);
+
+  int index = f ? f - accessor_hash_table : -1;
+  return index;
+}
+
+void
+_gfortran_caf_get_by_ct (
+  caf_token_t token, const gfc_descriptor_t *opt_src_desc,
+  const size_t *opt_src_charlen, const int image_index __attribute__ ((unused)),
+  const size_t dst_size __attribute__ ((unused)), void **dst_data,
+  size_t *opt_dst_charlen, gfc_descriptor_t *opt_dst_desc,
+  const bool may_realloc_dst, const int getter_index, void *get_data,
+  const size_t get_data_size __attribute__ ((unused)), int *stat,
+  caf_team_t *team __attribute__ ((unused)),
+  int *team_number __attribute__ ((unused)))
+{
+  caf_single_token_t single_token = TOKEN (token);
+  void *src_ptr = opt_src_desc
+		    ? (opt_src_desc != (void *) -1 ? (void *) opt_src_desc
+						   : single_token->memptr)
+		    : &single_token->memptr;
+  int free_buffer;
+  void *dst_ptr = opt_dst_desc ? (void *)opt_dst_desc : dst_data;
+  void *old_dst_data_ptr = NULL;
+
+  if (stat)
+    *stat = 0;
+
+  if (opt_dst_desc && !may_realloc_dst)
+    {
+      old_dst_data_ptr = opt_dst_desc->base_addr;
+      opt_dst_desc->base_addr = NULL;
+    }
+
+  accessor_hash_table[getter_index].accessor (dst_ptr, &free_buffer, src_ptr,
+					      get_data, opt_src_charlen,
+					      opt_dst_charlen);
+  if (opt_dst_desc && old_dst_data_ptr && !may_realloc_dst
+      && opt_dst_desc->base_addr != old_dst_data_ptr)
+    {
+      size_t dsize = opt_dst_desc->span;
+      for (int i = 0; i < GFC_DESCRIPTOR_RANK (opt_dst_desc); ++i)
+	dsize *= GFC_DESCRIPTOR_EXTENT (opt_dst_desc, i);
+      memcpy (old_dst_data_ptr, opt_dst_desc->base_addr, dsize);
+      free (opt_dst_desc->base_addr);
+      opt_dst_desc->base_addr = old_dst_data_ptr;
+    }
+}

 void
 _gfortran_caf_atomic_define (caf_token_t token, size_t offset,
--
2.47.0