diff mbox series

[RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components

Message ID d9f56b13-6dd0-4ef5-9052-8cedef51e529@baylibre.com
State New
Headers show
Series [RFC] Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components | expand

Commit Message

Tobias Burnus Sept. 10, 2024, 10:19 a.m. UTC
Background: OpenMP states that for 'map(var)', all allocatable components
of 'var' will automatically also be mapped ('deep mapping').

Thus, for

type(t), allocatable :: var(:)

this leads to a pseudo code like:

   map(var, storage_size(var))
   do i = lbound(var), ubound(var)
     if (allocated(var(i)%comp1) &
       map(var(i)%comp1, storage_size(var(i)%comp1))
   end do

and more complicated, e.g. var(1204)%comp1(395)%str might be
an allocatable scalar. Or var is an recursive type, e.g. it has
'type(t), allocatable :: self' as component such that
   var%self%self%self%self ...
might exist (and 'self' could also be an array …).

* * *

Approach:

The idea is to handle it inlower_omp_target as follows (semi-pseudocode): /* Obtain number of 
additional mappings, in the example above, it would be size(var) * 2 for 
map + attach of 'comp1', assuming all 'var(:)%comp1' are allocated and 
no other alloc comp. exist. */ tree cnt = 
lang_hooks.decls.omp_deep_mapping_cnt (...)   if (cnt)
    deep_map_cnt *= cnt; if (cnt) → switch to pointer type + dynamically 
allocate addrs, kinds, sizes → add 'uintptr_t s[]' as tailing member to 
addr struct.
(Thus, all automatically mapped items are added to the end.)

  In the big map loop, call additionally:
lang_hooks.decls.omp_deep_mapping Additionally, in some cases, the only 
question that needs to be solved is: Does the decl have an allocatable 
component or not. In that case, lang_hooks.decls.omp_deep_mapping_p is 
sufficient. * * * RFC: Does this approach sound sensible? Does the 
attached patch (middle-end part) look reasonable? One downside of the 
current approach is that for map(var) when 'var' is present we still 
attempt to map all allocatable components instead of stopping directly 
after finding 'var' in the splay table. this can be fixed by passing 
more attributes to libgomp, but as the items come last in the list, it 
might be not straight forward. (maybe a starts-here + ends-here flags, 
where the attach next to starts-here flag could be used to do the 
lookup?). This might also lead to cases where an allocatable variable is 
mapped that otherwise would not be mapped. Albeit as 'map(var%comp)' of 
a later allocated 'comp' is only guaranteed to work with the 'always' 
modifier, having it automapped for 'map(var)' should at least not affect 
the values that were mapped. * * * The full patch has been applied to 
OG14 (= devel/omp/gcc-14) branch. The interesting bit are the hook entry 
points gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_cnt, and 
gfc_omp_deep_mapping → 
https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209 
* * * I have attached the middle-end patch, only, of the patch:

https://gcc.gnu.org/g:92c3af3d4f8 Fortran/OpenMP: Support mapping of DT with allocatable components

to focus on that part.

Tobias

PS: In TR13 and also after TR13, a couple of mapping features were added that permit
shallow mapping, unmapping of allocatable components etc. I have not tried to analyze
whether this affects this patch, but I think it remains largely as is.

Comments

Jakub Jelinek Sept. 10, 2024, 10:25 a.m. UTC | #1
On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote:
> Background: OpenMP states that for 'map(var)', all allocatable components
> of 'var' will automatically also be mapped ('deep mapping').

Not a review, just a comment.  This kind of recursive mapping is also
what needs to happen for declare mapper, so wonder if that shouldn't be
solved together; and some way to merge mappings of one field after another
with the same way if consecutive fields (with possibly some padding bits
in between) are mapped the same way.

	Jakub
Tobias Burnus Sept. 10, 2024, 12:02 p.m. UTC | #2
Hi Jakub,

Jakub Jelinek wrote:
> On Tue, Sep 10, 2024 at 12:19:33PM +0200, Tobias Burnus wrote:
>> Background: OpenMP states that for 'map(var)', all allocatable components
>> of 'var' will automatically also be mapped ('deep mapping').
> Not a review, just a comment.  This kind of recursive mapping is also
> what needs to happen for declare mapper, so wonder if that shouldn't be
> solved together; and some way to merge mappings of one field after another
> with the same way if consecutive fields (with possibly some padding bits
> in between) are mapped the same way.

In case mapping Fortran allocatable components, I do not see the padding 
part. For 'map(var)' all of var is mapped, including all array 
descriptors. We then need to map the allocated memory (fully, if an 
array: all array elements) + do a pointer attach. And we need to handle 
unallocated components.

That's different to 'mapper', which is more flexible on one hand - but 
also really explicit. There is no hidden 'only if allocated do', 
possibly except for zero-sized array sections or iterator steps.

The Fortran part also handles polymorphic variables, where it is only 
known at runtime which components exist – which means that the whole 
tree of mappings to do is unknown at compile time. For 'mapper' that 
part is known.

[Granted, TR13 now explicitly does not permit mapping of polymorphic 
variables as there are too many corner cases. But for 6.x it is planned 
to re-add it.]

In any case, the Fortran allocatable-component mapping also needs to be 
applied to the mapper (+ iterator) generated code — and it needs to come 
last after all implicit mappings and remove-mapping optimizations. It 
could be also be done as part of the mapper expansion.

* * *

Having said this, there might be well a useful common approach that 
covers Fortran deep mapping, 'mapper' and 'iterator'.

But the current approaches don't use them. Namely, we have:

* The current Fortran deep mapper (as just posted) was ready in March 
2022, https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591075.html

* The mapper patch (latest version) is at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629363.html – 
albeit first bits date back to 
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591983.html

* There is also an 'iterator' patch at 
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662138.html – 
albeit it lacks the 'mapper' part, which is WIP and needs for main the 
patch 'mapper' of the previous bullet.

* * *

If we have a clear plan to to implement things, I am somewhat willing to 
revise patches, if it makes sense.

But for that, a clear design is needed.

And, in any case, it would be good, if we could get all of the features 
above into GCC 15: Fortran deep mapping, 'mapper' (+ target_update with 
strides), 'iterator'  [and some other backlog].

Tobias
Tobias Burnus Sept. 26, 2024, 12:29 p.m. UTC | #3
Now committed as r15-3895-ge4a58b6f28383c.

* * *

Next step is to sent the Fortran part. While it exists, I want to proof 
read what I wrote a couple years back and I want to split-off the 
polymorphism/class part as the current implementation has some issues 
and OpenMP 6 decided to disallow polymorphic Fortran variables for now. 
(Until some corner-case behavior has been defined.)

[The existing polymorphism support works but it effectively only permits 
access to the declared types (as the vtable pointers will be the ones of 
the host), it also has some issues + as the vtable gained two functions, 
the ABI compatibility with old code is gone (+ hence the .mod version 
number was bumped).]

The entry code for the committed patch as mentioned before:

Am 10.09.24 um 12:19 schrieb Tobias Burnus:
> The interesting bit are the hook entry points gfc_omp_deep_mapping_p, 
> gfc_omp_deep_mapping_cnt, and gfc_omp_deep_mapping → 
> https://github.com/gcc-mirror/gcc/blob/devel/omp/gcc-14/gcc/fortran/trans-openmp.cc#L3068-L3209

And I think all code is in this file, once removing the polymorphism 
code – and replacing it by a diagnostic message.

Tobias

PS: otherwise missing on the polymorphism side is 'private(class_var)'; 
'firstprivate(class_var)' works [all as data-sharing clauses not as 
data-mapping clauses].

PPS: The host-pointer vtable issue could be solved as for C++ in OpenMP 
5.2 by using the 'indirect' feature to lookup the device version of the 
table. (To be implemented for C++ and potentially for OpenMP 6.1+ (?) 
for Fortran.)
diff mbox series

Patch

Fortran/OpenMP: Middle-end support for mapping of DT with allocatable components
    
gcc/ChangeLog:
    
	* langhooks-def.h (lhd_omp_deep_mapping_p,
	lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
	(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
	LANG_HOOKS_OMP_DEEP_MAPPING): Define.
	(LANG_HOOKS_DECLS): Use it.
	* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
	lhd_omp_deep_mapping): New stubs.
	* langhooks.h (struct lang_hooks_for_decls): Add new hooks
	* omp-expand.cc (expand_omp_target): Handle dynamic-size
	addr/sizes/kinds arrays.
	* omp-low.cc (build_sender_ref, fixup_child_record_type,
	scan_sharing_clauses, lower_omp_target): Update to handle
	new hooks and dynamic-size addr/sizes/kinds arrays.
---
 gcc/langhooks-def.h |  10 +++
 gcc/langhooks.cc    |  24 ++++++
 gcc/langhooks.h     |  15 ++++
 gcc/omp-expand.cc   |  18 ++++-
 gcc/omp-low.cc      | 224 ++++++++++++++++++++++++++++++++++++++++++++++------
 5 files changed, 265 insertions(+), 26 deletions(-)

diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index f5c67b6823c..756714558e5 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -86,6 +86,10 @@  extern enum omp_clause_defaultmap_kind lhd_omp_predetermined_mapping (tree);
 extern tree lhd_omp_assignment (tree, tree, tree);
 extern void lhd_omp_finish_clause (tree, gimple_seq *, bool);
 extern tree lhd_omp_array_size (tree, gimple_seq *);
+extern bool lhd_omp_deep_mapping_p (const gimple *, tree);
+extern tree lhd_omp_deep_mapping_cnt (const gimple *, tree, gimple_seq *);
+extern void lhd_omp_deep_mapping (const gimple *, tree, unsigned HOST_WIDE_INT,
+				  tree, tree, tree, tree, tree, gimple_seq *);
 struct gimplify_omp_ctx;
 extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 					       tree);
@@ -272,6 +276,9 @@  extern tree lhd_unit_size_without_reusable_padding (tree);
 #define LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR NULL
 #define LANG_HOOKS_OMP_CLAUSE_DTOR hook_tree_tree_tree_null
 #define LANG_HOOKS_OMP_FINISH_CLAUSE lhd_omp_finish_clause
+#define LANG_HOOKS_OMP_DEEP_MAPPING_P lhd_omp_deep_mapping_p
+#define LANG_HOOKS_OMP_DEEP_MAPPING_CNT lhd_omp_deep_mapping_cnt
+#define LANG_HOOKS_OMP_DEEP_MAPPING lhd_omp_deep_mapping
 #define LANG_HOOKS_OMP_ALLOCATABLE_P hook_bool_tree_false
 #define LANG_HOOKS_OMP_SCALAR_P lhd_omp_scalar_p
 #define LANG_HOOKS_OMP_SCALAR_TARGET_P hook_bool_tree_false
@@ -306,6 +313,9 @@  extern tree lhd_unit_size_without_reusable_padding (tree);
   LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR, \
   LANG_HOOKS_OMP_CLAUSE_DTOR, \
   LANG_HOOKS_OMP_FINISH_CLAUSE, \
+  LANG_HOOKS_OMP_DEEP_MAPPING_P, \
+  LANG_HOOKS_OMP_DEEP_MAPPING_CNT, \
+  LANG_HOOKS_OMP_DEEP_MAPPING, \
   LANG_HOOKS_OMP_ALLOCATABLE_P, \
   LANG_HOOKS_OMP_SCALAR_P, \
   LANG_HOOKS_OMP_SCALAR_TARGET_P, \
diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 8614f44f187..ac844204288 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -642,6 +642,30 @@  lhd_omp_array_size (tree, gimple_seq *)
   return NULL_TREE;
 }
 
+/* Returns true when additional mappings for a decl are needed.  */
+
+bool
+lhd_omp_deep_mapping_p (const gimple *, tree)
+{
+  return false;
+}
+
+/* Returns number of additional mappings for a decl.  */
+
+tree
+lhd_omp_deep_mapping_cnt (const gimple *, tree, gimple_seq *)
+{
+  return NULL_TREE;
+}
+
+/* Do the additional mappings.  */
+
+void
+lhd_omp_deep_mapping (const gimple *, tree, unsigned HOST_WIDE_INT, tree, tree,
+		      tree, tree, tree, gimple_seq *)
+{
+}
+
 /* Return true if DECL is a scalar variable (for the purpose of
    implicit firstprivatization & mapping). Only if alloc_ptr_ok
    are allocatables and pointers accepted. */
diff --git a/gcc/langhooks.h b/gcc/langhooks.h
index 5a4dfb6ef62..b4bd0771976 100644
--- a/gcc/langhooks.h
+++ b/gcc/langhooks.h
@@ -313,6 +313,21 @@  struct lang_hooks_for_decls
   /* Do language specific checking on an implicitly determined clause.  */
   void (*omp_finish_clause) (tree clause, gimple_seq *pre_p, bool);
 
+  /* Additional language-specific mappings for a decl; returns true
+     if those may occur.  */
+  bool (*omp_deep_mapping_p) (const gimple *ctx_stmt, tree clause);
+
+  /* Additional language-specific mappings for a decl; returns the
+     number of additional mappings needed.  */
+  tree (*omp_deep_mapping_cnt) (const gimple *ctx_stmt, tree clause,
+				gimple_seq *seq);
+
+  /* Do the actual additional language-specific mappings for a decl. */
+  void (*omp_deep_mapping) (const gimple *stmt, tree clause,
+			    unsigned HOST_WIDE_INT tkind,
+			    tree data, tree sizes, tree kinds,
+			    tree offset_data, tree offset, gimple_seq *seq);
+
   /* Return true if DECL is an allocatable variable (for the purpose of
      implicit mapping).  */
   bool (*omp_allocatable_p) (tree decl);
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 24287826444..9ff9553c3ea 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -9928,8 +9928,9 @@  expand_omp_target (struct omp_region *region)
 		  /* We're ignoring the subcode because we're
 		     effectively doing a STRIP_NOPS.  */
 
-		  if (TREE_CODE (arg) == ADDR_EXPR
-		      && TREE_OPERAND (arg, 0) == sender)
+		  if ((TREE_CODE (arg) == ADDR_EXPR
+		       && TREE_OPERAND (arg, 0) == sender)
+		      || arg == sender)
 		    {
 		      tgtcopy_stmt = stmt;
 		      break;
@@ -10428,7 +10429,7 @@  expand_omp_target (struct omp_region *region)
       t3 = t2;
       t4 = t2;
     }
-  else
+  else if (TREE_VEC_LENGTH (t) == 3 || is_gimple_omp_oacc (entry_stmt))
     {
       t1 = TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (TREE_VEC_ELT (t, 1))));
       t1 = size_binop (PLUS_EXPR, t1, size_int (1));
@@ -10436,6 +10437,17 @@  expand_omp_target (struct omp_region *region)
       t3 = build_fold_addr_expr (TREE_VEC_ELT (t, 1));
       t4 = build_fold_addr_expr (TREE_VEC_ELT (t, 2));
     }
+  else
+    {
+      t1 = force_gimple_operand_gsi (&gsi, TREE_VEC_ELT (t, 3), true, NULL_TREE,
+				     true, GSI_SAME_STMT);
+      t2 = force_gimple_operand_gsi (&gsi, TREE_VEC_ELT (t, 0), true, NULL_TREE,
+				     true, GSI_SAME_STMT);
+      t3 = force_gimple_operand_gsi (&gsi, TREE_VEC_ELT (t, 1), true, NULL_TREE,
+				     true, GSI_SAME_STMT);
+      t4 = force_gimple_operand_gsi (&gsi, TREE_VEC_ELT (t, 2), true, NULL_TREE,
+				     true, GSI_SAME_STMT);
+    }
 
   gimple *g;
   bool tagging = false;
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index 241f79e34a9..da2051b0279 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -767,7 +767,10 @@  static tree
 build_sender_ref (splay_tree_key key, omp_context *ctx)
 {
   tree field = lookup_sfield (key, ctx);
-  return omp_build_component_ref (ctx->sender_decl, field);
+  tree tmp = ctx->sender_decl;
+  if (POINTER_TYPE_P (TREE_TYPE (tmp)))
+    tmp = build_fold_indirect_ref (tmp);
+  return omp_build_component_ref (tmp, field);
 }
 
 static tree
@@ -1138,7 +1141,9 @@  fixup_child_record_type (omp_context *ctx)
     type = build_qualified_type (type, TYPE_QUAL_CONST);
 
   TREE_TYPE (ctx->receiver_decl)
-    = build_qualified_type (build_reference_type (type), TYPE_QUAL_RESTRICT);
+    = build_qualified_type (flexible_array_type_p (type)
+			    ? build_pointer_type (type)
+			    : build_reference_type (type), TYPE_QUAL_RESTRICT);
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
@@ -1149,6 +1154,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 {
   tree c, decl;
   bool scan_array_reductions = false;
+  bool flex_array_ptr = false;
 
   for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
     if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ALLOCATE
@@ -1596,6 +1602,8 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 		  && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c))
 		break;
 	    }
+	  if (!flex_array_ptr)
+	    flex_array_ptr = lang_hooks.decls.omp_deep_mapping_p (ctx->stmt, c);
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
@@ -2009,6 +2017,18 @@  scan_sharing_clauses (tree clauses, omp_context *ctx)
 		 && OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c))
 	  scan_omp (&OMP_CLAUSE_LINEAR_GIMPLE_SEQ (c), ctx);
     }
+  if (flex_array_ptr)
+    {
+      tree field = build_range_type (size_type_node,
+				     build_int_cstu (size_type_node, 0),
+				     NULL_TREE);
+      field = build_array_type (ptr_type_node, field);
+      field = build_decl (UNKNOWN_LOCATION, FIELD_DECL, NULL_TREE, field);
+      SET_DECL_ALIGN (field, TYPE_ALIGN (ptr_type_node));
+      DECL_CONTEXT (field) = ctx->record_type;
+      DECL_CHAIN (field) = TYPE_FIELDS (ctx->record_type);
+      TYPE_FIELDS (ctx->record_type) = field;
+    }
 }
 
 /* Create a new name for omp child function.  Returns an identifier. */
@@ -12603,6 +12623,11 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   unsigned int map_cnt = 0;
   tree in_reduction_clauses = NULL_TREE;
 
+  tree deep_map_cnt = NULL_TREE;
+  tree deep_map_data = NULL_TREE;
+  tree deep_map_offset_data = NULL_TREE;
+  tree deep_map_offset = NULL_TREE;
+
   offloaded = is_gimple_omp_offloaded (stmt);
   switch (gimple_omp_target_kind (stmt))
     {
@@ -12681,6 +12706,8 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   push_gimplify_context ();
   fplist = NULL;
 
+  ilist = NULL;
+  olist = NULL;
   for (c = clauses; c ; c = OMP_CLAUSE_CHAIN (c))
     switch (OMP_CLAUSE_CODE (c))
       {
@@ -12739,6 +12766,16 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       case OMP_CLAUSE_FROM:
       oacc_firstprivate:
 	var = OMP_CLAUSE_DECL (c);
+	{
+	  tree extra = lang_hooks.decls.omp_deep_mapping_cnt (stmt, c, &ilist);
+	  if (extra != NULL_TREE && deep_map_cnt != NULL_TREE)
+	    deep_map_cnt = fold_build2_loc (OMP_CLAUSE_LOCATION (c), PLUS_EXPR,
+					    size_type_node, deep_map_cnt,
+					    extra);
+	  else if (extra != NULL_TREE)
+	    deep_map_cnt = extra;
+	}
+
 	if (!DECL_P (var))
 	  {
 	    if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP
@@ -12973,18 +13010,31 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       record_vars_into (gimple_bind_vars (tgt_bind), child_fn);
     }
 
-  olist = NULL;
-  ilist = NULL;
   if (ctx->record_type)
     {
+      if (deep_map_cnt && TREE_CODE (deep_map_cnt) == INTEGER_CST)
+	/* map_cnt = map_cnt + tree_to_hwi (deep_map_cnt); */
+	/* deep_map_cnt = NULL_TREE; */
+	gcc_unreachable ();
+      else if (deep_map_cnt)
+	{
+	  gcc_assert (flexible_array_type_p (ctx->record_type));
+	  tree n = create_tmp_var_raw (size_type_node, "nn_map");
+	  gimple_add_tmp_var (n);
+	  gimplify_assign (n, deep_map_cnt, &ilist);
+	  deep_map_cnt = n;
+	}
       ctx->sender_decl
-	= create_tmp_var (ctx->record_type, ".omp_data_arr");
+	= create_tmp_var (deep_map_cnt ? build_pointer_type (ctx->record_type)
+				       : ctx->record_type, ".omp_data_arr");
       DECL_NAMELESS (ctx->sender_decl) = 1;
       TREE_ADDRESSABLE (ctx->sender_decl) = 1;
-      t = make_tree_vec (3);
+      t = make_tree_vec (deep_map_cnt ? 4 : 3);
       TREE_VEC_ELT (t, 0) = ctx->sender_decl;
       TREE_VEC_ELT (t, 1)
-	= create_tmp_var (build_array_type_nelts (size_type_node, map_cnt),
+	= create_tmp_var (deep_map_cnt
+			  ? build_pointer_type (size_type_node)
+			  : build_array_type_nelts (size_type_node, map_cnt),
 			  ".omp_data_sizes");
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
@@ -12992,13 +13042,65 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
       tree tkind_type = short_unsigned_type_node;
       int talign_shift = 8;
       TREE_VEC_ELT (t, 2)
-	= create_tmp_var (build_array_type_nelts (tkind_type, map_cnt),
+	= create_tmp_var (deep_map_cnt
+			  ? build_pointer_type (tkind_type)
+			  : build_array_type_nelts (tkind_type, map_cnt),
 			  ".omp_data_kinds");
       DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1;
       TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1;
       gimple_omp_target_set_data_arg (stmt, t);
 
+      if (deep_map_cnt)
+	{
+	  tree tmp, size;
+	  size = create_tmp_var (size_type_node, NULL);
+	  DECL_NAMELESS (size) = 1;
+	  gimplify_assign (size,
+			   fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR,
+					    size_type_node, deep_map_cnt,
+					    build_int_cst (size_type_node,
+							   map_cnt)), &ilist);
+	  TREE_VEC_ELT (t, 3) = size;
+
+	  tree call = builtin_decl_explicit (BUILT_IN_MALLOC);
+	  size = fold_build2_loc (UNKNOWN_LOCATION, MULT_EXPR,
+				  size_type_node, deep_map_cnt,
+				  TYPE_SIZE_UNIT (ptr_type_node));
+	  size = fold_build2_loc (UNKNOWN_LOCATION, PLUS_EXPR,
+				  size_type_node, size,
+				  TYPE_SIZE_UNIT (ctx->record_type));
+	  tmp = build_call_expr_loc (input_location, call, 1, size);
+	  gimplify_assign (ctx->sender_decl, tmp, &ilist);
+
+	  size = fold_build2_loc (UNKNOWN_LOCATION, MULT_EXPR,
+				  size_type_node, TREE_VEC_ELT (t, 3),
+				  TYPE_SIZE_UNIT (size_type_node));
+	  tmp = build_call_expr_loc (input_location, call, 1, size);
+	  gimplify_assign (TREE_VEC_ELT (t, 1), tmp, &ilist);
+
+	  size = fold_build2_loc (UNKNOWN_LOCATION, MULT_EXPR,
+				  size_type_node, TREE_VEC_ELT (t, 3),
+				  TYPE_SIZE_UNIT (tkind_type));
+	  tmp = build_call_expr_loc (input_location, call, 1, size);
+	  gimplify_assign (TREE_VEC_ELT (t, 2), tmp, &ilist);
+	  tree field = TYPE_FIELDS (TREE_TYPE (TREE_TYPE (ctx->sender_decl)));
+	  for ( ; DECL_CHAIN (field) != NULL_TREE; field = DECL_CHAIN (field))
+	    ;
+	  gcc_assert (TREE_CODE (TREE_TYPE (field)));
+	  tmp = build_fold_indirect_ref (ctx->sender_decl);
+	  deep_map_data = omp_build_component_ref (tmp, field);
+	  deep_map_offset_data = create_tmp_var_raw (size_type_node,
+						     "map_offset_data");
+	  deep_map_offset = create_tmp_var_raw (size_type_node, "map_offset");
+	  gimple_add_tmp_var (deep_map_offset_data);
+	  gimple_add_tmp_var (deep_map_offset);
+	  gimplify_assign (deep_map_offset_data, build_int_cst (size_type_node,
+								0), &ilist);
+	  gimplify_assign (deep_map_offset, build_int_cst (size_type_node,
+							   map_cnt), &ilist);
+	}
+
       vec<constructor_elt, va_gc> *vsize;
       vec<constructor_elt, va_gc> *vkind;
       vec_alloc (vsize, map_cnt);
@@ -13025,6 +13127,24 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 		    || (OMP_CLAUSE_MAP_KIND (c)
 			== GOMP_MAP_FIRSTPRIVATE_REFERENCE)))
 	      break;
+	    if (deep_map_cnt)
+	      {
+		unsigned HOST_WIDE_INT tkind2;
+		switch (OMP_CLAUSE_CODE (c))
+		  {
+		  case OMP_CLAUSE_MAP: tkind2 = OMP_CLAUSE_MAP_KIND (c); break;
+		  case OMP_CLAUSE_FIRSTPRIVATE: tkind2 = GOMP_MAP_TO; break;
+		  case OMP_CLAUSE_TO: tkind2 = GOMP_MAP_TO; break;
+		  case OMP_CLAUSE_FROM: tkind2 = GOMP_MAP_FROM; break;
+		  default: gcc_unreachable ();
+		  }
+		lang_hooks.decls.omp_deep_mapping (stmt, c, tkind2,
+						   deep_map_data,
+						   TREE_VEC_ELT (t, 1),
+						   TREE_VEC_ELT (t, 2),
+						   deep_map_offset_data,
+						   deep_map_offset, &ilist);
+	      }
 	    if (!DECL_P (ovar))
 	      {
 		if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
@@ -13586,23 +13706,65 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
       gcc_assert (map_idx == map_cnt);
 
-      DECL_INITIAL (TREE_VEC_ELT (t, 1))
-	= build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 1)), vsize);
-      DECL_INITIAL (TREE_VEC_ELT (t, 2))
-	= build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)), vkind);
+      if (!deep_map_cnt)
+	{
+	  DECL_INITIAL (TREE_VEC_ELT (t, 1))
+	    = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 1)), vsize);
+	  DECL_INITIAL (TREE_VEC_ELT (t, 2))
+	    = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)), vkind);
+	}
       for (int i = 1; i <= 2; i++)
-	if (!TREE_STATIC (TREE_VEC_ELT (t, i)))
+	if (deep_map_cnt || !TREE_STATIC (TREE_VEC_ELT (t, i)))
 	  {
+	    tree tmp = TREE_VEC_ELT (t, i);
+	    if (deep_map_cnt)
+	      {
+		const char *prefix = (i == 1 ? ".omp_data_sizes0"
+					     : ".omp_data_kinds0");
+		tree type = (i == 1) ? size_type_node : tkind_type;
+		type = build_array_type_nelts (type, map_cnt);
+		tree var = create_tmp_var (type, prefix);
+		DECL_NAMELESS (var) = 1;
+		TREE_ADDRESSABLE (var) = 1;
+		TREE_STATIC (var) = TREE_STATIC (tmp);
+		DECL_INITIAL (var) = build_constructor (type, i == 1
+							      ? vsize : vkind);
+		tmp = var;
+		TREE_STATIC (TREE_VEC_ELT (t, i)) = 0;
+	      }
+
 	    gimple_seq initlist = NULL;
-	    force_gimple_operand (build1 (DECL_EXPR, void_type_node,
-					  TREE_VEC_ELT (t, i)),
+	    force_gimple_operand (build1 (DECL_EXPR, void_type_node, tmp),
 				  &initlist, true, NULL_TREE);
 	    gimple_seq_add_seq (&ilist, initlist);
 
-	    tree clobber = build_clobber (TREE_TYPE (TREE_VEC_ELT (t, i)));
-	    gimple_seq_add_stmt (&olist,
-				 gimple_build_assign (TREE_VEC_ELT (t, i),
-						      clobber));
+	    if (deep_map_cnt)
+	      {
+		tree tmp2;
+		tree call = builtin_decl_explicit (BUILT_IN_MEMCPY);
+		tmp2 = TYPE_SIZE_UNIT (TREE_TYPE (tmp));
+		call = build_call_expr_loc (input_location, call, 3,
+					    TREE_VEC_ELT (t, i),
+					    build_fold_addr_expr (tmp), tmp2);
+		gimplify_and_add (call, &ilist);
+	      }
+
+	    if (!TREE_STATIC (tmp))
+	      {
+		tree clobber = build_clobber (TREE_TYPE (tmp));
+		gimple_seq_add_stmt (&olist,
+				     gimple_build_assign (tmp, clobber));
+	      }
+	    if (deep_map_cnt)
+	      {
+		tmp = TREE_VEC_ELT (t, i);
+		tree call = builtin_decl_explicit (BUILT_IN_FREE);
+		call = build_call_expr_loc (input_location, call, 1, tmp);
+		gimplify_and_add (call, &olist);
+		tree clobber = build_clobber (TREE_TYPE (tmp));
+		gimple_seq_add_stmt (&olist,
+				     gimple_build_assign (tmp, clobber));
+	      }
 	  }
 	else if (omp_maybe_offloaded_ctx (ctx->outer))
 	  {
@@ -13622,7 +13784,18 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	      }
 	  }
 
-      tree clobber = build_clobber (ctx->record_type);
+      if (deep_map_cnt)
+	{
+	  tree call = builtin_decl_explicit (BUILT_IN_FREE);
+	  call = build_call_expr_loc (input_location, call, 1,
+				      TREE_VEC_ELT (t, 0));
+	  gimplify_and_add (call, &olist);
+
+	  gimplify_expr (&TREE_VEC_ELT (t, 1), &ilist, NULL, is_gimple_val,
+			 fb_rvalue);
+	}
+
+      tree clobber = build_clobber (TREE_TYPE (ctx->sender_decl));
       gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
 							clobber));
     }
@@ -13635,11 +13808,16 @@  lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   if (offloaded
       && ctx->record_type)
     {
-      t = build_fold_addr_expr_loc (loc, ctx->sender_decl);
+      t = ctx->sender_decl;
+      if (!deep_map_cnt)
+	t = build_fold_addr_expr_loc (loc, t);
       /* fixup_child_record_type might have changed receiver_decl's type.  */
       t = fold_convert_loc (loc, TREE_TYPE (ctx->receiver_decl), t);
-      gimple_seq_add_stmt (&new_body,
-	  		   gimple_build_assign (ctx->receiver_decl, t));
+      if (!AGGREGATE_TYPE_P (TREE_TYPE (ctx->sender_decl)))
+	gimplify_assign (ctx->receiver_decl, t, &new_body);
+      else
+	gimple_seq_add_stmt (&new_body,
+			     gimple_build_assign (ctx->receiver_decl, t));
     }
   gimple_seq_add_seq (&new_body, fplist);