diff mbox series

[v2] OpenMP: Constructors and destructors for "declare target" static aggregates

Message ID fa3028f4-271a-4401-9feb-d11f57e506ae@baylibre.com
State New
Headers show
Series [v2] OpenMP: Constructors and destructors for "declare target" static aggregates | expand

Commit Message

Tobias Burnus July 30, 2024, 8:51 p.m. UTC
Hello world, hi Jakub,

I would like to PING the following patch.
It's essentially Julian's patch, except:

* It is rediffed (albeit it mostly applied cleanly).
* I replaced the omp_is_initial_device call by an
   internal function (IFN_) such that it can be evaluated
   at compile time. With -O1, this also optimizes the host
   function away as it should :-)
* Regarding nvptx: constructors are supported since GCC 15.
   Thus, the three testcases now work under nvptx as well.
   (Two fail on nvptx when compiled with neither optimization nor
    -foffload-options=nvptx-none=-malias as the constructor
    uses aliases, which aren't supported, yet.)

Comments, remarks, suggestions?
OK for mainline?

Tobias

On May 12, 2023, Julian Brown wrote:> This patch adds support for 
running constructors and destructors for
> static (file-scope) aggregates for C++ objects which are marked with
> "declare target" directives on OpenMP offload targets.
> 
> At present, space is allocated on the target for such aggregates, but
> nothing ever constructs them properly, so they end up zero-initialised.
> 
> The approach taken is to generate a set of constructors to run on the
> target: this currently works for AMD GCN, but fails on NVPTX due
> to lack of constructor/destructor support there so far on mainline.
> (See the new test static-aggr-constructor-destructor-3.C for a reason
> why running constructors on the target is preferable to e.g. constructing
> on the host and then copying the resulting object to the target.)
> 
> This patch was previously posted for the og12 branch here:
> 
>    https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614710.html
>    https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615013.html
>    https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615144.html
> 
> though needed a fair amount of rework for mainline due to Nathan's
> (earlier!) patch:
> 
>    https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596402.html
> 
> Tested with offloading to AMD GCN and bootstrapped. OK for mainline?
> 
> Thanks,
> 
> Julian

Comments

Jakub Jelinek Aug. 1, 2024, 11:38 a.m. UTC | #1
On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote:
>  static tree
> -start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count)
> +start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count,
> +			    bool omp_target)
>  {
> -  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
> +  tree name;
>  
> -  /* Create the identifier for this function.  It will be of the form
> -     SSDF_IDENTIFIER_<number>.  */
> -  sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
> +  if (omp_target)
> +    {
> +      char id[sizeof (OMP_SSDF_IDENTIFIER) + 1 /* \0 */ + 32];
> +
> +      /* Create the identifier for this function.  It will be of the form
> +	 OMP_SSDF_IDENTIFIER_<number>.  */
> +      sprintf (id, "%s_%u", OMP_SSDF_IDENTIFIER, count);
> +      name = get_identifier (id);
> +    }
> +  else
> +    {
> +      char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
> +      /* Create the identifier for this function.  It will be of the form
> +	 SSDF_IDENTIFIER_<number>.  */
> +      sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
> +      name = get_identifier (id);

I'd just use a single buffer here,
  char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER))
	  + 1 /* \0 */ + 32];
and then
  sprintf (id, "%s_%u", omp_target ? OMP_SSDF_IDENTIFIER : SSDF_IDENTIFIER,
	   count);
and do get_identifier (id) as before, just tweak the comment.

> +  tree nonhost_if_stmt = NULL_TREE;
> +  if (omp_target)
> +    {
> +      nonhost_if_stmt = begin_if_stmt ();
> +      /* We add an "omp declare target nohost" attribute, but (for
> +	 now) we still get a copy of the constructor/destructor on
> +	 the host.  Make sure it does nothing unless we're on the
> +	 target device.  */
> +      tree fn = build_call_expr_internal_loc (input_location,
> +					      IFN_GOMP_IS_INITIAL_DEVICE,
> +					      integer_type_node, 0);

Given that the Xeon PHI offloading is gone and fork offloading doesn't seem
to be worked on, my preference would be
__builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because
that will actually help user code too.  This
wasn't done because of Xeon PHI, because the same compiler could be used
both for host and offloading compilation (say libraries linked in weren't
compiled twice) but now that isn't the case.
The fuzzy thing is whether we can fold the builtin at compile time in the
NVPTX/GCN targets if it isn't ACCEL_COMPILER, guess it would be safer not
to.

And of course, it would be much better to figure out real nohost fix,
because if we need to register a constructor which will just do nothing, it
still wastes runtime.

> +      enum internal_fn ifn = CALL_EXPR_IFN (*expr_p);
> +      if (ifn == IFN_GOMP_IS_INITIAL_DEVICE)
> +	{
> +	  /* Required to expand it in the pass_omp_device_lower pass.  */
> +	  cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
> +	  return GS_ALL_DONE;
> +	}

See above.  For the builtin, I'd actually just fold it in builtins.cc,
guaded with symtab->global_info_ready or something like that.
But if you want to fold it in omp_device_lower pass, that is fine too.

	Jakub
diff mbox series

Patch

OpenMP: Constructors and destructors for "declare target" static aggregates

This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-07-30  Julian Brown  <julian@codesourcery.com>
	    Tobias Burnus  <tobias@baylibre.com>

gcc/cp/
	* decl2.cc (tree-inline.h): Include.
	(static_init_fini_fns): Bump to four entries. Update comment.
	(start_objects, start_partial_init_fini_fn): Add 'omp_target'
	parameter. Support "declare target" decls. Update forward declaration.
	(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
	the created function. Support "declare target".
	(OMP_SSDF_IDENTIFIER): New macro.
	(partition_vars_for_init_fini): Support partitioning "declare target"
	variables also.
	(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
	"declare target" decls.
	(c_parse_final_cleanups): Support constructors/destructors on OpenMP
	offload targets.

gcc/
	* gimplify.cc (gimplify_call_expr): Set calls_declare_variant_alt
	for IFN_GOMP_IS_INITIAL_DEVICE.
	* internal-fn.cc (expand_GOMP_IS_INITIAL_DEVICE): New.
	* internal-fn.def (IFN_GOMP_IS_INITIAL_DEVICE): Add.
	* omp-offload.cc (execute_omp_device_lower): Expand it.
	* tree.cc (get_file_function_name): Support names for on-target
	constructor/destructor functions.

libgomp/
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
	test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
	test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New
	test.

Co-authored-by: Tobias Burnus <tobias@baylibre.com>

 gcc/cp/decl2.cc                                    | 240 +++++++++++++++++----
 gcc/gimplify.cc                                    |   8 +-
 gcc/internal-fn.cc                                 |   8 +
 gcc/internal-fn.def                                |   1 +
 gcc/omp-offload.cc                                 |   7 +
 gcc/tree.cc                                        |   6 +-
 .../static-aggr-constructor-destructor-1.C         |  28 +++
 .../static-aggr-constructor-destructor-2.C         |  31 +++
 .../static-aggr-constructor-destructor-3.C         |  36 ++++
 9 files changed, 324 insertions(+), 41 deletions(-)

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 6d674684931..21ac65452e6 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -50,6 +50,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "optabs-query.h"
 #include "omp-general.h"
+#include "tree-inline.h"
 #include "escaped_string.h"
 
 /* Id for dumping the raw trees.  */
@@ -57,14 +58,15 @@  int raw_dump_id;
  
 extern cpp_reader *parse_in;
 
-static tree start_objects (bool, unsigned, bool);
+static tree start_objects (bool, unsigned, bool, bool);
 static tree finish_objects (bool, unsigned, tree, bool = true);
-static tree start_partial_init_fini_fn (bool, unsigned, unsigned);
+static tree start_partial_init_fini_fn (bool, unsigned, unsigned, bool);
 static void finish_partial_init_fini_fn (tree);
-static void emit_partial_init_fini_fn (bool, unsigned, tree,
-				       unsigned, location_t);
+static tree emit_partial_init_fini_fn (bool, unsigned, tree,
+				       unsigned, location_t, tree);
 static void one_static_initialization_or_destruction (bool, tree, tree);
-static void generate_ctor_or_dtor_function (bool, unsigned, tree, location_t);
+static void generate_ctor_or_dtor_function (bool, unsigned, tree, location_t,
+					    bool);
 static tree prune_vars_needing_no_initialization (tree *);
 static void write_out_vars (tree);
 static void import_export_class (tree);
@@ -166,9 +168,10 @@  struct priority_map_traits
 typedef hash_map<unsigned/*Priority*/, tree/*List*/,
 		 priority_map_traits> priority_map_t;
 
-/* A pair of such hash tables, indexed by initp -- one for fini and
-   one for init.  The fini table is only ever used when !cxa_atexit.  */
-static GTY(()) priority_map_t *static_init_fini_fns[2];
+/* Two pairs of such hash tables, for the host and an OpenMP offload device.
+   Each pair has one priority map for fini and one for init.  The fini tables
+   are only ever used when !cxa_atexit.  */
+static GTY(()) priority_map_t *static_init_fini_fns[4];
 
 /* Nonzero if we're done parsing and into end-of-file activities.
    2 if all templates have been instantiated.
@@ -4048,7 +4051,8 @@  generate_tls_wrapper (tree fn)
 /* Start a global constructor or destructor function.  */
 
 static tree
-start_objects (bool initp, unsigned priority, bool has_body)
+start_objects (bool initp, unsigned priority, bool has_body,
+	       bool omp_target = false)
 {
   bool default_init = initp && priority == DEFAULT_INIT_PRIORITY;
   bool is_module_init = default_init && module_global_init_needed ();
@@ -4062,7 +4066,15 @@  start_objects (bool initp, unsigned priority, bool has_body)
 
       /* We use `I' to indicate initialization and `D' to indicate
 	 destruction.  */
-      unsigned len = sprintf (type, "sub_%c", initp ? 'I' : 'D');
+      unsigned len;
+      if (omp_target)
+	/* Use "off_" signifying "offload" here.  The name must be distinct
+	   from the non-offload case.  The format of the name is scanned in
+	   tree.cc/get_file_function_name, so stick to the same length for
+	   both name variants.  */
+	len = sprintf (type, "off_%c", initp ? 'I' : 'D');
+      else
+	len = sprintf (type, "sub_%c", initp ? 'I' : 'D');
       if (priority != DEFAULT_INIT_PRIORITY)
 	{
 	  char joiner = '_';
@@ -4077,6 +4089,17 @@  start_objects (bool initp, unsigned priority, bool has_body)
 
   tree fntype =	build_function_type (void_type_node, void_list_node);
   tree fndecl = build_lang_decl (FUNCTION_DECL, name, fntype);
+
+  if (omp_target)
+    {
+      DECL_ATTRIBUTES (fndecl)
+	= tree_cons (get_identifier ("omp declare target"), NULL_TREE,
+		     DECL_ATTRIBUTES (fndecl));
+      DECL_ATTRIBUTES (fndecl)
+	= tree_cons (get_identifier ("omp declare target nohost"), NULL_TREE,
+		     DECL_ATTRIBUTES (fndecl));
+    }
+
   DECL_CONTEXT (fndecl) = FROB_CONTEXT (global_namespace);
   if (is_module_init)
     {
@@ -4161,34 +4184,63 @@  finish_objects (bool initp, unsigned priority, tree body, bool startp)
 /* The name of the function we create to handle initializations and
    destructions for objects with static storage duration.  */
 #define SSDF_IDENTIFIER "__static_initialization_and_destruction"
+#define OMP_SSDF_IDENTIFIER "__omp_target_static_init_and_destruction"
 
 /* Begins the generation of the function that will handle all
    initialization or destruction of objects with static storage
    duration at PRIORITY.
 
-   It is assumed that this function will only be called once.  */
+   It is assumed that this function will be called once for the host, and once
+   for an OpenMP offload target.  */
 
 static tree
-start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count)
+start_partial_init_fini_fn (bool initp, unsigned priority, unsigned count,
+			    bool omp_target)
 {
-  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
+  tree name;
 
-  /* Create the identifier for this function.  It will be of the form
-     SSDF_IDENTIFIER_<number>.  */
-  sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
+  if (omp_target)
+    {
+      char id[sizeof (OMP_SSDF_IDENTIFIER) + 1 /* \0 */ + 32];
+
+      /* Create the identifier for this function.  It will be of the form
+	 OMP_SSDF_IDENTIFIER_<number>.  */
+      sprintf (id, "%s_%u", OMP_SSDF_IDENTIFIER, count);
+      name = get_identifier (id);
+    }
+  else
+    {
+      char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
+      /* Create the identifier for this function.  It will be of the form
+	 SSDF_IDENTIFIER_<number>.  */
+      sprintf (id, "%s_%u", SSDF_IDENTIFIER, count);
+      name = get_identifier (id);
+    }
 
   tree type = build_function_type (void_type_node, void_list_node);
 
   /* Create the FUNCTION_DECL itself.  */
-  tree fn = build_lang_decl (FUNCTION_DECL, get_identifier (id), type);
+  tree fn = build_lang_decl (FUNCTION_DECL, name, type);
   TREE_PUBLIC (fn) = 0;
   DECL_ARTIFICIAL (fn) = 1;
 
+  if (omp_target)
+    {
+      DECL_ATTRIBUTES (fn)
+	= tree_cons (get_identifier ("omp declare target"), NULL_TREE,
+		     DECL_ATTRIBUTES (fn));
+      DECL_ATTRIBUTES (fn)
+	= tree_cons (get_identifier ("omp declare target nohost"), NULL_TREE,
+		     DECL_ATTRIBUTES (fn));
+    }
+
+  int idx = initp + 2 * omp_target;
+
   /* Put this function in the list of functions to be called from the
      static constructors and destructors.  */
-  if (!static_init_fini_fns[initp])
-    static_init_fini_fns[initp] = priority_map_t::create_ggc ();
-  auto &slot = static_init_fini_fns[initp]->get_or_insert (priority);
+  if (!static_init_fini_fns[idx])
+    static_init_fini_fns[idx] = priority_map_t::create_ggc ();
+  auto &slot = static_init_fini_fns[idx]->get_or_insert (priority);
   slot = tree_cons (fn, NULL_TREE, slot);
 
   /* Put the function in the global scope.  */
@@ -4384,22 +4436,75 @@  one_static_initialization_or_destruction (bool initp, tree decl, tree init)
    a TREE_LIST of VAR_DECL with static storage duration.
    Whether initialization or destruction is performed is specified by INITP.  */
 
-static void
+static tree
 emit_partial_init_fini_fn (bool initp, unsigned priority, tree vars,
-			   unsigned counter, location_t locus)
+			   unsigned counter, location_t locus, tree host_fn)
 {
   input_location = locus;
-  tree body = start_partial_init_fini_fn (initp, priority, counter);
+  bool omp_target = (host_fn != NULL_TREE);
+  tree body = start_partial_init_fini_fn (initp, priority, counter, omp_target);
+  tree fndecl = current_function_decl;
+
+  tree nonhost_if_stmt = NULL_TREE;
+  if (omp_target)
+    {
+      nonhost_if_stmt = begin_if_stmt ();
+      /* We add an "omp declare target nohost" attribute, but (for
+	 now) we still get a copy of the constructor/destructor on
+	 the host.  Make sure it does nothing unless we're on the
+	 target device.  */
+      tree fn = build_call_expr_internal_loc (input_location,
+					      IFN_GOMP_IS_INITIAL_DEVICE,
+					      integer_type_node, 0);
+      tree target_dev_p
+	= cp_build_binary_op (input_location, NE_EXPR, fn, integer_one_node,
+			      tf_warning_or_error);
+      finish_if_stmt_cond (target_dev_p, nonhost_if_stmt);
+    }
 
   for (tree node = vars; node; node = TREE_CHAIN (node))
-    /* Do one initialization or destruction.  */
-    one_static_initialization_or_destruction (initp, TREE_VALUE (node),
-					      TREE_PURPOSE (node));
+    {
+      tree decl = TREE_VALUE (node);
+      tree init = TREE_PURPOSE (node);
+	/* We will emit 'init' twice, and it is modified in-place during
+	   gimplification.  Make a copy here.  */
+      if (omp_target)
+	{
+	  /* We've already emitted INIT in the host version of the ctor/dtor
+	     function.  We need to deep-copy it (including new versions of
+	     local variables introduced, etc.) for use in the target
+	     ctor/dtor function.  */
+	  copy_body_data id;
+	  hash_map<tree, tree> decl_map;
+	  memset (&id, 0, sizeof (id));
+	  id.src_fn = host_fn;
+	  id.dst_fn = current_function_decl;
+	  id.src_cfun = DECL_STRUCT_FUNCTION (id.src_fn);
+	  id.decl_map = &decl_map;
+	  id.copy_decl = copy_decl_no_change;
+	  id.transform_call_graph_edges = CB_CGE_DUPLICATE;
+	  id.transform_new_cfg = true;
+	  id.transform_return_to_modify = false;
+	  id.eh_lp_nr = 0;
+	  walk_tree (&init, copy_tree_body_r, &id, NULL);
+	}
+      /* Do one initialization or destruction.  */
+      one_static_initialization_or_destruction (initp, decl, init);
+    }
+
+  if (omp_target)
+    {
+      /* Finish up nonhost if-stmt body.  */
+      finish_then_clause (nonhost_if_stmt);
+      finish_if_stmt (nonhost_if_stmt);
+    }
 
   /* Finish up the static storage duration function for this
      round.  */
   input_location = locus;
   finish_partial_init_fini_fn (body);
+
+  return fndecl;
 }
 
 /* VARS is a list of variables with static storage duration which may
@@ -4462,7 +4567,7 @@  prune_vars_needing_no_initialization (tree *vars)
    This reverses the variable ordering.  */
 
 void
-partition_vars_for_init_fini (tree var_list, priority_map_t *(&parts)[2])
+partition_vars_for_init_fini (tree var_list, priority_map_t *(&parts)[4])
 {
   for (auto node = var_list; node; node = TREE_CHAIN (node))
     {
@@ -4488,6 +4593,30 @@  partition_vars_for_init_fini (tree var_list, priority_map_t *(&parts)[2])
 	  auto &slot = parts[false]->get_or_insert (priority);
 	  slot = tree_cons (NULL_TREE, decl, slot);
 	}
+
+      if (flag_openmp
+	   && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+	{
+	  priority_map_t **omp_parts = parts + 2;
+
+	  if (init || (flag_use_cxa_atexit && has_cleanup))
+	    {
+	      // Add to initialization list.
+	      if (!omp_parts[true])
+		omp_parts[true] = priority_map_t::create_ggc ();
+	      auto &slot = omp_parts[true]->get_or_insert (priority);
+	      slot = tree_cons (init, decl, slot);
+	    }
+
+	  if (!flag_use_cxa_atexit && has_cleanup)
+	    {
+	      // Add to finalization list.
+	      if (!omp_parts[false])
+		omp_parts[false] = priority_map_t::create_ggc ();
+	      auto &slot = omp_parts[false]->get_or_insert (priority);
+	      slot = tree_cons (NULL_TREE, decl, slot);
+	    }
+	}
     }
 }
 
@@ -4515,10 +4644,10 @@  write_out_vars (tree vars)
 
 static void
 generate_ctor_or_dtor_function (bool initp, unsigned priority,
-				tree fns, location_t locus)
+				tree fns, location_t locus, bool omp_target)
 {
   input_location = locus;
-  tree body = start_objects (initp, priority, bool (fns));
+  tree body = start_objects (initp, priority, bool (fns), omp_target);
 
   if (fns)
     {
@@ -5190,7 +5319,7 @@  c_parse_final_cleanups (void)
   auto_vec<tree> consteval_vtables;
 
   int retries = 0;
-  unsigned ssdf_count = 0;
+  unsigned ssdf_count = 0, omp_ssdf_count = 0;
   for (bool reconsider = true; reconsider; retries++)
     {
       reconsider = false;
@@ -5253,8 +5382,9 @@  c_parse_final_cleanups (void)
 	  write_out_vars (vars);
 
 	  function_depth++; // Disable GC
-	  priority_map_t *parts[2] = {nullptr, nullptr};
+	  priority_map_t *parts[4] = {nullptr, nullptr, nullptr, nullptr};
 	  partition_vars_for_init_fini (vars, parts);
+	  tree host_init_fini[2] = { NULL_TREE, NULL_TREE };
 
 	  for (unsigned initp = 2; initp--;)
 	    if (parts[initp])
@@ -5265,10 +5395,32 @@  c_parse_final_cleanups (void)
 		    // Partitioning kept the vars in reverse order.
 		    // We only want that for dtors.
 		    list = nreverse (list);
-		  emit_partial_init_fini_fn (initp, iter.first, list,
-					     ssdf_count++,
-					     locus_at_end_of_parsing);
+		  host_init_fini[initp]
+		    = emit_partial_init_fini_fn (initp, iter.first, list,
+						 ssdf_count++,
+						 locus_at_end_of_parsing,
+						 NULL_TREE);
 		}
+
+	  if (flag_openmp)
+	    {
+	      priority_map_t **omp_parts = parts + 2;
+	      for (unsigned initp = 2; initp--;)
+		if (omp_parts[initp])
+		  for (auto iter : *omp_parts[initp])
+		    {
+		      auto list = iter.second;
+		      if (initp)
+			// Partitioning kept the vars in reverse order.
+			// We only want that for dtors.
+			list = nreverse (list);
+		      emit_partial_init_fini_fn (initp, iter.first, list,
+						 omp_ssdf_count++,
+						 locus_at_end_of_parsing,
+						 host_init_fini[initp]);
+		  }
+	    }
+
 	  function_depth--; // Re-enable GC
 
 	  /* All those initializations and finalizations might cause
@@ -5439,6 +5591,10 @@  c_parse_final_cleanups (void)
     for (auto iter : *static_init_fini_fns[true])
       iter.second = nreverse (iter.second);
 
+  if (flag_openmp && static_init_fini_fns[2 + true])
+    for (auto iter : *static_init_fini_fns[2 + true])
+      iter.second = nreverse (iter.second);
+
   /* Now we've instantiated all templates.  Now we can escalate the functions
      we squirreled away earlier.  */
   process_and_check_pending_immediate_escalating_fns ();
@@ -5457,7 +5613,7 @@  c_parse_final_cleanups (void)
     {
       input_location = locus_at_end_of_parsing;
       tree body = start_partial_init_fini_fn (true, DEFAULT_INIT_PRIORITY,
-					      ssdf_count++);
+					      ssdf_count++, false);
       /* For Objective-C++, we may need to initialize metadata found
 	 in this module.  This must be done _before_ any other static
 	 initializations.  */
@@ -5476,18 +5632,26 @@  c_parse_final_cleanups (void)
 	static_init_fini_fns[true] = priority_map_t::create_ggc ();
       if (static_init_fini_fns[true]->get_or_insert (DEFAULT_INIT_PRIORITY))
 	has_module_inits = true;
+
+      if (flag_openmp)
+	{
+	  if (!static_init_fini_fns[2 + true])
+	    static_init_fini_fns[2 + true] = priority_map_t::create_ggc ();
+	  static_init_fini_fns[2 + true]->get_or_insert (DEFAULT_INIT_PRIORITY);
+	}
     }
 
   /* Generate initialization and destruction functions for all
      priorities for which they are required.  They have C-language
      linkage.  */
   push_lang_context (lang_name_c);
-  for (unsigned initp = 2; initp--;)
+  for (unsigned initp = 4; initp--;)
     if (static_init_fini_fns[initp])
       {
 	for (auto iter : *static_init_fini_fns[initp])
-	  generate_ctor_or_dtor_function (initp, iter.first, iter.second,
-					  locus_at_end_of_parsing);
+	  generate_ctor_or_dtor_function (initp & 1, iter.first, iter.second,
+					  locus_at_end_of_parsing,
+					  (initp & 2) != 0);
 	static_init_fini_fns[initp] = nullptr;
       }
   pop_lang_context ();
diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ab323d764e8..f761a7b613e 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3823,11 +3823,17 @@  gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
   /* Gimplify internal functions created in the FEs.  */
   if (CALL_EXPR_FN (*expr_p) == NULL_TREE)
     {
+      enum internal_fn ifn = CALL_EXPR_IFN (*expr_p);
+      if (ifn == IFN_GOMP_IS_INITIAL_DEVICE)
+	{
+	  /* Required to expand it in the pass_omp_device_lower pass.  */
+	  cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
+	  return GS_ALL_DONE;
+	}
       if (want_value)
 	return GS_ALL_DONE;
 
       nargs = call_expr_nargs (*expr_p);
-      enum internal_fn ifn = CALL_EXPR_IFN (*expr_p);
       auto_vec<tree> vargs (nargs);
 
       if (ifn == IFN_ASSUME)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 8a2e07f2f96..16356910063 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -514,6 +514,14 @@  expand_GOMP_TARGET_REV (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_IS_INITIAL_DEVICE (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index of the first SIMT lane that supplies a non-zero argument.
    This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
    lane that executed the last iteration for handling OpenMP lastprivate.  */
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 75b527b1ab0..44a7d4c4883 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -452,6 +452,7 @@  DEF_INTERNAL_INT_EXT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary)
 DEF_INTERNAL_INT_EXT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary)
 DEF_INTERNAL_INT_EXT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_IS_INITIAL_DEVICE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ENTER, ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 35313c2ecf3..a80bef4bf9f 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2729,6 +2729,13 @@  execute_omp_device_lower ()
 	      unlink_stmt_vdef (stmt);
 	    }
 	    break;
+	  case IFN_GOMP_IS_INITIAL_DEVICE:
+#ifdef ACCEL_COMPILER
+	    rhs = integer_zero_node;
+#else
+	    rhs = integer_one_node;
+#endif
+	    break;
 	  case IFN_GOMP_USE_SIMT:
 	    rhs = vf == 1 ? integer_zero_node : integer_one_node;
 	    break;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index a2d431662bd..17a5cea7c25 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -8908,9 +8908,11 @@  get_file_function_name (const char *type)
      will be local to this file and the name is only necessary for
      debugging purposes. 
      We also assign sub_I and sub_D sufixes to constructors called from
-     the global static constructors.  These are always local.  */
+     the global static constructors.  These are always local.
+     OpenMP "declare target" offloaded constructors/destructors use "off_I" and
+     "off_D" for the same purpose.  */
   else if (((type[0] == 'I' || type[0] == 'D') && targetm.have_ctors_dtors)
-	   || (startswith (type, "sub_")
+	   || ((startswith (type, "sub_") || startswith (type, "off_"))
 	       && (type[4] == 'I' || type[4] == 'D')))
     {
       const char *file = main_input_filename;
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
new file mode 100644
index 00000000000..91d8469a150
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
@@ -0,0 +1,28 @@ 
+// { dg-do run }
+
+#include <cassert>
+
+#pragma omp declare target
+
+struct str {
+  str(int x) : _x(x) { }
+  int add(str o) { return _x + o._x; }
+  int _x;
+} v1(5);
+
+#pragma omp end declare target
+
+int main()
+{
+  int res = -1;
+  str v2(2);
+
+#pragma omp target map(from:res)
+  {
+    res = v1.add(v2);
+  }
+
+  assert (res == 7);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
new file mode 100644
index 00000000000..1bf3ee8e31c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
@@ -0,0 +1,31 @@ 
+// { dg-do run }
+
+#include <cassert>
+
+#pragma omp declare target
+
+template<typename T>
+struct str {
+  str(T x) : _x(x) { }
+  T add(str o) { return _x + o._x; }
+  T _x;
+};
+
+str<long> v1(5);
+
+#pragma omp end declare target
+
+int main()
+{
+  long res = -1;
+  str<long> v2(2);
+
+#pragma omp target map(from:res)
+  {
+    res = v1.add(v2);
+  }
+
+  assert (res == 7);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C
new file mode 100644
index 00000000000..8d4aff21cd7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C
@@ -0,0 +1,36 @@ 
+// { dg-do run }
+
+#include <cassert>
+
+#pragma omp declare target
+
+struct item {
+  item(item *p, int v) : prev(p), val(v) { }
+  int get() { return prev ? prev->get() * val : val; }
+  item *prev;
+  int val;
+};
+
+/* This case demonstrates why constructing on the host and then copying to
+   the target would be less desirable.  With on-target construction, "prev"
+   for each 'item' will be a device pointer, not a host pointer.  */
+item hubert1(nullptr, 3);
+item hubert2(&hubert1, 5);
+item hubert3(&hubert2, 7);
+item hubert4(&hubert3, 11);
+
+#pragma omp end declare target
+
+int main()
+{
+  int res = -1;
+
+#pragma omp target map(from:res)
+  {
+    res = hubert4.get ();
+  }
+
+  assert (res == 1155);
+
+  return 0;
+}