diff mbox

Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

Message ID 52FE2FDA.5010106@codesourcery.com
State New
Headers show

Commit Message

Bernd Schmidt Feb. 14, 2014, 3:01 p.m. UTC
On 02/14/2014 03:49 PM, Ilya Verbin wrote:
> Hi Bernd and Thomas,
>
> Are you planning to support offloading from DSO in PTX/CUDA
> environment?  If yes, how are you going to solve the problem of the
> collision of function names from different DSOs?

What I'm currently trying to do is to use get_file_function_name, which 
should provide a unique string that can be used to look up an offloaded 
function. That was suggested by Nathan Sidwell, I'd forgotten that such 
a function existed. I haven't actually given thought to whether that'll 
be unique across multiple DSOs, but as long as we also pass a value to 
the libgomp registration function that is unique per DSO, that shouldn't 
really matter - we should be able to reliably look up a function given 
these two keys.

I'm attaching a work-in-progress patch, which is based on patch #2 of 
Michael Zolotukhin's series. Does this look like something you could 
also work with? I should have something a little more complete next week.


Bernd

Comments

Jakub Jelinek Feb. 14, 2014, 3:12 p.m. UTC | #1
On Fri, Feb 14, 2014 at 04:01:46PM +0100, Bernd Schmidt wrote:
> What I'm currently trying to do is to use get_file_function_name,
> which should provide a unique string that can be used to look up an
> offloaded function. That was suggested by Nathan Sidwell, I'd
> forgotten that such a function existed. I haven't actually given
> thought to whether that'll be unique across multiple DSOs, but as
> long as we also pass a value to the libgomp registration function
> that is unique per DSO, that shouldn't really matter - we should be
> able to reliably look up a function given these two keys.

get_file_function_name is very problematic, it is not unique across DSOs,
and relies on -frandom-seed which user can tweak and even without tweaking
can have collisions in, if we can avoid it at all, we should.
Not to mention that strings, especially get_file_function_name based ones,
aren't really short and will occupy much more space than the tables.

I still don't see what you find wrong on the approach with host/target
address arrays, if you are afraid something will reorder the arrays
(but, what would do that), one can insert indexes into both arrays as well,
which the linker can fill in and you can then verify.

	Jakub
Bernd Schmidt Feb. 14, 2014, 3:50 p.m. UTC | #2
On 02/14/2014 04:12 PM, Jakub Jelinek wrote:
> On Fri, Feb 14, 2014 at 04:01:46PM +0100, Bernd Schmidt wrote:
>> What I'm currently trying to do is to use get_file_function_name,
>> which should provide a unique string that can be used to look up an
>> offloaded function. That was suggested by Nathan Sidwell, I'd
>> forgotten that such a function existed. I haven't actually given
>> thought to whether that'll be unique across multiple DSOs, but as
>> long as we also pass a value to the libgomp registration function
>> that is unique per DSO, that shouldn't really matter - we should be
>> able to reliably look up a function given these two keys.
>
> get_file_function_name is very problematic, it is not unique across DSOs,
> and relies on -frandom-seed which user can tweak and even without tweaking
> can have collisions in, if we can avoid it at all, we should.
> Not to mention that strings, especially get_file_function_name based ones,
> aren't really short and will occupy much more space than the tables.

How many offloaded functions do we really expect to have in an 
executable? I don't think that's likely to be a bottleneck.

The use of a random-seed is really just a fallback, preferrably it uses 
the name of first symbol defined in the current translation unit which I 
think ought to be reliable enough.

> I still don't see what you find wrong on the approach with host/target
> address arrays, if you are afraid something will reorder the arrays
> (but, what would do that), one can insert indexes into both arrays as well,
> which the linker can fill in and you can then verify.

It strikes me as really unnecessarily brittle. On the host side we'd 
have multiple objects linked together in some order to produce such a 
table, on the ptx side we'd have to produce the table all in one go. 
Factor in possibilities like function cloning and I just think there are 
too many ways in which this can utterly fail. I'd rather have something 
that is more robust from the start even if it's slightly less efficient.


Bernd
Jakub Jelinek Feb. 14, 2014, 4:06 p.m. UTC | #3
On Fri, Feb 14, 2014 at 04:50:34PM +0100, Bernd Schmidt wrote:
> How many offloaded functions do we really expect to have in an
> executable? I don't think that's likely to be a bottleneck.

First of all, this isn't just about offloaded functions, but also any
global variables that need to be mapped (for OpenMP #pragma omp declare
target surrounded vars).  Like functions, also the vars can be global
(but, with multiple DSOs, even global can be interposed and thus not unique
name), or static, at which point you can have the same name in between
different TUs.

> The use of a random-seed is really just a fallback, preferrably it
> uses the name of first symbol defined in the current translation
> unit which I think ought to be reliable enough.

Many TUs don't have any non-weak global symbols at all, if the symbols
they provide are all comdat etc., then you hit the random seed all the time.
Encoding the random seed into data sections of the binary is a problem for
build reproduceability, unless you always supply -frandom-seed=XXXX, but
then it isn't really that much random (e.g. the often used
-frandom-seed=$(@) or similar).  If the weirdo names are only used to name
symbols in .symtab, that randomness at least can be stripped off, but not
if it is in data sections.  So, to me this is far less reliable and against
the spirit of static symbols.

> >I still don't see what you find wrong on the approach with host/target
> >address arrays, if you are afraid something will reorder the arrays
> >(but, what would do that), one can insert indexes into both arrays as well,
> >which the linker can fill in and you can then verify.
> 
> It strikes me as really unnecessarily brittle. On the host side we'd
> have multiple objects linked together in some order to produce such
> a table, on the ptx side we'd have to produce the table all in one

Sure.  So, the linker/linker plugin orders the objects in some order
and thus by concatenation of the smaller per-TU tables creates the
host table, then the same linker/linker plugin just creates the to be target
table with the same order, and feeds that to the offloading target
compiler.

> go. Factor in possibilities like function cloning and I just think
> there are too many ways in which this can utterly fail. I'd rather
> have something that is more robust from the start even if it's
> slightly less efficient.

I don't see how function cloning or anything similar can make a difference
here, you have a function which is address taken and it's address is tracked
in some array, such function can't be cloned (well, can be cloned for
unrelated callers, but the table still keeps the original function, with the
same public ABI etc.).

	Jakub
diff mbox

Patch

Index: gomp-4_0-branch/gcc/cgraphunit.c
===================================================================
--- gomp-4_0-branch.orig/gcc/cgraphunit.c
+++ gomp-4_0-branch/gcc/cgraphunit.c
@@ -206,6 +206,7 @@  along with GCC; see the file COPYING3.
 #include "pass_manager.h"
 #include "tree-nested.h"
 #include "gimplify.h"
+#include "omp-low.h"
 #include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
@@ -2019,6 +2020,8 @@  ipa_passes (void)
 
       execute_ipa_summary_passes
 	((struct ipa_opt_pass_d *) passes->all_regular_ipa_passes);
+
+      omp_finish_file ();
     }
 
   /* Some targets need to handle LTO assembler output specially.  */
Index: gomp-4_0-branch/gcc/lto-streamer-out.c
===================================================================
--- gomp-4_0-branch.orig/gcc/lto-streamer-out.c
+++ gomp-4_0-branch/gcc/lto-streamer-out.c
@@ -498,6 +498,7 @@  DFS_write_tree_body (struct output_block
 	 special handling in LTO, it must be handled by streamer hooks.  */
 
       DFS_follow_tree_edge (DECL_ATTRIBUTES (expr));
+      DFS_follow_tree_edge (DECL_UNIQUE_ID (expr));
 
       /* Do not follow DECL_ABSTRACT_ORIGIN.  We cannot handle debug information
 	 for early inlining so drop it on the floor instead of ICEing in
Index: gomp-4_0-branch/gcc/omp-low.c
===================================================================
--- gomp-4_0-branch.orig/gcc/omp-low.c
+++ gomp-4_0-branch/gcc/omp-low.c
@@ -58,6 +58,7 @@  along with GCC; see the file COPYING3.
 #include "optabs.h"
 #include "cfgloop.h"
 #include "target.h"
+#include "common/common-target.h"
 #include "omp-low.h"
 #include "gimple-low.h"
 #include "tree-cfgcleanup.h"
@@ -191,7 +192,6 @@  struct omp_for_data
   struct omp_for_data_loop *loops;
 };
 
-
 static splay_tree all_contexts;
 static int taskreg_nesting_level;
 static int target_nesting_level;
@@ -1889,6 +1889,7 @@  create_omp_child_function (omp_context *
   DECL_EXTERNAL (decl) = 0;
   DECL_CONTEXT (decl) = NULL_TREE;
   DECL_INITIAL (decl) = make_node (BLOCK);
+
   bool target_p = false;
   if (lookup_attribute ("omp declare target",
 			DECL_ATTRIBUTES (current_function_decl)))
@@ -12379,4 +12380,158 @@  make_pass_omp_simd_clone (gcc::context *
   return new pass_omp_simd_clone (ctxt);
 }
 
+struct ctor_elt_data
+{
+  vec<constructor_elt, va_gc> *v;
+  char *buffer;
+  HOST_WIDE_INT offset;
+  tree string_decl;
+};
+
+static void
+make_constructor_elts (tree decl, struct ctor_elt_data *d)
+{
+  const char *str = IDENTIFIER_POINTER (DECL_UNIQUE_ID (decl));
+  size_t len = strlen (str) + 1;
+  memcpy (d->buffer + d->offset, str, len);
+  tree str_addr = build_fold_addr_expr (d->string_decl);
+  tree off = build_int_cst (size_type_node, d->offset);
+  d->offset += len;
+
+  CONSTRUCTOR_APPEND_ELT (d->v, NULL_TREE, build_fold_addr_expr (decl));
+  CONSTRUCTOR_APPEND_ELT (d->v, NULL_TREE,
+			  fold_build_pointer_plus (str_addr, off));
+}
+
+static void
+make_unique_name (tree decl)
+{
+  tree name = DECL_NAME (decl);
+  char *p = (char *)alloca (strlen (IDENTIFIER_POINTER (name)) + 3);
+  p[0] = 'O';
+  p[1] = '_';
+  strcpy (p + 2, IDENTIFIER_POINTER (name));
+  tree id = get_file_function_name (p);
+  DECL_UNIQUE_ID (decl) = id;
+}
+
+static size_t
+build_unique_names (size_t *plen)
+{
+  int n = 0;
+  size_t len = 0;
+  /* Collect all omp-target functions.  */
+  struct cgraph_node *node;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (node->decl))
+	  || !DECL_ARTIFICIAL (node->decl))
+	continue;
+      n++;
+      if (!in_lto_p)
+	make_unique_name (node->decl);
+      else
+	len += strlen (IDENTIFIER_POINTER (DECL_UNIQUE_ID (node->decl))) + 1;
+    }
+  /* Collect all omp-target global variables.  */
+  struct varpool_node *vnode;
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  || DECL_SIZE (vnode->decl) == 0)
+	continue;
+
+      n++;
+      if (!in_lto_p)
+	make_unique_name (vnode->decl);
+      else
+	len += strlen (IDENTIFIER_POINTER (DECL_UNIQUE_ID (vnode->decl))) + 1;
+    }
+  *plen = len;
+  return n;
+}
+
+/* Create new symbol containing (address, size) pairs for omp-marked
+   functions and global variables.  */
+void
+omp_finish_file ()
+{
+  const char *section_name = ".offload_func_table_section";
+  tree new_decl, new_decl_type;
+  vec<constructor_elt, va_gc> *v;
+  tree ctor;
+
+  size_t len;
+  int num = build_unique_names (&len);
+
+  if (num == 0 || !in_lto_p || !targetm_common.have_named_sections)
+    return;
+
+  vec_alloc (v, num);
+
+  struct ctor_elt_data data;
+  tree array_type = build_array_type_nelts (char_type_node, len);
+  data.offset = 0;
+  data.buffer = (char *) xmalloc (len);
+  data.string_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+				 get_file_function_name ("O"),
+				 array_type);
+  DECL_ARTIFICIAL (data.string_decl) = 1;
+  TREE_STATIC (data.string_decl) = 1;
+  data.v = v;
+
+  /* Collect all omp-target functions.  */
+  struct cgraph_node *node;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (node->decl))
+	  || !DECL_ARTIFICIAL (node->decl))
+	continue;
+      make_constructor_elts (node->decl, &data);
+    }
+  /* Collect all omp-target global variables.  */
+  struct varpool_node *vnode;
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  || DECL_SIZE (vnode->decl) == 0)
+	continue;
+
+      make_constructor_elts (vnode->decl, &data);
+    }
+
+  if (data.offset == 0)
+    {
+      free (data.buffer);
+      return;
+    }
+
+  tree cst = build_string (data.offset, data.buffer);
+  TREE_TYPE (cst) = build_array_type (char_type_node,
+				      build_index_type (size_int (data.offset)));
+  DECL_INITIAL (data.string_decl) = cst;
+  free (data.buffer);
+
+  varpool_assemble_decl (varpool_node_for_decl (data.string_decl));
+
+  new_decl_type = build_array_type_nelts (pointer_sized_int_node, num * 2);
+  ctor = build_constructor (new_decl_type, v);
+  TREE_CONSTANT (ctor) = 1;
+  TREE_STATIC (ctor) = 1;
+  new_decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
+			 get_identifier (".omp_table"), new_decl_type);
+  TREE_STATIC (new_decl) = 1;
+  DECL_INITIAL (new_decl) = ctor;
+  DECL_SECTION_NAME (new_decl) = build_string (strlen (section_name),
+					       section_name);
+
+  varpool_assemble_decl (varpool_node_for_decl (new_decl));
+}
+
 #include "gt-omp-low.h"
Index: gomp-4_0-branch/gcc/omp-low.h
===================================================================
--- gomp-4_0-branch.orig/gcc/omp-low.h
+++ gomp-4_0-branch/gcc/omp-low.h
@@ -27,5 +27,6 @@  extern void omp_expand_local (basic_bloc
 extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **);
+extern void omp_finish_file (void);
 
 #endif /* GCC_OMP_LOW_H */
Index: gomp-4_0-branch/gcc/toplev.c
===================================================================
--- gomp-4_0-branch.orig/gcc/toplev.c
+++ gomp-4_0-branch/gcc/toplev.c
@@ -78,6 +78,7 @@  along with GCC; see the file COPYING3.
 #include "diagnostic-color.h"
 #include "context.h"
 #include "pass_manager.h"
+#include "omp-low.h"
 
 #if defined(DBX_DEBUGGING_INFO) || defined(XCOFF_DEBUGGING_INFO)
 #include "dbxout.h"
@@ -576,6 +577,8 @@  compile_file (void)
       if (flag_sanitize & SANITIZE_THREAD)
 	tsan_finish_file ();
 
+      omp_finish_file ();
+
       output_shared_constant_pool ();
       output_object_blocks ();
       finish_tm_clone_pairs ();
Index: gomp-4_0-branch/gcc/tree.c
===================================================================
--- gomp-4_0-branch.orig/gcc/tree.c
+++ gomp-4_0-branch/gcc/tree.c
@@ -9078,7 +9078,8 @@  clean_symbol_name (char *p)
    I - for constructors
    D - for destructors
    N - for C++ anonymous namespaces
-   F - for DWARF unwind frame information.  */
+   F - for DWARF unwind frame information
+   O - for OpenMP/OpenACC target functions.  */
 
 tree
 get_file_function_name (const char *type)
Index: gomp-4_0-branch/gcc/tree-core.h
===================================================================
--- gomp-4_0-branch.orig/gcc/tree-core.h
+++ gomp-4_0-branch/gcc/tree-core.h
@@ -1366,6 +1366,10 @@  struct GTY(()) tree_decl_common {
   tree attributes;
   tree abstract_origin;
 
+  /* An IDENTIFIER_NODE used in omp-lower when offloading to
+     accelerator targets.  */
+  tree unique_id;
+
   /* Points to a structure whose details depend on the language in use.  */
   struct lang_decl *lang_specific;
 };
Index: gomp-4_0-branch/gcc/tree.h
===================================================================
--- gomp-4_0-branch.orig/gcc/tree.h
+++ gomp-4_0-branch/gcc/tree.h
@@ -2003,6 +2003,11 @@  extern void protected_set_expr_location
 #define DECL_ATTRIBUTES(NODE) \
   (DECL_COMMON_CHECK (NODE)->decl_common.attributes)
 
+/* In a DECL, an optional IDENTIFIER_NODE that holds a unique name for this
+   decl that is used when offloading to accelerator targets.  */
+#define DECL_UNIQUE_ID(NODE) \
+  (DECL_COMMON_CHECK (NODE)->decl_common.unique_id)
+
 /* For a FUNCTION_DECL, holds the tree of BINDINGs.
    For a TRANSLATION_UNIT_DECL, holds the namespace's BLOCK.
    For a VAR_DECL, holds the initial value.
Index: gomp-4_0-branch/gcc/tree-streamer-in.c
===================================================================
--- gomp-4_0-branch.orig/gcc/tree-streamer-in.c
+++ gomp-4_0-branch/gcc/tree-streamer-in.c
@@ -658,6 +658,7 @@  lto_input_ts_decl_common_tree_pointers (
   DECL_SIZE (expr) = stream_read_tree (ib, data_in);
   DECL_SIZE_UNIT (expr) = stream_read_tree (ib, data_in);
   DECL_ATTRIBUTES (expr) = stream_read_tree (ib, data_in);
+  DECL_UNIQUE_ID (expr) = stream_read_tree (ib, data_in);
 
   /* Do not stream DECL_ABSTRACT_ORIGIN.  We cannot handle debug information
      for early inlining so drop it on the floor instead of ICEing in
Index: gomp-4_0-branch/gcc/tree-streamer-out.c
===================================================================
--- gomp-4_0-branch.orig/gcc/tree-streamer-out.c
+++ gomp-4_0-branch/gcc/tree-streamer-out.c
@@ -593,6 +593,7 @@  write_ts_decl_common_tree_pointers (stru
      special handling in LTO, it must be handled by streamer hooks.  */
 
   stream_write_tree (ob, DECL_ATTRIBUTES (expr), ref_p);
+  stream_write_tree (ob, DECL_UNIQUE_ID (expr), ref_p);
 
   /* Do not stream DECL_ABSTRACT_ORIGIN.  We cannot handle debug information
      for early inlining so drop it on the floor instead of ICEing in