diff mbox

New post-LTO OpenACC pass

Message ID 560170CB.3040603@acm.org
State New
Headers show

Commit Message

Nathan Sidwell Sept. 22, 2015, 3:16 p.m. UTC
On 09/21/15 16:39, Nathan Sidwell wrote:
> On 09/21/15 16:30, Cesar Philippidis wrote:
>> On 09/21/2015 09:30 AM, Nathan Sidwell wrote:
>>
>>> +const pass_data pass_data_oacc_transform =
>>> +{
>>> +  GIMPLE_PASS, /* type */
>>> +  "fold_oacc_transform", /* name */
>>
>> Want to rename the tree dump file to oacc_xforms like I'm did in the
>> attached patch? Regardless, I think we need to document this flag in
>> invoke.texi.
>
> Thanks for noticing the missing doc.  I'm not attached to any particular name.
> 'fold_oacc_transform' is rather generic, and a bit of  a mouthful.  Perhaps
> 'oacclower', 'oaccdevlower' or something (I  see there's 'lateomplower' for
> guidance)

this updated patch includes Cesar's doc patch.  Also change the name of the pass 
to 'oaccdevlow'.

nathan

Comments

Bernd Schmidt Sept. 23, 2015, 10:59 a.m. UTC | #1
On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
> +	if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
> +	  /* acc_on_device must be evaluated at compile time for
> +	     constant arguments.  */
> +	  {
> +	    oacc_xform_on_device (call);
> +	    rescan = true;
> +	  }

Is there a reason this is not done as part of pass_fold_builtins? (It 
looks like maybe adding this to fold_call_stmt in builtins.c would be 
sufficient too).


Bernd
Nathan Sidwell Sept. 23, 2015, 12:14 p.m. UTC | #2
On 09/23/15 06:59, Bernd Schmidt wrote:
> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>> +    if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>> +      /* acc_on_device must be evaluated at compile time for
>> +         constant arguments.  */
>> +      {
>> +        oacc_xform_on_device (call);
>> +        rescan = true;
>> +      }
>
> Is there a reason this is not done as part of pass_fold_builtins? (It looks like
> maybe adding this to fold_call_stmt in builtins.c would be sufficient too).

Perhaps it could be.  I'll need to check where  that pass happens.  Anyway, the 
main thrust of this patch is the new pass, which I thought might be easier to 
review with minimal additional  clutter.

nathan
Bernd Schmidt Sept. 23, 2015, 12:58 p.m. UTC | #3
On 09/23/2015 02:14 PM, Nathan Sidwell wrote:
> On 09/23/15 06:59, Bernd Schmidt wrote:
>> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>>> +    if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>>> +      /* acc_on_device must be evaluated at compile time for
>>> +         constant arguments.  */
>>> +      {
>>> +        oacc_xform_on_device (call);
>>> +        rescan = true;
>>> +      }
>>
>> Is there a reason this is not done as part of pass_fold_builtins? (It
>> looks like
>> maybe adding this to fold_call_stmt in builtins.c would be sufficient
>> too).
>
> Perhaps it could be.  I'll need to check where  that pass happens.
> Anyway, the main thrust of this patch is the new pass, which I thought
> might be easier to review with minimal additional  clutter.

There's no issue adding a new pass if there's a demonstrated need for 
it, but I think builtin folding doesn't quite meet that criterion given 
that we already have a pass that does that. Unless you really need it to 
happen very early in the pipeline - fold_builtins runs pretty late, but 
I checked and fold_call_stmt gets called from pass_forwprop and possibly 
from elsewhere too.


Bernd
Nathan Sidwell Sept. 23, 2015, 6:42 p.m. UTC | #4
On 09/23/15 08:58, Bernd Schmidt wrote:
> On 09/23/2015 02:14 PM, Nathan Sidwell wrote:
>> On 09/23/15 06:59, Bernd Schmidt wrote:
>>> On 09/22/2015 05:16 PM, Nathan Sidwell wrote:
>>>> +    if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
>>>> +      /* acc_on_device must be evaluated at compile time for
>>>> +         constant arguments.  */
>>>> +      {
>>>> +        oacc_xform_on_device (call);
>>>> +        rescan = true;
>>>> +      }
>>>
>>> Is there a reason this is not done as part of pass_fold_builtins? (It
>>> looks like
>>> maybe adding this to fold_call_stmt in builtins.c would be sufficient
>>> too).


As I feared, builtin folding occurs in several places.  In particular its first 
call is very early on in the host compiler, which is far too soon.

We have to defer folding until we know whether we're doing host or device 
compilation.

nathan
Bernd Schmidt Sept. 23, 2015, 6:51 p.m. UTC | #5
On 09/23/2015 08:42 PM, Nathan Sidwell wrote:

> As I feared, builtin folding occurs in several places.  In particular
> its first call is very early on in the host compiler, which is far too
> soon.
>
> We have to defer folding until we know whether we're doing host or
> device compilation.

Doesn't something like "symtab->state >= EXPANSION" give you that?


Bernd
Nathan Sidwell Sept. 23, 2015, 6:58 p.m. UTC | #6
On 09/23/15 14:51, Bernd Schmidt wrote:
> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>
>> As I feared, builtin folding occurs in several places.  In particular
>> its first call is very early on in the host compiler, which is far too
>> soon.
>>
>> We have to defer folding until we know whether we're doing host or
>> device compilation.
>
> Doesn't something like "symtab->state >= EXPANSION" give you that?

I don't know.   It doesn't seem to me to be a good idea for the builtin 
expanders to be context-sensitive.

nathan
Nathan Sidwell Sept. 24, 2015, 10:38 p.m. UTC | #7
On 09/23/15 14:58, Nathan Sidwell wrote:
> On 09/23/15 14:51, Bernd Schmidt wrote:
>> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>>
>>> As I feared, builtin folding occurs in several places.  In particular
>>> its first call is very early on in the host compiler, which is far too
>>> soon.
>>>
>>> We have to defer folding until we know whether we're doing host or
>>> device compilation.
>>
>> Doesn't something like "symtab->state >= EXPANSION" give you that?

I've tried limiting expansion by checking symtab->state.  I have been unable to 
succeed.

It either expands too early in the host compiler, or it doesn't get expanded at 
  all and one ends up with an RTL call to the library function.   For instance 
there doesn't appear to be call to fold builtins when state == EXPANSION. 
lesser values are present in the host compiler before LTO write out, AFAICT.

nathan
Bernd Schmidt Sept. 25, 2015, 10:28 a.m. UTC | #8
On 09/25/2015 12:38 AM, Nathan Sidwell wrote:
> On 09/23/15 14:58, Nathan Sidwell wrote:
>> On 09/23/15 14:51, Bernd Schmidt wrote:
>>> On 09/23/2015 08:42 PM, Nathan Sidwell wrote:
>>>> We have to defer folding until we know whether we're doing host or
>>>> device compilation.
>>>
>>> Doesn't something like "symtab->state >= EXPANSION" give you that?
>
> I've tried limiting expansion by checking symtab->state.  I have been
> unable to succeed.
>
> It either expands too early in the host compiler, or it doesn't get
> expanded at  all and one ends up with an RTL call to the library
> function.   For instance there doesn't appear to be call to fold
> builtins when state == EXPANSION. lesser values are present in the host
> compiler before LTO write out, AFAICT.

That's a bit odd:

Breakpoint 5, (anonymous namespace)::pass_fold_builtins::execute 
(this=0x1ce89a0, fun=0x7ffff0858348) at ../../git/gcc/tree-ssa-ccp.c:2722
[...]
(gdb) p stmt
$3 = (gimple *) 0x7ffff0736d80
(gdb) pgg
warning: Expression is not an assignment (and might have no effect)
# .MEM_2 = VDEF <.MEM_1(D)>
_3 = acc_on_device (123);
(gdb) p symtab->state
$4 = EXPANSION

On the other hand, it's not considered a builtin:

(gdb) p gimple_call_builtin_p(stmt, BUILT_IN_ACC_ON_DEVICE)
$6 = false

This is the c-c++-common/goacc/acc_on_device-2.c testcase. Is that 
expected to be handled? If I change it to use __builtin_acc_on_device, I 
can step right into

Breakpoint 8, fold_call_stmt (stmt=0x7ffff0736e10, ignore=false) at 
../../git/gcc/builtins.c:12277
12277	  tree ret = NULL_TREE;

Maybe you were compiling without optimization? In that case 
expand_builtin_acc_on_device (which already exists) should still end up 
doing the right thing. In no case should you see a RTL call to a 
function, that indicates that something else went wrong.

Can you send me the patch you tried (and possibly a testcase you expect 
to be handled), I'll see if I can find out what's going on.


Bernd
diff mbox

Patch

2015-09-22  Nathan Sidwell  <nathan@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>

	* omp-low.h (get_oacc_fn_attrib): Declare.
	* omp-low.c (get_oacc_fn_attrib): New.
	(oacc_xform_on_device): New.
	(execute_oacc_device_lower): New pass.
	(pass_data_oacc_device_lower): New.
	(pass_oacc_device_lower): New.
	(make_pass_oacc_device_lower): New.
	* tree-pass.h (make_pass_oacc_device_lower): Declare.
	* passes.def: Add pass_oacc_transform.
	* doc/invoke.texi: Document -fdump-tree-oaccdevlow.

Index: tree-pass.h
===================================================================
--- tree-pass.h	(revision 227968)
+++ tree-pass.h	(working copy)
@@ -406,6 +406,7 @@  extern gimple_opt_pass *make_pass_lower_
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: passes.def
===================================================================
--- passes.def	(revision 227968)
+++ passes.def	(working copy)
@@ -148,6 +148,7 @@  along with GCC; see the file COPYING3.
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
+  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
       NEXT_PASS (pass_remove_cgraph_callee_edges);
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 227968)
+++ doc/invoke.texi	(working copy)
@@ -7179,6 +7179,11 @@  is made by appending @file{.slp} to the
 Dump each function after Value Range Propagation (VRP).  The file name
 is made by appending @file{.vrp} to the source file name.
 
+@item oaccdevlow
+@opindex fdump-tree-oaccdevlow
+Dump each function after applying device-specific OpenACC transformations.
+The file name is made by appending @file{.oaccdevlow} to the source file name.
+
 @item all
 @opindex fdump-tree-all
 Enable all the available tree dumps with the flags provided in this option.
Index: omp-low.c
===================================================================
--- omp-low.c	(revision 227968)
+++ omp-low.c	(working copy)
@@ -8860,6 +8860,16 @@  expand_omp_atomic (struct omp_region *re
   expand_omp_atomic_mutex (load_bb, store_bb, addr, loaded_val, stored_val);
 }
 
+#define OACC_FN_ATTRIB "oacc function"
+
+/* Retrieve the oacc function attrib and return it.  Non-oacc
+   functions will return NULL.  */
+
+tree
+get_oacc_fn_attrib (tree fn)
+{
+  return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn));
+}
 
 /* Expand the GIMPLE_OMP_TARGET starting at REGION.  */
 
@@ -13909,4 +13919,131 @@  omp_finish_file (void)
     }
 }
 
+/* Transform an acc_on_device call.  OpenACC 2.0a requires this folded at
+   compile time for constant operands.  We always fold it.  In an
+   offloaded function we're never 'none'.  */
+
+static void
+oacc_xform_on_device (gimple *call)
+{
+  tree arg = gimple_call_arg (call, 0);
+  unsigned val = GOMP_DEVICE_HOST;
+	      
+#ifdef ACCEL_COMPILER
+  val = GOMP_DEVICE_NOT_HOST;
+#endif
+  tree result = build2 (EQ_EXPR, boolean_type_node, arg,
+			build_int_cst (integer_type_node, val));
+#ifdef ACCEL_COMPILER
+  {
+    tree dev  = build2 (EQ_EXPR, boolean_type_node, arg,
+			build_int_cst (integer_type_node,
+				       ACCEL_COMPILER_acc_device));
+    result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
+  }
+#endif
+  result = fold_convert (integer_type_node, result);
+  tree lhs = gimple_call_lhs (call);
+  gimple_seq seq = NULL;
+
+  push_gimplify_context (true);
+  gimplify_assign (lhs, result, &seq);
+  pop_gimplify_context (NULL);
+
+  gimple_stmt_iterator gsi = gsi_for_stmt (call);
+  gsi_replace_with_seq (&gsi, seq, false);
+}
+
+/* Main entry point for oacc transformations which run on the device
+   compiler after LTO, so we know what the target device is at this
+   point (including the host fallback).  */
+
+static unsigned int
+execute_oacc_device_lower ()
+{
+  tree attrs = get_oacc_fn_attrib (current_function_decl);
+  
+  if (!attrs)
+    /* Not an offloaded function.  */
+    return 0;
+
+  basic_block bb;
+  FOR_ALL_BB_FN (bb, cfun)
+    for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	 !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple *stmt = gsi_stmt (gsi);
+	bool rescan = false;
+
+	if (!is_gimple_call (stmt))
+	  continue;
+
+	/* Rewind to allow rescan.  */
+	gsi_prev (&gsi);
+
+	gcall *call = as_a <gcall *> (stmt);
+	
+	if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
+	  /* acc_on_device must be evaluated at compile time for
+	     constant arguments.  */
+	  {
+	    oacc_xform_on_device (call);
+	    rescan = true;
+	  }
+
+	if (gsi_end_p (gsi))
+	  /* We rewound past the beginning of the BB.  */
+	  gsi = gsi_start_bb (bb);
+
+	if (!rescan)
+	  /* Undo the rewind, so we don't get stuck infinitely.  */
+	  gsi_next (&gsi);
+      }
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_oacc_device_lower =
+{
+  GIMPLE_PASS, /* type */
+  "oaccdevlow", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
+};
+
+class pass_oacc_device_lower : public gimple_opt_pass
+{
+public:
+  pass_oacc_device_lower (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_oacc_device_lower, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual unsigned int execute (function *)
+    {
+      bool gate = (flag_openacc != 0 && !seen_error ());
+
+      if (!gate)
+	return 0;
+
+      return execute_oacc_device_lower ();
+    }
+
+}; // class pass_oacc_transform
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_oacc_device_lower (gcc::context *ctxt)
+{
+  return new pass_oacc_device_lower (ctxt);
+}
+
 #include "gt-omp-low.h"
Index: omp-low.h
===================================================================
--- omp-low.h	(revision 227968)
+++ omp-low.h	(working copy)
@@ -28,6 +28,7 @@  extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern tree get_oacc_fn_attrib (tree);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;