Message ID | 560170CB.3040603@acm.org |
---|---|
State | New |
Headers | show |
On 09/22/2015 05:16 PM, Nathan Sidwell wrote: > + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE)) > + /* acc_on_device must be evaluated at compile time for > + constant arguments. */ > + { > + oacc_xform_on_device (call); > + rescan = true; > + } Is there a reason this is not done as part of pass_fold_builtins? (It looks like maybe adding this to fold_call_stmt in builtins.c would be sufficient too). Bernd
On 09/23/15 06:59, Bernd Schmidt wrote: > On 09/22/2015 05:16 PM, Nathan Sidwell wrote: >> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE)) >> + /* acc_on_device must be evaluated at compile time for >> + constant arguments. */ >> + { >> + oacc_xform_on_device (call); >> + rescan = true; >> + } > > Is there a reason this is not done as part of pass_fold_builtins? (It looks like > maybe adding this to fold_call_stmt in builtins.c would be sufficient too). Perhaps it could be. I'll need to check where that pass happens. Anyway, the main thrust of this patch is the new pass, which I thought might be easier to review with minimal additional clutter. nathan
On 09/23/2015 02:14 PM, Nathan Sidwell wrote: > On 09/23/15 06:59, Bernd Schmidt wrote: >> On 09/22/2015 05:16 PM, Nathan Sidwell wrote: >>> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE)) >>> + /* acc_on_device must be evaluated at compile time for >>> + constant arguments. */ >>> + { >>> + oacc_xform_on_device (call); >>> + rescan = true; >>> + } >> >> Is there a reason this is not done as part of pass_fold_builtins? (It >> looks like >> maybe adding this to fold_call_stmt in builtins.c would be sufficient >> too). > > Perhaps it could be. I'll need to check where that pass happens. > Anyway, the main thrust of this patch is the new pass, which I thought > might be easier to review with minimal additional clutter. There's no issue adding a new pass if there's a demonstrated need for it, but I think builtin folding doesn't quite meet that criterion given that we already have a pass that does that. Unless you really need it to happen very early in the pipeline - fold_builtins runs pretty late, but I checked and fold_call_stmt gets called from pass_forwprop and possibly from elsewhere too. Bernd
On 09/23/15 08:58, Bernd Schmidt wrote: > On 09/23/2015 02:14 PM, Nathan Sidwell wrote: >> On 09/23/15 06:59, Bernd Schmidt wrote: >>> On 09/22/2015 05:16 PM, Nathan Sidwell wrote: >>>> + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE)) >>>> + /* acc_on_device must be evaluated at compile time for >>>> + constant arguments. */ >>>> + { >>>> + oacc_xform_on_device (call); >>>> + rescan = true; >>>> + } >>> >>> Is there a reason this is not done as part of pass_fold_builtins? (It >>> looks like >>> maybe adding this to fold_call_stmt in builtins.c would be sufficient >>> too). As I feared, builtin folding occurs in several places. In particular its first call is very early on in the host compiler, which is far too soon. We have to defer folding until we know whether we're doing host or device compilation. nathan
On 09/23/2015 08:42 PM, Nathan Sidwell wrote: > As I feared, builtin folding occurs in several places. In particular > its first call is very early on in the host compiler, which is far too > soon. > > We have to defer folding until we know whether we're doing host or > device compilation. Doesn't something like "symtab->state >= EXPANSION" give you that? Bernd
On 09/23/15 14:51, Bernd Schmidt wrote: > On 09/23/2015 08:42 PM, Nathan Sidwell wrote: > >> As I feared, builtin folding occurs in several places. In particular >> its first call is very early on in the host compiler, which is far too >> soon. >> >> We have to defer folding until we know whether we're doing host or >> device compilation. > > Doesn't something like "symtab->state >= EXPANSION" give you that? I don't know. It doesn't seem to me to be a good idea for the builtin expanders to be context-sensitive. nathan
On 09/23/15 14:58, Nathan Sidwell wrote: > On 09/23/15 14:51, Bernd Schmidt wrote: >> On 09/23/2015 08:42 PM, Nathan Sidwell wrote: >> >>> As I feared, builtin folding occurs in several places. In particular >>> its first call is very early on in the host compiler, which is far too >>> soon. >>> >>> We have to defer folding until we know whether we're doing host or >>> device compilation. >> >> Doesn't something like "symtab->state >= EXPANSION" give you that? I've tried limiting expansion by checking symtab->state. I have been unable to succeed. It either expands too early in the host compiler, or it doesn't get expanded at all and one ends up with an RTL call to the library function. For instance there doesn't appear to be call to fold builtins when state == EXPANSION. lesser values are present in the host compiler before LTO write out, AFAICT. nathan
On 09/25/2015 12:38 AM, Nathan Sidwell wrote: > On 09/23/15 14:58, Nathan Sidwell wrote: >> On 09/23/15 14:51, Bernd Schmidt wrote: >>> On 09/23/2015 08:42 PM, Nathan Sidwell wrote: >>>> We have to defer folding until we know whether we're doing host or >>>> device compilation. >>> >>> Doesn't something like "symtab->state >= EXPANSION" give you that? > > I've tried limiting expansion by checking symtab->state. I have been > unable to succeed. > > It either expands too early in the host compiler, or it doesn't get > expanded at all and one ends up with an RTL call to the library > function. For instance there doesn't appear to be call to fold > builtins when state == EXPANSION. lesser values are present in the host > compiler before LTO write out, AFAICT. That's a bit odd: Breakpoint 5, (anonymous namespace)::pass_fold_builtins::execute (this=0x1ce89a0, fun=0x7ffff0858348) at ../../git/gcc/tree-ssa-ccp.c:2722 [...] (gdb) p stmt $3 = (gimple *) 0x7ffff0736d80 (gdb) pgg warning: Expression is not an assignment (and might have no effect) # .MEM_2 = VDEF <.MEM_1(D)> _3 = acc_on_device (123); (gdb) p symtab->state $4 = EXPANSION On the other hand, it's not considered a builtin: (gdb) p gimple_call_builtin_p(stmt, BUILT_IN_ACC_ON_DEVICE) $6 = false This is the c-c++-common/goacc/acc_on_device-2.c testcase. Is that expected to be handled? If I change it to use __builtin_acc_on_device, I can step right into Breakpoint 8, fold_call_stmt (stmt=0x7ffff0736e10, ignore=false) at ../../git/gcc/builtins.c:12277 12277 tree ret = NULL_TREE; Maybe you were compiling without optimization? In that case expand_builtin_acc_on_device (which already exists) should still end up doing the right thing. In no case should you see a RTL call to a function, that indicates that something else went wrong. Can you send me the patch you tried (and possibly a testcase you expect to be handled), I'll see if I can find out what's going on. Bernd
2015-09-22 Nathan Sidwell <nathan@codesourcery.com> Cesar Philippidis <cesar@codesourcery.com> * omp-low.h (get_oacc_fn_attrib): Declare. * omp-low.c (get_oacc_fn_attrib): New. (oacc_xform_on_device): New. (execute_oacc_device_lower): New pass. (pass_data_oacc_device_lower): New. (pass_oacc_device_lower): New. (make_pass_oacc_device_lower): New. * tree-pass.h (make_pass_oacc_device_lower): Declare. * passes.def: Add pass_oacc_transform. * doc/invoke.texi: Document -fdump-tree-oaccdevlow. Index: tree-pass.h =================================================================== --- tree-pass.h (revision 227968) +++ tree-pass.h (working copy) @@ -406,6 +406,7 @@ extern gimple_opt_pass *make_pass_lower_ extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt); extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt); extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt); extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt); extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt); Index: passes.def =================================================================== --- passes.def (revision 227968) +++ passes.def (working copy) @@ -148,6 +148,7 @@ along with GCC; see the file COPYING3. INSERT_PASSES_AFTER (all_passes) NEXT_PASS (pass_fixup_cfg); NEXT_PASS (pass_lower_eh_dispatch); + NEXT_PASS (pass_oacc_device_lower); NEXT_PASS (pass_all_optimizations); PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations) NEXT_PASS (pass_remove_cgraph_callee_edges); Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 227968) +++ doc/invoke.texi (working copy) @@ -7179,6 +7179,11 @@ is made by appending @file{.slp} to the Dump each function after Value Range Propagation (VRP). The file name is made by appending @file{.vrp} to the source file name. +@item oaccdevlow +@opindex fdump-tree-oaccdevlow +Dump each function after applying device-specific OpenACC transformations. +The file name is made by appending @file{.oaccdevlow} to the source file name. + @item all @opindex fdump-tree-all Enable all the available tree dumps with the flags provided in this option. Index: omp-low.c =================================================================== --- omp-low.c (revision 227968) +++ omp-low.c (working copy) @@ -8860,6 +8860,16 @@ expand_omp_atomic (struct omp_region *re expand_omp_atomic_mutex (load_bb, store_bb, addr, loaded_val, stored_val); } +#define OACC_FN_ATTRIB "oacc function" + +/* Retrieve the oacc function attrib and return it. Non-oacc + functions will return NULL. */ + +tree +get_oacc_fn_attrib (tree fn) +{ + return lookup_attribute (OACC_FN_ATTRIB, DECL_ATTRIBUTES (fn)); +} /* Expand the GIMPLE_OMP_TARGET starting at REGION. */ @@ -13909,4 +13919,131 @@ omp_finish_file (void) } } +/* Transform an acc_on_device call. OpenACC 2.0a requires this folded at + compile time for constant operands. We always fold it. In an + offloaded function we're never 'none'. */ + +static void +oacc_xform_on_device (gimple *call) +{ + tree arg = gimple_call_arg (call, 0); + unsigned val = GOMP_DEVICE_HOST; + +#ifdef ACCEL_COMPILER + val = GOMP_DEVICE_NOT_HOST; +#endif + tree result = build2 (EQ_EXPR, boolean_type_node, arg, + build_int_cst (integer_type_node, val)); +#ifdef ACCEL_COMPILER + { + tree dev = build2 (EQ_EXPR, boolean_type_node, arg, + build_int_cst (integer_type_node, + ACCEL_COMPILER_acc_device)); + result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev); + } +#endif + result = fold_convert (integer_type_node, result); + tree lhs = gimple_call_lhs (call); + gimple_seq seq = NULL; + + push_gimplify_context (true); + gimplify_assign (lhs, result, &seq); + pop_gimplify_context (NULL); + + gimple_stmt_iterator gsi = gsi_for_stmt (call); + gsi_replace_with_seq (&gsi, seq, false); +} + +/* Main entry point for oacc transformations which run on the device + compiler after LTO, so we know what the target device is at this + point (including the host fallback). */ + +static unsigned int +execute_oacc_device_lower () +{ + tree attrs = get_oacc_fn_attrib (current_function_decl); + + if (!attrs) + /* Not an offloaded function. */ + return 0; + + basic_block bb; + FOR_ALL_BB_FN (bb, cfun) + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + bool rescan = false; + + if (!is_gimple_call (stmt)) + continue; + + /* Rewind to allow rescan. */ + gsi_prev (&gsi); + + gcall *call = as_a <gcall *> (stmt); + + if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE)) + /* acc_on_device must be evaluated at compile time for + constant arguments. */ + { + oacc_xform_on_device (call); + rescan = true; + } + + if (gsi_end_p (gsi)) + /* We rewound past the beginning of the BB. */ + gsi = gsi_start_bb (bb); + + if (!rescan) + /* Undo the rewind, so we don't get stuck infinitely. */ + gsi_next (&gsi); + } + + return 0; +} + +namespace { + +const pass_data pass_data_oacc_device_lower = +{ + GIMPLE_PASS, /* type */ + "oaccdevlow", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_NONE, /* tv_id */ + PROP_cfg, /* properties_required */ + 0 /* Possibly PROP_gimple_eomp. */, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */ +}; + +class pass_oacc_device_lower : public gimple_opt_pass +{ +public: + pass_oacc_device_lower (gcc::context *ctxt) + : gimple_opt_pass (pass_data_oacc_device_lower, ctxt) + {} + + /* opt_pass methods: */ + virtual unsigned int execute (function *) + { + bool gate = (flag_openacc != 0 && !seen_error ()); + + if (!gate) + return 0; + + return execute_oacc_device_lower (); + } + +}; // class pass_oacc_transform + +} // anon namespace + +gimple_opt_pass * +make_pass_oacc_device_lower (gcc::context *ctxt) +{ + return new pass_oacc_device_lower (ctxt); +} + #include "gt-omp-low.h" Index: omp-low.h =================================================================== --- omp-low.h (revision 227968) +++ omp-low.h (working copy) @@ -28,6 +28,7 @@ extern void free_omp_regions (void); extern tree omp_reduction_init (tree, tree); extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *); extern void omp_finish_file (void); +extern tree get_oacc_fn_attrib (tree); extern GTY(()) vec<tree, va_gc> *offload_funcs; extern GTY(()) vec<tree, va_gc> *offload_vars;