diff mbox

Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)

Message ID 87iocp1d8o.fsf@kepler.schwinge.homeip.net
State New
Headers show

Commit Message

Thomas Schwinge April 21, 2015, 7:40 p.m. UTC
Hi!

On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 24-11-14 11:56, Tom de Vries wrote:
> > On 15-11-14 18:19, Tom de Vries wrote:
> >> On 15-11-14 13:14, Tom de Vries wrote:
> >>> I'm submitting a patch series with initial support for the oacc kernels
> >>> directive.
> >>>
> >>> The patch series uses pass_parallelize_loops to implement parallelization of
> >>> loops in the oacc kernels region.
> >>>
> >>> The patch series consists of these 8 patches:
> >>> ...
> >>>      1  Expand oacc kernels after pass_build_ealias
> >>>      2  Add pass_oacc_kernels
> >>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>>      5  Add pass_loop_im to pass_oacc_kernels
> >>>      6  Add pass_ccp to pass_oacc_kernels
> >>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>>      8  Do simple omp lowering for no address taken var
> >>> ...
> >>
> >> This patch moves omp expansion of the oacc kernels directive to after
> >> pass_build_ealias.
> >>
> >> The rationale is that in order to use pass_parallelize_loops for analysis and
> >> transformation of an oacc kernels region, we postpone omp expansion of that
> >> region until the earliest point in the pass list where enough information is
> >> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
> >>
> >> The patch postpones expansion in expand_omp, and ensures expansion by adding
> >> pass_expand_omp_ssa:
> >> - after pass_build_ealias, and
> >> - after pass_all_early_optimizations for the case we're not optimizing.
> >>
> >> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
> >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
> >> lowered omp code, to handle it conservatively.
> >>
> >> The patch contains changes in expand_omp_target to deal with ssa-code, similar
> >> to what is already present in expand_omp_taskreg.
> >>
> >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
> >> static for oacc kernels. It does this to get some references to .omp_data_sizes
> >> and .omp_data_kinds in the ssa code.  Without these references, the definitions
> >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
> >> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
> >> kludge for this purpose ].
> >>
> >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> >> original function of which the definition has been removed (as in moved to the
> >> split off function). TODO_remove_unused_locals takes care of some of them, but
> >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
> >> dangling SSA_NAMEs and releases them.
> >>
> >
> > Reposting with small update: I've replaced the use of the rather generic
> > gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
> >
> > Bootstrapped and reg-tested in the same way as before.
> >
> 
> I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> 
> This allows fre to unify references to the same omp variable before entering 
> pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> 
> F.i. this reduction fragment:
> ...
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _67 = .omp_data_i_59->sumD.2270;
>    # VUSE <.MEM_8>
>    _68 = *_67;
> 
>    _70 = _66 + _68;
> 
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _69 = .omp_data_i_59->sumD.2270;
>    # .MEM_71 = VDEF <.MEM_8>
>    *_69 = _70;
> ...
> 
> is transformed by fre into:
> ...
>    # VUSE <.MEM_8>
>    # PT = { D.2282 }
>    _67 = .omp_data_i_59->sumD.2270;
>    # VUSE <.MEM_8>
>    _68 = *_67;
> 
>    _70 = _66 + _68;
> 
>    # .MEM_71 = VDEF <.MEM_8>
>    *_67 = _70;
> ...
> 
> In order for pass_fre to respect the kernels region boundaries, I've added a 
> change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
> 
> Bootstrapped and reg-tested as before.
> 
> OK for trunk?

Committed to gomp-4_0-branch in r222279:

commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 21 19:37:19 2015 +0000

    Expand oacc kernels after pass_fre
    
    	gcc/
    	* omp-low.c: Include gimple-pretty-print.h.
    	(release_first_vuse_in_edge_dest): New function.
    	(expand_omp_target): When not in ssa, don't split off oacc kernels
    	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
    	expanssion, and add GOACC_kernels_internal call.
    	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
    	into GOACC_kernels call.  Handle ssa-code.
    	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
    	properties_provided field.
    	(pass_expand_omp::execute): Set PROP_gimple_eomp in
    	cfun->curr_properties tentatively.
    	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
    	todo_flags_finish field.
    	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
    	execute_expand_omp.
    	(gimple_stmt_ssa_operand_references_var_p)
    	(gimple_stmt_omp_data_i_init_p): New function.
    	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
    	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
    	pass_expand_omp_ssa after pass_all_early_optimizations.
    	* tree-ssa-ccp.c: Include omp-low.h.
    	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
    	conservatively.
    	* tree-ssa-forwprop.c: Include omp-low.h.
    	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
    	* tree-ssa-sccvn.c: Include omp-low.h.
    	(visit_use): Handle .omp_data_i init conservatively.
    	* cgraph.c (cgraph_node::release_body): Don't release offloadable
    	functions.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp      |   30 +++++++
 gcc/cgraph.c            |    9 ++
 gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |    1 +
 gcc/passes.def          |    2 +
 gcc/tree-ssa-ccp.c      |    6 ++
 gcc/tree-ssa-forwprop.c |    4 +-
 gcc/tree-ssa-sccvn.c    |    4 +-
 8 files changed, 257 insertions(+), 13 deletions(-)



Grüße,
 Thomas

Comments

Richard Biener April 22, 2015, 7:36 a.m. UTC | #1
On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries <Tom_deVries@mentor.com> wrote:
> > On 24-11-14 11:56, Tom de Vries wrote:
> > > On 15-11-14 18:19, Tom de Vries wrote:
> > >> On 15-11-14 13:14, Tom de Vries wrote:
> > >>> I'm submitting a patch series with initial support for the oacc kernels
> > >>> directive.
> > >>>
> > >>> The patch series uses pass_parallelize_loops to implement parallelization of
> > >>> loops in the oacc kernels region.
> > >>>
> > >>> The patch series consists of these 8 patches:
> > >>> ...
> > >>>      1  Expand oacc kernels after pass_build_ealias
> > >>>      2  Add pass_oacc_kernels
> > >>>      3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>>      4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>>      5  Add pass_loop_im to pass_oacc_kernels
> > >>>      6  Add pass_ccp to pass_oacc_kernels
> > >>>      7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>>      8  Do simple omp lowering for no address taken var
> > >>> ...
> > >>
> > >> This patch moves omp expansion of the oacc kernels directive to after
> > >> pass_build_ealias.
> > >>
> > >> The rationale is that in order to use pass_parallelize_loops for analysis and
> > >> transformation of an oacc kernels region, we postpone omp expansion of that
> > >> region until the earliest point in the pass list where enough information is
> > >> availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.
> > >>
> > >> The patch postpones expansion in expand_omp, and ensures expansion by adding
> > >> pass_expand_omp_ssa:
> > >> - after pass_build_ealias, and
> > >> - after pass_all_early_optimizations for the case we're not optimizing.
> > >>
> > >> In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
> > >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
> > >> lowered omp code, to handle it conservatively.
> > >>
> > >> The patch contains changes in expand_omp_target to deal with ssa-code, similar
> > >> to what is already present in expand_omp_taskreg.
> > >>
> > >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
> > >> static for oacc kernels. It does this to get some references to .omp_data_sizes
> > >> and .omp_data_kinds in the ssa code.  Without these references, the definitions
> > >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
> > >> enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
> > >> kludge for this purpose ].
> > >>
> > >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> > >> original function of which the definition has been removed (as in moved to the
> > >> split off function). TODO_remove_unused_locals takes care of some of them, but
> > >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
> > >> dangling SSA_NAMEs and releases them.
> > >>
> > >
> > > Reposting with small update: I've replaced the use of the rather generic
> > > gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.
> > >
> > > Bootstrapped and reg-tested in the same way as before.
> > >
> > 
> > I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> > 
> > This allows fre to unify references to the same omp variable before entering 
> > pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> > 
> > F.i. this reduction fragment:
> > ...
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _67 = .omp_data_i_59->sumD.2270;
> >    # VUSE <.MEM_8>
> >    _68 = *_67;
> > 
> >    _70 = _66 + _68;
> > 
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _69 = .omp_data_i_59->sumD.2270;
> >    # .MEM_71 = VDEF <.MEM_8>
> >    *_69 = _70;
> > ...
> > 
> > is transformed by fre into:
> > ...
> >    # VUSE <.MEM_8>
> >    # PT = { D.2282 }
> >    _67 = .omp_data_i_59->sumD.2270;
> >    # VUSE <.MEM_8>
> >    _68 = *_67;
> > 
> >    _70 = _66 + _68;
> > 
> >    # .MEM_71 = VDEF <.MEM_8>
> >    *_67 = _70;
> > ...
> > 
> > In order for pass_fre to respect the kernels region boundaries, I've added a 
> > change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init conservatively.
> > 
> > Bootstrapped and reg-tested as before.
> > 
> > OK for trunk?
> 
> Committed to gomp-4_0-branch in r222279:
> 
> commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Tue Apr 21 19:37:19 2015 +0000
> 
>     Expand oacc kernels after pass_fre
>     
>     	gcc/
>     	* omp-low.c: Include gimple-pretty-print.h.
>     	(release_first_vuse_in_edge_dest): New function.
>     	(expand_omp_target): When not in ssa, don't split off oacc kernels
>     	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
>     	expanssion, and add GOACC_kernels_internal call.
>     	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
>     	into GOACC_kernels call.  Handle ssa-code.
>     	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
>     	properties_provided field.
>     	(pass_expand_omp::execute): Set PROP_gimple_eomp in
>     	cfun->curr_properties tentatively.
>     	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
>     	todo_flags_finish field.
>     	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
>     	execute_expand_omp.
>     	(gimple_stmt_ssa_operand_references_var_p)
>     	(gimple_stmt_omp_data_i_init_p): New function.
>     	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
>     	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
>     	pass_expand_omp_ssa after pass_all_early_optimizations.
>     	* tree-ssa-ccp.c: Include omp-low.h.
>     	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
>     	conservatively.
>     	* tree-ssa-forwprop.c: Include omp-low.h.
>     	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
>     	* tree-ssa-sccvn.c: Include omp-low.h.
>     	(visit_use): Handle .omp_data_i init conservatively.
>     	* cgraph.c (cgraph_node::release_body): Don't release offloadable
>     	functions.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@222279 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  gcc/ChangeLog.gomp      |   30 +++++++
>  gcc/cgraph.c            |    9 ++
>  gcc/omp-low.c           |  214 ++++++++++++++++++++++++++++++++++++++++++++---
>  gcc/omp-low.h           |    1 +
>  gcc/passes.def          |    2 +
>  gcc/tree-ssa-ccp.c      |    6 ++
>  gcc/tree-ssa-forwprop.c |    4 +-
>  gcc/tree-ssa-sccvn.c    |    4 +-
>  8 files changed, 257 insertions(+), 13 deletions(-)
> 
> diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
> index 7885189..1f86160 100644
> --- gcc/ChangeLog.gomp
> +++ gcc/ChangeLog.gomp
> @@ -1,5 +1,35 @@
>  2015-04-21  Tom de Vries  <tom@codesourcery.com>
>  
> +	* omp-low.c: Include gimple-pretty-print.h.
> +	(release_first_vuse_in_edge_dest): New function.
> +	(expand_omp_target): When not in ssa, don't split off oacc kernels
> +	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
> +	expanssion, and add GOACC_kernels_internal call.
> +	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
> +	into GOACC_kernels call.  Handle ssa-code.
> +	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
> +	properties_provided field.
> +	(pass_expand_omp::execute): Set PROP_gimple_eomp in
> +	cfun->curr_properties tentatively.
> +	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
> +	todo_flags_finish field.
> +	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
> +	execute_expand_omp.
> +	(gimple_stmt_ssa_operand_references_var_p)
> +	(gimple_stmt_omp_data_i_init_p): New function.
> +	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
> +	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
> +	pass_expand_omp_ssa after pass_all_early_optimizations.
> +	* tree-ssa-ccp.c: Include omp-low.h.
> +	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
> +	conservatively.
> +	* tree-ssa-forwprop.c: Include omp-low.h.
> +	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
> +	* tree-ssa-sccvn.c: Include omp-low.h.
> +	(visit_use): Handle .omp_data_i init conservatively.
> +	* cgraph.c (cgraph_node::release_body): Don't release offloadable
> +	functions.
> +
>  	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
>  	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
>  	DEF_ATTR_TREE_LIST.
> diff --git gcc/cgraph.c gcc/cgraph.c
> index e099856..c608d7e 100644
> --- gcc/cgraph.c
> +++ gcc/cgraph.c
> @@ -1706,6 +1706,15 @@ release_function_body (tree decl)
>  void
>  cgraph_node::release_body (bool keep_arguments)
>  {
> +  /* The omp-expansion of the oacc kernels directive is post-poned till after
> +     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
> +     release the body of the offload function, is run before omp_expand_target 
> +     can process the oacc kernels directive,  and omp_expand_target would crash
> +     trying to access the body.  This snippet works around this problem.
> +     FIXME: This should probably be fixed in a different way.  */
> +  if (offloadable)
> +    return;
> +
>    ipa_transforms_to_apply.release ();
>    if (!used_as_abstract_origin && symtab->state != PARSING)
>      {
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 4134f3d..16d9a5e 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "context.h"
>  #include "lto-section-names.h"
>  #include "gomp-constants.h"
> +#include "gimple-pretty-print.h"
>  
>  
>  /* Lowering of OMP parallel and workshare constructs proceeds in two
> @@ -5353,6 +5354,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
>      }
>  }
>  
> +static void
> +release_first_vuse_in_edge_dest (edge e)

All functions need a comment with documentation.

> +{
> +  gimple_stmt_iterator i;
> +  basic_block bb = e->dest;
> +
> +  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
> +    {
> +      gimple phi = gsi_stmt (i);
> +      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
> +
> +      if (!virtual_operand_p (arg))
> +	continue;
> +
> +      mark_virtual_operand_for_renaming (arg);
> +      return;
> +    }
> +
> +  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
> +    {
> +      gimple stmt = gsi_stmt (i);
> +      if (gimple_vuse (stmt) == NULL_TREE)
> +	continue;
> +
> +      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
> +      return;
> +    }
> +}
> +
>  /* Expand the OpenMP parallel or task directive starting at REGION.  */
>  
>  static void
> @@ -8770,8 +8800,11 @@ expand_omp_target (struct omp_region *region)
>    gimple stmt;
>    edge e;
>    bool offloaded, data_region;
> +  bool do_emit_library_call = true;
> +  bool do_splitoff = true;
>  
>    entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
> +
>    new_bb = region->entry;
>  
>    offloaded = is_gimple_omp_offloaded (entry_stmt);
> @@ -8804,12 +8837,48 @@ expand_omp_target (struct omp_region *region)
>    /* Supported by expand_omp_taskreg, but not here.  */
>    if (child_cfun != NULL)
>      gcc_checking_assert (!child_cfun->cfg);
> -  gcc_checking_assert (!gimple_in_ssa_p (cfun));
>  
>    entry_bb = region->entry;
>    exit_bb = region->exit;
>  
> -  if (offloaded)
> +  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
> +    {
> +      if (!gimple_in_ssa_p (cfun))
> +	{
> +	  /* We need to do analysis and optimizations on the kernels region
> +	     before splitoff.  Since that's hard to do on low gimple, we
> +	     postpone the splitoff until we're in SSA.
> +	     However, we do the emit of the corresponding function call already,
> +	     in order to keep the arguments of the call alive until the
> +	     splitoff.
> +	     Since at this point the function that is called is empty, we can
> +	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
> +	     some of it's function arguments as non-escaping, so it acts less
> +	     as an optimization barrier.  */
> +	  do_splitoff = false;
> +	  cfun->curr_properties &= ~PROP_gimple_eomp;
> +	}
> +      else
> +	{
> +	  /* Don't emit the library call.  We've already done that.  */
> +	  do_emit_library_call = false;
> +	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
> +	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
> +	     split off, we can no longer regard the omp_data_array reference as
> +	     non-escaping.  */
> +	  gsi = gsi_last_bb (entry_bb);
> +	  gsi_prev (&gsi);
> +	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
> +	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
> +	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
> +	  gimple_call_set_fndecl (call, fndecl);
> +	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
> +	  gimple_call_reset_alias_info (call);
> +	}
> +    }
> +
> +  if (offloaded
> +      && do_splitoff)
>      {
>        unsigned srcidx, dstidx, num;
>  
> @@ -8831,7 +8900,7 @@ expand_omp_target (struct omp_region *region)
>  	{
>  	  basic_block entry_succ_bb = single_succ (entry_bb);
>  	  gimple_stmt_iterator gsi;
> -	  tree arg;
> +	  tree arg, narg;
>  	  gimple tgtcopy_stmt = NULL;
>  	  tree sender = TREE_VEC_ELT (data_arg, 0);
>  
> @@ -8861,8 +8930,27 @@ expand_omp_target (struct omp_region *region)
>  	  gcc_assert (tgtcopy_stmt != NULL);
>  	  arg = DECL_ARGUMENTS (child_fn);
>  
> -	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
> -	  gsi_remove (&gsi, true);
> +	  if (!gimple_in_ssa_p (cfun))
> +	    {
> +	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
> +	      gsi_remove (&gsi, true);
> +	    }
> +	  else
> +	    {
> +	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
> +			  == arg);
> +
> +	      /* If we are in ssa form, we must load the value from the default
> +		 definition of the argument.  That should not be defined now,
> +		 since the argument is not used uninitialized.  */
> +	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
> +	      narg = make_ssa_name (arg, gimple_build_nop ());
> +	      set_ssa_default_def (cfun, arg, narg);
> +	      /* ?? Is setting the subcode really necessary ??  */
> +	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
> +	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
> +	      update_stmt (tgtcopy_stmt);
> +	    }
>  	}
>  
>        /* Declare local variables needed in CHILD_CFUN.  */
> @@ -8905,11 +8993,23 @@ expand_omp_target (struct omp_region *region)
>  	  stmt = gimple_build_return (NULL);
>  	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
>  	  gsi_remove (&gsi, true);
> +
> +	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
> +	     which is about to be split off.  Mark the vdef for renaming.  */
> +	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
>  	}
>  
>        /* Move the offloading region into CHILD_CFUN.  */
>  
> -      block = gimple_block (entry_stmt);
> +      if (gimple_in_ssa_p (cfun))
> +	{
> +	  init_tree_ssa (child_cfun);
> +	  init_ssa_operands (child_cfun);
> +	  child_cfun->gimple_df->in_ssa_p = true;
> +	  block = NULL_TREE;
> +	}
> +      else
> +	block = gimple_block (entry_stmt);
>  
>        new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
>        if (exit_bb)
> @@ -8969,9 +9069,18 @@ expand_omp_target (struct omp_region *region)
>  	  if (changed)
>  	    cleanup_tree_cfg ();
>  	}
> +      if (gimple_in_ssa_p (cfun))
> +	update_ssa (TODO_update_ssa);
>        pop_cfun ();
>      }
>  
> +  if (!do_emit_library_call)
> +    {
> +      if (gimple_in_ssa_p (cfun))
> +	update_ssa (TODO_update_ssa_only_virtuals);
> +      return;
> +    }
> +
>    /* Emit a library call to launch the offloading region, or do data
>       transfers.  */
>    tree t1, t2, t3, t4, device, cond, c, clauses;
> @@ -8993,7 +9102,7 @@ expand_omp_target (struct omp_region *region)
>        start_ix = BUILT_IN_GOACC_PARALLEL;
>        break;
>      case GF_OMP_TARGET_KIND_OACC_KERNELS:
> -      start_ix = BUILT_IN_GOACC_KERNELS;
> +      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
>        break;
>      case GF_OMP_TARGET_KIND_OACC_DATA:
>        start_ix = BUILT_IN_GOACC_DATA_START;
> @@ -9128,6 +9237,7 @@ expand_omp_target (struct omp_region *region)
>      case BUILT_IN_GOACC_DATA_START:
>      case BUILT_IN_GOACC_ENTER_EXIT_DATA:
>      case BUILT_IN_GOACC_KERNELS:
> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>      case BUILT_IN_GOACC_PARALLEL:
>      case BUILT_IN_GOACC_UPDATE:
>        break;
> @@ -9146,6 +9256,7 @@ expand_omp_target (struct omp_region *region)
>      case BUILT_IN_GOMP_TARGET_UPDATE:
>        break;
>      case BUILT_IN_GOACC_KERNELS:
> +    case BUILT_IN_GOACC_KERNELS_INTERNAL:
>      case BUILT_IN_GOACC_PARALLEL:
>        {
>  	tree t_num_gangs, t_num_workers, t_vector_length;
> @@ -9249,6 +9360,8 @@ expand_omp_target (struct omp_region *region)
>        gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
>        gsi_remove (&gsi, true);
>      }
> +  if (gimple_in_ssa_p (cfun))
> +    update_ssa (TODO_update_ssa_only_virtuals);
>  }
>  
>  
> @@ -9503,7 +9616,7 @@ const pass_data pass_data_expand_omp =
>    OPTGROUP_NONE, /* optinfo_flags */
>    TV_NONE, /* tv_id */
>    PROP_gimple_any, /* properties_required */
> -  PROP_gimple_eomp, /* properties_provided */
> +  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
>    0, /* properties_destroyed */
>    0, /* todo_flags_start */
>    0, /* todo_flags_finish */
> @@ -9517,12 +9630,14 @@ public:
>    {}
>  
>    /* opt_pass methods: */
> -  virtual unsigned int execute (function *)
> +  virtual unsigned int execute (function *fun)
>      {
>        bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
>  		    || flag_openmp_simd != 0)
>  		   && !seen_error ());
>  
> +      fun->curr_properties |= PROP_gimple_eomp;
> +
>        /* This pass always runs, to provide PROP_gimple_eomp.
>  	 But often, there is nothing to do.  */
>        if (!gate)
> @@ -9553,7 +9668,8 @@ const pass_data pass_data_expand_omp_ssa =
>    PROP_gimple_eomp, /* properties_provided */
>    0, /* properties_destroyed */
>    0, /* todo_flags_start */
> -  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
> +  TODO_cleanup_cfg | TODO_rebuild_alias
> +  | TODO_remove_unused_locals, /* todo_flags_finish */
>  };
>  
>  class pass_expand_omp_ssa : public gimple_opt_pass
> @@ -9568,7 +9684,48 @@ public:
>      {
>        return !(fun->curr_properties & PROP_gimple_eomp);
>      }
> -  virtual unsigned int execute (function *) { return execute_expand_omp (); }
> +  virtual unsigned int execute (function *)

Please move this out of the class body.

> +    {
> +      unsigned res = execute_expand_omp ();
> +
> +      /* After running pass_expand_omp_ssa to expand the oacc kernels
> +	 directive, we are left in the original function with anonymous
> +	 SSA_NAMEs, with a defining statement that has been deleted.  This
> +	 pass finds those SSA_NAMEs and releases them.
> +	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
> +      unsigned int i;
> +      for (i = 1; i < num_ssa_names; ++i)
> +	{
> +	  tree name = ssa_name (i);
> +	  if (name == NULL_TREE)
> +	    continue;
> +
> +	  gimple stmt = SSA_NAME_DEF_STMT (name);
> +	  bool found = false;
> +
> +	  ssa_op_iter op_iter;
> +	  def_operand_p def_p;
> +	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
> +	    {
> +	      tree def = DEF_FROM_PTR (def_p);
> +	      if (def == name)
> +		{
> +		  found = true;
> +		  break;
> +		}
> +	    }
> +
> +	  if (!found)
> +	    {
> +	      if (dump_file)
> +		fprintf (dump_file, "Released dangling ssa name %u\n", i);
> +	      release_ssa_name (name);
> +	    }
> +	}
> +
> +      return res;
> +    }
> +  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
>  
>  }; // class pass_expand_omp_ssa
>  
> @@ -13728,4 +13885,39 @@ omp_finish_file (void)
>      }
>  }
>  
> +static bool
> +gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
> +					  unsigned int nr_varnames,
> +					  unsigned int flags)

Missing comment.

> +{
> +  tree use;
> +  ssa_op_iter iter;
> +  const char *s;
> +
> +  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
> +    {
> +      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
> +	continue;
> +      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
> +
> +      unsigned int i;
> +      for (i = 0; i < nr_varnames; ++i)
> +	if (strcmp (varnames[i], s) == 0)
> +	  return true;

Eh?  This surely is crap - you can't ever have semantics depend on
identifiers.

> +    }
> +
> +  return false;
> +}
> +
> +/* Return true if STMT is .omp_data_i init.  */
> +
> +bool
> +gimple_stmt_omp_data_i_init_p (gimple stmt)
> +{
> +  const char *varnames[] = { ".omp_data_i" };
> +  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
> +  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
> +						   SSA_OP_DEF);

So no - this isn't possible this way and I suspect it's not reliable 
anyway.

> +}
> +
>  #include "gt-omp-low.h"
> diff --git gcc/omp-low.h gcc/omp-low.h
> index 8a4052e..3d30c3b 100644
> --- gcc/omp-low.h
> +++ gcc/omp-low.h
> @@ -28,6 +28,7 @@ extern void free_omp_regions (void);
>  extern tree omp_reduction_init (tree, tree);
>  extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
>  extern void omp_finish_file (void);
> +extern bool gimple_stmt_omp_data_i_init_p (gimple);
>  
>  extern GTY(()) vec<tree, va_gc> *offload_funcs;
>  extern GTY(()) vec<tree, va_gc> *offload_vars;
> diff --git gcc/passes.def gcc/passes.def
> index 2bc5dcd..db0dd18 100644
> --- gcc/passes.def
> +++ gcc/passes.def
> @@ -86,6 +86,7 @@ along with GCC; see the file COPYING3.  If not see
>  	     execute TODO_rebuild_alias at this point.  */
>  	  NEXT_PASS (pass_build_ealias);
>  	  NEXT_PASS (pass_fre);
> +	  NEXT_PASS (pass_expand_omp_ssa);
>  	  NEXT_PASS (pass_merge_phi);
>  	  NEXT_PASS (pass_cd_dce);
>  	  NEXT_PASS (pass_early_ipa_sra);
> @@ -99,6 +100,7 @@ along with GCC; see the file COPYING3.  If not see
>  	      late.  */
>  	  NEXT_PASS (pass_split_functions);
>        POP_INSERT_PASSES ()
> +      NEXT_PASS (pass_expand_omp_ssa);
>        NEXT_PASS (pass_release_ssa_names);
>        NEXT_PASS (pass_rebuild_cgraph_edges);
>        NEXT_PASS (pass_inline_parameters);
> diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
> index d45a3ff..46fe1c7 100644
> --- gcc/tree-ssa-ccp.c
> +++ gcc/tree-ssa-ccp.c
> @@ -172,6 +172,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "wide-int-print.h"
>  #include "builtins.h"
>  #include "tree-chkp.h"
> +#include "omp-low.h"
>  
>  
>  /* Possible lattice values.  */
> @@ -796,6 +797,9 @@ surely_varying_stmt_p (gimple stmt)
>        && gimple_code (stmt) != GIMPLE_CALL)
>      return true;
>  
> +  if (gimple_stmt_omp_data_i_init_p (stmt))
> +    return true;
> +

No.

>    return false;
>  }
>  
> @@ -2329,6 +2333,8 @@ ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
>    switch (gimple_code (stmt))
>      {
>        case GIMPLE_ASSIGN:
> +	if (gimple_stmt_omp_data_i_init_p (stmt))
> +	  break;
>          /* If the statement is an assignment that produces a single
>             output value, evaluate its RHS to see if the lattice value of
>             its output has changed.  */
> diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
> index d8db20a..554a5a5 100644
> --- gcc/tree-ssa-forwprop.c
> +++ gcc/tree-ssa-forwprop.c
> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-cfgcleanup.h"
>  #include "tree-into-ssa.h"
>  #include "cfganal.h"
> +#include "omp-low.h"
>  
>  /* This pass propagates the RHS of assignment statements into use
>     sites of the LHS of the assignment.  It's basically a specialized
> @@ -2155,7 +2156,8 @@ pass_forwprop::execute (function *fun)
>  	  tree lhs, rhs;
>  	  enum tree_code code;
>  
> -	  if (!is_gimple_assign (stmt))
> +	  if (!is_gimple_assign (stmt)
> +	      || gimple_stmt_omp_data_i_init_p (stmt))

No.

>  	    {
>  	      gsi_next (&gsi);
>  	      continue;
> diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
> index e417a15..449a615 100644
> --- gcc/tree-ssa-sccvn.c
> +++ gcc/tree-ssa-sccvn.c
> @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-ref.h"
>  #include "plugin-api.h"
>  #include "cgraph.h"
> +#include "omp-low.h"
>  
>  /* This algorithm is based on the SCC algorithm presented by Keith
>     Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> @@ -3542,7 +3543,8 @@ visit_use (tree use)
>      {
>        if (gimple_code (stmt) == GIMPLE_PHI)
>  	changed = visit_phi (stmt);
> -      else if (gimple_has_volatile_ops (stmt))
> +      else if (gimple_has_volatile_ops (stmt)
> +	       || gimple_stmt_omp_data_i_init_p (stmt))

No.

What is the intent of these changes?

Richard.


>  	changed = defs_to_varying (stmt);
>        else if (is_gimple_assign (stmt))
>  	{
> 
> 
> Grüße,
>  Thomas
>
diff mbox

Patch

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 7885189..1f86160 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,5 +1,35 @@ 
 2015-04-21  Tom de Vries  <tom@codesourcery.com>
 
+	* omp-low.c: Include gimple-pretty-print.h.
+	(release_first_vuse_in_edge_dest): New function.
+	(expand_omp_target): When not in ssa, don't split off oacc kernels
+	region, clear PROP_gimple_eomp in cfun->curr_properties to force later
+	expanssion, and add GOACC_kernels_internal call.
+	When in ssa, split off oacc kernels and convert GOACC_kernels_internal
+	into GOACC_kernels call.  Handle ssa-code.
+	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
+	properties_provided field.
+	(pass_expand_omp::execute): Set PROP_gimple_eomp in
+	cfun->curr_properties tentatively.
+	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
+	todo_flags_finish field.
+	(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
+	execute_expand_omp.
+	(gimple_stmt_ssa_operand_references_var_p)
+	(gimple_stmt_omp_data_i_init_p): New function.
+	* omp-low.h (gimple_stmt_omp_data_i_init_p): Declare.
+	* passes.def: Add pass_expand_omp_ssa after pass_fre.  Add
+	pass_expand_omp_ssa after pass_all_early_optimizations.
+	* tree-ssa-ccp.c: Include omp-low.h.
+	(surely_varying_stmt_p, ccp_visit_stmt): Handle .omp_data_i init
+	conservatively.
+	* tree-ssa-forwprop.c: Include omp-low.h.
+	(pass_forwprop::execute): Handle .omp_data_i init conservatively.
+	* tree-ssa-sccvn.c: Include omp-low.h.
+	(visit_use): Handle .omp_data_i init conservatively.
+	* cgraph.c (cgraph_node::release_body): Don't release offloadable
+	functions.
+
 	* builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING.
 	(ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add
 	DEF_ATTR_TREE_LIST.
diff --git gcc/cgraph.c gcc/cgraph.c
index e099856..c608d7e 100644
--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -1706,6 +1706,15 @@  release_function_body (tree decl)
 void
 cgraph_node::release_body (bool keep_arguments)
 {
+  /* The omp-expansion of the oacc kernels directive is post-poned till after
+     all_small_ipa_passes.  That means pass_ipa_free_lang_data, which tries to
+     release the body of the offload function, is run before omp_expand_target 
+     can process the oacc kernels directive,  and omp_expand_target would crash
+     trying to access the body.  This snippet works around this problem.
+     FIXME: This should probably be fixed in a different way.  */
+  if (offloadable)
+    return;
+
   ipa_transforms_to_apply.release ();
   if (!used_as_abstract_origin && symtab->state != PARSING)
     {
diff --git gcc/omp-low.c gcc/omp-low.c
index 4134f3d..16d9a5e 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -108,6 +108,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "lto-section-names.h"
 #include "gomp-constants.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OMP parallel and workshare constructs proceeds in two
@@ -5353,6 +5354,35 @@  expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8770,8 +8800,11 @@  expand_omp_target (struct omp_region *region)
   gimple stmt;
   edge e;
   bool offloaded, data_region;
+  bool do_emit_library_call = true;
+  bool do_splitoff = true;
 
   entry_stmt = as_a <gomp_target *> (last_stmt (region->entry));
+
   new_bb = region->entry;
 
   offloaded = is_gimple_omp_offloaded (entry_stmt);
@@ -8804,12 +8837,48 @@  expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_checking_assert (!child_cfun->cfg);
-  gcc_checking_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
 
-  if (offloaded)
+  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (!gimple_in_ssa_p (cfun))
+	{
+	  /* We need to do analysis and optimizations on the kernels region
+	     before splitoff.  Since that's hard to do on low gimple, we
+	     postpone the splitoff until we're in SSA.
+	     However, we do the emit of the corresponding function call already,
+	     in order to keep the arguments of the call alive until the
+	     splitoff.
+	     Since at this point the function that is called is empty, we can
+	     model the function as BUILT_IN_GOACC_KERNELS_INTERNAL, which marks
+	     some of it's function arguments as non-escaping, so it acts less
+	     as an optimization barrier.  */
+	  do_splitoff = false;
+	  cfun->curr_properties &= ~PROP_gimple_eomp;
+	}
+      else
+	{
+	  /* Don't emit the library call.  We've already done that.  */
+	  do_emit_library_call = false;
+	  /* Transform BUILT_IN_GOACC_KERNELS_INTERNAL into
+	     BUILT_IN_GOACC_KERNELS_INTERNAL.  Now that the function body will be
+	     split off, we can no longer regard the omp_data_array reference as
+	     non-escaping.  */
+	  gsi = gsi_last_bb (entry_bb);
+	  gsi_prev (&gsi);
+	  gcall *call = as_a <gcall *> (gsi_stmt (gsi));
+	  gcc_assert (gimple_call_builtin_p (call, BUILT_IN_GOACC_KERNELS_INTERNAL));
+	  tree fndecl = builtin_decl_explicit (BUILT_IN_GOACC_KERNELS);
+	  gimple_call_set_fndecl (call, fndecl);
+	  gimple_call_set_fntype (call, TREE_TYPE (fndecl));
+	  gimple_call_reset_alias_info (call);
+	}
+    }
+
+  if (offloaded
+      && do_splitoff)
     {
       unsigned srcidx, dstidx, num;
 
@@ -8831,7 +8900,7 @@  expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (data_arg, 0);
 
@@ -8861,8 +8930,27 @@  expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8905,11 +8993,23 @@  expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8969,9 +9069,18 @@  expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
+  if (!do_emit_library_call)
+    {
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa_only_virtuals);
+      return;
+    }
+
   /* Emit a library call to launch the offloading region, or do data
      transfers.  */
   tree t1, t2, t3, t4, device, cond, c, clauses;
@@ -8993,7 +9102,7 @@  expand_omp_target (struct omp_region *region)
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
-      start_ix = BUILT_IN_GOACC_KERNELS;
+      start_ix = BUILT_IN_GOACC_KERNELS_INTERNAL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
       start_ix = BUILT_IN_GOACC_DATA_START;
@@ -9128,6 +9237,7 @@  expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOACC_DATA_START:
     case BUILT_IN_GOACC_ENTER_EXIT_DATA:
     case BUILT_IN_GOACC_KERNELS:
+    case BUILT_IN_GOACC_KERNELS_INTERNAL:
     case BUILT_IN_GOACC_PARALLEL:
     case BUILT_IN_GOACC_UPDATE:
       break;
@@ -9146,6 +9256,7 @@  expand_omp_target (struct omp_region *region)
     case BUILT_IN_GOMP_TARGET_UPDATE:
       break;
     case BUILT_IN_GOACC_KERNELS:
+    case BUILT_IN_GOACC_KERNELS_INTERNAL:
     case BUILT_IN_GOACC_PARALLEL:
       {
 	tree t_num_gangs, t_num_workers, t_vector_length;
@@ -9249,6 +9360,8 @@  expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9503,7 +9616,7 @@  const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9517,12 +9630,14 @@  public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_cilkplus != 0 || flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0)
 		   && !seen_error ());
 
+      fun->curr_properties |= PROP_gimple_eomp;
+
       /* This pass always runs, to provide PROP_gimple_eomp.
 	 But often, there is nothing to do.  */
       if (!gate)
@@ -9553,7 +9668,8 @@  const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9568,7 +9684,48 @@  public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.
+	 TODO: Either fix this elsewhere, or make the fix unnecessary.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -13728,4 +13885,39 @@  omp_finish_file (void)
     }
 }
 
+static bool
+gimple_stmt_ssa_operand_references_var_p (gimple stmt, const char **varnames,
+					  unsigned int nr_varnames,
+					  unsigned int flags)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, flags)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      unsigned int i;
+      for (i = 0; i < nr_varnames; ++i)
+	if (strcmp (varnames[i], s) == 0)
+	  return true;
+    }
+
+  return false;
+}
+
+/* Return true if STMT is .omp_data_i init.  */
+
+bool
+gimple_stmt_omp_data_i_init_p (gimple stmt)
+{
+  const char *varnames[] = { ".omp_data_i" };
+  unsigned int nr_varnames = sizeof (varnames) / sizeof (varnames[0]);
+  return gimple_stmt_ssa_operand_references_var_p (stmt, varnames, nr_varnames,
+						   SSA_OP_DEF);
+}
+
 #include "gt-omp-low.h"
diff --git gcc/omp-low.h gcc/omp-low.h
index 8a4052e..3d30c3b 100644
--- gcc/omp-low.h
+++ gcc/omp-low.h
@@ -28,6 +28,7 @@  extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_data_i_init_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git gcc/passes.def gcc/passes.def
index 2bc5dcd..db0dd18 100644
--- gcc/passes.def
+++ gcc/passes.def
@@ -86,6 +86,7 @@  along with GCC; see the file COPYING3.  If not see
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
 	  NEXT_PASS (pass_fre);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
 	  NEXT_PASS (pass_early_ipa_sra);
@@ -99,6 +100,7 @@  along with GCC; see the file COPYING3.  If not see
 	      late.  */
 	  NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git gcc/tree-ssa-ccp.c gcc/tree-ssa-ccp.c
index d45a3ff..46fe1c7 100644
--- gcc/tree-ssa-ccp.c
+++ gcc/tree-ssa-ccp.c
@@ -172,6 +172,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "wide-int-print.h"
 #include "builtins.h"
 #include "tree-chkp.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -796,6 +797,9 @@  surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_data_i_init_p (stmt))
+    return true;
+
   return false;
 }
 
@@ -2329,6 +2333,8 @@  ccp_visit_stmt (gimple stmt, edge *taken_edge_p, tree *output_p)
   switch (gimple_code (stmt))
     {
       case GIMPLE_ASSIGN:
+	if (gimple_stmt_omp_data_i_init_p (stmt))
+	  break;
         /* If the statement is an assignment that produces a single
            output value, evaluate its RHS to see if the lattice value of
            its output has changed.  */
diff --git gcc/tree-ssa-forwprop.c gcc/tree-ssa-forwprop.c
index d8db20a..554a5a5 100644
--- gcc/tree-ssa-forwprop.c
+++ gcc/tree-ssa-forwprop.c
@@ -85,6 +85,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -2155,7 +2156,8 @@  pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_data_i_init_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
index e417a15..449a615 100644
--- gcc/tree-ssa-sccvn.c
+++ gcc/tree-ssa-sccvn.c
@@ -85,6 +85,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "ipa-ref.h"
 #include "plugin-api.h"
 #include "cgraph.h"
+#include "omp-low.h"
 
 /* This algorithm is based on the SCC algorithm presented by Keith
    Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -3542,7 +3543,8 @@  visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt)
+	       || gimple_stmt_omp_data_i_init_p (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{