diff mbox

[PING^3,12/16] Handle acc loop directive

Message ID 56D3BA65.4000605@mentor.com
State New
Headers show

Commit Message

Tom de Vries Feb. 29, 2016, 3:26 a.m. UTC
On 22-02-16 11:57, Jakub Jelinek wrote:
> On Mon, Feb 22, 2016 at 11:54:46AM +0100, Tom de Vries wrote:
>> Following up on your suggestion to implement this during gimplification, I
>> wrote attached patch.
>>
>> I'll put it through some openacc testing and add testcases. Is this approach
>> acceptable for stage4?
>
> LGTM.

Hi,

I ran into trouble during testing of this patch, with ignoring the private 
clause on the loop directive.

This openacc testcase compiles atm without a problem:
...
int
main (void)
{
   int j;
#pragma acc kernels default(none)
   {
#pragma acc loop private (j)
     for (unsigned i = 0; i < 1000; ++i)
       {
	j;
       }
   }
}
...

But when compiling with the patch, and ignoring the private clause, we run into 
this error:
...
test.c: In function ‘main’:
test.c:10:2: error: ‘j’ not specified in enclosing OpenACC ‘kernels’ construct
   j;
   ^
test.c:5:9: note: enclosing OpenACC ‘kernels’ construct
  #pragma acc kernels default(none)
...

So I updated the patch to ignore all but the private clause on the loop 
directive during gimplification, and moved the sequential expansion of the 
omp-for construct from gimplify to omp-lower.

Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested goacc.exp and libgomp testsuite.

Updated patch still ok for stage4?

Thanks,
- Tom

Comments

Tom de Vries March 7, 2016, 8:21 a.m. UTC | #1
On 29/02/16 04:26, Tom de Vries wrote:
> On 22-02-16 11:57, Jakub Jelinek wrote:
>> On Mon, Feb 22, 2016 at 11:54:46AM +0100, Tom de Vries wrote:
>>> Following up on your suggestion to implement this during
>>> gimplification, I
>>> wrote attached patch.
>>>
>>> I'll put it through some openacc testing and add testcases. Is this
>>> approach
>>> acceptable for stage4?
>>
>> LGTM.
>
> Hi,
>
> I ran into trouble during testing of this patch, with ignoring the
> private clause on the loop directive.
>
> This openacc testcase compiles atm without a problem:
> ...
> int
> main (void)
> {
>    int j;
> #pragma acc kernels default(none)
>    {
> #pragma acc loop private (j)
>      for (unsigned i = 0; i < 1000; ++i)
>        {
>      j;
>        }
>    }
> }
> ...
>
> But when compiling with the patch, and ignoring the private clause, we
> run into this error:
> ...
> test.c: In function ‘main’:
> test.c:10:2: error: ‘j’ not specified in enclosing OpenACC ‘kernels’
> construct
>    j;
>    ^
> test.c:5:9: note: enclosing OpenACC ‘kernels’ construct
>   #pragma acc kernels default(none)
> ...
>
> So I updated the patch to ignore all but the private clause on the loop
> directive during gimplification, and moved the sequential expansion of
> the omp-for construct from gimplify to omp-lower.
>
> Bootstrapped and reg-tested on x86_64.
>
> Build for nvidia accelerator and reg-tested goacc.exp and libgomp
> testsuite.
>
> Updated patch still ok for stage4?
>

Ping. ( Submitted here: 
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01903.html )

Thanks,
- Tom

> 0001-Ignore-acc-loop-directive-in-kernels-region.patch
>
>
> Ignore acc loop directive in kernels region
>
> 2016-02-29  Tom de Vries  <tom@codesourcery.com>
>
> 	* gimplify.c (gimplify_ctx_in_oacc_kernels_region): New function.
> 	(gimplify_omp_for): Ignore all but private clause on loop directive in
> 	kernels region.
> 	* omp-low.c (lower_omp_for_seq): New function.
> 	(lower_omp_for): Use lower_omp_for_seq in kernels region.  Don't
> 	generate omp continue/return.
>
> 	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
> 	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Same.
> 	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Same.
> 	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
> 	* c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
> 	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
> 	* c-c++-common/goacc/combined-directives.c: Update test.
> 	* c-c++-common/goacc/loop-private-1.c: Same.
> 	* gfortran.dg/goacc/combined-directives.f90: Same.
> 	* gfortran.dg/goacc/gang-static.f95: Same.
> 	* gfortran.dg/goacc/reduction-2.f95: Same.
>
> ---
>   gcc/gimplify.c                                     | 41 ++++++++++
>   gcc/omp-low.c                                      | 93 ++++++++++++++++++++--
>   .../c-c++-common/goacc/combined-directives.c       | 16 ++--
>   .../goacc/kernels-acc-loop-reduction.c             | 24 ++++++
>   .../goacc/kernels-acc-loop-smaller-equal.c         | 22 +++++
>   .../c-c++-common/goacc/kernels-loop-2-acc-loop.c   | 17 ++++
>   .../c-c++-common/goacc/kernels-loop-3-acc-loop.c   | 14 ++++
>   .../c-c++-common/goacc/kernels-loop-acc-loop.c     | 14 ++++
>   .../c-c++-common/goacc/kernels-loop-n-acc-loop.c   | 14 ++++
>   gcc/testsuite/c-c++-common/goacc/loop-private-1.c  |  2 +-
>   .../gfortran.dg/goacc/combined-directives.f90      | 16 ++--
>   gcc/testsuite/gfortran.dg/goacc/gang-static.f95    |  4 +-
>   gcc/testsuite/gfortran.dg/goacc/reduction-2.f95    |  3 +-
>   13 files changed, 252 insertions(+), 28 deletions(-)
>
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 7be6bd7..4b82305 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -8364,6 +8364,20 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, void *)
>     return NULL_TREE;
>   }
>
> +/* Return true if CTX is (part of) an oacc kernels region.  */
> +
> +static bool
> +gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctx *ctx)
> +{
> +  for (;ctx != NULL; ctx = ctx->outer_context)
> +    {
> +      if (ctx->region_type == ORT_ACC_KERNELS)
> +	return true;
> +    }
> +
> +  return false;
> +}
> +
>   /* Gimplify the gross structure of an OMP_FOR statement.  */
>
>   static enum gimplify_status
> @@ -8403,6 +8417,33 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>         gcc_unreachable ();
>       }
>
> +  /* Skip loop clauses not handled in kernels region.  */
> +  if (gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctxp))
> +    {
> +      tree *prev_ptr = &OMP_FOR_CLAUSES (for_stmt);
> +
> +      while (tree probe = *prev_ptr)
> +	{
> +	  tree *next_ptr = &OMP_CLAUSE_CHAIN (probe);
> +
> +	  bool keep_clause;
> +	  switch (OMP_CLAUSE_CODE (probe))
> +	    {
> +	    case OMP_CLAUSE_PRIVATE:
> +	      keep_clause = true;
> +	      break;
> +	    default:
> +	      keep_clause = false;
> +	      break;
> +	    }
> +
> +	  if (keep_clause)
> +	    prev_ptr = next_ptr;
> +	  else
> +	    *prev_ptr = *next_ptr;
> +	}
> +    }
> +
>     /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
>        clause for the IV.  */
>     if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index fcbb3e0..bb70ac2 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -14944,6 +14944,75 @@ lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
>       }
>   }
>
> +/* Lower the loops with index I and higher in omp_for FOR_STMT as a sequential
> +   loop, and append the resulting gimple statements to PRE_P.  */
> +
> +static void
> +lower_omp_for_seq (gimple_seq *pre_p, gimple *for_stmt, unsigned int i)
> +{
> +  unsigned int len = gimple_omp_for_collapse (for_stmt);
> +  gcc_assert (i < len);
> +
> +  /* Gimplify OMP_FOR[i] as:
> +
> +     OMP_FOR_INIT[i];
> +     goto <loop_entry_label>;
> +     <fall_thru_label>:
> +     if (i == len - 1)
> +       OMP_FOR_BODY;
> +     else
> +       OMP_FOR[i+1];
> +    OMP_FOR_INCR[i];
> +    <loop_entry_label>:
> +    if (OMP_FOR_COND[i])
> +      goto <fall_thru_label>;
> +    else
> +      goto <loop_exit_label>;
> +    <loop_exit_label>:
> +  */
> +
> +  tree loop_entry_label = create_artificial_label (UNKNOWN_LOCATION);
> +  tree fall_thru_label = create_artificial_label (UNKNOWN_LOCATION);
> +  tree loop_exit_label = create_artificial_label (UNKNOWN_LOCATION);
> +
> +  /* OMP_FOR_INIT[i].  */
> +  tree init = gimple_omp_for_initial (for_stmt, i);
> +  tree var = gimple_omp_for_index (for_stmt, i);
> +  gimple *g = gimple_build_assign (var, init);
> +  gimple_seq_add_stmt (pre_p, g);
> +
> +  /* goto <loop_entry_label>.  */
> +  gimple_seq_add_stmt (pre_p, gimple_build_goto (loop_entry_label));
> +
> +  /* <fall_thru_label>.  */
> +  gimple_seq_add_stmt (pre_p, gimple_build_label (fall_thru_label));
> +
> +  /* if (i == len - 1) OMP_FOR_BODY
> +     else OMP_FOR[i+1].  */
> +  if (i == len - 1)
> +    gimple_seq_add_seq (pre_p, gimple_omp_body (for_stmt));
> +  else
> +    lower_omp_for_seq (pre_p, for_stmt, i + 1);
> +
> +  /* OMP_FOR_INCR[i].  */
> +  tree incr = gimple_omp_for_incr (for_stmt, i);
> +  g = gimple_build_assign (var, incr);
> +  gimple_seq_add_stmt (pre_p, g);
> +
> +  /* <loop_entry_label>.  */
> +  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_entry_label));
> +
> +  /* if (OMP_FOR_COND[i]) goto <fall_thru_label>
> +     else goto <loop_exit_label>.  */
> +  enum tree_code cond = gimple_omp_for_cond (for_stmt, i);
> +  tree final_val = gimple_omp_for_final (for_stmt, i);
> +  gimple *gimple_cond = gimple_build_cond (cond, var, final_val,
> +					   fall_thru_label, loop_exit_label);
> +  gimple_seq_add_stmt (pre_p, gimple_cond);
> +
> +  /* <loop_exit_label>.  */
> +  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_exit_label));
> +}
>
>   /* Lower code for an OMP loop directive.  */
>
> @@ -14957,6 +15026,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>     gimple_seq omp_for_body, body, dlist;
>     gimple_seq oacc_head = NULL, oacc_tail = NULL;
>     size_t i;
> +  bool oacc_kernels_p = (is_gimple_omp_oacc (ctx->stmt)
> +			 && ctx_in_oacc_kernels_region (ctx));
>
>     push_gimplify_context ();
>
> @@ -15065,7 +15136,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>     extract_omp_for_data (stmt, &fd, NULL);
>
>     if (is_gimple_omp_oacc (ctx->stmt)
> -      && !ctx_in_oacc_kernels_region (ctx))
> +      && !oacc_kernels_p)
>       lower_oacc_head_tail (gimple_location (stmt),
>   			  gimple_omp_for_clauses (stmt),
>   			  &oacc_head, &oacc_tail, ctx);
> @@ -15088,13 +15159,18 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>   						ctx);
>   	}
>
> -  if (!gimple_omp_for_grid_phony (stmt))
> -    gimple_seq_add_stmt (&body, stmt);
> -  gimple_seq_add_seq (&body, gimple_omp_body (stmt));
> +  if (oacc_kernels_p)
> +    lower_omp_for_seq (&body, stmt, 0);
> +  else if (gimple_omp_for_grid_phony (stmt))
> +    gimple_seq_add_seq (&body, gimple_omp_body (stmt));
> +  else
> +    {
> +      gimple_seq_add_stmt (&body, stmt);
> +      gimple_seq_add_seq (&body, gimple_omp_body (stmt));
>
> -  if (!gimple_omp_for_grid_phony (stmt))
> -    gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
> -							   fd.loop.v));
> +      gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
> +							     fd.loop.v));
> +    }
>
>     /* After the loop, add exit clauses.  */
>     lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
> @@ -15106,7 +15182,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>
>     body = maybe_catch_exception (body);
>
> -  if (!gimple_omp_for_grid_phony (stmt))
> +  if (!gimple_omp_for_grid_phony (stmt)
> +      && !oacc_kernels_p)
>       {
>         /* Region exit marker goes at the end of the loop body.  */
>         gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
> diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
> index c387285..66b8b65 100644
> --- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
> +++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
> @@ -108,12 +108,12 @@ test ()
>   //    ;
>   }
>
> -// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j. private.i" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop gang" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
> -// { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j. private.i" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop gang" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop worker" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop vector" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop seq" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop auto" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 1 "gimple" } }
> +// { dg-final { scan-tree-dump-times "acc loop independent private.i" 1 "gimple" } }
>   // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
> new file mode 100644
> index 0000000..6a9f52b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
> @@ -0,0 +1,24 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +unsigned int a[1000];
> +
> +unsigned int
> +foo (int n)
> +{
> +  unsigned int sum = 0;
> +
> +#pragma acc kernels loop gang reduction(+:sum)
> +  for (int i = 0; i < n; i++)
> +    sum += a[i];
> +
> +  return sum;
> +}
> +
> +/* Check that only one loop is analyzed, and that it can be parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
> new file mode 100644
> index 0000000..d18c779
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
> @@ -0,0 +1,22 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +unsigned int
> +foo (int n)
> +{
> +  unsigned int sum = 1;
> +
> +  #pragma acc kernels loop
> +  for (int i = 1; i <= n; i++)
> +    sum += i;
> +
> +  return sum;
> +}
> +
> +/* Check that only one loop is analyzed, and that it can be parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
> new file mode 100644
> index 0000000..95354e1
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
> @@ -0,0 +1,17 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
> +#define ACC_LOOP
> +#include "kernels-loop-2.c"
> +
> +/* Check that only three loops are analyzed, and that all can be
> +   parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
> new file mode 100644
> index 0000000..1ad3067
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
> @@ -0,0 +1,14 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
> +#define ACC_LOOP
> +#include "kernels-loop-3.c"
> +
> +/* Check that only one loop is analyzed, and that it can be parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
> new file mode 100644
> index 0000000..47b8459
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
> @@ -0,0 +1,14 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
> +#define ACC_LOOP
> +#include "kernels-loop.c"
> +
> +/* Check that only one loop is analyzed, and that it can be parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
> new file mode 100644
> index 0000000..25b56d7
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
> @@ -0,0 +1,14 @@
> +/* { dg-additional-options "-O2" } */
> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
> +/* { dg-additional-options "-fdump-tree-optimized" } */
> +
> +/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
> +#define ACC_LOOP
> +#include "kernels-loop-n.c"
> +
> +/* Check that only one loop is analyzed, and that it can be parallelized.  */
> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
> +
> +/* Check that the loop has been split off into a function.  */
> +/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
> diff --git a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
> index 38a4a7d..9b2f7fa 100644
> --- a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
> @@ -10,4 +10,4 @@ f (int i, int j)
>         ;
>   }
>
> -/* { dg-final { scan-tree-dump-times "#pragma acc loop collapse\\(2\\) private\\(j\\) private\\(i\\)" 1 "gimple" } } */
> +/* { dg-final { scan-tree-dump-times "#pragma acc loop private\\(j\\) private\\(i\\)" 1 "gimple" } } */
> diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
> index 6977525..e89ddc9 100644
> --- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
> +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
> @@ -144,12 +144,12 @@ subroutine test
>   !  !$acc end kernels loop
>   end subroutine test
>
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 1 "gimple" } }
>   ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
> diff --git a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95 b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
> index 3481085..c14b7b2 100644
> --- a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
> +++ b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
> @@ -78,5 +78,5 @@ end subroutine test
>   ! { dg-final { scan-tree-dump-times "gang\\(static:2\\)" 1 "omplower" } }
>   ! { dg-final { scan-tree-dump-times "gang\\(static:5\\)" 1 "omplower" } }
>   ! { dg-final { scan-tree-dump-times "gang\\(static:20\\)" 1 "omplower" } }
> -! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 1 "omplower" } }
> -! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 1 "omplower" } }
> +! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 0 "omplower" } }
> +! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 0 "omplower" } }
> diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
> index 929fb0e..4c431c8 100644
> --- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
> +++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
> @@ -11,6 +11,7 @@ subroutine foo ()
>     !$acc end parallel loop
>     !$acc kernels loop reduction(+:a)
>     do k = 2,6
> +     a = a + 1
>     enddo
>     !$acc end kernels loop
>   end subroutine
> @@ -18,5 +19,5 @@ end subroutine
>   ! { dg-final { scan-tree-dump-times "target oacc_parallel firstprivate.a." 1 "gimple" } }
>   ! { dg-final { scan-tree-dump-times "acc loop private.p. reduction..:a." 1 "gimple" } }
>   ! { dg-final { scan-tree-dump-times "target oacc_kernels map.force_tofrom:a .len: 4.." 1 "gimple" } }
> -! { dg-final { scan-tree-dump-times "acc loop private.k. reduction..:a." 1 "gimple" } }
> +! { dg-final { scan-tree-dump-times "acc loop private.k." 1 "gimple" } }
>
>
Tom de Vries March 14, 2016, 6:20 a.m. UTC | #2
On 07/03/16 09:21, Tom de Vries wrote:
> On 29/02/16 04:26, Tom de Vries wrote:
>> On 22-02-16 11:57, Jakub Jelinek wrote:
>>> On Mon, Feb 22, 2016 at 11:54:46AM +0100, Tom de Vries wrote:
>>>> Following up on your suggestion to implement this during
>>>> gimplification, I
>>>> wrote attached patch.
>>>>
>>>> I'll put it through some openacc testing and add testcases. Is this
>>>> approach
>>>> acceptable for stage4?
>>>
>>> LGTM.
>>
>> Hi,
>>
>> I ran into trouble during testing of this patch, with ignoring the
>> private clause on the loop directive.
>>
>> This openacc testcase compiles atm without a problem:
>> ...
>> int
>> main (void)
>> {
>>    int j;
>> #pragma acc kernels default(none)
>>    {
>> #pragma acc loop private (j)
>>      for (unsigned i = 0; i < 1000; ++i)
>>        {
>>      j;
>>        }
>>    }
>> }
>> ...
>>
>> But when compiling with the patch, and ignoring the private clause, we
>> run into this error:
>> ...
>> test.c: In function ‘main’:
>> test.c:10:2: error: ‘j’ not specified in enclosing OpenACC ‘kernels’
>> construct
>>    j;
>>    ^
>> test.c:5:9: note: enclosing OpenACC ‘kernels’ construct
>>   #pragma acc kernels default(none)
>> ...
>>
>> So I updated the patch to ignore all but the private clause on the loop
>> directive during gimplification, and moved the sequential expansion of
>> the omp-for construct from gimplify to omp-lower.
>>
>> Bootstrapped and reg-tested on x86_64.
>>
>> Build for nvidia accelerator and reg-tested goacc.exp and libgomp
>> testsuite.
>>
>> Updated patch still ok for stage4?
>>

Ping. ( Submitted here:
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01903.html )

Thanks,
- Tom

>> 0001-Ignore-acc-loop-directive-in-kernels-region.patch
>>
>>
>> Ignore acc loop directive in kernels region
>>
>> 2016-02-29  Tom de Vries  <tom@codesourcery.com>
>>
>>     * gimplify.c (gimplify_ctx_in_oacc_kernels_region): New function.
>>     (gimplify_omp_for): Ignore all but private clause on loop
>> directive in
>>     kernels region.
>>     * omp-low.c (lower_omp_for_seq): New function.
>>     (lower_omp_for): Use lower_omp_for_seq in kernels region.  Don't
>>     generate omp continue/return.
>>
>>     * c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
>>     * c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Same.
>>     * c-c++-common/goacc/kernels-loop-2-acc-loop.c: Same.
>>     * c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
>>     * c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
>>     * c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
>>     * c-c++-common/goacc/combined-directives.c: Update test.
>>     * c-c++-common/goacc/loop-private-1.c: Same.
>>     * gfortran.dg/goacc/combined-directives.f90: Same.
>>     * gfortran.dg/goacc/gang-static.f95: Same.
>>     * gfortran.dg/goacc/reduction-2.f95: Same.
>>
>> ---
>>   gcc/gimplify.c                                     | 41 ++++++++++
>>   gcc/omp-low.c                                      | 93
>> ++++++++++++++++++++--
>>   .../c-c++-common/goacc/combined-directives.c       | 16 ++--
>>   .../goacc/kernels-acc-loop-reduction.c             | 24 ++++++
>>   .../goacc/kernels-acc-loop-smaller-equal.c         | 22 +++++
>>   .../c-c++-common/goacc/kernels-loop-2-acc-loop.c   | 17 ++++
>>   .../c-c++-common/goacc/kernels-loop-3-acc-loop.c   | 14 ++++
>>   .../c-c++-common/goacc/kernels-loop-acc-loop.c     | 14 ++++
>>   .../c-c++-common/goacc/kernels-loop-n-acc-loop.c   | 14 ++++
>>   gcc/testsuite/c-c++-common/goacc/loop-private-1.c  |  2 +-
>>   .../gfortran.dg/goacc/combined-directives.f90      | 16 ++--
>>   gcc/testsuite/gfortran.dg/goacc/gang-static.f95    |  4 +-
>>   gcc/testsuite/gfortran.dg/goacc/reduction-2.f95    |  3 +-
>>   13 files changed, 252 insertions(+), 28 deletions(-)
>>
>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index 7be6bd7..4b82305 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -8364,6 +8364,20 @@ find_combined_omp_for (tree *tp, int
>> *walk_subtrees, void *)
>>     return NULL_TREE;
>>   }
>>
>> +/* Return true if CTX is (part of) an oacc kernels region.  */
>> +
>> +static bool
>> +gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctx *ctx)
>> +{
>> +  for (;ctx != NULL; ctx = ctx->outer_context)
>> +    {
>> +      if (ctx->region_type == ORT_ACC_KERNELS)
>> +    return true;
>> +    }
>> +
>> +  return false;
>> +}
>> +
>>   /* Gimplify the gross structure of an OMP_FOR statement.  */
>>
>>   static enum gimplify_status
>> @@ -8403,6 +8417,33 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
>>         gcc_unreachable ();
>>       }
>>
>> +  /* Skip loop clauses not handled in kernels region.  */
>> +  if (gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctxp))
>> +    {
>> +      tree *prev_ptr = &OMP_FOR_CLAUSES (for_stmt);
>> +
>> +      while (tree probe = *prev_ptr)
>> +    {
>> +      tree *next_ptr = &OMP_CLAUSE_CHAIN (probe);
>> +
>> +      bool keep_clause;
>> +      switch (OMP_CLAUSE_CODE (probe))
>> +        {
>> +        case OMP_CLAUSE_PRIVATE:
>> +          keep_clause = true;
>> +          break;
>> +        default:
>> +          keep_clause = false;
>> +          break;
>> +        }
>> +
>> +      if (keep_clause)
>> +        prev_ptr = next_ptr;
>> +      else
>> +        *prev_ptr = *next_ptr;
>> +    }
>> +    }
>> +
>>     /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
>>        clause for the IV.  */
>>     if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
>> == 1)
>> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
>> index fcbb3e0..bb70ac2 100644
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -14944,6 +14944,75 @@ lower_omp_for_lastprivate (struct
>> omp_for_data *fd, gimple_seq *body_p,
>>       }
>>   }
>>
>> +/* Lower the loops with index I and higher in omp_for FOR_STMT as a
>> sequential
>> +   loop, and append the resulting gimple statements to PRE_P.  */
>> +
>> +static void
>> +lower_omp_for_seq (gimple_seq *pre_p, gimple *for_stmt, unsigned int i)
>> +{
>> +  unsigned int len = gimple_omp_for_collapse (for_stmt);
>> +  gcc_assert (i < len);
>> +
>> +  /* Gimplify OMP_FOR[i] as:
>> +
>> +     OMP_FOR_INIT[i];
>> +     goto <loop_entry_label>;
>> +     <fall_thru_label>:
>> +     if (i == len - 1)
>> +       OMP_FOR_BODY;
>> +     else
>> +       OMP_FOR[i+1];
>> +    OMP_FOR_INCR[i];
>> +    <loop_entry_label>:
>> +    if (OMP_FOR_COND[i])
>> +      goto <fall_thru_label>;
>> +    else
>> +      goto <loop_exit_label>;
>> +    <loop_exit_label>:
>> +  */
>> +
>> +  tree loop_entry_label = create_artificial_label (UNKNOWN_LOCATION);
>> +  tree fall_thru_label = create_artificial_label (UNKNOWN_LOCATION);
>> +  tree loop_exit_label = create_artificial_label (UNKNOWN_LOCATION);
>> +
>> +  /* OMP_FOR_INIT[i].  */
>> +  tree init = gimple_omp_for_initial (for_stmt, i);
>> +  tree var = gimple_omp_for_index (for_stmt, i);
>> +  gimple *g = gimple_build_assign (var, init);
>> +  gimple_seq_add_stmt (pre_p, g);
>> +
>> +  /* goto <loop_entry_label>.  */
>> +  gimple_seq_add_stmt (pre_p, gimple_build_goto (loop_entry_label));
>> +
>> +  /* <fall_thru_label>.  */
>> +  gimple_seq_add_stmt (pre_p, gimple_build_label (fall_thru_label));
>> +
>> +  /* if (i == len - 1) OMP_FOR_BODY
>> +     else OMP_FOR[i+1].  */
>> +  if (i == len - 1)
>> +    gimple_seq_add_seq (pre_p, gimple_omp_body (for_stmt));
>> +  else
>> +    lower_omp_for_seq (pre_p, for_stmt, i + 1);
>> +
>> +  /* OMP_FOR_INCR[i].  */
>> +  tree incr = gimple_omp_for_incr (for_stmt, i);
>> +  g = gimple_build_assign (var, incr);
>> +  gimple_seq_add_stmt (pre_p, g);
>> +
>> +  /* <loop_entry_label>.  */
>> +  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_entry_label));
>> +
>> +  /* if (OMP_FOR_COND[i]) goto <fall_thru_label>
>> +     else goto <loop_exit_label>.  */
>> +  enum tree_code cond = gimple_omp_for_cond (for_stmt, i);
>> +  tree final_val = gimple_omp_for_final (for_stmt, i);
>> +  gimple *gimple_cond = gimple_build_cond (cond, var, final_val,
>> +                       fall_thru_label, loop_exit_label);
>> +  gimple_seq_add_stmt (pre_p, gimple_cond);
>> +
>> +  /* <loop_exit_label>.  */
>> +  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_exit_label));
>> +}
>>
>>   /* Lower code for an OMP loop directive.  */
>>
>> @@ -14957,6 +15026,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p,
>> omp_context *ctx)
>>     gimple_seq omp_for_body, body, dlist;
>>     gimple_seq oacc_head = NULL, oacc_tail = NULL;
>>     size_t i;
>> +  bool oacc_kernels_p = (is_gimple_omp_oacc (ctx->stmt)
>> +             && ctx_in_oacc_kernels_region (ctx));
>>
>>     push_gimplify_context ();
>>
>> @@ -15065,7 +15136,7 @@ lower_omp_for (gimple_stmt_iterator *gsi_p,
>> omp_context *ctx)
>>     extract_omp_for_data (stmt, &fd, NULL);
>>
>>     if (is_gimple_omp_oacc (ctx->stmt)
>> -      && !ctx_in_oacc_kernels_region (ctx))
>> +      && !oacc_kernels_p)
>>       lower_oacc_head_tail (gimple_location (stmt),
>>                 gimple_omp_for_clauses (stmt),
>>                 &oacc_head, &oacc_tail, ctx);
>> @@ -15088,13 +15159,18 @@ lower_omp_for (gimple_stmt_iterator *gsi_p,
>> omp_context *ctx)
>>                           ctx);
>>       }
>>
>> -  if (!gimple_omp_for_grid_phony (stmt))
>> -    gimple_seq_add_stmt (&body, stmt);
>> -  gimple_seq_add_seq (&body, gimple_omp_body (stmt));
>> +  if (oacc_kernels_p)
>> +    lower_omp_for_seq (&body, stmt, 0);
>> +  else if (gimple_omp_for_grid_phony (stmt))
>> +    gimple_seq_add_seq (&body, gimple_omp_body (stmt));
>> +  else
>> +    {
>> +      gimple_seq_add_stmt (&body, stmt);
>> +      gimple_seq_add_seq (&body, gimple_omp_body (stmt));
>>
>> -  if (!gimple_omp_for_grid_phony (stmt))
>> -    gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
>> -                               fd.loop.v));
>> +      gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
>> +                                 fd.loop.v));
>> +    }
>>
>>     /* After the loop, add exit clauses.  */
>>     lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
>> @@ -15106,7 +15182,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p,
>> omp_context *ctx)
>>
>>     body = maybe_catch_exception (body);
>>
>> -  if (!gimple_omp_for_grid_phony (stmt))
>> +  if (!gimple_omp_for_grid_phony (stmt)
>> +      && !oacc_kernels_p)
>>       {
>>         /* Region exit marker goes at the end of the loop body.  */
>>         gimple_seq_add_stmt (&body, gimple_build_omp_return
>> (fd.have_nowait));
>> diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
>> b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
>> index c387285..66b8b65 100644
>> --- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
>> @@ -108,12 +108,12 @@ test ()
>>   //    ;
>>   }
>>
>> -// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j.
>> private.i" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop gang" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
>> -// { dg-final { scan-tree-dump-times "acc loop independent private.i"
>> 2 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j.
>> private.i" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop gang" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop worker" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop vector" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop seq" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop auto" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 1 "gimple" } }
>> +// { dg-final { scan-tree-dump-times "acc loop independent private.i"
>> 1 "gimple" } }
>>   // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
>> diff --git
>> a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
>> new file mode 100644
>> index 0000000..6a9f52b
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
>> @@ -0,0 +1,24 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +unsigned int a[1000];
>> +
>> +unsigned int
>> +foo (int n)
>> +{
>> +  unsigned int sum = 0;
>> +
>> +#pragma acc kernels loop gang reduction(+:sum)
>> +  for (int i = 0; i < n; i++)
>> +    sum += a[i];
>> +
>> +  return sum;
>> +}
>> +
>> +/* Check that only one loop is analyzed, and that it can be
>> parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
>> diff --git
>> a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
>> new file mode 100644
>> index 0000000..d18c779
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
>> @@ -0,0 +1,22 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +unsigned int
>> +foo (int n)
>> +{
>> +  unsigned int sum = 1;
>> +
>> +  #pragma acc kernels loop
>> +  for (int i = 1; i <= n; i++)
>> +    sum += i;
>> +
>> +  return sum;
>> +}
>> +
>> +/* Check that only one loop is analyzed, and that it can be
>> parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
>> diff --git
>> a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
>> new file mode 100644
>> index 0000000..95354e1
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +/* Check that loops with '#pragma acc loop' tagged gets properly
>> parallelized.  */
>> +#define ACC_LOOP
>> +#include "kernels-loop-2.c"
>> +
>> +/* Check that only three loops are analyzed, and that all can be
>> +   parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*main._omp_fn.0" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*main._omp_fn.1" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*main._omp_fn.2" 1 "optimized" } } */
>> diff --git
>> a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
>> new file mode 100644
>> index 0000000..1ad3067
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +/* Check that loops with '#pragma acc loop' tagged gets properly
>> parallelized.  */
>> +#define ACC_LOOP
>> +#include "kernels-loop-3.c"
>> +
>> +/* Check that only one loop is analyzed, and that it can be
>> parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*main._omp_fn.0" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
>> new file mode 100644
>> index 0000000..47b8459
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +/* Check that loops with '#pragma acc loop' tagged gets properly
>> parallelized.  */
>> +#define ACC_LOOP
>> +#include "kernels-loop.c"
>> +
>> +/* Check that only one loop is analyzed, and that it can be
>> parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*main._omp_fn.0" 1 "optimized" } } */
>> diff --git
>> a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
>> b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
>> new file mode 100644
>> index 0000000..25b56d7
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-additional-options "-O2" } */
>> +/* { dg-additional-options "-fdump-tree-parloops1-all" } */
>> +/* { dg-additional-options "-fdump-tree-optimized" } */
>> +
>> +/* Check that loops with '#pragma acc loop' tagged gets properly
>> parallelized.  */
>> +#define ACC_LOOP
>> +#include "kernels-loop-n.c"
>> +
>> +/* Check that only one loop is analyzed, and that it can be
>> parallelized.  */
>> +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1
>> "parloops1" } } */
>> +/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
>> +
>> +/* Check that the loop has been split off into a function.  */
>> +/* { dg-final { scan-tree-dump-times "(?n);; Function
>> .*foo.*._omp_fn.0" 1 "optimized" } } */
>> diff --git a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
>> b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
>> index 38a4a7d..9b2f7fa 100644
>> --- a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
>> @@ -10,4 +10,4 @@ f (int i, int j)
>>         ;
>>   }
>>
>> -/* { dg-final { scan-tree-dump-times "#pragma acc loop
>> collapse\\(2\\) private\\(j\\) private\\(i\\)" 1 "gimple" } } */
>> +/* { dg-final { scan-tree-dump-times "#pragma acc loop private\\(j\\)
>> private\\(i\\)" 1 "gimple" } } */
>> diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
>> b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
>> index 6977525..e89ddc9 100644
>> --- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
>> +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
>> @@ -144,12 +144,12 @@ subroutine test
>>   !  !$acc end kernels loop
>>   end subroutine test
>>
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> collapse.2." 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2
>> "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> worker" 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> vector" 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> seq" 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> auto" 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> tile.2, 3" 2 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.i. independent"
>> 2 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> collapse.2." 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 1
>> "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> worker" 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> vector" 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> seq" 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> auto" 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. private.j.
>> tile.2, 3" 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.i. independent"
>> 1 "gimple" } }
>>   ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
>> diff --git a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
>> b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
>> index 3481085..c14b7b2 100644
>> --- a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
>> +++ b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
>> @@ -78,5 +78,5 @@ end subroutine test
>>   ! { dg-final { scan-tree-dump-times "gang\\(static:2\\)" 1
>> "omplower" } }
>>   ! { dg-final { scan-tree-dump-times "gang\\(static:5\\)" 1
>> "omplower" } }
>>   ! { dg-final { scan-tree-dump-times "gang\\(static:20\\)" 1
>> "omplower" } }
>> -! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 1
>> "omplower" } }
>> -! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 1
>> "omplower" } }
>> +! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 0
>> "omplower" } }
>> +! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 0
>> "omplower" } }
>> diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
>> b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
>> index 929fb0e..4c431c8 100644
>> --- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
>> +++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
>> @@ -11,6 +11,7 @@ subroutine foo ()
>>     !$acc end parallel loop
>>     !$acc kernels loop reduction(+:a)
>>     do k = 2,6
>> +     a = a + 1
>>     enddo
>>     !$acc end kernels loop
>>   end subroutine
>> @@ -18,5 +19,5 @@ end subroutine
>>   ! { dg-final { scan-tree-dump-times "target oacc_parallel
>> firstprivate.a." 1 "gimple" } }
>>   ! { dg-final { scan-tree-dump-times "acc loop private.p.
>> reduction..:a." 1 "gimple" } }
>>   ! { dg-final { scan-tree-dump-times "target oacc_kernels
>> map.force_tofrom:a .len: 4.." 1 "gimple" } }
>> -! { dg-final { scan-tree-dump-times "acc loop private.k.
>> reduction..:a." 1 "gimple" } }
>> +! { dg-final { scan-tree-dump-times "acc loop private.k." 1 "gimple" } }
>>
>>
>
diff mbox

Patch

Ignore acc loop directive in kernels region

2016-02-29  Tom de Vries  <tom@codesourcery.com>

	* gimplify.c (gimplify_ctx_in_oacc_kernels_region): New function.
	(gimplify_omp_for): Ignore all but private clause on loop directive in
	kernels region.
	* omp-low.c (lower_omp_for_seq): New function.
	(lower_omp_for): Use lower_omp_for_seq in kernels region.  Don't
	generate omp continue/return.

	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Same.
	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
	* c-c++-common/goacc/combined-directives.c: Update test.
	* c-c++-common/goacc/loop-private-1.c: Same.
	* gfortran.dg/goacc/combined-directives.f90: Same.
	* gfortran.dg/goacc/gang-static.f95: Same.
	* gfortran.dg/goacc/reduction-2.f95: Same.

---
 gcc/gimplify.c                                     | 41 ++++++++++
 gcc/omp-low.c                                      | 93 ++++++++++++++++++++--
 .../c-c++-common/goacc/combined-directives.c       | 16 ++--
 .../goacc/kernels-acc-loop-reduction.c             | 24 ++++++
 .../goacc/kernels-acc-loop-smaller-equal.c         | 22 +++++
 .../c-c++-common/goacc/kernels-loop-2-acc-loop.c   | 17 ++++
 .../c-c++-common/goacc/kernels-loop-3-acc-loop.c   | 14 ++++
 .../c-c++-common/goacc/kernels-loop-acc-loop.c     | 14 ++++
 .../c-c++-common/goacc/kernels-loop-n-acc-loop.c   | 14 ++++
 gcc/testsuite/c-c++-common/goacc/loop-private-1.c  |  2 +-
 .../gfortran.dg/goacc/combined-directives.f90      | 16 ++--
 gcc/testsuite/gfortran.dg/goacc/gang-static.f95    |  4 +-
 gcc/testsuite/gfortran.dg/goacc/reduction-2.f95    |  3 +-
 13 files changed, 252 insertions(+), 28 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7be6bd7..4b82305 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8364,6 +8364,20 @@  find_combined_omp_for (tree *tp, int *walk_subtrees, void *)
   return NULL_TREE;
 }
 
+/* Return true if CTX is (part of) an oacc kernels region.  */
+
+static bool
+gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctx *ctx)
+{
+  for (;ctx != NULL; ctx = ctx->outer_context)
+    {
+      if (ctx->region_type == ORT_ACC_KERNELS)
+	return true;
+    }
+
+  return false;
+}
+
 /* Gimplify the gross structure of an OMP_FOR statement.  */
 
 static enum gimplify_status
@@ -8403,6 +8417,33 @@  gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
       gcc_unreachable ();
     }
 
+  /* Skip loop clauses not handled in kernels region.  */
+  if (gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctxp))
+    {
+      tree *prev_ptr = &OMP_FOR_CLAUSES (for_stmt);
+
+      while (tree probe = *prev_ptr)
+	{
+	  tree *next_ptr = &OMP_CLAUSE_CHAIN (probe);
+
+	  bool keep_clause;
+	  switch (OMP_CLAUSE_CODE (probe))
+	    {
+	    case OMP_CLAUSE_PRIVATE:
+	      keep_clause = true;
+	      break;
+	    default:
+	      keep_clause = false;
+	      break;
+	    }
+
+	  if (keep_clause)
+	    prev_ptr = next_ptr;
+	  else
+	    *prev_ptr = *next_ptr;
+	}
+    }
+
   /* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
      clause for the IV.  */
   if (ort == ORT_SIMD && TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == 1)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fcbb3e0..bb70ac2 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -14944,6 +14944,75 @@  lower_omp_for_lastprivate (struct omp_for_data *fd, gimple_seq *body_p,
     }
 }
 
+/* Lower the loops with index I and higher in omp_for FOR_STMT as a sequential
+   loop, and append the resulting gimple statements to PRE_P.  */
+
+static void
+lower_omp_for_seq (gimple_seq *pre_p, gimple *for_stmt, unsigned int i)
+{
+  unsigned int len = gimple_omp_for_collapse (for_stmt);
+  gcc_assert (i < len);
+
+  /* Gimplify OMP_FOR[i] as:
+
+     OMP_FOR_INIT[i];
+     goto <loop_entry_label>;
+     <fall_thru_label>:
+     if (i == len - 1)
+       OMP_FOR_BODY;
+     else
+       OMP_FOR[i+1];
+    OMP_FOR_INCR[i];
+    <loop_entry_label>:
+    if (OMP_FOR_COND[i])
+      goto <fall_thru_label>;
+    else
+      goto <loop_exit_label>;
+    <loop_exit_label>:
+  */
+
+  tree loop_entry_label = create_artificial_label (UNKNOWN_LOCATION);
+  tree fall_thru_label = create_artificial_label (UNKNOWN_LOCATION);
+  tree loop_exit_label = create_artificial_label (UNKNOWN_LOCATION);
+
+  /* OMP_FOR_INIT[i].  */
+  tree init = gimple_omp_for_initial (for_stmt, i);
+  tree var = gimple_omp_for_index (for_stmt, i);
+  gimple *g = gimple_build_assign (var, init);
+  gimple_seq_add_stmt (pre_p, g);
+
+  /* goto <loop_entry_label>.  */
+  gimple_seq_add_stmt (pre_p, gimple_build_goto (loop_entry_label));
+
+  /* <fall_thru_label>.  */
+  gimple_seq_add_stmt (pre_p, gimple_build_label (fall_thru_label));
+
+  /* if (i == len - 1) OMP_FOR_BODY
+     else OMP_FOR[i+1].  */
+  if (i == len - 1)
+    gimple_seq_add_seq (pre_p, gimple_omp_body (for_stmt));
+  else
+    lower_omp_for_seq (pre_p, for_stmt, i + 1);
+
+  /* OMP_FOR_INCR[i].  */
+  tree incr = gimple_omp_for_incr (for_stmt, i);
+  g = gimple_build_assign (var, incr);
+  gimple_seq_add_stmt (pre_p, g);
+
+  /* <loop_entry_label>.  */
+  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_entry_label));
+
+  /* if (OMP_FOR_COND[i]) goto <fall_thru_label>
+     else goto <loop_exit_label>.  */
+  enum tree_code cond = gimple_omp_for_cond (for_stmt, i);
+  tree final_val = gimple_omp_for_final (for_stmt, i);
+  gimple *gimple_cond = gimple_build_cond (cond, var, final_val,
+					   fall_thru_label, loop_exit_label);
+  gimple_seq_add_stmt (pre_p, gimple_cond);
+
+  /* <loop_exit_label>.  */
+  gimple_seq_add_stmt (pre_p, gimple_build_label (loop_exit_label));
+}
 
 /* Lower code for an OMP loop directive.  */
 
@@ -14957,6 +15026,8 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   gimple_seq omp_for_body, body, dlist;
   gimple_seq oacc_head = NULL, oacc_tail = NULL;
   size_t i;
+  bool oacc_kernels_p = (is_gimple_omp_oacc (ctx->stmt)
+			 && ctx_in_oacc_kernels_region (ctx));
 
   push_gimplify_context ();
 
@@ -15065,7 +15136,7 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   extract_omp_for_data (stmt, &fd, NULL);
 
   if (is_gimple_omp_oacc (ctx->stmt)
-      && !ctx_in_oacc_kernels_region (ctx))
+      && !oacc_kernels_p)
     lower_oacc_head_tail (gimple_location (stmt),
 			  gimple_omp_for_clauses (stmt),
 			  &oacc_head, &oacc_tail, ctx);
@@ -15088,13 +15159,18 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						ctx);
 	}
 
-  if (!gimple_omp_for_grid_phony (stmt))
-    gimple_seq_add_stmt (&body, stmt);
-  gimple_seq_add_seq (&body, gimple_omp_body (stmt));
+  if (oacc_kernels_p)
+    lower_omp_for_seq (&body, stmt, 0);
+  else if (gimple_omp_for_grid_phony (stmt))
+    gimple_seq_add_seq (&body, gimple_omp_body (stmt));
+  else
+    {
+      gimple_seq_add_stmt (&body, stmt);
+      gimple_seq_add_seq (&body, gimple_omp_body (stmt));
 
-  if (!gimple_omp_for_grid_phony (stmt))
-    gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
-							   fd.loop.v));
+      gimple_seq_add_stmt (&body, gimple_build_omp_continue (fd.loop.v,
+							     fd.loop.v));
+    }
 
   /* After the loop, add exit clauses.  */
   lower_reduction_clauses (gimple_omp_for_clauses (stmt), &body, ctx);
@@ -15106,7 +15182,8 @@  lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 
   body = maybe_catch_exception (body);
 
-  if (!gimple_omp_for_grid_phony (stmt))
+  if (!gimple_omp_for_grid_phony (stmt)
+      && !oacc_kernels_p)
     {
       /* Region exit marker goes at the end of the loop body.  */
       gimple_seq_add_stmt (&body, gimple_build_omp_return (fd.have_nowait));
diff --git a/gcc/testsuite/c-c++-common/goacc/combined-directives.c b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c387285..66b8b65 100644
--- a/gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ b/gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -108,12 +108,12 @@  test ()
 //    ;
 }
 
-// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j. private.i" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop gang" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop worker" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop vector" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop seq" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop auto" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 2 "gimple" } }
-// { dg-final { scan-tree-dump-times "acc loop independent private.i" 2 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop collapse.2. private.j. private.i" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop gang" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop worker" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop vector" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop seq" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop auto" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop tile.2, 3" 1 "gimple" } }
+// { dg-final { scan-tree-dump-times "acc loop independent private.i" 1 "gimple" } }
 // { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
new file mode 100644
index 0000000..6a9f52b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
@@ -0,0 +1,24 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int a[1000];
+
+unsigned int
+foo (int n)
+{
+  unsigned int sum = 0;
+
+#pragma acc kernels loop gang reduction(+:sum)
+  for (int i = 0; i < n; i++)
+    sum += a[i];
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
new file mode 100644
index 0000000..d18c779
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
@@ -0,0 +1,22 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int
+foo (int n)
+{
+  unsigned int sum = 1;
+
+  #pragma acc kernels loop
+  for (int i = 1; i <= n; i++)
+    sum += i;
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
new file mode 100644
index 0000000..95354e1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
@@ -0,0 +1,17 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP
+#include "kernels-loop-2.c"
+
+/* Check that only three loops are analyzed, and that all can be
+   parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.1" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.2" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
new file mode 100644
index 0000000..1ad3067
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
@@ -0,0 +1,14 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP
+#include "kernels-loop-3.c"
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
new file mode 100644
index 0000000..47b8459
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
@@ -0,0 +1,14 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP
+#include "kernels-loop.c"
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*main._omp_fn.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
new file mode 100644
index 0000000..25b56d7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
@@ -0,0 +1,14 @@ 
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-fdump-tree-parloops1-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+/* Check that loops with '#pragma acc loop' tagged gets properly parallelized.  */
+#define ACC_LOOP
+#include "kernels-loop-n.c"
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*._omp_fn.0" 1 "optimized" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
index 38a4a7d..9b2f7fa 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-private-1.c
@@ -10,4 +10,4 @@  f (int i, int j)
       ;
 }
 
-/* { dg-final { scan-tree-dump-times "#pragma acc loop collapse\\(2\\) private\\(j\\) private\\(i\\)" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "#pragma acc loop private\\(j\\) private\\(i\\)" 1 "gimple" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 6977525..e89ddc9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -144,12 +144,12 @@  subroutine test
 !  !$acc end kernels loop
 end subroutine test
 
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. worker" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. vector" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. seq" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. auto" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95 b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
index 3481085..c14b7b2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/gang-static.f95
@@ -78,5 +78,5 @@  end subroutine test
 ! { dg-final { scan-tree-dump-times "gang\\(static:2\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "gang\\(static:5\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "gang\\(static:20\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "gang\\(num: 5 static:\\\*\\)" 0 "omplower" } }
+! { dg-final { scan-tree-dump-times "gang\\(num: 30 static:20\\)" 0 "omplower" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95 b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
index 929fb0e..4c431c8 100644
--- a/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/reduction-2.f95
@@ -11,6 +11,7 @@  subroutine foo ()
   !$acc end parallel loop
   !$acc kernels loop reduction(+:a)
   do k = 2,6
+     a = a + 1
   enddo
   !$acc end kernels loop
 end subroutine
@@ -18,5 +19,5 @@  end subroutine
 ! { dg-final { scan-tree-dump-times "target oacc_parallel firstprivate.a." 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.p. reduction..:a." 1 "gimple" } }
 ! { dg-final { scan-tree-dump-times "target oacc_kernels map.force_tofrom:a .len: 4.." 1 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loop private.k. reduction..:a." 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.k." 1 "gimple" } }