Message ID | 5649C02B.9030604@mentor.com |
---|---|
State | New |
Headers | show |
On Mon, 16 Nov 2015, Tom de Vries wrote: > On 11/11/15 11:55, Richard Biener wrote: > > On Mon, 9 Nov 2015, Tom de Vries wrote: > > > > > On 09/11/15 16:35, Tom de Vries wrote: > > > > Hi, > > > > > > > > this patch series for stage1 trunk adds support to: > > > > - parallelize oacc kernels regions using parloops, and > > > > - map the loops onto the oacc gang dimension. > > > > > > > > The patch series contains these patches: > > > > > > > > 1 Insert new exit block only when needed in > > > > transform_to_exit_first_loop_alt > > > > 2 Make create_parallel_loop return void > > > > 3 Ignore reduction clause on kernels directive > > > > 4 Implement -foffload-alias > > > > 5 Add in_oacc_kernels_region in struct loop > > > > 6 Add pass_oacc_kernels > > > > 7 Add pass_dominator_oacc_kernels > > > > 8 Add pass_ch_oacc_kernels > > > > 9 Add pass_parallelize_loops_oacc_kernels > > > > 10 Add pass_oacc_kernels pass group in passes.def > > > > 11 Update testcases after adding kernels pass group > > > > 12 Handle acc loop directive > > > > 13 Add c-c++-common/goacc/kernels-*.c > > > > 14 Add gfortran.dg/goacc/kernels-*.f95 > > > > 15 Add libgomp.oacc-c-c++-common/kernels-*.c > > > > 16 Add libgomp.oacc-fortran/kernels-*.f95 > > > > > > > > The first 9 patches are more or less independent, but patches 10-16 are > > > > intended to be committed at the same time. > > > > > > > > Bootstrapped and reg-tested on x86_64. > > > > > > > > Build and reg-tested with nvidia accelerator, in combination with a > > > > patch that enables accelerator testing (which is submitted at > > > > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01771.html ). > > > > > > > > I'll post the individual patches in reply to this message. > > > > > > this patch adds and initializes the field in_oacc_kernels_region field in > > > struct loop. > > > > > > The field is used to signal to subsequent passes that we're dealing with a > > > loop in a kernels region that we're trying parallelize. > > > > > > Note that we do not parallelize kernels regions with more than one loop > > > nest. > > > [ In general, kernels regions with more than one loop nest should be split > > > up > > > into seperate kernels regions, but that's not supported atm. ] > > > > I think mark_loops_in_oacc_kernels_region can be greatly simplified. > > > > Both region entry and exit should have the same ->loop_father (a SESE > > region). Then you can just walk that loops inner (and their sibling) > > loops checking their header domination relation with the region entry > > exit (only necessary for direct inner loops). > > Updated patch to use the loops structure. Atm I'm also skipping loops > containing sibling loops, since I have no test-cases for that yet. Looks ok to me now. You want to update copy_loop_info btw. Richard. > Thanks, > - Tom > >
Add in_oacc_kernels_region in struct loop 2015-11-09 Tom de Vries <tom@codesourcery.com> * cfgloop.h (struct loop): Add in_oacc_kernels_region field. * omp-low.c (mark_loops_in_oacc_kernels_region): New function. (expand_omp_target): Call mark_loops_in_oacc_kernels_region. --- gcc/cfgloop.h | 3 +++ gcc/omp-low.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+) diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index 6af6893..ee73bf9 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop { /* True if we should try harder to vectorize this loop. */ bool force_vectorize; + /* True if the loop is part of an oacc kernels region. */ + bool in_oacc_kernels_region; + /* For SIMD loops, this is a unique identifier of the loop, referenced by IFN_GOMP_SIMD_VF, IFN_GOMP_SIMD_LANE and IFN_GOMP_SIMD_LAST_LANE builtins. */ diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 5f76434..fba7bbd 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -12450,6 +12450,46 @@ get_oacc_ifn_dim_arg (const gimple *stmt) return (int) axis; } +/* Mark the loops inside the kernels region starting at REGION_ENTRY and ending + at REGION_EXIT. */ + +static void +mark_loops_in_oacc_kernels_region (basic_block region_entry, + basic_block region_exit) +{ + struct loop *outer = region_entry->loop_father; + gcc_assert (region_exit == NULL || outer == region_exit->loop_father); + + /* Don't parallelize the kernels region if it contains more than one outer + loop. */ + unsigned int nr_outer_loops = 0; + struct loop *single_outer; + for (struct loop *loop = outer->inner; loop != NULL; loop = loop->next) + { + gcc_assert (loop_outer (loop) == outer); + + if (!dominated_by_p (CDI_DOMINATORS, loop->header, region_entry)) + continue; + + if (region_exit != NULL + && dominated_by_p (CDI_DOMINATORS, loop->header, region_exit)) + continue; + + nr_outer_loops++; + single_outer = loop; + } + if (nr_outer_loops != 1) + return; + + for (struct loop *loop = single_outer->inner; loop != NULL; loop = loop->inner) + if (loop->next) + return; + + /* Mark the loops in the region. */ + for (struct loop *loop = single_outer; loop != NULL; loop = loop->inner) + loop->in_oacc_kernels_region = true; +} + /* Expand the GIMPLE_OMP_TARGET starting at REGION. */ static void @@ -12505,6 +12545,9 @@ expand_omp_target (struct omp_region *region) entry_bb = region->entry; exit_bb = region->exit; + if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS) + mark_loops_in_oacc_kernels_region (region->entry, region->exit); + if (offloaded) { unsigned srcidx, dstidx, num;