Message ID | 4EA1AE61.6000003@netcologne.de |
---|---|
State | New |
Headers | show |
Ping ** 0.571428 > Jakub Jelinek wrote: >> Though, what could be done is just special case OpenMP workshare regions, >> insert everything into BLOCK local vars unless in OpenMP workshare, in >> that >> case put the BLOCK with the temporary around the workshare rather than >> inside of it. In the case of omp parallel workshare it would need >> to go in between omp parallel and omp workshare. > > Well, here's a patch which implements this concept. I chose to insert > the BLOCK in a separate pass because it was the cleanest way to avoid > infinite recursion when inserting a block. > > Regression-tested. OK for trunk? > > Thomas > > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/50690 > * frontend-passes.c (workshare_level): New variable. > (create_var): Put the newly created variable into the block > around the WORKSHARE. > (enclose_workshare): New callback function to enclose > WORKSHAREs in blocks. > (optimize_namespace): Use it. > (gfc_code_walker): Save/restore current namespace when > following a BLOCK. Keep track of workshare level. > > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/50690 > * gfortran.dg/gomp/workshare2.f90: New test. > >
On Tuesday 25 October 2011 07:39:44 Thomas Koenig wrote: > Ping ** 0.571428 > Let's keep Jakub CC-ed for mixes of OpenMP and frontend optimizations. ;-) There are two commented lines in the testcase. Is it expected? Otherwise doesn't look too bad... Mikael > > Jakub Jelinek wrote: > >> Though, what could be done is just special case OpenMP workshare > >> regions, insert everything into BLOCK local vars unless in OpenMP > >> workshare, in that > >> case put the BLOCK with the temporary around the workshare rather than > >> inside of it. In the case of omp parallel workshare it would need > >> to go in between omp parallel and omp workshare. > > > > Well, here's a patch which implements this concept. I chose to insert > > the BLOCK in a separate pass because it was the cleanest way to avoid > > infinite recursion when inserting a block. > > > > Regression-tested. OK for trunk? > > > > Thomas > > > > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org> > > > > PR fortran/50690 > > * frontend-passes.c (workshare_level): New variable. > > (create_var): Put the newly created variable into the block > > around the WORKSHARE. > > (enclose_workshare): New callback function to enclose > > WORKSHAREs in blocks. > > (optimize_namespace): Use it. > > (gfc_code_walker): Save/restore current namespace when > > following a BLOCK. Keep track of workshare level. > > > > 2011-10-21 Thomas Koenig <tkoenig@gcc.gnu.org> > > > > PR fortran/50690 > > * gfortran.dg/gomp/workshare2.f90: New test.
Mikael Morin wrote: > Let's keep Jakub CC-ed for mixes of OpenMP and frontend optimizations. ;-) > > There are two commented lines in the testcase. Is it expected? > Otherwise doesn't look too bad... I had also a glance at the patch - and it looks reasonable; in particular, I failed to generate a failing test case. Besides the !$omp parallel !$omp workshare one should also add a test for - it can also be in the same file -: !$omp parallel workshare Regarding the test case: You cannot place a "dg-do run" OpenMP test case into gcc/testsuite/gfortran.dg/gomp/workshare2.f90; you have to use libgomp/testsuite/libgomp.fortran. The former directory only allows "dg-do compile" test cases. (By the way, running "make check" for libgomp does never harm.) Jakub, what do you think? Tobias >>> Jakub Jelinek wrote: >>>> Though, what could be done is just special case OpenMP workshare >>>> regions, insert everything into BLOCK local vars unless in OpenMP >>>> workshare, in that >>>> case put the BLOCK with the temporary around the workshare rather than >>>> inside of it. In the case of omp parallel workshare it would need >>>> to go in between omp parallel and omp workshare. >>> Well, here's a patch which implements this concept. I chose to insert >>> the BLOCK in a separate pass because it was the cleanest way to avoid >>> infinite recursion when inserting a block. >>> >>> Regression-tested. OK for trunk? >>> >>> Thomas >>> >>> 2011-10-21 Thomas Koenig<tkoenig@gcc.gnu.org> >>> >>> PR fortran/50690 >>> * frontend-passes.c (workshare_level): New variable. >>> (create_var): Put the newly created variable into the block >>> around the WORKSHARE. >>> (enclose_workshare): New callback function to enclose >>> WORKSHAREs in blocks. >>> (optimize_namespace): Use it. >>> (gfc_code_walker): Save/restore current namespace when >>> following a BLOCK. Keep track of workshare level. >>> >>> 2011-10-21 Thomas Koenig<tkoenig@gcc.gnu.org> >>> >>> PR fortran/50690 >>> * gfortran.dg/gomp/workshare2.f90: New test.
Tobias Burnus wrote: > I had also a glance at the patch - and it looks reasonable; in > particular, I failed to generate a failing test case. Actually, the test case is *not* OK. If one compiles the original test case of the PR (or your workshare2.f90) with "-O" and looks at "-fdump-tree-original", one finds: #pragma omp parallel default(shared) { { real(kind=4) __var_1; { #pragma omp single { __var_1 = __builtin_cosf (b[0]) } ... #pragma omp for schedule(static) nowait for (S.1 = 1; S.1 <= 5; S.1 = S.1 + 1) { a[S.1 + -1] = a[S.1 + -1] * D.1730 + a[S.1 + -1] * D.1731; Thus, __var_1 is a thread-local variable; however, COS() is not executed in all threads but only in one due to the omp single: "The single construct specifies that the associated structured block is executed by only one of the threads in the team" (2.5.3 single Construct, OpenMP 3.1). Jakub remarks that omp single is what we expand to omp workshare if it is not simple enough for us. * * * With the test case below, the dump looks OK, but the FE optimization does not combine the two cos() calls - I have no idea why. The dump looks as: #pragma omp parallel default(shared) { D.1743 = __builtin_cosf (b[0]); D.1745 = __builtin_cosf (b[0]); ... #pragma omp for schedule(static) nowait for (S.2 = 1; S.2 <= 10; S.2 = S.2 + 1) a[S.2 + D.1750] = a[S.2 + D.1748] * D.1743 + a[S.2 + D.1749] * D.1745; Tobias PS: The test case is: program workshare implicit none real, parameter :: eps = 3e-7 integer :: j real :: A(10,5), B(5) B(1) = 3.344 call random_number(a) !$omp parallel default(shared) !$omp workshare forall (j=1:5) A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1)) end forall !$omp end workshare !$omp end parallel print *, A end program workshare subroutine parallel_workshare implicit none real, parameter :: eps = 3e-7 integer :: j real :: A(10,5), B(5) B(1) = 3.344 call random_number(a) !$omp parallel workshare default(shared) forall (j=1:5) A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1)) end forall !$omp end parallel workshare print *, A end subroutine parallel_workshare
Index: frontend-passes.c =================================================================== --- frontend-passes.c (Revision 180063) +++ frontend-passes.c (Arbeitskopie) @@ -66,6 +66,10 @@ static gfc_namespace *current_ns; static int forall_level; +/* If we are within an OMP WORKSHARE or OMP PARALLEL WORKSHARE. */ + +static int workshare_level; + /* Entry point - run all passes for a namespace. So far, only an optimization pass is run. */ @@ -245,8 +249,16 @@ create_var (gfc_expr * e) gfc_namespace *ns; int i; + /* Special treatment for WORKSHARE: The variable goes into the block + created by the earlier pass around it. */ + + if (workshare_level > 0) + { + ns = current_ns; + changed_statement = current_code; + } /* If the block hasn't already been created, do so. */ - if (inserted_block == NULL) + else if (inserted_block == NULL) { inserted_block = XCNEW (gfc_code); inserted_block->op = EXEC_BLOCK; @@ -497,6 +509,38 @@ convert_do_while (gfc_code **c, int *walk_subtrees return 0; } +/* Callback function to enclose OMP workshares into BLOCKs. This is done + so that later front end optimization can insert temporary variables into + the outer block scope. */ + +static int +enclose_workshare (gfc_code **c, int *walk_subtrees, + void *data ATTRIBUTE_UNUSED) +{ + gfc_code *co; + gfc_code *new_block; + gfc_namespace *ns; + + co = *c; + + if (co->op != EXEC_OMP_WORKSHARE && co->op != EXEC_OMP_PARALLEL_WORKSHARE) + return 0; + + /* Create the block. */ + new_block = XCNEW (gfc_code); + new_block->op = EXEC_BLOCK; + new_block->loc = co->loc; + ns = gfc_build_block_ns (current_ns); + new_block->ext.block.ns = ns; + new_block->ext.block.assoc = NULL; + ns->code = co; + + /* Insert the BLOCK at the right position. */ + *c = new_block; + *walk_subtrees = false; + return 0; +} + /* Optimize a namespace, including all contained namespaces. */ static void @@ -507,6 +551,12 @@ optimize_namespace (gfc_namespace *ns) forall_level = 0; gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL); + if (gfc_option.gfc_flag_openmp) + { + workshare_level = 0; + gfc_code_walker (&ns->code, enclose_workshare, dummy_expr_callback, NULL); + } + gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL); gfc_code_walker (&ns->code, optimize_code, optimize_expr, NULL); @@ -1148,6 +1198,7 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code gfc_code *b; gfc_actual_arglist *a; gfc_code *co; + gfc_namespace *save_ns; gfc_association_list *alist; /* There might be statement insertions before the current code, @@ -1159,7 +1210,11 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code { case EXEC_BLOCK: + save_ns = current_ns; + current_ns = co->ext.block.ns; WALK_SUBCODE (co->ext.block.ns->code); + current_ns = save_ns; + for (alist = co->ext.block.assoc; alist; alist = alist->next) WALK_SUBEXPR (alist->target); break; @@ -1329,14 +1384,18 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code WALK_SUBEXPR (co->ext.dt->extra_comma); break; + case EXEC_OMP_PARALLEL_WORKSHARE: + case EXEC_OMP_WORKSHARE: + workshare_level ++; + + /* Fall through. */ + case EXEC_OMP_DO: case EXEC_OMP_PARALLEL: case EXEC_OMP_PARALLEL_DO: case EXEC_OMP_PARALLEL_SECTIONS: - case EXEC_OMP_PARALLEL_WORKSHARE: case EXEC_OMP_SECTIONS: case EXEC_OMP_SINGLE: - case EXEC_OMP_WORKSHARE: case EXEC_OMP_END_SINGLE: case EXEC_OMP_TASK: if (co->ext.omp_clauses) @@ -1365,6 +1424,9 @@ gfc_code_walker (gfc_code **c, walk_code_fn_t code if (co->op == EXEC_FORALL) forall_level --; + if (co->op == EXEC_OMP_WORKSHARE + || co->op == EXEC_OMP_PARALLEL_WORKSHARE) + workshare_level --; } } return 0;