Message ID | e491d566-c4f6-156d-18fa-83e6304aaab4@netcologne.de |
---|---|
State | New |
Headers | show |
Series | [fortran] Remove parallell annotation from DO CONCURRENT | expand |
On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > Hello world, > > the attached patch removes the parallel annotation from DO CONCURRENT. > As discussed in the PR, the autoparallellizer currently generates > wrong code. The only feasible way is to disable the annotation for > gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > regression. Can't it be turned into annot_expr_ivdep_kind instead of annot_expr_parallel_kind for GCC8? I mean, the potential local addressable variables shouldn't be a problem for vectorization, just autopar, right? Jakub
On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > > the attached patch removes the parallel annotation from DO CONCURRENT. > As discussed in the PR, the autoparallellizer currently generates > wrong code. The only feasible way is to disable the annotation for > gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > regression. > > Regression-tested. OK for trunk? > Yes.
Hi Steve, > On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: >> >> the attached patch removes the parallel annotation from DO CONCURRENT. >> As discussed in the PR, the autoparallellizer currently generates >> wrong code. The only feasible way is to disable the annotation for >> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 >> regression. >> >> Regression-tested. OK for trunk? >> > > Yes. Thanks for the review! However, just as I was about to commit, I thought of something else and, hopefully, better. The underlying problem is that the annotation does not go together with -ftree-parallelize-loops - so let's simply not set it if that flag is set. This way, we can avoid speed regressions for people who do not use -ftree-parallelize-loops, and it will be possible to downgrade the PR to a missed-optimization bug; it is also not necessary to xfail vect-do-concurrent-1.f90 Regression-tested. OK for trunk, for this version? 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for parallell processing of DO CONCURRENT -ftree-parallelize-loops is set. 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 * gfortran.dg/do_concurrent_5.f90: New test. * gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus message. Index: fortran/trans-stmt.c =================================================================== --- fortran/trans-stmt.c (Revision 259222) +++ fortran/trans-stmt.c (Arbeitskopie) @@ -3642,7 +3642,10 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr /* The exit condition. */ cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - if (forall_tmp->do_concurrent) + + /* PR 83064 means that we cannot use the annotation if the + autoparallelizer is active. */ + if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, annot_expr_parallel_kind), Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 =================================================================== --- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Revision 259222) +++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Arbeitskopie) @@ -12,4 +12,3 @@ subroutine test(n, a, b, c) end subroutine test ! { dg-message "loop vectorized" "" { target *-*-* } 0 } -! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 } ! { dg-do run } ! PR 83064 - this used to give wrong results. ! { dg-options "-O3 -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main use, intrinsic :: iso_fortran_env implicit none integer, parameter :: nsplit = 4 integer(int64), parameter :: ne = 20000000 integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i real(real64), dimension(nsplit) :: pi edof(1::4) = 1 edof(2::4) = 2 edof(3::4) = 3 edof(4::4) = 4 stride = ceiling(real(ne)/nsplit) do i = 1, nsplit high(i) = stride*i end do do i = 2, nsplit low(i) = high(i-1) + 1 end do low(1) = 1 high(nsplit) = ne pi = 0 do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort contains pure function compute( low, high ) result( ttt ) integer(int64), intent(in) :: low, high real(real64), dimension(nsplit) :: ttt integer(int64) :: j, k ttt = 0 ! Unrolled loop ! do j = low, high, 4 ! k = 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! k = 2 ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) ! k = 3 ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) ! k = 4 ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) ! end do ! Loop with modulo operation ! do j = low, high ! k = mod( j, nsplit ) + 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! end do ! Loop with subscripting via host association do j = low, high k = edof(j) ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) end do end function end program main
On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote: > > On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: > >> > >> the attached patch removes the parallel annotation from DO CONCURRENT. > >> As discussed in the PR, the autoparallellizer currently generates > >> wrong code. The only feasible way is to disable the annotation for > >> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 > >> regression. > >> > >> Regression-tested. OK for trunk? > >> > > > > Yes. > > Thanks for the review! > > However, just as I was about to commit, I thought of something else > and, hopefully, better. > > The underlying problem is that the annotation does not go together > with -ftree-parallelize-loops - so let's simply not set it if that > flag is set. > > This way, we can avoid speed regressions for people who do not use > -ftree-parallelize-loops, and it will be possible to downgrade the > PR to a missed-optimization bug; it is also not necessary to xfail > vect-do-concurrent-1.f90 > > Regression-tested. OK for trunk, for this version? In reviewing the PR audit trailing, I thought Richard indicated the problem is in the autopar stage in the middle. Removing the annotation simply prevents gfortran's DO CURRENT from a middle in bug. I'm fine with the new patch. I'll leave it up to you to decide which is preferred.
On Mon, Apr 9, 2018 at 10:58 PM, Thomas Koenig <tkoenig@netcologne.de> wrote: > Hi Steve, > >> On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote: >>> >>> >>> the attached patch removes the parallel annotation from DO CONCURRENT. >>> As discussed in the PR, the autoparallellizer currently generates >>> wrong code. The only feasible way is to disable the annotation for >>> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8 >>> regression. >>> >>> Regression-tested. OK for trunk? >>> >> >> Yes. > > > Thanks for the review! > > However, just as I was about to commit, I thought of something else > and, hopefully, better. > > The underlying problem is that the annotation does not go together > with -ftree-parallelize-loops - so let's simply not set it if that > flag is set. > > This way, we can avoid speed regressions for people who do not use > -ftree-parallelize-loops, and it will be possible to downgrade the > PR to a missed-optimization bug; it is also not necessary to xfail > vect-do-concurrent-1.f90 > > Regression-tested. OK for trunk, for this version? flag_tree_parallelize_loops is 1 by default (and doesn't parallelize then, the flag specifies the number of threads). The pass is also run if flag_openacc. Note that setting the attribute is only ever useful for autopar, nothing else looks at it. I think before I made the change there was the ivdep flag set which should be still valid (in fact lowering of parallel_kind sets both can_be_parallel and safelen). So I suggest to instead unconditonally replace parallel_kind with ivdep_kind. Richard. > > 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for > parallell processing of DO CONCURRENT -ftree-parallelize-loops > is set. > > 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > * gfortran.dg/do_concurrent_5.f90: New test. > * gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus > message.
On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote: > 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for > parallell processing of DO CONCURRENT -ftree-parallelize-loops > is set. > > 2018-04-09 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > * gfortran.dg/do_concurrent_5.f90: New test. The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests. You need to put such tests into libgomp/testsuite/libgomp.fortran/ Jakub
Hi Jakub, > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests. > You need to put such tests into libgomp/testsuite/libgomp.fortran/ I put the test case in the attached form into the libgomp.fortran directory, but it failed execution, without error message. Anything I could have done differently? ! { dg-do run } ! PR 83064 - this used to give wrong results. ! { dg-additional-options "-O3 -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main use, intrinsic :: iso_fortran_env implicit none integer, parameter :: nsplit = 4 integer(int64), parameter :: ne = 20000000 integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i real(real64), dimension(nsplit) :: pi edof(1::4) = 1 edof(2::4) = 2 edof(3::4) = 3 edof(4::4) = 4 stride = ceiling(real(ne)/nsplit) do i = 1, nsplit high(i) = stride*i end do do i = 2, nsplit low(i) = high(i-1) + 1 end do low(1) = 1 high(nsplit) = ne pi = 0 do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1 contains pure function compute( low, high ) result( ttt ) integer(int64), intent(in) :: low, high real(real64), dimension(nsplit) :: ttt integer(int64) :: j, k ttt = 0 ! Unrolled loop ! do j = low, high, 4 ! k = 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! k = 2 ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) ! k = 3 ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) ! k = 4 ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) ! end do ! Loop with modulo operation ! do j = low, high ! k = mod( j, nsplit ) + 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! end do ! Loop with subscripting via host association do j = low, high k = edof(j) ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) end do end function end program main
On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote: > Hi Jakub, > > > > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to > > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests. > > You need to put such tests into libgomp/testsuite/libgomp.fortran/ > > I put the test case in the attached form into the libgomp.fortran > directory, but it failed execution, without error message. > > Anything I could have done differently? Avoid using that much stack? ulimit -s is usually around 8MB on Linux, on other OSes it can be as low as 2MB or even less, so using 160MB edof array is way too much. Also, even if you e.g. allocated it from heap rather than stack (still for some targets it would be too much), isn't that just too expensive for the testsuite? Can you reproduce say even with ne = 200000 (with the fix reverted)? Also, are you going to do the suggested change (because can_be_parallel is something only autopar cares about, but annot_expr_parallel_kind sets like annot_expr_ivdep_kind also safelen to INTMAX): --- gcc/fortran/trans-stmt.c.jj 2018-04-10 08:52:25.467790554 +0200 +++ gcc/fortran/trans-stmt.c 2018-04-11 17:42:40.670493050 +0200 @@ -3643,12 +3643,13 @@ gfc_trans_forall_loop (forall_info *fora cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - /* PR 83064 means that we cannot use the annotation if the - autoparallelizer is active. */ - if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) + /* PR 83064 means that we cannot use the annot_expr_parallel_kind + annotation until autopar is tought to handle local variables + in loops annotated that way. */ + if (forall_tmp->do_concurrent) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_parallel_kind), + annot_expr_ivdep_kind), integer_zero_node); tmp = build1_v (GOTO_EXPR, exit_label); > ! { dg-do run } > ! PR 83064 - this used to give wrong results. > ! { dg-additional-options "-O3 -ftree-parallelize-loops=2" } > ! Original test case by Christian Felter > > program main > use, intrinsic :: iso_fortran_env > implicit none > > integer, parameter :: nsplit = 4 > integer(int64), parameter :: ne = 20000000 > integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i > real(real64), dimension(nsplit) :: pi Jakub
Am 11.04.2018 um 17:44 schrieb Jakub Jelinek: > On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote: >> Hi Jakub, >> >> >>> The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to >>> run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests. >>> You need to put such tests into libgomp/testsuite/libgomp.fortran/ >> >> I put the test case in the attached form into the libgomp.fortran >> directory, but it failed execution, without error message. >> >> Anything I could have done differently? > > Avoid using that much stack? Well, I don't think stack use is excessive :-) $ gfortran -S -Ofast do_concurrent_5.f90 $ fgrep ', %rsp' do_concurrent_5.s subq $136, %rsp addq $136, %rsp I do see your point about total memory consumption, though. Computation time of the test case I committed is around 1 s, which was also not too bad. I have attached updated patch which moves the test case to gfortran.dg/gomp (where it actually passes). Also, the patch below implements the suggestion of using annot_expr_ivdep_kind. OK for trunk? Regards Thomas 2018-04-11 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 PR testsuite/85346 * trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind for annotation and remove dependence on -ftree-parallelize-loops. 2018-04-11 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 PR testsuite/85346 * gfortran.dg/do_concurrent_5.f90: Reduce memory consumption and move test to * gfortran.dg/gomp/do_concurrent_5.f90: New location. * gfortran.dg/do_concurrent_6.f90: New test. Index: fortran/trans-stmt.c =================================================================== --- fortran/trans-stmt.c (Revision 259326) +++ fortran/trans-stmt.c (Arbeitskopie) @@ -3643,12 +3643,12 @@ cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - /* PR 83064 means that we cannot use the annotation if the - autoparallelizer is active. */ - if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) + /* PR 83064 means that we cannot use annot_expr_parallel_kind until + the autoparallelizer can hande this. */ + if (forall_tmp->do_concurrent) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_parallel_kind), + annot_expr_ivdep_kind), integer_zero_node); tmp = build1_v (GOTO_EXPR, exit_label); Index: testsuite/gfortran.dg/do_concurrent_5.f90 =================================================================== --- testsuite/gfortran.dg/do_concurrent_5.f90 (Revision 259258) +++ testsuite/gfortran.dg/do_concurrent_5.f90 (nicht existent) @@ -1,70 +0,0 @@ -! { dg-do run } -! PR 83064 - this used to give wrong results. -! { dg-options "-O3 -ftree-parallelize-loops=2" } -! Original test case by Christian Felter - -program main - use, intrinsic :: iso_fortran_env - implicit none - - integer, parameter :: nsplit = 4 - integer(int64), parameter :: ne = 20000000 - integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i - real(real64), dimension(nsplit) :: pi - - edof(1::4) = 1 - edof(2::4) = 2 - edof(3::4) = 3 - edof(4::4) = 4 - - stride = ceiling(real(ne)/nsplit) - do i = 1, nsplit - high(i) = stride*i - end do - do i = 2, nsplit - low(i) = high(i-1) + 1 - end do - low(1) = 1 - high(nsplit) = ne - - pi = 0 - do concurrent (i = 1:nsplit) - pi(i) = sum(compute( low(i), high(i) )) - end do - if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort - -contains - - pure function compute( low, high ) result( ttt ) - integer(int64), intent(in) :: low, high - real(real64), dimension(nsplit) :: ttt - integer(int64) :: j, k - - ttt = 0 - - ! Unrolled loop -! do j = low, high, 4 -! k = 1 -! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) -! k = 2 -! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) -! k = 3 -! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) -! k = 4 -! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) -! end do - - ! Loop with modulo operation -! do j = low, high -! k = mod( j, nsplit ) + 1 -! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) -! end do - - ! Loop with subscripting via host association - do j = low, high - k = edof(j) - ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) - end do - end function - -end program main Index: testsuite/gfortran.dg/do_concurrent_6.f90 =================================================================== --- testsuite/gfortran.dg/do_concurrent_6.f90 (nicht existent) +++ testsuite/gfortran.dg/do_concurrent_6.f90 (Arbeitskopie) @@ -0,0 +1,13 @@ +! { dg-do compile } +! { dg-additional-options "-fdump-tree-original" } + +program main + real, dimension(100) :: a,b + call random_number(a) + do concurrent (i=1:100) + b(i) = a(i)*a(i) + end do + print *,sum(a) +end program main + +! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } } Index: testsuite/gfortran.dg/gomp/do_concurrent_5.f90 =================================================================== --- testsuite/gfortran.dg/gomp/do_concurrent_5.f90 (nicht existent) +++ testsuite/gfortran.dg/gomp/do_concurrent_5.f90 (Arbeitskopie) @@ -0,0 +1,71 @@ +! { dg-do run } +! PR 83064 - this used to give wrong results. +! { dg-additional-options "-O1 -ftree-parallelize-loops=2" } +! Original test case by Christian Felter + +program main + use, intrinsic :: iso_fortran_env + implicit none + + integer, parameter :: nsplit = 4 + integer(int64), parameter :: ne = 2**20 + integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i + real(real64), dimension(nsplit) :: pi + + edof(1::4) = 1 + edof(2::4) = 2 + edof(3::4) = 3 + edof(4::4) = 4 + + stride = ceiling(real(ne)/nsplit) + do i = 1, nsplit + high(i) = stride*i + end do + do i = 2, nsplit + low(i) = high(i-1) + 1 + end do + low(1) = 1 + high(nsplit) = ne + + pi = 0 + do concurrent (i = 1:nsplit) + pi(i) = sum(compute( low(i), high(i) )) + end do + print *,sum(pi) + if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1 + +contains + + pure function compute( low, high ) result( ttt ) + integer(int64), intent(in) :: low, high + real(real64), dimension(nsplit) :: ttt + integer(int64) :: j, k + + ttt = 0 + + ! Unrolled loop +! do j = low, high, 4 +! k = 1 +! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) +! k = 2 +! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) +! k = 3 +! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) +! k = 4 +! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) +! end do + + ! Loop with modulo operation +! do j = low, high +! k = mod( j, nsplit ) + 1 +! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) +! end do + + ! Loop with subscripting via host association + do j = low, high + k = edof(j) + ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) + end do + end function + +end program main
On Wed, Apr 11, 2018 at 08:18:35PM +0200, Thomas Koenig wrote: > Am 11.04.2018 um 17:44 schrieb Jakub Jelinek: > > On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote: > > > Hi Jakub, > > > > > > > > > > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to > > > > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests. > > > > You need to put such tests into libgomp/testsuite/libgomp.fortran/ > > > > > > I put the test case in the attached form into the libgomp.fortran > > > directory, but it failed execution, without error message. > > > > > > Anything I could have done differently? > > > > Avoid using that much stack? > > Well, I don't think stack use is excessive :-) > > $ gfortran -S -Ofast do_concurrent_5.f90 > $ fgrep ', %rsp' do_concurrent_5.s > subq $136, %rsp > addq $136, %rsp The test is not compiled with those options in the testsuite though, but with -fopenmp -O0 -O3 -ftree-parallelize-loops=2 to select the important ones. And with these options grep ', %rsp' do_concurrent_5.s | sort -u addq $160000176, %rsp addq $8, %rsp subq $160000176, %rsp subq $8, %rsp -fopenmp is added in the default flags and implies -frecursive and thus -fautomatic. You could add -fno-openmp to dg-additional-options if it is ok for the large vars to be static. Another thing which can be seen from the above "-O0 -O3" is that libgomp.fortran/ tests cycle through optimization options, if you only want -O3 only, then better just dg-skip-if if it isn't -O3, instead of running the test effectively with -O3 6 or how many times. Or if you want to test all optimization levels, take -O3 out of the dg-additional-options. > I have attached updated patch which moves the test case to > gfortran.dg/gomp (where it actually passes). How could it pass there? dg-do run tests don't belong into g*.dg/gomp/, nothing adds the -B etc. options needed to find libgomp.spec or libgomp as a library, or adds it to LD_LIBRARY_PATH etc. There are zero dg-do run tests in gfortran.dg/gomp/, there are 4 dg-do run tests in c-c++-common/gomp/, but those work fine because they use -fopenmp-simd option rather than -fopenmp/-fopenacc/-ftree-parallelize-loops= etc. Jakub
Am 11.04.2018 um 20:33 schrieb Jakub Jelinek: >> I have attached updated patch which moves the test case to >> gfortran.dg/gomp (where it actually passes). > > How could it pass there? dg-do run tests don't belong into g*.dg/gomp/, > nothing adds the -B etc. options needed to find libgomp.spec or libgomp > as a library, or adds it to LD_LIBRARY_PATH etc. > There are zero dg-do run tests in gfortran.dg/gomp/, there are 4 > dg-do run tests in c-c++-common/gomp/, but those work fine because they > use -fopenmp-simd option rather than > -fopenmp/-fopenacc/-ftree-parallelize-loops= etc. So, where should the test go? The suggestion in PR 85346, to put it into libgomp/testsuite/libgomp.fortran/, does not work: Running ../../../../trunk/libgomp/testsuite/libgomp.fortran/fortran.exp ... FAIL: libgomp.fortran/do_concurrent_5.f90 -O execution test even when ne (the array size) has been reduced to 2**20, far below reasonable memory limits. The test passes when given the -O1 -ftree-parallelize-loops=2 options by hand. So, what's the idea? Is there actually a directory which works, or are we left with a wrong-code bug for which no test case is possible? That would be quite bad, I think. Thomas
On Wed, Apr 11, 2018 at 09:47:22PM +0200, Thomas Koenig wrote: > Am 11.04.2018 um 20:33 schrieb Jakub Jelinek: > > > > I have attached updated patch which moves the test case to > > > gfortran.dg/gomp (where it actually passes). > > > > How could it pass there? dg-do run tests don't belong into g*.dg/gomp/, > > nothing adds the -B etc. options needed to find libgomp.spec or libgomp > > as a library, or adds it to LD_LIBRARY_PATH etc. > > There are zero dg-do run tests in gfortran.dg/gomp/, there are 4 > > dg-do run tests in c-c++-common/gomp/, but those work fine because they > > use -fopenmp-simd option rather than > > -fopenmp/-fopenacc/-ftree-parallelize-loops= etc. > > So, where should the test go? > > The suggestion in PR 85346, to put it into > libgomp/testsuite/libgomp.fortran/, does not work: Yes, and I said what can be done to make it work; in patch form below. > Running ../../../../trunk/libgomp/testsuite/libgomp.fortran/fortran.exp ... > FAIL: libgomp.fortran/do_concurrent_5.f90 -O execution test > > even when ne (the array size) has been reduced to 2**20, far below ne is not the array size, the array size is 8 * ne, and 8 * 1MB is 8MB and you eat all of the usual stack limit just by that. > reasonable memory limits. The test passes when given the > -O1 -ftree-parallelize-loops=2 options by hand. > > So, what's the idea? Is there actually a directory which works, > or are we left with a wrong-code bug for which no test case is > possible? That would be quite bad, I think. Here is incremental diff. With the dg-skip-if and removal of explicit -O3, you make the test run only once with -O3 -g, and skip the other variants: UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90 -O0 UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90 -O1 UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90 -O2 UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90 -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90 -Os and with -fno-openmp you disable the default -fopenmp which you really don't need for the testcase, there are no OpenMP directives in there. --- libgomp/testsuite/libgomp.fortran/do_concurrent_5.f90 2018-04-11 17:27:59.035100057 +0200 +++ libgomp/testsuite/libgomp.fortran/do_concurrent_5.f90 2018-04-12 09:12:40.611789503 +0200 @@ -1,6 +1,7 @@ ! { dg-do run } ! PR 83064 - this used to give wrong results. -! { dg-additional-options "-O3 -ftree-parallelize-loops=2" } +! { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O3 -g" } } +! { dg-additional-options "-fno-openmp -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main @@ -8,7 +9,7 @@ program main implicit none integer, parameter :: nsplit = 4 - integer(int64), parameter :: ne = 20000000 + integer(int64), parameter :: ne = 2000000 integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i real(real64), dimension(nsplit) :: pi Jakub
Well, here's a variation which actually passes regression-test. Seems I implicitly believed that the implicit save on main program variables actually works... well, it turns out that it doesn't, which is now PR85364. OK for trunk? Thomas 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 PR testsuite/85346 * trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind for annotation and remove dependence on -ftree-parallelize-loops. 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 PR testsuite/85346 * gfortran.dg/do_concurrent_5.f90: Dynamically allocate main work array and move test to libgomp/testsuite/libgomp.fortran. * gfortran.dg/do_concurrent_6.f90: New test. 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> PR fortran/83064 PR testsuite/85346 * testsuite/libgomp.fortran: Move modified test from gfortran.dg to here. Index: trans-stmt.c =================================================================== --- trans-stmt.c (Revision 259326) +++ trans-stmt.c (Arbeitskopie) @@ -3643,12 +3643,12 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - /* PR 83064 means that we cannot use the annotation if the - autoparallelizer is active. */ - if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) + /* PR 83064 means that we cannot use annot_expr_parallel_kind until + the autoparallelizer can hande this. */ + if (forall_tmp->do_concurrent) cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, build_int_cst (integer_type_node, - annot_expr_parallel_kind), + annot_expr_ivdep_kind), integer_zero_node); tmp = build1_v (GOTO_EXPR, exit_label); ! { dg-do run } ! PR 83064 - this used to give wrong results. ! { dg-additional-options "-O1 -ftree-parallelize-loops=2" } ! Original test case by Christian Felter program main use, intrinsic :: iso_fortran_env implicit none integer, parameter :: nsplit = 4 integer(int64), parameter :: ne = 2**20 integer(int64) :: stride, low(nsplit), high(nsplit), i integer(int64), dimension(:), allocatable :: edof real(real64), dimension(nsplit) :: pi allocate (edof(ne)) edof(1::4) = 1 edof(2::4) = 2 edof(3::4) = 3 edof(4::4) = 4 stride = ceiling(real(ne)/nsplit) do i = 1, nsplit high(i) = stride*i end do do i = 2, nsplit low(i) = high(i-1) + 1 end do low(1) = 1 high(nsplit) = ne pi = 0 do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1 contains pure function compute( low, high ) result( ttt ) integer(int64), intent(in) :: low, high real(real64), dimension(nsplit) :: ttt integer(int64) :: j, k ttt = 0 ! Unrolled loop ! do j = low, high, 4 ! k = 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! k = 2 ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) ! k = 3 ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) ! k = 4 ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) ! end do ! Loop with modulo operation ! do j = low, high ! k = mod( j, nsplit ) + 1 ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) ! end do ! Loop with subscripting via host association do j = low, high k = edof(j) ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) end do end function end program main ! { dg-do compile } ! { dg-additional-options "-fdump-tree-original" } program main real, dimension(100) :: a,b call random_number(a) do concurrent (i=1:100) b(i) = a(i)*a(i) end do print *,sum(a) end program main ! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } }
On Thu, Apr 12, 2018 at 11:14:45PM +0200, Thomas Koenig wrote: > 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > PR testsuite/85346 > * trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind > for annotation and remove dependence on -ftree-parallelize-loops. > > 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > PR testsuite/85346 > * gfortran.dg/do_concurrent_5.f90: Dynamically allocate main work > array and move test to libgomp/testsuite/libgomp.fortran. > * gfortran.dg/do_concurrent_6.f90: New test. > > 2018-04-12 Thomas Koenig <tkoenig@gcc.gnu.org> > > PR fortran/83064 > PR testsuite/85346 > * testsuite/libgomp.fortran: Move modified test from gfortran.dg > to here. Please use full filename here, like: * testsuite/libgomp.fortran/do_concurrent_5.f90: New test, moved from gfortran.dg. Make edof array allocatable. Ok with that change. > Index: trans-stmt.c > =================================================================== > --- trans-stmt.c (Revision 259326) > +++ trans-stmt.c (Arbeitskopie) > @@ -3643,12 +3643,12 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr > cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, > count, build_int_cst (TREE_TYPE (count), 0)); > > - /* PR 83064 means that we cannot use the annotation if the > - autoparallelizer is active. */ > - if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops) > + /* PR 83064 means that we cannot use annot_expr_parallel_kind until > + the autoparallelizer can hande this. */ > + if (forall_tmp->do_concurrent) > cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, > build_int_cst (integer_type_node, > - annot_expr_parallel_kind), > + annot_expr_ivdep_kind), > integer_zero_node); > > tmp = build1_v (GOTO_EXPR, exit_label); > ! { dg-do run } > ! PR 83064 - this used to give wrong results. > ! { dg-additional-options "-O1 -ftree-parallelize-loops=2" } > ! Original test case by Christian Felter > > program main > use, intrinsic :: iso_fortran_env > implicit none > > integer, parameter :: nsplit = 4 > integer(int64), parameter :: ne = 2**20 > integer(int64) :: stride, low(nsplit), high(nsplit), i > integer(int64), dimension(:), allocatable :: edof > real(real64), dimension(nsplit) :: pi > > allocate (edof(ne)) > edof(1::4) = 1 > edof(2::4) = 2 > edof(3::4) = 3 > edof(4::4) = 4 > > stride = ceiling(real(ne)/nsplit) > do i = 1, nsplit > high(i) = stride*i > end do > do i = 2, nsplit > low(i) = high(i-1) + 1 > end do > low(1) = 1 > high(nsplit) = ne > > pi = 0 > do concurrent (i = 1:nsplit) > pi(i) = sum(compute( low(i), high(i) )) > end do > if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1 > > contains > > pure function compute( low, high ) result( ttt ) > integer(int64), intent(in) :: low, high > real(real64), dimension(nsplit) :: ttt > integer(int64) :: j, k > > ttt = 0 > > ! Unrolled loop > ! do j = low, high, 4 > ! k = 1 > ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) > ! k = 2 > ! ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 ) > ! k = 3 > ! ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 ) > ! k = 4 > ! ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 ) > ! end do > > ! Loop with modulo operation > ! do j = low, high > ! k = mod( j, nsplit ) + 1 > ! ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 ) > ! end do > > ! Loop with subscripting via host association > do j = low, high > k = edof(j) > ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 ) > end do > end function > > end program main > ! { dg-do compile } > ! { dg-additional-options "-fdump-tree-original" } > > program main > real, dimension(100) :: a,b > call random_number(a) > do concurrent (i=1:100) > b(i) = a(i)*a(i) > end do > print *,sum(a) > end program main > > ! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } } Jakub
Index: fortran/trans-stmt.c =================================================================== --- fortran/trans-stmt.c (Revision 259222) +++ fortran/trans-stmt.c (Arbeitskopie) @@ -3642,12 +3642,6 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr /* The exit condition. */ cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node, count, build_int_cst (TREE_TYPE (count), 0)); - if (forall_tmp->do_concurrent) - cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond, - build_int_cst (integer_type_node, - annot_expr_parallel_kind), - integer_zero_node); - tmp = build1_v (GOTO_EXPR, exit_label); tmp = fold_build3_loc (input_location, COND_EXPR, void_type_node, cond, tmp, build_empty_stmt (input_location)); Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 =================================================================== --- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Revision 259222) +++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90 (Arbeitskopie) @@ -1,6 +1,8 @@ ! { dg-do compile } ! { dg-require-effective-target vect_float } ! { dg-additional-options "-O3 -fopt-info-vec-optimized" } +! { xfail *-*-* } +! PR 83064 - DO CONCURRENT is no longer vectorized for this case. subroutine test(n, a, b, c) integer, value :: n @@ -12,4 +14,4 @@ subroutine test(n, a, b, c) end subroutine test ! { dg-message "loop vectorized" "" { target *-*-* } 0 } -! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 } +! Restoration of the test case will need dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0