[fortran] Remove parallell annotation from DO CONCURRENT

Message ID	e491d566-c4f6-156d-18fa-83e6304aaab4@netcologne.de
State	New
Headers	show Return-Path: <gcc-patches-return-476082-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to :from:subject:message-id:date:mime-version:content-type; q=dns; s=default; b=surt/N8IyV3Rj0iXrviK+l2XGLsed74DmQUhI/95NjC9ppg6qH kDmyG6uTRHkDsqHgiywLiVaw31pJLvzEmYZthn/UWNF2EOwu3KfnUtGjk1MBgfWz mvomYhi7DZNLf1ipYyssLF3yBv3c2dGQRYIDFITLN7xrqLT40eNZXeXiM= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org To: "fortran@gcc.gnu.org" <fortran@gcc.gnu.org>, gcc-patches <gcc-patches@gcc.gnu.org> From: Thomas Koenig <tkoenig@netcologne.de> Subject: [patch, fortran] Remove parallell annotation from DO CONCURRENT Message-ID: <e491d566-c4f6-156d-18fa-83e6304aaab4@netcologne.de> Date: Mon, 9 Apr 2018 22:10:13 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------7C7CAFBD66EB1F9D7B53C58E"
Series	[fortran] Remove parallell annotation from DO CONCURRENT \| expand [fortran] Remove parallell annotation from DO CONCURRENT

Thomas Koenig April 9, 2018, 8:10 p.m. UTC

Hello world,

the attached patch removes the parallel annotation from DO CONCURRENT.
As discussed in the PR, the autoparallellizer currently generates
wrong code. The only feasible way is to disable the annotation for
gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
regression.

Regression-tested. OK for trunk?

Regards

	Thomas

2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
         parallell processing of DO CONCURRENT.

2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         * gfortran.dg/do_concurrent_5.f90: New test.
         * gfortran.dg/vect/vect-do-concurrent-1.f90: Xfail. Adjust
         dg-bogus message.

Jakub Jelinek April 9, 2018, 8:19 p.m. UTC | #1

On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> Hello world,
> 
> the attached patch removes the parallel annotation from DO CONCURRENT.
> As discussed in the PR, the autoparallellizer currently generates
> wrong code. The only feasible way is to disable the annotation for
> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> regression.

Can't it be turned into annot_expr_ivdep_kind instead of
annot_expr_parallel_kind for GCC8?  I mean, the potential local addressable
variables shouldn't be a problem for vectorization, just autopar, right?

	Jakub

Steve Kargl April 9, 2018, 8:28 p.m. UTC | #2

On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> 
> the attached patch removes the parallel annotation from DO CONCURRENT.
> As discussed in the PR, the autoparallellizer currently generates
> wrong code. The only feasible way is to disable the annotation for
> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> regression.
> 
> Regression-tested. OK for trunk?
> 

Yes.

Thomas Koenig April 9, 2018, 8:58 p.m. UTC | #3

Hi Steve,

> On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
>>
>> the attached patch removes the parallel annotation from DO CONCURRENT.
>> As discussed in the PR, the autoparallellizer currently generates
>> wrong code. The only feasible way is to disable the annotation for
>> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
>> regression.
>>
>> Regression-tested. OK for trunk?
>>
> 
> Yes.

Thanks for the review!

However, just as I was about to commit, I thought of something else
and, hopefully, better.

The underlying problem is that the annotation does not go together
with -ftree-parallelize-loops - so let's simply not set it if that
flag is set.

This way, we can avoid speed regressions for people who do not use
-ftree-parallelize-loops, and it will be possible to downgrade the
PR to a missed-optimization bug; it is also not necessary to xfail
vect-do-concurrent-1.f90

Regression-tested. OK for trunk, for this version?


2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
         parallell processing of DO CONCURRENT -ftree-parallelize-loops
         is set.

2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         * gfortran.dg/do_concurrent_5.f90: New test.
         * gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus
         message.
Index: fortran/trans-stmt.c
===================================================================
--- fortran/trans-stmt.c	(Revision 259222)
+++ fortran/trans-stmt.c	(Arbeitskopie)
@@ -3642,7 +3642,10 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr
       /* The exit condition.  */
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
-      if (forall_tmp->do_concurrent)
+
+      /* PR 83064 means that we cannot use the annotation if the
+	 autoparallelizer is active.  */
+      if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
 				      annot_expr_parallel_kind),
Index: testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90
===================================================================
--- testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Revision 259222)
+++ testsuite/gfortran.dg/vect/vect-do-concurrent-1.f90	(Arbeitskopie)
@@ -12,4 +12,3 @@ subroutine test(n, a, b, c)
 end subroutine test
 
 ! { dg-message "loop vectorized" "" { target *-*-* } 0 }
-! { dg-bogus " version\[^\n\r]* alias" "" { target *-*-* } 0 }
! { dg-do  run }
! PR 83064 - this used to give wrong results.
! { dg-options "-O3 -ftree-parallelize-loops=2" }
! Original test case by Christian Felter

program main
    use, intrinsic :: iso_fortran_env
    implicit none

    integer, parameter :: nsplit = 4
    integer(int64), parameter :: ne = 20000000
    integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
    real(real64), dimension(nsplit) :: pi
    
    edof(1::4) = 1
    edof(2::4) = 2
    edof(3::4) = 3
    edof(4::4) = 4
    
    stride = ceiling(real(ne)/nsplit)
    do i = 1, nsplit
        high(i) = stride*i
    end do
    do i = 2, nsplit
        low(i) = high(i-1) + 1
    end do
    low(1) = 1
    high(nsplit) = ne

    pi = 0
    do concurrent (i = 1:nsplit)
        pi(i) = sum(compute( low(i), high(i) ))
    end do
    if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort
    
contains
    
    pure function compute( low, high ) result( ttt )        
        integer(int64), intent(in) :: low, high
        real(real64), dimension(nsplit) :: ttt
        integer(int64) :: j, k
        
        ttt = 0

        ! Unrolled loop
!         do j = low, high, 4
!             k = 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
!             k = 2
!             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
!             k = 3
!             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
!             k = 4
!             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
!         end do
        
        ! Loop with modulo operation
!         do j = low, high
!             k = mod( j, nsplit ) + 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
!         end do
        
        ! Loop with subscripting via host association
        do j = low, high
            k = edof(j)
            ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
        end do
    end function
    
end program main

Steve Kargl April 9, 2018, 9:19 p.m. UTC | #4

On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote:
> > On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
> >>
> >> the attached patch removes the parallel annotation from DO CONCURRENT.
> >> As discussed in the PR, the autoparallellizer currently generates
> >> wrong code. The only feasible way is to disable the annotation for
> >> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
> >> regression.
> >>
> >> Regression-tested. OK for trunk?
> >>
> > 
> > Yes.
> 
> Thanks for the review!
> 
> However, just as I was about to commit, I thought of something else
> and, hopefully, better.
> 
> The underlying problem is that the annotation does not go together
> with -ftree-parallelize-loops - so let's simply not set it if that
> flag is set.
> 
> This way, we can avoid speed regressions for people who do not use
> -ftree-parallelize-loops, and it will be possible to downgrade the
> PR to a missed-optimization bug; it is also not necessary to xfail
> vect-do-concurrent-1.f90
> 
> Regression-tested. OK for trunk, for this version?

In reviewing the PR audit trailing, I thought Richard indicated
the problem is in the autopar stage in the middle.  Removing the
annotation simply prevents gfortran's DO CURRENT from a middle 
in bug.

I'm fine with the new patch.  I'll leave it up to you to decide
which is preferred.

Richard Biener April 10, 2018, 6:44 a.m. UTC | #5

On Mon, Apr 9, 2018 at 10:58 PM, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Hi Steve,
>
>> On Mon, Apr 09, 2018 at 10:10:13PM +0200, Thomas Koenig wrote:
>>>
>>>
>>> the attached patch removes the parallel annotation from DO CONCURRENT.
>>> As discussed in the PR, the autoparallellizer currently generates
>>> wrong code. The only feasible way is to disable the annotation for
>>> gcc-8 and work on the wrong-code issues for gcc-9. This is an 8
>>> regression.
>>>
>>> Regression-tested. OK for trunk?
>>>
>>
>> Yes.
>
>
> Thanks for the review!
>
> However, just as I was about to commit, I thought of something else
> and, hopefully, better.
>
> The underlying problem is that the annotation does not go together
> with -ftree-parallelize-loops - so let's simply not set it if that
> flag is set.
>
> This way, we can avoid speed regressions for people who do not use
> -ftree-parallelize-loops, and it will be possible to downgrade the
> PR to a missed-optimization bug; it is also not necessary to xfail
> vect-do-concurrent-1.f90
>
> Regression-tested. OK for trunk, for this version?

flag_tree_parallelize_loops is 1 by default (and doesn't parallelize then,
the flag specifies the number of threads).  The pass is also run if
flag_openacc.

Note that setting the attribute is only ever useful for autopar, nothing
else looks at it.  I think before I made the change there was the
ivdep flag set which should be still valid (in fact lowering of parallel_kind
sets both can_be_parallel and safelen).

So I suggest to instead unconditonally replace parallel_kind with
ivdep_kind.

Richard.

>
> 2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>         PR fortran/83064
>         * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
>         parallell processing of DO CONCURRENT -ftree-parallelize-loops
>         is set.
>
> 2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>
>
>         PR fortran/83064
>         * gfortran.dg/do_concurrent_5.f90: New test.
>         * gfortran.dg/vect/vect-do-concurrent-1.f90: Adjust dg-bogus
>         message.

Jakub Jelinek April 10, 2018, 1:36 p.m. UTC | #6

On Mon, Apr 09, 2018 at 10:58:13PM +0200, Thomas Koenig wrote:
> 2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>
> 
>         PR fortran/83064
>         * trans-stmt.c (gfc_trans_forall_loop): Remove annotation for
>         parallell processing of DO CONCURRENT -ftree-parallelize-loops
>         is set.
> 
> 2018-04-09  Thomas Koenig  <tkoenig@gcc.gnu.org>
> 
>         PR fortran/83064
>         * gfortran.dg/do_concurrent_5.f90: New test.

The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
You need to put such tests into libgomp/testsuite/libgomp.fortran/

	Jakub

Thomas Koenig April 10, 2018, 9:50 p.m. UTC | #7

Hi Jakub,


> The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
> run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
> You need to put such tests into libgomp/testsuite/libgomp.fortran/

I put the test case in the attached form into the libgomp.fortran
directory, but it failed execution, without error message.

Anything I could have done differently?
! { dg-do run }
! PR 83064 - this used to give wrong results.
! { dg-additional-options "-O3 -ftree-parallelize-loops=2" }
! Original test case by Christian Felter

program main
    use, intrinsic :: iso_fortran_env
    implicit none

    integer, parameter :: nsplit = 4
    integer(int64), parameter :: ne = 20000000
    integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
    real(real64), dimension(nsplit) :: pi
    
    edof(1::4) = 1
    edof(2::4) = 2
    edof(3::4) = 3
    edof(4::4) = 4
    
    stride = ceiling(real(ne)/nsplit)
    do i = 1, nsplit
        high(i) = stride*i
    end do
    do i = 2, nsplit
        low(i) = high(i-1) + 1
    end do
    low(1) = 1
    high(nsplit) = ne

    pi = 0
    do concurrent (i = 1:nsplit)
        pi(i) = sum(compute( low(i), high(i) ))
    end do
    if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1
    
contains
    
    pure function compute( low, high ) result( ttt )        
        integer(int64), intent(in) :: low, high
        real(real64), dimension(nsplit) :: ttt
        integer(int64) :: j, k
        
        ttt = 0

        ! Unrolled loop
!         do j = low, high, 4
!             k = 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
!             k = 2
!             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
!             k = 3
!             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
!             k = 4
!             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
!         end do
        
        ! Loop with modulo operation
!         do j = low, high
!             k = mod( j, nsplit ) + 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
!         end do
        
        ! Loop with subscripting via host association
        do j = low, high
            k = edof(j)
            ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
        end do
    end function
    
end program main

Jakub Jelinek April 11, 2018, 3:44 p.m. UTC | #8

On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:
> Hi Jakub,
> 
> 
> > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
> > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
> > You need to put such tests into libgomp/testsuite/libgomp.fortran/
> 
> I put the test case in the attached form into the libgomp.fortran
> directory, but it failed execution, without error message.
> 
> Anything I could have done differently?

Avoid using that much stack?  ulimit -s is usually around 8MB on Linux, on
other OSes it can be as low as 2MB or even less, so using 160MB edof array
is way too much.  Also, even if you e.g. allocated it from heap rather than
stack (still for some targets it would be too much), isn't that just too
expensive for the testsuite?
Can you reproduce say even with ne = 200000 (with the fix reverted)?

Also, are you going to do the suggested change (because can_be_parallel
is something only autopar cares about, but annot_expr_parallel_kind
sets like annot_expr_ivdep_kind also safelen to INTMAX):
--- gcc/fortran/trans-stmt.c.jj	2018-04-10 08:52:25.467790554 +0200
+++ gcc/fortran/trans-stmt.c	2018-04-11 17:42:40.670493050 +0200
@@ -3643,12 +3643,13 @@ gfc_trans_forall_loop (forall_info *fora
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
 
-      /* PR 83064 means that we cannot use the annotation if the
-	 autoparallelizer is active.  */
-      if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
+      /* PR 83064 means that we cannot use the annot_expr_parallel_kind
+	 annotation until autopar is tought to handle local variables
+	 in loops annotated that way.  */
+      if (forall_tmp->do_concurrent)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_parallel_kind),
+				      annot_expr_ivdep_kind),
 		       integer_zero_node);
 
       tmp = build1_v (GOTO_EXPR, exit_label);


> ! { dg-do run }
> ! PR 83064 - this used to give wrong results.
> ! { dg-additional-options "-O3 -ftree-parallelize-loops=2" }
> ! Original test case by Christian Felter
> 
> program main
>     use, intrinsic :: iso_fortran_env
>     implicit none
> 
>     integer, parameter :: nsplit = 4
>     integer(int64), parameter :: ne = 20000000
>     integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
>     real(real64), dimension(nsplit) :: pi

	Jakub

Thomas Koenig April 11, 2018, 6:18 p.m. UTC | #9

Am 11.04.2018 um 17:44 schrieb Jakub Jelinek:
> On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:
>> Hi Jakub,
>>
>>
>>> The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
>>> run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
>>> You need to put such tests into libgomp/testsuite/libgomp.fortran/
>>
>> I put the test case in the attached form into the libgomp.fortran
>> directory, but it failed execution, without error message.
>>
>> Anything I could have done differently?
> 
> Avoid using that much stack?

Well, I don't think stack use is excessive :-)

$ gfortran -S -Ofast do_concurrent_5.f90
$ fgrep ', %rsp' do_concurrent_5.s
         subq    $136, %rsp
         addq    $136, %rsp

I do see your point about total memory consumption, though.

Computation time of the test case I committed is around 1 s, which was
also not too bad.

I have attached updated patch which moves the test case to
gfortran.dg/gomp (where it actually passes).

Also, the patch below implements the suggestion of using
annot_expr_ivdep_kind.

OK for trunk?

Regards

	Thomas

2018-04-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/83064
	PR testsuite/85346
	* trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind
	for annotation and remove dependence on -ftree-parallelize-loops.

2018-04-11  Thomas Koenig  <tkoenig@gcc.gnu.org>

	PR fortran/83064
	PR testsuite/85346
	* gfortran.dg/do_concurrent_5.f90: Reduce memory consumption and
	move test to
	* gfortran.dg/gomp/do_concurrent_5.f90: New location.
	* gfortran.dg/do_concurrent_6.f90: New test.
Index: fortran/trans-stmt.c
===================================================================
--- fortran/trans-stmt.c	(Revision 259326)
+++ fortran/trans-stmt.c	(Arbeitskopie)
@@ -3643,12 +3643,12 @@
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
 
-      /* PR 83064 means that we cannot use the annotation if the
-	 autoparallelizer is active.  */
-      if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
+      /* PR 83064 means that we cannot use annot_expr_parallel_kind until
+       the autoparallelizer can hande this.  */
+      if (forall_tmp->do_concurrent)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_parallel_kind),
+				      annot_expr_ivdep_kind),
 		       integer_zero_node);
 
       tmp = build1_v (GOTO_EXPR, exit_label);
Index: testsuite/gfortran.dg/do_concurrent_5.f90
===================================================================
--- testsuite/gfortran.dg/do_concurrent_5.f90	(Revision 259258)
+++ testsuite/gfortran.dg/do_concurrent_5.f90	(nicht existent)
@@ -1,70 +0,0 @@
-! { dg-do  run }
-! PR 83064 - this used to give wrong results.
-! { dg-options "-O3 -ftree-parallelize-loops=2" }
-! Original test case by Christian Felter
-
-program main
-    use, intrinsic :: iso_fortran_env
-    implicit none
-
-    integer, parameter :: nsplit = 4
-    integer(int64), parameter :: ne = 20000000
-    integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
-    real(real64), dimension(nsplit) :: pi
-    
-    edof(1::4) = 1
-    edof(2::4) = 2
-    edof(3::4) = 3
-    edof(4::4) = 4
-    
-    stride = ceiling(real(ne)/nsplit)
-    do i = 1, nsplit
-        high(i) = stride*i
-    end do
-    do i = 2, nsplit
-        low(i) = high(i-1) + 1
-    end do
-    low(1) = 1
-    high(nsplit) = ne
-
-    pi = 0
-    do concurrent (i = 1:nsplit)
-        pi(i) = sum(compute( low(i), high(i) ))
-    end do
-    if (abs (sum(pi) - atan(1.0d0)) > 1e-5) call abort
-    
-contains
-    
-    pure function compute( low, high ) result( ttt )        
-        integer(int64), intent(in) :: low, high
-        real(real64), dimension(nsplit) :: ttt
-        integer(int64) :: j, k
-        
-        ttt = 0
-
-        ! Unrolled loop
-!         do j = low, high, 4
-!             k = 1
-!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
-!             k = 2
-!             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
-!             k = 3
-!             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
-!             k = 4
-!             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
-!         end do
-        
-        ! Loop with modulo operation
-!         do j = low, high
-!             k = mod( j, nsplit ) + 1
-!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
-!         end do
-        
-        ! Loop with subscripting via host association
-        do j = low, high
-            k = edof(j)
-            ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
-        end do
-    end function
-    
-end program main
Index: testsuite/gfortran.dg/do_concurrent_6.f90
===================================================================
--- testsuite/gfortran.dg/do_concurrent_6.f90	(nicht existent)
+++ testsuite/gfortran.dg/do_concurrent_6.f90	(Arbeitskopie)
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+program main
+  real, dimension(100) :: a,b
+  call random_number(a)
+  do concurrent (i=1:100)
+     b(i) = a(i)*a(i)
+  end do
+  print *,sum(a)
+end program main
+
+! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } }
Index: testsuite/gfortran.dg/gomp/do_concurrent_5.f90
===================================================================
--- testsuite/gfortran.dg/gomp/do_concurrent_5.f90	(nicht existent)
+++ testsuite/gfortran.dg/gomp/do_concurrent_5.f90	(Arbeitskopie)
@@ -0,0 +1,71 @@
+! { dg-do  run }
+! PR 83064 - this used to give wrong results.
+! { dg-additional-options "-O1 -ftree-parallelize-loops=2" }
+! Original test case by Christian Felter
+
+program main
+    use, intrinsic :: iso_fortran_env
+    implicit none
+
+    integer, parameter :: nsplit = 4
+    integer(int64), parameter :: ne = 2**20
+    integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
+    real(real64), dimension(nsplit) :: pi
+    
+    edof(1::4) = 1
+    edof(2::4) = 2
+    edof(3::4) = 3
+    edof(4::4) = 4
+    
+    stride = ceiling(real(ne)/nsplit)
+    do i = 1, nsplit
+        high(i) = stride*i
+    end do
+    do i = 2, nsplit
+        low(i) = high(i-1) + 1
+    end do
+    low(1) = 1
+    high(nsplit) = ne
+
+    pi = 0
+    do concurrent (i = 1:nsplit)
+        pi(i) = sum(compute( low(i), high(i) ))
+    end do
+    print *,sum(pi)
+    if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1
+    
+contains
+    
+    pure function compute( low, high ) result( ttt )        
+        integer(int64), intent(in) :: low, high
+        real(real64), dimension(nsplit) :: ttt
+        integer(int64) :: j, k
+        
+        ttt = 0
+
+        ! Unrolled loop
+!         do j = low, high, 4
+!             k = 1
+!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
+!             k = 2
+!             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
+!             k = 3
+!             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
+!             k = 4
+!             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
+!         end do
+        
+        ! Loop with modulo operation
+!         do j = low, high
+!             k = mod( j, nsplit ) + 1
+!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
+!         end do
+        
+        ! Loop with subscripting via host association
+        do j = low, high
+            k = edof(j)
+            ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
+        end do
+    end function
+    
+end program main

Jakub Jelinek April 11, 2018, 6:33 p.m. UTC | #10

On Wed, Apr 11, 2018 at 08:18:35PM +0200, Thomas Koenig wrote:
> Am 11.04.2018 um 17:44 schrieb Jakub Jelinek:
> > On Tue, Apr 10, 2018 at 11:50:44PM +0200, Thomas Koenig wrote:
> > > Hi Jakub,
> > > 
> > > 
> > > > The new test FAILs everywhere, gfortran.dg doesn't have infrastructure to
> > > > run -fopenmp, -fopenacc nor -ftree-parallelize-loops= tests.
> > > > You need to put such tests into libgomp/testsuite/libgomp.fortran/
> > > 
> > > I put the test case in the attached form into the libgomp.fortran
> > > directory, but it failed execution, without error message.
> > > 
> > > Anything I could have done differently?
> > 
> > Avoid using that much stack?
> 
> Well, I don't think stack use is excessive :-)
> 
> $ gfortran -S -Ofast do_concurrent_5.f90
> $ fgrep ', %rsp' do_concurrent_5.s
>         subq    $136, %rsp
>         addq    $136, %rsp

The test is not compiled with those options in the testsuite though, but
with -fopenmp -O0 -O3 -ftree-parallelize-loops=2 to select the important
ones.  And with these options
grep ', %rsp' do_concurrent_5.s  | sort -u
	addq	$160000176, %rsp
	addq	$8, %rsp
	subq	$160000176, %rsp
	subq	$8, %rsp
-fopenmp is added in the default flags and implies -frecursive and thus
-fautomatic.  You could add -fno-openmp to dg-additional-options if it is
ok for the large vars to be static.  Another thing which can be seen from
the above "-O0 -O3" is that libgomp.fortran/ tests cycle through optimization
options, if you only want -O3 only, then better just dg-skip-if if it isn't -O3,
instead of running the test effectively with -O3 6 or how many times.
Or if you want to test all optimization levels, take -O3 out of the
dg-additional-options.

> I have attached updated patch which moves the test case to
> gfortran.dg/gomp (where it actually passes).

How could it pass there?  dg-do run tests don't belong into g*.dg/gomp/,
nothing adds the -B etc. options needed to find libgomp.spec or libgomp
as a library, or adds it to LD_LIBRARY_PATH etc.
There are zero dg-do run tests in gfortran.dg/gomp/, there are 4
dg-do run tests in c-c++-common/gomp/, but those work fine because they
use -fopenmp-simd option rather than
-fopenmp/-fopenacc/-ftree-parallelize-loops= etc.

	Jakub

Thomas Koenig April 11, 2018, 7:47 p.m. UTC | #11

Am 11.04.2018 um 20:33 schrieb Jakub Jelinek:

>> I have attached updated patch which moves the test case to
>> gfortran.dg/gomp (where it actually passes).
> 
> How could it pass there?  dg-do run tests don't belong into g*.dg/gomp/,
> nothing adds the -B etc. options needed to find libgomp.spec or libgomp
> as a library, or adds it to LD_LIBRARY_PATH etc.
> There are zero dg-do run tests in gfortran.dg/gomp/, there are 4
> dg-do run tests in c-c++-common/gomp/, but those work fine because they
> use -fopenmp-simd option rather than
> -fopenmp/-fopenacc/-ftree-parallelize-loops= etc.

So, where should the test go?

The suggestion in PR 85346, to put it into
libgomp/testsuite/libgomp.fortran/, does not work:

Running ../../../../trunk/libgomp/testsuite/libgomp.fortran/fortran.exp ...
FAIL: libgomp.fortran/do_concurrent_5.f90   -O  execution test

even when ne (the array size) has been reduced to 2**20, far below
reasonable memory limits.  The test passes when given the
-O1 -ftree-parallelize-loops=2 options by hand.

So, what's the idea? Is there actually a directory which works,
or are we left with a wrong-code bug for which no test case is
possible? That would be quite bad, I think.

	Thomas

Jakub Jelinek April 12, 2018, 7:17 a.m. UTC | #12

On Wed, Apr 11, 2018 at 09:47:22PM +0200, Thomas Koenig wrote:
> Am 11.04.2018 um 20:33 schrieb Jakub Jelinek:
> 
> > > I have attached updated patch which moves the test case to
> > > gfortran.dg/gomp (where it actually passes).
> > 
> > How could it pass there?  dg-do run tests don't belong into g*.dg/gomp/,
> > nothing adds the -B etc. options needed to find libgomp.spec or libgomp
> > as a library, or adds it to LD_LIBRARY_PATH etc.
> > There are zero dg-do run tests in gfortran.dg/gomp/, there are 4
> > dg-do run tests in c-c++-common/gomp/, but those work fine because they
> > use -fopenmp-simd option rather than
> > -fopenmp/-fopenacc/-ftree-parallelize-loops= etc.
> 
> So, where should the test go?
> 
> The suggestion in PR 85346, to put it into
> libgomp/testsuite/libgomp.fortran/, does not work:

Yes, and I said what can be done to make it work; in patch form below.

> Running ../../../../trunk/libgomp/testsuite/libgomp.fortran/fortran.exp ...
> FAIL: libgomp.fortran/do_concurrent_5.f90   -O  execution test
> 
> even when ne (the array size) has been reduced to 2**20, far below

ne is not the array size, the array size is 8 * ne, and 8 * 1MB is 8MB and
you eat all of the usual stack limit just by that.

> reasonable memory limits.  The test passes when given the
> -O1 -ftree-parallelize-loops=2 options by hand.
> 
> So, what's the idea? Is there actually a directory which works,
> or are we left with a wrong-code bug for which no test case is
> possible? That would be quite bad, I think.

Here is incremental diff.  With the dg-skip-if and removal of explicit -O3,
you make the test run only once with -O3 -g, and skip the other variants:
UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90   -O0 
UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90   -O1 
UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90   -O2 
UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions 
UNSUPPORTED: libgomp.fortran/do_concurrent_5.f90   -Os 
and with -fno-openmp you disable the default -fopenmp which you really don't
need for the testcase, there are no OpenMP directives in there.

--- libgomp/testsuite/libgomp.fortran/do_concurrent_5.f90	2018-04-11 17:27:59.035100057 +0200
+++ libgomp/testsuite/libgomp.fortran/do_concurrent_5.f90	2018-04-12 09:12:40.611789503 +0200
@@ -1,6 +1,7 @@
 ! { dg-do run }
 ! PR 83064 - this used to give wrong results.
-! { dg-additional-options "-O3 -ftree-parallelize-loops=2" }
+! { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O3 -g" } }
+! { dg-additional-options "-fno-openmp -ftree-parallelize-loops=2" }
 ! Original test case by Christian Felter
 
 program main
@@ -8,7 +9,7 @@ program main
     implicit none
 
     integer, parameter :: nsplit = 4
-    integer(int64), parameter :: ne = 20000000
+    integer(int64), parameter :: ne = 2000000
     integer(int64) :: stride, low(nsplit), high(nsplit), edof(ne), i
     real(real64), dimension(nsplit) :: pi
     


	Jakub

Thomas Koenig April 12, 2018, 9:14 p.m. UTC | #13

Well, here's a variation which actually passes regression-test.

Seems I implicitly believed that the implicit save on main program
variables actually works... well, it turns out that it doesn't,
which is now PR85364.

OK for trunk?

	Thomas

2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         PR testsuite/85346
         * trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind
         for annotation and remove dependence on -ftree-parallelize-loops.

2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         PR testsuite/85346
         * gfortran.dg/do_concurrent_5.f90: Dynamically allocate main work
         array and move test to libgomp/testsuite/libgomp.fortran.
         * gfortran.dg/do_concurrent_6.f90: New test.

2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>

         PR fortran/83064
         PR testsuite/85346
         * testsuite/libgomp.fortran: Move modified test from gfortran.dg
         to here.
Index: trans-stmt.c
===================================================================
--- trans-stmt.c	(Revision 259326)
+++ trans-stmt.c	(Arbeitskopie)
@@ -3643,12 +3643,12 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr
       cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
 			      count, build_int_cst (TREE_TYPE (count), 0));
 
-      /* PR 83064 means that we cannot use the annotation if the
-	 autoparallelizer is active.  */
-      if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
+      /* PR 83064 means that we cannot use annot_expr_parallel_kind until
+       the autoparallelizer can hande this.  */
+      if (forall_tmp->do_concurrent)
 	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
 		       build_int_cst (integer_type_node,
-				      annot_expr_parallel_kind),
+				      annot_expr_ivdep_kind),
 		       integer_zero_node);
 
       tmp = build1_v (GOTO_EXPR, exit_label);
! { dg-do  run }
! PR 83064 - this used to give wrong results.
! { dg-additional-options "-O1 -ftree-parallelize-loops=2" }
! Original test case by Christian Felter

program main
    use, intrinsic :: iso_fortran_env
    implicit none

    integer, parameter :: nsplit = 4
    integer(int64), parameter :: ne = 2**20
    integer(int64) :: stride, low(nsplit), high(nsplit), i
    integer(int64), dimension(:), allocatable :: edof
    real(real64), dimension(nsplit) :: pi
    
    allocate (edof(ne))
    edof(1::4) = 1
    edof(2::4) = 2
    edof(3::4) = 3
    edof(4::4) = 4
    
    stride = ceiling(real(ne)/nsplit)
    do i = 1, nsplit
        high(i) = stride*i
    end do
    do i = 2, nsplit
        low(i) = high(i-1) + 1
    end do
    low(1) = 1
    high(nsplit) = ne

    pi = 0
    do concurrent (i = 1:nsplit)
        pi(i) = sum(compute( low(i), high(i) ))
    end do
    if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1
    
contains
    
    pure function compute( low, high ) result( ttt )        
        integer(int64), intent(in) :: low, high
        real(real64), dimension(nsplit) :: ttt
        integer(int64) :: j, k
        
        ttt = 0

        ! Unrolled loop
!         do j = low, high, 4
!             k = 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
!             k = 2
!             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
!             k = 3
!             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
!             k = 4
!             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
!         end do
        
        ! Loop with modulo operation
!         do j = low, high
!             k = mod( j, nsplit ) + 1
!             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
!         end do
        
        ! Loop with subscripting via host association
        do j = low, high
            k = edof(j)
            ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
        end do
    end function
    
end program main
! { dg-do compile }
! { dg-additional-options "-fdump-tree-original" }

program main
  real, dimension(100) :: a,b
  call random_number(a)
  do concurrent (i=1:100)
     b(i) = a(i)*a(i)
  end do
  print *,sum(a)
end program main

! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } }

Jakub Jelinek April 12, 2018, 9:21 p.m. UTC | #14

On Thu, Apr 12, 2018 at 11:14:45PM +0200, Thomas Koenig wrote:
> 2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>
> 
>         PR fortran/83064
>         PR testsuite/85346
>         * trans-stmt.c (gfc_trans_forall_loop): Use annot_expr_ivdep_kind
>         for annotation and remove dependence on -ftree-parallelize-loops.
> 
> 2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>
> 
>         PR fortran/83064
>         PR testsuite/85346
>         * gfortran.dg/do_concurrent_5.f90: Dynamically allocate main work
>         array and move test to libgomp/testsuite/libgomp.fortran.
>         * gfortran.dg/do_concurrent_6.f90: New test.
> 
> 2018-04-12  Thomas Koenig  <tkoenig@gcc.gnu.org>
> 
>         PR fortran/83064
>         PR testsuite/85346
>         * testsuite/libgomp.fortran: Move modified test from gfortran.dg
>         to here.

Please use full filename here, like:
	* testsuite/libgomp.fortran/do_concurrent_5.f90: New test, moved
	from gfortran.dg.  Make edof array allocatable.

Ok with that change.

> Index: trans-stmt.c
> ===================================================================
> --- trans-stmt.c	(Revision 259326)
> +++ trans-stmt.c	(Arbeitskopie)
> @@ -3643,12 +3643,12 @@ gfc_trans_forall_loop (forall_info *forall_tmp, tr
>        cond = fold_build2_loc (input_location, LE_EXPR, logical_type_node,
>  			      count, build_int_cst (TREE_TYPE (count), 0));
>  
> -      /* PR 83064 means that we cannot use the annotation if the
> -	 autoparallelizer is active.  */
> -      if (forall_tmp->do_concurrent && ! flag_tree_parallelize_loops)
> +      /* PR 83064 means that we cannot use annot_expr_parallel_kind until
> +       the autoparallelizer can hande this.  */
> +      if (forall_tmp->do_concurrent)
>  	cond = build3 (ANNOTATE_EXPR, TREE_TYPE (cond), cond,
>  		       build_int_cst (integer_type_node,
> -				      annot_expr_parallel_kind),
> +				      annot_expr_ivdep_kind),
>  		       integer_zero_node);
>  
>        tmp = build1_v (GOTO_EXPR, exit_label);

> ! { dg-do  run }
> ! PR 83064 - this used to give wrong results.
> ! { dg-additional-options "-O1 -ftree-parallelize-loops=2" }
> ! Original test case by Christian Felter
> 
> program main
>     use, intrinsic :: iso_fortran_env
>     implicit none
> 
>     integer, parameter :: nsplit = 4
>     integer(int64), parameter :: ne = 2**20
>     integer(int64) :: stride, low(nsplit), high(nsplit), i
>     integer(int64), dimension(:), allocatable :: edof
>     real(real64), dimension(nsplit) :: pi
>     
>     allocate (edof(ne))
>     edof(1::4) = 1
>     edof(2::4) = 2
>     edof(3::4) = 3
>     edof(4::4) = 4
>     
>     stride = ceiling(real(ne)/nsplit)
>     do i = 1, nsplit
>         high(i) = stride*i
>     end do
>     do i = 2, nsplit
>         low(i) = high(i-1) + 1
>     end do
>     low(1) = 1
>     high(nsplit) = ne
> 
>     pi = 0
>     do concurrent (i = 1:nsplit)
>         pi(i) = sum(compute( low(i), high(i) ))
>     end do
>     if (abs (sum(pi) - atan(1.0d0)) > 1e-5) STOP 1
>     
> contains
>     
>     pure function compute( low, high ) result( ttt )        
>         integer(int64), intent(in) :: low, high
>         real(real64), dimension(nsplit) :: ttt
>         integer(int64) :: j, k
>         
>         ttt = 0
> 
>         ! Unrolled loop
> !         do j = low, high, 4
> !             k = 1
> !             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                            
> !             k = 2
> !             ttt(k) = ttt(k) + (-1)**(j+2) / real( 2*j+1 )                            
> !             k = 3
> !             ttt(k) = ttt(k) + (-1)**(j+3) / real( 2*j+3 )                            
> !             k = 4
> !             ttt(k) = ttt(k) + (-1)**(j+4) / real( 2*j+5 )                            
> !         end do
>         
>         ! Loop with modulo operation
> !         do j = low, high
> !             k = mod( j, nsplit ) + 1
> !             ttt(k) = ttt(k) + (-1)**(j+1) / real( 2*j-1 )                                        
> !         end do
>         
>         ! Loop with subscripting via host association
>         do j = low, high
>             k = edof(j)
>             ttt(k) = ttt(k) + (-1.0_real64)**(j+1) / real( 2*j-1 )                                        
>         end do
>     end function
>     
> end program main

> ! { dg-do compile }
> ! { dg-additional-options "-fdump-tree-original" }
> 
> program main
>   real, dimension(100) :: a,b
>   call random_number(a)
>   do concurrent (i=1:100)
>      b(i) = a(i)*a(i)
>   end do
>   print *,sum(a)
> end program main
> 
> ! { dg-final { scan-tree-dump-times "ivdep" 1 "original" } }


	Jakub

[fortran] Remove parallell annotation from DO CONCURRENT

Commit Message

Comments

Patch