diff mbox series

[vect] Disable vectorization of epilogues for loops with SIMDUID set

Message ID 63635e04-3faa-9b89-ca42-0a3b190f4106@arm.com
State New
Headers show
Series [vect] Disable vectorization of epilogues for loops with SIMDUID set | expand

Commit Message

Andre Vieira (lists) Nov. 4, 2019, 5:11 p.m. UTC
Hi,

I was using loop->simdlen to detect whether it was a SIMD loop and I 
don't believe that was correct, as can be witnessed by the mass failures 
in libgomp. My apologies for not running this, didn't think of it!

I found that these were failing because we do not handle vectorization 
of epilogues correctly when SIMDUID is set. For now Jakub and I agreed 
to disable epilogue vectorization for loops where SIMDUID is set until 
we have fixed this. See further comments inline.

I bootstrapped it on aarch64 and x86_64, ran libgomp on both.

This OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-04  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
         vectorization if loop->simduid is non null.

On 31/10/2019 16:58, Jakub Jelinek wrote:

> FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test
> FAIL: libgomp.c/examples-4/simd-3.c execution test
> FAIL: libgomp.c/pr58392.c execution test
> FAIL: libgomp.c/scan-13.c execution test
> FAIL: libgomp.c/scan-17.c execution test
> FAIL: libgomp.c/scan-19.c execution test
> FAIL: libgomp.c/scan-20.c execution test
> FAIL: libgomp.c/simd-10.c execution test
> FAIL: libgomp.c/simd-12.c execution test
> FAIL: libgomp.c/simd-13.c execution test
> FAIL: libgomp.c/simd-6.c execution test
> FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test
> FAIL: libgomp.c++/simd-8.C execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
> FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test
> FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test
> FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test
> FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test

These should go away now, but we should revisit SIMDUID and epilogue 
vectorization later.  I tried to look into it, but I am afraid I know 
very little about SIMD loops to figure out how to make this work.
> 
> On i686-linux, I also see newly
> FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump vect "LOOP EPILOGUE VECTORIZED"
> FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE VECTORIZED"
> and in libgomp just

These, just like for arm should be skipped for i686, I suspect it 
doesn't know how to vectorize for lower VF's.  Could someone add the 
appropriate skip target triple for i686?

> FAIL: libgomp.c/examples-4/simd-3.c execution test
> FAIL: libgomp.c/scan-13.c execution test
> FAIL: libgomp.c/scan-17.c execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test

Same as the other libgomp tests. Should go away now.

Comments

Richard Biener Nov. 5, 2019, 7:07 a.m. UTC | #1
On Mon, 4 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> I was using loop->simdlen to detect whether it was a SIMD loop and I don't
> believe that was correct, as can be witnessed by the mass failures in libgomp.
> My apologies for not running this, didn't think of it!
> 
> I found that these were failing because we do not handle vectorization of
> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
> epilogue vectorization for loops where SIMDUID is set until we have fixed
> this. See further comments inline.
> 
> I bootstrapped it on aarch64 and x86_64, ran libgomp on both.
> 
> This OK for trunk?

OK.  Can you remove the simdlen == 0 check as a followup?

Thanks,
Richard.

> Cheers,
> Andre
> 
> gcc/ChangeLog:
> 2019-11-04  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
>         vectorization if loop->simduid is non null.
> 
> On 31/10/2019 16:58, Jakub Jelinek wrote:
> 
> > FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test
> > FAIL: libgomp.c/examples-4/simd-3.c execution test
> > FAIL: libgomp.c/pr58392.c execution test
> > FAIL: libgomp.c/scan-13.c execution test
> > FAIL: libgomp.c/scan-17.c execution test
> > FAIL: libgomp.c/scan-19.c execution test
> > FAIL: libgomp.c/scan-20.c execution test
> > FAIL: libgomp.c/simd-10.c execution test
> > FAIL: libgomp.c/simd-12.c execution test
> > FAIL: libgomp.c/simd-13.c execution test
> > FAIL: libgomp.c/simd-6.c execution test
> > FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test
> > FAIL: libgomp.c++/simd-8.C execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O1  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O2  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/nestedfn5.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/nestedfn5.f90   -Os  execution test
> 
> These should go away now, but we should revisit SIMDUID and epilogue
> vectorization later.  I tried to look into it, but I am afraid I know very
> little about SIMD loops to figure out how to make this work.
> > 
> > On i686-linux, I also see newly
> > FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects  scan-tree-dump
> > FAIL: vect "LOOP EPILOGUE VECTORIZED"
> > FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE
> > FAIL: VECTORIZED"
> > and in libgomp just
> 
> These, just like for arm should be skipped for i686, I suspect it doesn't know
> how to vectorize for lower VF's.  Could someone add the appropriate skip
> target triple for i686?
> 
> > FAIL: libgomp.c/examples-4/simd-3.c execution test
> > FAIL: libgomp.c/scan-13.c execution test
> > FAIL: libgomp.c/scan-17.c execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O1  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O2  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -fomit-frame-pointer
> > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> > FAIL: test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -O3 -g  execution test
> > FAIL: libgomp.fortran/examples-4/simd-3.f90   -Os  execution test
> 
> Same as the other libgomp tests. Should go away now.
> 
> 
>
Jakub Jelinek Nov. 5, 2019, 7:16 a.m. UTC | #2
On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:
> > I was using loop->simdlen to detect whether it was a SIMD loop and I don't
> > believe that was correct, as can be witnessed by the mass failures in libgomp.
> > My apologies for not running this, didn't think of it!
> > 
> > I found that these were failing because we do not handle vectorization of
> > epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
> > epilogue vectorization for loops where SIMDUID is set until we have fixed
> > this. See further comments inline.
> > 
> > I bootstrapped it on aarch64 and x86_64, ran libgomp on both.
> > 
> > This OK for trunk?
> 
> OK.  Can you remove the simdlen == 0 check as a followup?

Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
All a non-zero simdlen says is a user assertion that certain inter-loop
depencencies don't exist.

	Jakub
Andre Vieira (lists) Nov. 7, 2019, 2:26 p.m. UTC | #3
Hi,

Rebased the patch on top of Richard Sandiford's patches, with his fixes 
I can now allow for vectorization of epilogues after we match simdlen. 
This will however not turn on epilogue vectorization in cases where we 
specify a desired simdlen that is never matched.  This would require 
more work as before simdlen is matched we would need to analyze each 
vector_size after creating a "first_loop_vinfo" twice: once as an 
epilogue (for in the case we never match simdlen) and once as a main 
loop (in case simdlen would match its VF).  Maybe there is a different 
way of doing it but I don't see it right now.

Bootstrapped and regression tested (also ran libgomp tests) for x86_64 
and aarch64. Currently libgomp has 5 failures for aarch64, these are all 
openacc tests. The first one I looked at is due to a reduction seemingly 
performing too many iterations when defining '$acc parallel 
vector_length(vl)' I am looking into it.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog:
2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>

         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
         vectorization for loops with SIMDUID set.  Enable epilogue
         vectorization for loops with SIMDLEN set after finding a main
         loop with a VF that matches it.

On 05/11/2019 07:16, Jakub Jelinek wrote:
> On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote:
>>> I was using loop->simdlen to detect whether it was a SIMD loop and I don't
>>> believe that was correct, as can be witnessed by the mass failures in libgomp.
>>> My apologies for not running this, didn't think of it!
>>>
>>> I found that these were failing because we do not handle vectorization of
>>> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable
>>> epilogue vectorization for loops where SIMDUID is set until we have fixed
>>> this. See further comments inline.
>>>
>>> I bootstrapped it on aarch64 and x86_64, ran libgomp on both.
>>>
>>> This OK for trunk?
>>
>> OK.  Can you remove the simdlen == 0 check as a followup?
> 
> Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is.
> All a non-zero simdlen says is a user assertion that certain inter-loop
> depencencies don't exist.
> 
> 	Jakub
>
Jakub Jelinek Nov. 7, 2019, 2:32 p.m. UTC | #4
On Thu, Nov 07, 2019 at 02:26:29PM +0000, Andre Vieira (lists) wrote:
> 2019-11-07  Andre Vieira  <andre.simoesdiasvieira@arm.com>
> 
>         * tree-vect-loop.c (vect_analyze_loop): Disable epilogue
>         vectorization for loops with SIMDUID set.  Enable epilogue
>         vectorization for loops with SIMDLEN set after finding a main
>         loop with a VF that matches it.
> 
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared)
>  	    delete loop_vinfo;
>  
>  	  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is
> -	     enabled, this is not a simd loop and it is the innermost loop.  */
> -	  vect_epilogues = (!loop->simdlen
> +	     enabled, SIMDUID is not set, it is the innermost loop and we have
> +	     either already found the loop's SIMDLEN or there was no SIMDLEN to
> +	     begin with.
> +	     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
> +	  vect_epilogues = (!simdlen
>  			    && loop->inner == NULL
>  			    && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)
>  			    && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo)
> +			    && !loop->simduid
>  			    /* For now only allow one epilogue loop.  */
>  			    && first_loop_vinfo->epilogue_vinfos.is_empty ());

LGTM.

	Jakub
diff mbox series

Patch

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index fa873e9b435037e5a81dda6615cab809d2d4de48..d3a0fa015332dbcccf84bc68531dc1e49550cc19 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2383,10 +2383,11 @@  vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo,
   poly_uint64 lowest_th = 0;
   unsigned vectorized_loops = 0;
 
-  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, this
-     is not a simd loop and it is the most inner loop.  */
+  /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled,
+     SIMDLEN is not set, SIMDUID is not set and this is the most inner loop.
+     TODO: Enable epilogue vectorization for loops with SIMDUID set.  */
   bool vect_epilogues
-    = !loop->simdlen && loop->inner == NULL
+    = loop->simdlen == 0 && loop->simduid == NULL_TREE && loop->inner == NULL
       && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK);
   while (1)
     {