Message ID | 63635e04-3faa-9b89-ca42-0a3b190f4106@arm.com |
---|---|
State | New |
Headers | show |
Series | [vect] Disable vectorization of epilogues for loops with SIMDUID set | expand |
On Mon, 4 Nov 2019, Andre Vieira (lists) wrote: > Hi, > > I was using loop->simdlen to detect whether it was a SIMD loop and I don't > believe that was correct, as can be witnessed by the mass failures in libgomp. > My apologies for not running this, didn't think of it! > > I found that these were failing because we do not handle vectorization of > epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable > epilogue vectorization for loops where SIMDUID is set until we have fixed > this. See further comments inline. > > I bootstrapped it on aarch64 and x86_64, ran libgomp on both. > > This OK for trunk? OK. Can you remove the simdlen == 0 check as a followup? Thanks, Richard. > Cheers, > Andre > > gcc/ChangeLog: > 2019-11-04 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * tree-vect-loop.c (vect_analyze_loop): Disable epilogue > vectorization if loop->simduid is non null. > > On 31/10/2019 16:58, Jakub Jelinek wrote: > > > FAIL: libgomp.c/../libgomp.c-c++-common/loop-1.c execution test > > FAIL: libgomp.c/examples-4/simd-3.c execution test > > FAIL: libgomp.c/pr58392.c execution test > > FAIL: libgomp.c/scan-13.c execution test > > FAIL: libgomp.c/scan-17.c execution test > > FAIL: libgomp.c/scan-19.c execution test > > FAIL: libgomp.c/scan-20.c execution test > > FAIL: libgomp.c/simd-10.c execution test > > FAIL: libgomp.c/simd-12.c execution test > > FAIL: libgomp.c/simd-13.c execution test > > FAIL: libgomp.c/simd-6.c execution test > > FAIL: libgomp.c++/../libgomp.c-c++-common/loop-1.c execution test > > FAIL: libgomp.c++/simd-8.C execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O1 execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O2 execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O3 -fomit-frame-pointer > > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions execution > > FAIL: test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O3 -g execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -Os execution test > > FAIL: libgomp.fortran/nestedfn5.f90 -O1 execution test > > FAIL: libgomp.fortran/nestedfn5.f90 -O2 execution test > > FAIL: libgomp.fortran/nestedfn5.f90 -O3 -fomit-frame-pointer > > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions execution > > FAIL: test > > FAIL: libgomp.fortran/nestedfn5.f90 -O3 -g execution test > > FAIL: libgomp.fortran/nestedfn5.f90 -Os execution test > > These should go away now, but we should revisit SIMDUID and epilogue > vectorization later. I tried to look into it, but I am afraid I know very > little about SIMD loops to figure out how to make this work. > > > > On i686-linux, I also see newly > > FAIL: gcc.dg/vect/vect-epilogues.c -flto -ffat-lto-objects scan-tree-dump > > FAIL: vect "LOOP EPILOGUE VECTORIZED" > > FAIL: gcc.dg/vect/vect-epilogues.c scan-tree-dump vect "LOOP EPILOGUE > > FAIL: VECTORIZED" > > and in libgomp just > > These, just like for arm should be skipped for i686, I suspect it doesn't know > how to vectorize for lower VF's. Could someone add the appropriate skip > target triple for i686? > > > FAIL: libgomp.c/examples-4/simd-3.c execution test > > FAIL: libgomp.c/scan-13.c execution test > > FAIL: libgomp.c/scan-17.c execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O1 execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O2 execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O3 -fomit-frame-pointer > > FAIL: -funroll-loops -fpeel-loops -ftracer -finline-functions execution > > FAIL: test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -O3 -g execution test > > FAIL: libgomp.fortran/examples-4/simd-3.f90 -Os execution test > > Same as the other libgomp tests. Should go away now. > > >
On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote: > > I was using loop->simdlen to detect whether it was a SIMD loop and I don't > > believe that was correct, as can be witnessed by the mass failures in libgomp. > > My apologies for not running this, didn't think of it! > > > > I found that these were failing because we do not handle vectorization of > > epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable > > epilogue vectorization for loops where SIMDUID is set until we have fixed > > this. See further comments inline. > > > > I bootstrapped it on aarch64 and x86_64, ran libgomp on both. > > > > This OK for trunk? > > OK. Can you remove the simdlen == 0 check as a followup? Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is. All a non-zero simdlen says is a user assertion that certain inter-loop depencencies don't exist. Jakub
Hi, Rebased the patch on top of Richard Sandiford's patches, with his fixes I can now allow for vectorization of epilogues after we match simdlen. This will however not turn on epilogue vectorization in cases where we specify a desired simdlen that is never matched. This would require more work as before simdlen is matched we would need to analyze each vector_size after creating a "first_loop_vinfo" twice: once as an epilogue (for in the case we never match simdlen) and once as a main loop (in case simdlen would match its VF). Maybe there is a different way of doing it but I don't see it right now. Bootstrapped and regression tested (also ran libgomp tests) for x86_64 and aarch64. Currently libgomp has 5 failures for aarch64, these are all openacc tests. The first one I looked at is due to a reduction seemingly performing too many iterations when defining '$acc parallel vector_length(vl)' I am looking into it. Is this OK for trunk? Cheers, Andre gcc/ChangeLog: 2019-11-07 Andre Vieira <andre.simoesdiasvieira@arm.com> * tree-vect-loop.c (vect_analyze_loop): Disable epilogue vectorization for loops with SIMDUID set. Enable epilogue vectorization for loops with SIMDLEN set after finding a main loop with a VF that matches it. On 05/11/2019 07:16, Jakub Jelinek wrote: > On Tue, Nov 05, 2019 at 08:07:53AM +0100, Richard Biener wrote: >>> I was using loop->simdlen to detect whether it was a SIMD loop and I don't >>> believe that was correct, as can be witnessed by the mass failures in libgomp. >>> My apologies for not running this, didn't think of it! >>> >>> I found that these were failing because we do not handle vectorization of >>> epilogues correctly when SIMDUID is set. For now Jakub and I agreed to disable >>> epilogue vectorization for loops where SIMDUID is set until we have fixed >>> this. See further comments inline. >>> >>> I bootstrapped it on aarch64 and x86_64, ran libgomp on both. >>> >>> This OK for trunk? >> >> OK. Can you remove the simdlen == 0 check as a followup? > > Yeah, exactly, I wanted to ask what the point of the simdlen == 0 check is. > All a non-zero simdlen says is a user assertion that certain inter-loop > depencencies don't exist. > > Jakub >
On Thu, Nov 07, 2019 at 02:26:29PM +0000, Andre Vieira (lists) wrote: > 2019-11-07 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * tree-vect-loop.c (vect_analyze_loop): Disable epilogue > vectorization for loops with SIMDUID set. Enable epilogue > vectorization for loops with SIMDLEN set after finding a main > loop with a VF that matches it. > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index dfa087ebb2cf01a5d21da0a921f8b6fc3d691ce9..22550ca2d6c56cce201ea422bfae5472a0d85f3a 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2455,11 +2455,15 @@ vect_analyze_loop (class loop *loop, vec_info_shared *shared) > delete loop_vinfo; > > /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is > - enabled, this is not a simd loop and it is the innermost loop. */ > - vect_epilogues = (!loop->simdlen > + enabled, SIMDUID is not set, it is the innermost loop and we have > + either already found the loop's SIMDLEN or there was no SIMDLEN to > + begin with. > + TODO: Enable epilogue vectorization for loops with SIMDUID set. */ > + vect_epilogues = (!simdlen > && loop->inner == NULL > && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK) > && LOOP_VINFO_PEELING_FOR_NITER (first_loop_vinfo) > + && !loop->simduid > /* For now only allow one epilogue loop. */ > && first_loop_vinfo->epilogue_vinfos.is_empty ()); LGTM. Jakub
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index fa873e9b435037e5a81dda6615cab809d2d4de48..d3a0fa015332dbcccf84bc68531dc1e49550cc19 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2383,10 +2383,11 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, poly_uint64 lowest_th = 0; unsigned vectorized_loops = 0; - /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, this - is not a simd loop and it is the most inner loop. */ + /* Only vectorize epilogues if PARAM_VECT_EPILOGUES_NOMASK is enabled, + SIMDLEN is not set, SIMDUID is not set and this is the most inner loop. + TODO: Enable epilogue vectorization for loops with SIMDUID set. */ bool vect_epilogues - = !loop->simdlen && loop->inner == NULL + = loop->simdlen == 0 && loop->simduid == NULL_TREE && loop->inner == NULL && PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK); while (1) {