Message ID | 6b8319c5-9bb4-78f4-7abc-92ad91bdd781@arm.com |
---|---|
State | New |
Headers | show |
Series | [vect] Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue | expand |
On Fri, 8 Nov 2019, Andre Vieira (lists) wrote: > Hi, > > As I mentioned in the patch to disable epilogue vectorization for loops with > SIMDUID set, there were still some aarch64 libgomp failures. This patch fixes > those. > > The problem was that we were vectorizing a reduction that was only using one > of the parts from a complex number, creating data accesses with gaps. For this > we set PEELING_FOR_GAPS which forces us to peel an extra scalar iteration. > > What was happening in the testcase I looked at was that we had a known niters > of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations. The epilogue > had VF 2, which meant the current code thought we could do it. However, given > the PEELING_FOR_GAPS it would create a scalar epilogue and we would end up > doing too many iterations, surprisingly 12 as I think the code assumed we > hadn't created said epilogue. > > I ran a local check where I upped the iterations of the fortran test to 11 and > I see GCC vectorizing the epilogue with VF = 2 and a scalar epilogue for one > iteration, so that looks good too. I have transformed it into a test that > would reproduce the issue in C and without openacc so I can run it in gcc's > normal testsuite more easily. > > Bootstrap on aarch64 and x86_64. > > Is this OK for trunk? OK. Richard. > Cheers, > Andre > > gcc/ChangeLog: > 2019-11-08 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps > into account when checking if there are enough iterations to > vectorize epilogue. > > gcc/testsuite/ChangeLog: > 2019-11-08 Andre Vieira <andre.simoesdiasvieira@arm.com> > > * gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test. > >
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c new file mode 100644 index 0000000000000000000000000000000000000000..dc5704f5607fc94fd02036e5db4a9bd37ce5169b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-epilogue-gaps.c @@ -0,0 +1,45 @@ +/* { dg-options "-O3 -fno-vect-cost-model" } */ +struct { + float real; + float img; +} g[11]; + +float __attribute__ ((noclone)) +foo_11 (void) +{ + float sum = 0.0; + for (int i = 0; i < 11; ++i) + sum += g[i].real; + return sum; +} + +float __attribute__ ((noclone)) +foo_10 (void) +{ + float sum = 0.0; + for (int i = 0; i < 10; ++i) + sum += g[i].real; + return sum; +} + +int main (void) +{ + float check_10 = 0.0; + float check_11 = 0.0; + for (int i = 0; i < 11; ++i) + { + asm volatile ("" : : : "memory"); + g[i].real = (float) i; + g[i].img = (float) -i; + if (i < 10) + check_10 += (float) i; + check_11 += (float) i; + } + + if (foo_10 () != check_10) + __builtin_abort (); + if (foo_11 () != check_11) + __builtin_abort (); + + return 0; +} diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 54f3ccf3ec373b5621e7778e6e80bab853a57687..559d59bbe78738e53e7e6c1d64e7f87eed255d76 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2530,9 +2530,11 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, = eiters % lowest_vf + LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo); unsigned int ratio; + unsigned int epilogue_gaps + = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); while (!(constant_multiple_p (loop_vinfo->vector_size, epilogue_vinfo->vector_size, &ratio) - && eiters >= lowest_vf / ratio)) + && eiters >= lowest_vf / ratio + epilogue_gaps)) { delete epilogue_vinfo; epilogue_vinfo = NULL; @@ -2543,6 +2545,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, } epilogue_vinfo = loop_vinfo->epilogue_vinfos[0]; loop_vinfo->epilogue_vinfos.ordered_remove (0); + epilogue_gaps = LOOP_VINFO_PEELING_FOR_GAPS (epilogue_vinfo); } } /* Prolog loop may be skipped. */