Message ID | yddedc4exp0.fsf@CeBiTec.Uni-Bielefeld.DE |
---|---|
State | New |
Headers | show |
Series | testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147] | expand |
On Thu, 21 Mar 2024, Rainer Orth wrote: > gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC: > > XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump slp2 "vectorization is not profitable" > XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not profitable" > > At least on SPARC, the current xfail can simply go, but I'm highly > uncertain if this is right in general. > > Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11. > > Ok for trunk? The condition was made for the case where vectorization fails even when not considering costing. But given we now do p = __builtin_assume_aligned (p, __BIGGEST_ALIGNMENT__); that condition doesn't make sense anymore (forgot to update it in my r11-6715-gb36c9cd09472c8 change). In principle the testcase should be profitable to vectorize with the SLP reduction support now (and we'd vectorize it that way). But we fail to apply SLP node CSE when merging the SLP instance into a common subgraph, so we over-estimate cost (and perform double code generation that's later CSEd). That it's still not profitable on x86_64 for me is a quite narrow loss: Vector cost: 144 Scalar cost: 140 So ideally we'd key the FAIL on .REDUC_PLUS not being available for V4SImode but then we also try V2SImode where the reduction isn't recognized. So the testcase wouldn't work well for targets comparing cost. I'd say we remove the dg-final completely for now. I filed PR114413 about the costing/CSE issue above. Richard.
Hi Richard, > On Thu, 21 Mar 2024, Rainer Orth wrote: > >> gcc.dg/vect/bb-slp-32.c currently XPASSes on 32 and 64-bit Solaris/SPARC: >> >> XPASS: gcc.dg/vect/bb-slp-32.c -flto -ffat-lto-objects scan-tree-dump >> slp2 "vectorization is not profitable" >> XPASS: gcc.dg/vect/bb-slp-32.c scan-tree-dump slp2 "vectorization is not >> profitable" >> >> At least on SPARC, the current xfail can simply go, but I'm highly >> uncertain if this is right in general. >> >> Tested on sparc-sun-solaris2.11 and i386-pc-solaris2.11. >> >> Ok for trunk? > > The condition was made for the case where vectorization fails even when > not considering costing. But given we now do > > p = __builtin_assume_aligned (p, __BIGGEST_ALIGNMENT__); > > that condition doesn't make sense anymore (forgot to update it in my > r11-6715-gb36c9cd09472c8 change). > > In principle the testcase should be profitable to vectorize with > the SLP reduction support now (and we'd vectorize it that way). > But we fail to apply SLP node CSE when merging the SLP instance > into a common subgraph, so we over-estimate cost (and perform > double code generation that's later CSEd). > > That it's still not profitable on x86_64 for me is a quite narrow loss: > > Vector cost: 144 > Scalar cost: 140 > > So ideally we'd key the FAIL on .REDUC_PLUS not being available for > V4SImode but then we also try V2SImode where the reduction isn't > recognized. So the testcase wouldn't work well for targets comparing > cost. > > I'd say we remove the dg-final completely for now. I filed PR114413 > about the costing/CSE issue above. Thanks. This is what I committed after re-testing. Rainer
# HG changeset patch # Parent b3b6fa4472bc1f2b170e2b736852ec93bae94480 testsuite: vect: Don't xfail scan-tree-dump in gcc.dg/vect/bb-slp-32.c [PR96147] diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c --- a/gcc/testsuite/gcc.dg/vect/bb-slp-32.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-32.c @@ -25,4 +25,4 @@ int foo (int *p, int a, int b) return sum; } -/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ +/* { dg-final { scan-tree-dump "vectorization is not profitable" "slp2" } } */