Message ID | 6569d067-8ebc-8360-61a8-499a3700d216@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: Fix PR91275 | expand |
Hi Bill, On Mon, Sep 30, 2019 at 08:24:09PM -0500, Bill Schmidt wrote: > PR91275 observes that __builtin_crypto_vpmsumd fails to work properly > with -O1 or higher with -mcpu=power8. That combination spells swap > optimization. Indeed, all vpmsum* instructions were being accepted > as swappable operations. This is correct for all of them but vpmsumd, > which creates a 128-bit result. Yeah, that is obvious once you see it spelled out :-) > The -std=gnu11 in the testcase is there to avoid errors about long long > not being accepted with pure ANSI. The "11" part is arbitrary. The > testcase is modified from the original bug report. That is fine. > This patch disallows swap optimization in the presence of vpmsumd. > Bootstrapped and tested on powerpc64le-unknown-linux-gnu. Is this okay > for trunk, and for backport to all active branches after an appropriate > waiting period? Yes please. Thanks, Segher > 2019-09-30 Bill Schmidt <wschmidt@linux.ibm.com> > > * config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Don't swap > vpmsumd. > > [gcc/testsuite] > > 2019-09-30 Bill Schmidt <wschmdit@linux.ibm.com> > > * gcc.target/powerpc/pr91275.c: New.
Index: gcc/config/rs6000/rs6000-p8swap.c =================================================================== --- gcc/config/rs6000/rs6000-p8swap.c (revision 276360) +++ gcc/config/rs6000/rs6000-p8swap.c (working copy) @@ -791,6 +791,11 @@ rtx_is_swappable_p (rtx op, unsigned int *special) case UNSPEC_REDUC_PLUS: case UNSPEC_REDUC: return 1; + case UNSPEC_VPMSUM: + /* vpmsumd is not swappable, but vpmsum[bhw] are. */ + if (GET_MODE (op) == V2DImode) + return 0; + break; } } Index: gcc/testsuite/gcc.target/powerpc/pr91275.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr91275.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr91275.c (working copy) @@ -0,0 +1,21 @@ +/* Test that we generate vpmsumd correctly without a swap error. */ + +/* { dg-do run { target { p8vector_hw } } } */ +/* { dg-options "-O2 -std=gnu11" } */ + +#include <altivec.h> + +int main() { + + const unsigned long long r0l = 0x8e7dfceac070e3a0; + vector unsigned long long r0 = (vector unsigned long long) {r0l, 0}, v; + const vector unsigned long long pd + = (vector unsigned long) {0xc2LLU << 56, 0}; + + v = __builtin_crypto_vpmsumd ((vector unsigned long long) {r0[0], 0}, pd); + + if (v[0] != 0x4000000000000000 || v[1] != 0x65bd7ab605a4a8ff) + __builtin_abort (); + + return 0; +}