Message ID | 24684412-042a-0842-3a5e-1e62c0ae3691@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | rs6000: MMA test case emits wrong code when building a vector pair | expand |
I'd like to ping the following patch. Peter On 10/27/21 8:37 PM, Peter Bergner via Gcc-patches wrote: > PR102976 shows a test case where we generate wrong code when building > a vector pair from 2 vector registers. The bug here is that with unlucky > register assignments, we can clobber one of the input operands before > we write both registers of the output operand. The solution is to use > early-clobbers in the assemble pair and accumulator patterns. > > This passed bootstrap and regtesting with no regressions and our > OpenBLAS team has confirmed it fixes the issues they reported. > Ok for mainline? > > Ok for GCC 11 too after a few days on trunk? > > Peter > > > gcc/ > PR target/102976 > * config/rs6000/mma.md (*vsx_assemble_pair): Add early-clobber for > output operand. > (*mma_assemble_acc): Likewise. > > gcc/testsuite/ > PR target/102976 > * gcc.target/powerpc/pr102976.c: New test. > > diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md > index 1990a2183f6..f0ea99963f7 100644 > --- a/gcc/config/rs6000/mma.md > +++ b/gcc/config/rs6000/mma.md > @@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair" > }) > > (define_insn_and_split "*vsx_assemble_pair" > - [(set (match_operand:OO 0 "vsx_register_operand" "=wa") > + [(set (match_operand:OO 0 "vsx_register_operand" "=&wa") > (unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa") > (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")] > UNSPEC_MMA_ASSEMBLE))] > @@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc" > }) > > (define_insn_and_split "*mma_assemble_acc" > - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") > + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") > (unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa") > (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa") > (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa") > diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c b/gcc/testsuite/gcc.target/powerpc/pr102976.c > new file mode 100644 > index 00000000000..a8de8f056f1 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c > @@ -0,0 +1,14 @@ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ > + > +#include <altivec.h> > +void > +bug (__vector_pair *dst) > +{ > + register vector unsigned char vec0 asm ("vs44"); > + register vector unsigned char vec1 asm ("vs32"); > + __builtin_vsx_build_pair (dst, vec0, vec1); > +} > + > +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */ > +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */ >
On Wed, Oct 27, 2021 at 08:37:57PM -0500, Peter Bergner wrote: > PR102976 shows a test case where we generate wrong code when building > a vector pair from 2 vector registers. The bug here is that with unlucky > register assignments, we can clobber one of the input operands before > we write both registers of the output operand. The solution is to use > early-clobbers in the assemble pair and accumulator patterns. Because of what insns there are after the split. Aha. Please add a comment explaining this, near the earlyclobber itself. A usually nicer way of doing it is by special casing the split code for this situation. But with the comment in place the way you do it might even be preferable here :-) > +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */ > +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */ Bracket expressions using ^ match newlines as well, unless you use (partial) newline-sensitive matching. Partial is almost always what you want, so start the regex with (?p) ? You also want to add some \m and \M btw. For example, as written his will match xxlorc insns as well. Not a super big deal, but :-) You can just write this as {\mxxlor \d+,44,44\M} etc., that will be simplest I think. Okay for trunk with comments added near the earlyclobber, and the RE improved. Also fine for 11 after some burn-in. Thanks! Segher
On 11/13/21 7:25 AM, Segher Boessenkool wrote: > On Wed, Oct 27, 2021 at 08:37:57PM -0500, Peter Bergner wrote: >> PR102976 shows a test case where we generate wrong code when building >> a vector pair from 2 vector registers. The bug here is that with unlucky >> register assignments, we can clobber one of the input operands before >> we write both registers of the output operand. The solution is to use >> early-clobbers in the assemble pair and accumulator patterns. > > Because of what insns there are after the split. Aha. > > Please add a comment explaining this, near the earlyclobber itself. Done for both patterns. > You can just write this as {\mxxlor \d+,44,44\M} etc., that will be > simplest I think. Done and tested that it still works. > Okay for trunk with comments added near the earlyclobber, and the RE > improved. Also fine for 11 after some burn-in. Thanks! Ok, I pushed with both changes. I'll push a change to GCC11 in a few days. Thanks! Peter
diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 1990a2183f6..f0ea99963f7 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -339,7 +339,7 @@ (define_expand "vsx_assemble_pair" }) (define_insn_and_split "*vsx_assemble_pair" - [(set (match_operand:OO 0 "vsx_register_operand" "=wa") + [(set (match_operand:OO 0 "vsx_register_operand" "=&wa") (unspec:OO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa") (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa")] UNSPEC_MMA_ASSEMBLE))] @@ -405,7 +405,7 @@ (define_expand "mma_assemble_acc" }) (define_insn_and_split "*mma_assemble_acc" - [(set (match_operand:XO 0 "fpr_reg_operand" "=d") + [(set (match_operand:XO 0 "fpr_reg_operand" "=&d") (unspec:XO [(match_operand:V16QI 1 "mma_assemble_input_operand" "mwa") (match_operand:V16QI 2 "mma_assemble_input_operand" "mwa") (match_operand:V16QI 3 "mma_assemble_input_operand" "mwa") diff --git a/gcc/testsuite/gcc.target/powerpc/pr102976.c b/gcc/testsuite/gcc.target/powerpc/pr102976.c new file mode 100644 index 00000000000..a8de8f056f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr102976.c @@ -0,0 +1,14 @@ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#include <altivec.h> +void +bug (__vector_pair *dst) +{ + register vector unsigned char vec0 asm ("vs44"); + register vector unsigned char vec1 asm ("vs32"); + __builtin_vsx_build_pair (dst, vec0, vec1); +} + +/* { dg-final { scan-assembler-times {xxlor[^,]*,44,44} 1 } } */ +/* { dg-final { scan-assembler-times {xxlor[^,]*,32,32} 1 } } */