Message ID | 011f01d8984e$b3a27870$1ae76950$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] PR target/106273: Add earlyclobber to *andn<dwi>3_doubleword_bmi | expand |
On Fri, Jul 15, 2022 at 3:28 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > > This patch resolves PR target/106273 which is a wrong code regression > > caused by the recent reorganization to split doubleword operations after > > reload on x86. For the failing test case, the constraints on the > > andnti3_doubleword_bmi pattern allow reload to allocate the output and > > operand in overlapping but non-identical registers, i.e. > > > > (insn 45 44 66 2 (parallel [ > > (set (reg/v:TI 5 di [orig:96 i ] [96]) > > (and:TI (not:TI (reg:TI 39 r11 [orig:83 _2 ] [83])) > > (reg/v:TI 4 si [orig:100 i ] [100]))) > > (clobber (reg:CC 17 flags)) > > ]) "pr106273.c":13:5 562 {*andnti3_doubleword_bmi} > > > > where the output is in registers 5 and 6, and the second operand is > > registers 4 and 5, which then leads to the incorrect split: > > > > (insn 113 44 114 2 (parallel [ > > (set (reg:DI 5 di [orig:96 i ] [96]) > > (and:DI (not:DI (reg:DI 39 r11 [orig:83 _2 ] [83])) > > (reg:DI 4 si [orig:100 i ] [100]))) > > (clobber (reg:CC 17 flags)) > > ]) "pr106273.c":13:5 566 {*andndi_1} > > > > (insn 114 113 66 2 (parallel [ > > (set (reg:DI 6 bp [ i+8 ]) > > (and:DI (not:DI (reg:DI 40 r12 [ _2+8 ])) > > (reg:DI 5 di [ i+8 ]))) > > (clobber (reg:CC 17 flags)) > > ]) "pr106273.c":13:5 566 {*andndi_1} > > > > [Notice that reg:DI 5 is set in the first instruction, but assumed > > to have its original value in the second]. My first thought was > > that this could be fixed by swapping the order of the split instructions > > (which works in this case), but in the general case, it's impossible > > to handle (set (reg:TI x) (op (reg:TI x+1) (reg:TI x-1)). Hence for > > correctness this pattern needs an earlyclobber "=&r", but we can also > > allow cases where the output is the same as one of the operands (using > > constraint "0"). The other binary logic operations (AND, IOR, XOR) > > are unaffected as they constrain the output to match the first > > operand, but BMI's andn is a three-operand instruction which can > > lead to the overlapping cases described above. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} with > > no new failures. Ok for mainline? > > > > 2022-07-15 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/106273 > > * config/i386/i386.md (*andn<dwi>3_doubleword_bmi): Update the > > constraints to reflect the output is earlyclobber, unless it is > > the same register (pair) as one of the operands. > > > > gcc/testsuite/ChangeLog > > PR target/106273 > > * gcc.target/i386/pr106273.c: New test case. OK with a small testcase adjustment. Thanks, Uros. +int +main (void) If possible, please name this function "bar", the "main" function assumes a runtime test. +{ + u8 x; + foo (5, &x); + if (x != 5) + __builtin_abort (); + return 0; +} > > > > > > Thanks again, and sorry for the inconvenience. > > Roger > > -- > >
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 3b02d0c..585b2d5 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10423,10 +10423,10 @@ }) (define_insn_and_split "*andn<dwi>3_doubleword_bmi" - [(set (match_operand:<DWI> 0 "register_operand" "=r") + [(set (match_operand:<DWI> 0 "register_operand" "=&r,r,r") (and:<DWI> - (not:<DWI> (match_operand:<DWI> 1 "register_operand" "r")) - (match_operand:<DWI> 2 "nonimmediate_operand" "ro"))) + (not:<DWI> (match_operand:<DWI> 1 "register_operand" "r,0,r")) + (match_operand:<DWI> 2 "nonimmediate_operand" "ro,ro,0"))) (clobber (reg:CC FLAGS_REG))] "TARGET_BMI" "#" diff --git a/gcc/testsuite/gcc.target/i386/pr106273.c b/gcc/testsuite/gcc.target/i386/pr106273.c new file mode 100644 index 0000000..8c2fbbb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr106273.c @@ -0,0 +1,27 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-Og -march=cascadelake" } */ +typedef unsigned char u8; +typedef unsigned short u16; +typedef unsigned long long u64; + +u8 g; + +void +foo (__int128 i, u8 *r) +{ + u16 a = __builtin_sub_overflow_p (0, i * g, 0); + i ^= g & i; + u64 s = (i >> 64) + i; + *r = ((union { u16 a; u8 b[2]; }) a).b[1] + s; +} + +int +main (void) +{ + u8 x; + foo (5, &x); + if (x != 5) + __builtin_abort (); + return 0; +} +/* { dg-final { scan-assembler-not "andn\[ \\t\]+%rdi, %r11, %rdi" } } */