Message ID | 02a401d9b433$47f0de80$d7d29b80$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | [x86] Fix FAIL of gcc.target/i386/pr91681-1.c | expand |
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > The recent change in TImode parameter passing on x86_64 results in the > FAIL of pr91681-1.c. The issue is that with the extra flexibility, > the combine pass is now spoilt for choice between using either the > *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext > patterns, when one operand is a *concat and the other is a zero_extend. > The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext > define_insn_and_split, that can benefit both from the register allocation > of *concat, and still avoid the xor normally required by zero extension. > > I'm investigating a follow-up refinement to improve register allocation > further by avoiding the early clobber in the =&r, and handling (custom) > reloads explicitly, but this piece resolves the testcase failure. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/91681 > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New > define_insn_and_split derived from *add<dwi>3_doubleword_concat > and *add<dwi>3_doubleword_zext. OK. Thanks, Uros. > > > Thanks, > Roger > -- >
> The recent change in TImode parameter passing on x86_64 results in the FAIL > of pr91681-1.c. The issue is that with the extra flexibility, the combine pass is > now spoilt for choice between using either the > *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext > patterns, when one operand is a *concat and the other is a zero_extend. > The solution proposed below is provide an > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can > benefit both from the register allocation of *concat, and still avoid the xor > normally required by zero extension. > > I'm investigating a follow-up refinement to improve register allocation > further by avoiding the early clobber in the =&r, and handling (custom) > reloads explicitly, but this piece resolves the testcase failure. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and > make -k check, both with and without --target_board=unix{-m32} with no > new failures. Ok for mainline? > > > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/91681 > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New > define_insn_and_split derived from *add<dwi>3_doubleword_concat > and *add<dwi>3_doubleword_zext. Hi Roger, This commit currently changed the codegen of testcase p443644-2.c from: movq %rdx, %rax xorl %edx, %edx addq %rdi, %rax adcq %rsi, %rdx to: movq %rdx, %rcx movq %rdi, %rax movq %rsi, %rdx addq %rcx, %rax adcq $0, %rdx which causes the testcase fail under -m64. Is this within your expectation? BRs, Haochen > > > Thanks, > Roger > --
> -----Original Message----- > From: Jiang, Haochen > Sent: Friday, July 14, 2023 10:50 AM > To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org > Cc: 'Uros Bizjak' <ubizjak@gmail.com> > Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c > > > The recent change in TImode parameter passing on x86_64 results in the > > FAIL of pr91681-1.c. The issue is that with the extra flexibility, > > the combine pass is now spoilt for choice between using either the > > *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext > > patterns, when one operand is a *concat and the other is a zero_extend. > > The solution proposed below is provide an > > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can > > benefit both from the register allocation of *concat, and still avoid > > the xor normally required by zero extension. > > > > I'm investigating a follow-up refinement to improve register > > allocation further by avoiding the early clobber in the =&r, and > > handling (custom) reloads explicitly, but this piece resolves the testcase > failure. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > > > > > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/91681 > > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New > > define_insn_and_split derived from *add<dwi>3_doubleword_concat > > and *add<dwi>3_doubleword_zext. > > Hi Roger, > > This commit currently changed the codegen of testcase p443644-2.c from: Oops, a typo, I mean pr43644-2.c. Haochen > > movq %rdx, %rax > xorl %edx, %edx > addq %rdi, %rax > adcq %rsi, %rdx > to: > > movq %rdx, %rcx > movq %rdi, %rax > movq %rsi, %rdx > addq %rcx, %rax > adcq $0, %rdx > > which causes the testcase fail under -m64. > > Is this within your expectation? > > BRs, > Haochen > > > > > > > Thanks, > > Roger > > --
> From: Jiang, Haochen <haochen.jiang@intel.com> > Sent: 17 July 2023 02:50 > > > From: Jiang, Haochen > > Sent: Friday, July 14, 2023 10:50 AM > > > > > The recent change in TImode parameter passing on x86_64 results in > > > the FAIL of pr91681-1.c. The issue is that with the extra > > > flexibility, the combine pass is now spoilt for choice between using > > > either the *add<dwi>3_doubleword_concat or the > > > *add<dwi>3_doubleword_zext patterns, when one operand is a *concat and > the other is a zero_extend. > > > The solution proposed below is provide an > > > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can > > > benefit both from the register allocation of *concat, and still > > > avoid the xor normally required by zero extension. > > > > > > I'm investigating a follow-up refinement to improve register > > > allocation further by avoiding the early clobber in the =&r, and > > > handling (custom) reloads explicitly, but this piece resolves the > > > testcase > > failure. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make > > > bootstrap and make -k check, both with and without > > > --target_board=unix{-m32} with no new failures. Ok for mainline? > > > > > > > > > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > PR target/91681 > > > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New > > > define_insn_and_split derived from *add<dwi>3_doubleword_concat > > > and *add<dwi>3_doubleword_zext. > > > > Hi Roger, > > > > This commit currently changed the codegen of testcase p443644-2.c from: > > Oops, a typo, I mean pr43644-2.c. > > Haochen I'm working on a fix and hope to have this resolved soon (unfortunately fixing things in a post-reload splitter isn't working out due to reload's choices, so the solution will likely be a peephole2). The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as written)! The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or as "xor x,x; add y,x". pr91681-1.c checks there isn't an xor, pr43644-2.c checks there isn't a mov. Doh! As the author of both these test cases, I've painted myself into a corner. The solution is that add $0,x should be generated (optimal) when y is already in x, and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add $0,x", both sequences being approximately equal performance-wise). > > movq %rdx, %rax > > xorl %edx, %edx > > addq %rdi, %rax > > adcq %rsi, %rdx > > to: > > movq %rdx, %rcx > > movq %rdi, %rax > > movq %rsi, %rdx > > addq %rcx, %rax > > adcq $0, %rdx > > > > which causes the testcase fail under -m64. > > Is this within your expectation? You're right that the original (using xor) is better for pr43644-2.c's test case. unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } but the closely related (swapping the argument order): unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return x+y; } is better using "adcq $0", than having a superfluous xor. Executive summary: This FAIL isn't serious. I'll silence it soon. > > BRs, > > Haochen > > > > > > > > > > > Thanks, > > > Roger > > > --
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index e47ced1..ca6977f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -6222,6 +6222,39 @@ (clobber (reg:CC FLAGS_REG))])] "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);") +(define_insn_and_split "*add<dwi>3_doubleword_concat_zext" + [(set (match_operand:<DWI> 0 "register_operand" "=&r") + (plus:<DWI> + (any_or_plus:<DWI> + (ashift:<DWI> + (zero_extend:<DWI> + (match_operand:DWIH 2 "nonimmediate_operand" "rm")) + (match_operand:QI 3 "const_int_operand")) + (zero_extend:<DWI> + (match_operand:DWIH 4 "nonimmediate_operand" "rm"))) + (zero_extend:<DWI> + (match_operand:DWIH 1 "nonimmediate_operand" "rm"))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT" + "#" + "&& reload_completed" + [(set (match_dup 0) (match_dup 4)) + (set (match_dup 5) (match_dup 2)) + (parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC + (plus:DWIH (match_dup 0) (match_dup 1)) + (match_dup 0))) + (set (match_dup 0) + (plus:DWIH (match_dup 0) (match_dup 1)))]) + (parallel [(set (match_dup 5) + (plus:DWIH + (plus:DWIH + (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)) + (match_dup 5)) + (const_int 0))) + (clobber (reg:CC FLAGS_REG))])] + "split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[5]);") + (define_insn "*add<mode>_1" [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r") (plus:SWI48