diff mbox series

[x86] Fix FAIL of gcc.target/i386/pr91681-1.c

Message ID 02a401d9b433$47f0de80$d7d29b80$@nextmovesoftware.com
State New
Headers show
Series [x86] Fix FAIL of gcc.target/i386/pr91681-1.c | expand

Commit Message

Roger Sayle July 11, 2023, 8:07 p.m. UTC
The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.

I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-07-11  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/91681
        * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
        define_insn_and_split derived from *add<dwi>3_doubleword_concat
        and *add<dwi>3_doubleword_zext.


Thanks,
Roger
--

Comments

Uros Bizjak July 12, 2023, 9:37 a.m. UTC | #1
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> The recent change in TImode parameter passing on x86_64 results in the
> FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> the combine pass is now spoilt for choice between using either the
> *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
> define_insn_and_split, that can benefit both from the register allocation
> of *concat, and still avoid the xor normally required by zero extension.
>
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-11  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/91681
>         * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
>         define_insn_and_split derived from *add<dwi>3_doubleword_concat
>         and *add<dwi>3_doubleword_zext.

OK.

Thanks,
Uros.

>
>
> Thanks,
> Roger
> --
>
Li, Pan2 via Gcc-patches July 14, 2023, 2:50 a.m. UTC | #2
> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c.  The issue is that with the extra flexibility, the combine pass is
> now spoilt for choice between using either the
> *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
> 
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures.  Ok for mainline?
> 
> 
> 2023-07-11  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>         PR target/91681
>         * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
>         define_insn_and_split derived from *add<dwi>3_doubleword_concat
>         and *add<dwi>3_doubleword_zext.

Hi Roger,

This commit currently changed the codegen of testcase p443644-2.c from:

        movq    %rdx, %rax
        xorl    %edx, %edx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
to:

        movq    %rdx, %rcx
        movq    %rdi, %rax
        movq    %rsi, %rdx
        addq    %rcx, %rax
        adcq    $0, %rdx

which causes the testcase fail under -m64.

Is this within your expectation?

BRs,
Haochen

> 
> 
> Thanks,
> Roger
> --
Li, Pan2 via Gcc-patches July 17, 2023, 1:49 a.m. UTC | #3
> -----Original Message-----
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' <ubizjak@gmail.com>
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
> 
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =&r, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-11  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         PR target/91681
> >         * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> >         define_insn_and_split derived from *add<dwi>3_doubleword_concat
> >         and *add<dwi>3_doubleword_zext.
> 
> Hi Roger,
> 
> This commit currently changed the codegen of testcase p443644-2.c from:

Oops, a typo, I mean pr43644-2.c.

Haochen

> 
>         movq    %rdx, %rax
>         xorl    %edx, %edx
>         addq    %rdi, %rax
>         adcq    %rsi, %rdx
> to:
> 
>         movq    %rdx, %rcx
>         movq    %rdi, %rax
>         movq    %rsi, %rdx
>         addq    %rcx, %rax
>         adcq    $0, %rdx
> 
> which causes the testcase fail under -m64.
> 
> Is this within your expectation?
> 
> BRs,
> Haochen
> 
> >
> >
> > Thanks,
> > Roger
> > --
Roger Sayle July 17, 2023, 7:54 a.m. UTC | #4
> From: Jiang, Haochen <haochen.jiang@intel.com>
> Sent: 17 July 2023 02:50
> 
> > From: Jiang, Haochen
> > Sent: Friday, July 14, 2023 10:50 AM
> >
> > > The recent change in TImode parameter passing on x86_64 results in
> > > the FAIL of pr91681-1.c.  The issue is that with the extra
> > > flexibility, the combine pass is now spoilt for choice between using
> > > either the *add<dwi>3_doubleword_concat or the
> > > *add<dwi>3_doubleword_zext patterns, when one operand is a *concat and
> the other is a zero_extend.
> > > The solution proposed below is provide an
> > > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> > > benefit both from the register allocation of *concat, and still
> > > avoid the xor normally required by zero extension.
> > >
> > > I'm investigating a follow-up refinement to improve register
> > > allocation further by avoiding the early clobber in the =&r, and
> > > handling (custom) reloads explicitly, but this piece resolves the
> > > testcase
> > failure.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check, both with and without
> > > --target_board=unix{-m32} with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-11  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         PR target/91681
> > >         * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> > >         define_insn_and_split derived from
*add<dwi>3_doubleword_concat
> > >         and *add<dwi>3_doubleword_zext.
> >
> > Hi Roger,
> >
> > This commit currently changed the codegen of testcase p443644-2.c from:
> 
> Oops, a typo, I mean pr43644-2.c.
> 
> Haochen

I'm working on a fix and hope to have this resolved soon (unfortunately
fixing
things in a post-reload splitter isn't working out due to reload's choices,
so the
solution will likely be a peephole2).

The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as
written)!
The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or
as
"xor x,x; add y,x".  pr91681-1.c checks there isn't an xor, pr43644-2.c
checks
there isn't a mov.  Doh!  As the author of both these test cases, I've
painted
myself into a corner.

The solution is that add $0,x should be generated (optimal) when y is
already in x,
and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add
$0,x",
both sequences being approximately equal performance-wise).

> >         movq    %rdx, %rax
> >         xorl    %edx, %edx
> >         addq    %rdi, %rax
> >         adcq    %rsi, %rdx
> > to:
> >         movq    %rdx, %rcx
> >         movq    %rdi, %rax
> >         movq    %rsi, %rdx
> >         addq    %rcx, %rax
> >         adcq    $0, %rdx
> >
> > which causes the testcase fail under -m64.
> > Is this within your expectation?

You're right that the original (using xor) is better for pr43644-2.c's test
case.
unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return
x+y; }
but the closely related (swapping the argument order):
unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return
x+y; }
is better using "adcq $0", than having a superfluous xor.

Executive summary: This FAIL isn't serious.  I'll silence it soon.

> > BRs,
> > Haochen
> >
> > >
> > >
> > > Thanks,
> > > Roger
> > > --
diff mbox series

Patch

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e47ced1..ca6977f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6222,6 +6222,39 @@ 
 	      (clobber (reg:CC FLAGS_REG))])]
  "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
 
+(define_insn_and_split "*add<dwi>3_doubleword_concat_zext"
+  [(set (match_operand:<DWI> 0 "register_operand" "=&r")
+	(plus:<DWI>
+	  (any_or_plus:<DWI>
+	    (ashift:<DWI>
+	      (zero_extend:<DWI>
+		(match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+	      (match_operand:QI 3 "const_int_operand"))
+	    (zero_extend:<DWI>
+	      (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+	  (zero_extend:<DWI>
+	    (match_operand:DWIH 1 "nonimmediate_operand" "rm")))
+   (clobber (reg:CC FLAGS_REG))]
+  "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))
+   (set (match_dup 5) (match_dup 2))
+   (parallel [(set (reg:CCC FLAGS_REG)
+		   (compare:CCC
+		     (plus:DWIH (match_dup 0) (match_dup 1))
+		     (match_dup 0)))
+	      (set (match_dup 0)
+		   (plus:DWIH (match_dup 0) (match_dup 1)))])
+   (parallel [(set (match_dup 5)
+		   (plus:DWIH
+		     (plus:DWIH
+		       (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+		       (match_dup 5))
+		     (const_int 0)))
+	      (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[5]);")
+
 (define_insn "*add<mode>_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
 	(plus:SWI48