Message ID | 009901d9801a$57573ba0$0605b2e0$@nextmovesoftware.com |
---|---|
State | New |
Headers | show |
Series | Don't call emit_clobber in lower-subreg.cc's resolve_simple_move. | expand |
On 5/6/23 06:57, Roger Sayle wrote: > > Following up on posts/reviews by Segher and Uros, there's some question > over why the middle-end's lower subreg pass emits a clobber (of a > multi-word register) into the instruction stream before emitting the > sequence of moves of the word-sized parts. This clobber interferes > with (LRA) register allocation, preventing the multi-word pseudo to > remain in the same hard registers. This patch eliminates this > (presumably superfluous) clobber and thereby improves register allocation. Those clobbered used to help dataflow analysis know that a multi word register was fully assigned by a subsequent sequence. I suspect they haven't been terribly useful in quite a while. > > A concrete example of the observed improvement is PR target/43644. > For the test case: > __int128 foo(__int128 x, __int128 y) { return x+y; } > > on x86_64-pc-linux-gnu, gcc -O2 currently generates: > > foo: movq %rsi, %rax > movq %rdi, %r8 > movq %rax, %rdi > movq %rdx, %rax > movq %rcx, %rdx > addq %r8, %rax > adcq %rdi, %rdx > ret > > with this patch, we now generate the much improved: > > foo: movq %rdx, %rax > movq %rcx, %rdx > addq %rdi, %rax > adcq %rsi, %rdx > ret > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} with > no new failures. OK for mainline? > > > 2023-05-06 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > PR target/43644 > * lower-subreg.cc (resolve_simple_move): Don't emit a clobber > immediately before moving a multi-word register by parts. > > gcc/testsuite/ChangeLog > PR target/43644 > * gcc.target/i386/pr43644.c: New test case. OK for the trunk. I won't be at all surprised to see fallout in the various target tests. We can fault in fixes as needed. More importantly I think we want as much soak time for this change as we can in case there are unexpected consequences. jeff
On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > > On 5/6/23 06:57, Roger Sayle wrote: > > > > Following up on posts/reviews by Segher and Uros, there's some question > > over why the middle-end's lower subreg pass emits a clobber (of a > > multi-word register) into the instruction stream before emitting the > > sequence of moves of the word-sized parts. This clobber interferes > > with (LRA) register allocation, preventing the multi-word pseudo to > > remain in the same hard registers. This patch eliminates this > > (presumably superfluous) clobber and thereby improves register allocation. > Those clobbered used to help dataflow analysis know that a multi word > register was fully assigned by a subsequent sequence. I suspect they > haven't been terribly useful in quite a while. Likely - maybe they still make a difference for some targets though. It might be interesting to see whether combining the clobber with the first set or making the set a multi-set with a parallel would be any better? > > > > > > A concrete example of the observed improvement is PR target/43644. > > For the test case: > > __int128 foo(__int128 x, __int128 y) { return x+y; } > > > > on x86_64-pc-linux-gnu, gcc -O2 currently generates: > > > > foo: movq %rsi, %rax > > movq %rdi, %r8 > > movq %rax, %rdi > > movq %rdx, %rax > > movq %rcx, %rdx > > addq %r8, %rax > > adcq %rdi, %rdx > > ret > > > > with this patch, we now generate the much improved: > > > > foo: movq %rdx, %rax > > movq %rcx, %rdx > > addq %rdi, %rax > > adcq %rsi, %rdx > > ret > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} with > > no new failures. OK for mainline? > > > > > > 2023-05-06 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > PR target/43644 > > * lower-subreg.cc (resolve_simple_move): Don't emit a clobber > > immediately before moving a multi-word register by parts. > > > > gcc/testsuite/ChangeLog > > PR target/43644 > > * gcc.target/i386/pr43644.c: New test case. > OK for the trunk. I won't be at all surprised to see fallout in the > various target tests. We can fault in fixes as needed. More > importantly I think we want as much soak time for this change as we can > in case there are unexpected consequences. > > jeff
On 5/8/23 00:43, Richard Biener wrote: > On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> >> >> On 5/6/23 06:57, Roger Sayle wrote: >>> >>> Following up on posts/reviews by Segher and Uros, there's some question >>> over why the middle-end's lower subreg pass emits a clobber (of a >>> multi-word register) into the instruction stream before emitting the >>> sequence of moves of the word-sized parts. This clobber interferes >>> with (LRA) register allocation, preventing the multi-word pseudo to >>> remain in the same hard registers. This patch eliminates this >>> (presumably superfluous) clobber and thereby improves register allocation. >> Those clobbered used to help dataflow analysis know that a multi word >> register was fully assigned by a subsequent sequence. I suspect they >> haven't been terribly useful in quite a while. > > Likely - maybe they still make a difference for some targets though. > It might be interesting to see whether combining the clobber with the > first set or making the set a multi-set with a parallel would be any > better? Wrapping them inside a PARALLEL might be better, but probably isn't worth the effort. I think all this stuff dates back to the era where we had flow.c to provide the register lifetimes used by local-alloc. We also had things like REG_NO_CONFLICT to indicate that the sub-object assignments didn't conflict. In all it was rather hackish. Jeff
diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc index 81fc5380..7c9cc3c 100644 --- a/gcc/lower-subreg.cc +++ b/gcc/lower-subreg.cc @@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *insn) { unsigned int i; - if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest))) - emit_clobber (dest); - for (i = 0; i < words; ++i) { rtx t = simplify_gen_subreg_concatn (word_mode, dest, diff --git a/gcc/testsuite/gcc.target/i386/pr43644.c b/gcc/testsuite/gcc.target/i386/pr43644.c new file mode 100644 index 0000000..ffdf31c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr43644.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2" } */ + +__int128 foo(__int128 x, __int128 y) +{ + return x+y; +} + +/* { dg-final { scan-assembler-times "movq" 2 } } */ +/* { dg-final { scan-assembler-not "push" } } */ +/* { dg-final { scan-assembler-not "pop" } } */