Message ID | mpt8re2soib.fsf@arm.com |
---|---|
State | New |
Headers | show |
Series | ira: Don't create copies for earlyclobbered pairs | expand |
On 5/5/23 12:59, Richard Sandiford wrote: > This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a > ("the previous patch"). To start by quoting that: > > If an insn requires two operands to be tied, and the input operand dies > in the insn, IRA acts as though there were a copy from the input to the > output with the same execution frequency as the insn. Allocating the > same register to the input and the output then saves the cost of a move. > > If there is no such tie, but an input operand nevertheless dies > in the insn, IRA creates a similar move, but with an eighth of the > frequency. This helps to ensure that chains of instructions reuse > registers in a natural way, rather than using arbitrarily different > registers for no reason. > > This heuristic seems to work well in the vast majority of cases. > However, the problem fixed in the previous patch was that we > could create a copy for an operand pair even if, for all relevant > alternatives, the output and input register classes did not have > any registers in common. It is then impossible for the output > operand to reuse the dying input register. > > This left unfixed a further case where copies don't make sense: > there is no point trying to reuse the dying input register if, > for all relevant alternatives, the output is earlyclobbered and > the input doesn't match the output. (Matched earlyclobbers are fine.) > > Handling that case fixes several existing XFAILs and helps with > a follow-on aarch64 patch. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. A SPEC2017 run > on aarch64 showed no differences outside the noise. Also, I tried > compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target > per cpu directory, using the options -Os -fno-schedule-insns{,2}. > The results below summarise the tests that showed a difference in LOC: > > Target Tests Good Bad Delta Best Worst Median > ====== ===== ==== === ===== ==== ===== ====== > amdgcn-amdhsa 14 7 7 3 -18 10 -1 > arm-linux-gnueabihf 16 15 1 -22 -4 2 -1 > csky-elf 6 6 0 -21 -6 -2 -4 > hppa64-hp-hpux11.23 5 5 0 -7 -2 -1 -1 > ia64-linux-gnu 16 16 0 -70 -15 -1 -3 > m32r-elf 53 1 52 64 -2 8 1 > mcore-elf 2 2 0 -8 -6 -2 -6 > microblaze-elf 285 283 2 -909 -68 4 -1 > mmix 7 7 0 -2101 -2091 -1 -1 > msp430-elf 1 1 0 -4 -4 -4 -4 > pru-elf 8 6 2 -12 -6 2 -2 > rx-elf 22 18 4 -40 -5 6 -2 > sparc-linux-gnu 15 14 1 -40 -8 1 -2 > sparc-wrs-vxworks 15 14 1 -40 -8 1 -2 > visium-elf 2 1 1 0 -2 2 -2 > xstormy16-elf 1 1 0 -2 -2 -2 -2 > > with other targets showing no sensitivity to the patch. The only > target that seems to be negatively affected is m32r-elf; otherwise > the patch seems like an extremely minor but still clear improvement. > > OK to install? > Yes, Richard. Thank you for measuring the patch effect. I wish other people would do the same for patches affecting generated code performance.
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc index 5aa080af421..a4d93c8d734 100644 --- a/gcc/ira-conflicts.cc +++ b/gcc/ira-conflicts.cc @@ -398,6 +398,9 @@ can_use_same_reg_p (rtx_insn *insn, int output, int input) if (op_alt[input].matches == output) return true; + if (op_alt[output].earlyclobber) + continue; + if (ira_reg_class_intersect[op_alt[input].cl][op_alt[output].cl] != NO_REGS) return true; diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c index b74ae33e100..e40865fcbc4 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s16_z_tied1, svint16_t, uint64_t, z0 = svasr_wide_z (p0, z0, x0)) /* -** asr_wide_x0_s16_z_untied: { xfail *-*-* } +** asr_wide_x0_s16_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.h, p0/z, z1\.h ** asr z0\.h, p0/m, z0\.h, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c index 8698aef26c6..06e4ca2a030 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s32_z_tied1, svint32_t, uint64_t, z0 = svasr_wide_z (p0, z0, x0)) /* -** asr_wide_x0_s32_z_untied: { xfail *-*-* } +** asr_wide_x0_s32_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.s, p0/z, z1\.s ** asr z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c index 77b1669392d..1f840ca8e57 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s8_z_tied1, svint8_t, uint64_t, z0 = svasr_wide_z (p0, z0, x0)) /* -** asr_wide_x0_s8_z_untied: { xfail *-*-* } +** asr_wide_x0_s8_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.b, p0/z, z1\.b ** asr z0\.b, p0/m, z0\.b, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c index 9e388e499b8..e02c66947d6 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_s32_z_tied1, svint32_t, int32_t, z0 = svbic_z (p0, z0, x0)) /* -** bic_w0_s32_z_untied: { xfail *-*-* } +** bic_w0_s32_z_untied: ** mov (z[0-9]+\.s), w0 ** movprfx z0\.s, p0/z, z1\.s ** bic z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c index bf953681547..57c1e535fea 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_s64_z_tied1, svint64_t, int64_t, z0 = svbic_z (p0, z0, x0)) /* -** bic_x0_s64_z_untied: { xfail *-*-* } +** bic_x0_s64_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.d, p0/z, z1\.d ** bic z0\.d, p0/m, z0\.d, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c index b308b599b43..9f08ab40a8c 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_u32_z_tied1, svuint32_t, uint32_t, z0 = svbic_z (p0, z0, x0)) /* -** bic_w0_u32_z_untied: { xfail *-*-* } +** bic_w0_u32_z_untied: ** mov (z[0-9]+\.s), w0 ** movprfx z0\.s, p0/z, z1\.s ** bic z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c index e82db1e94fd..de84f3af6ff 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_u64_z_tied1, svuint64_t, uint64_t, z0 = svbic_z (p0, z0, x0)) /* -** bic_x0_u64_z_untied: { xfail *-*-* } +** bic_x0_u64_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.d, p0/z, z1\.d ** bic z0\.d, p0/m, z0\.d, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c index 8d63d390984..a0207726144 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s16_z_tied1, svint16_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_s16_z_untied: { xfail *-*-* } +** lsl_wide_x0_s16_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.h, p0/z, z1\.h ** lsl z0\.h, p0/m, z0\.h, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c index acd813df34f..bd67b7006b5 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s32_z_tied1, svint32_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_s32_z_untied: { xfail *-*-* } +** lsl_wide_x0_s32_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.s, p0/z, z1\.s ** lsl z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c index 17e8e8685e3..7eb8627041d 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s8_z_tied1, svint8_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_s8_z_untied: { xfail *-*-* } +** lsl_wide_x0_s8_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.b, p0/z, z1\.b ** lsl z0\.b, p0/m, z0\.b, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c index cff24a85090..482f8d0557b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u16_z_tied1, svuint16_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_u16_z_untied: { xfail *-*-* } +** lsl_wide_x0_u16_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.h, p0/z, z1\.h ** lsl z0\.h, p0/m, z0\.h, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c index 7b1afab4918..612897d24df 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u32_z_tied1, svuint32_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_u32_z_untied: { xfail *-*-* } +** lsl_wide_x0_u32_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.s, p0/z, z1\.s ** lsl z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c index df8b1ec86b4..6ca2f9e7da2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c @@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u8_z_tied1, svuint8_t, uint64_t, z0 = svlsl_wide_z (p0, z0, x0)) /* -** lsl_wide_x0_u8_z_untied: { xfail *-*-* } +** lsl_wide_x0_u8_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.b, p0/z, z1\.b ** lsl z0\.b, p0/m, z0\.b, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c index 863b51a2fc5..9110c5aad44 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u16_z_tied1, svuint16_t, uint64_t, z0 = svlsr_wide_z (p0, z0, x0)) /* -** lsr_wide_x0_u16_z_untied: { xfail *-*-* } +** lsr_wide_x0_u16_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.h, p0/z, z1\.h ** lsr z0\.h, p0/m, z0\.h, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c index 73c2cf86e33..93af4fa4925 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u32_z_tied1, svuint32_t, uint64_t, z0 = svlsr_wide_z (p0, z0, x0)) /* -** lsr_wide_x0_u32_z_untied: { xfail *-*-* } +** lsr_wide_x0_u32_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.s, p0/z, z1\.s ** lsr z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c index fe44eabda11..2f38139d40b 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c @@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u8_z_tied1, svuint8_t, uint64_t, z0 = svlsr_wide_z (p0, z0, x0)) /* -** lsr_wide_x0_u8_z_untied: { xfail *-*-* } +** lsr_wide_x0_u8_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.b, p0/z, z1\.b ** lsr z0\.b, p0/m, z0\.b, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c index 747f8a6397b..12a1b1d8686 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_w0_f32_z_tied1, svfloat32_t, int32_t, z0 = svscale_z (p0, z0, x0)) /* -** scale_w0_f32_z_untied: { xfail *-*-* } +** scale_w0_f32_z_untied: ** mov (z[0-9]+\.s), w0 ** movprfx z0\.s, p0/z, z1\.s ** fscale z0\.s, p0/m, z0\.s, \1 diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c index 004cbfa3eff..f6b11718584 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c @@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_x0_f64_z_tied1, svfloat64_t, int64_t, z0 = svscale_z (p0, z0, x0)) /* -** scale_x0_f64_z_untied: { xfail *-*-* } +** scale_x0_f64_z_untied: ** mov (z[0-9]+\.d), x0 ** movprfx z0\.d, p0/z, z1\.d ** fscale z0\.d, p0/m, z0\.d, \1