Message ID | 20240124070103.3800874-1-quic_apinski@quicinc.com |
---|---|
State | New |
Headers | show |
Series | aarch64: Fix movv8di for overlapping register and memory load [PR113550] | expand |
Andrew Pinski <quic_apinski@quicinc.com> writes: > The split for movv8di is not ready to handle the case where the setting > register overlaps with the address of the memory that is being load. > Fixing the split than just making the output constraint as an early clobber > for this alternative. The split would first need to figure out which register > is overlapping with the address and then only emit that move last. I was curious how strained that detection would be in practice, and in the end it didn't seem too bad. I pushed the following variant after testing on aarch64-linux-gnu. Thanks, Richard The LS64 movv8di pattern didn't handle loads that overlapped with the address register (unless the overlap happened to be in the last subload). gcc/ PR target/113550 * config/aarch64/aarch64-simd.md: In the movv8di splitter, check whether each split instruction is a load that clobbers the source address. Emit that instruction last if so. gcc/testsuite/ PR target/113550 * gcc.target/aarch64/pr113550.c: New test. --- gcc/config/aarch64/aarch64-simd.md | 18 ++++++-- gcc/testsuite/gcc.target/aarch64/pr113550.c | 48 +++++++++++++++++++++ 2 files changed, 62 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113550.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 48f0741e7d0..f036f6ce997 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8221,11 +8221,21 @@ (define_split || (memory_operand (operands[0], V8DImode) && register_operand (operands[1], V8DImode))) { + std::pair<rtx, rtx> last_pair = {}; for (int offset = 0; offset < 64; offset += 16) - emit_move_insn (simplify_gen_subreg (TImode, operands[0], - V8DImode, offset), - simplify_gen_subreg (TImode, operands[1], - V8DImode, offset)); + { + std::pair<rtx, rtx> pair = { + simplify_gen_subreg (TImode, operands[0], V8DImode, offset), + simplify_gen_subreg (TImode, operands[1], V8DImode, offset) + }; + if (register_operand (pair.first, TImode) + && reg_overlap_mentioned_p (pair.first, pair.second)) + last_pair = pair; + else + emit_move_insn (pair.first, pair.second); + } + if (last_pair.first) + emit_move_insn (last_pair.first, last_pair.second); DONE; } else diff --git a/gcc/testsuite/gcc.target/aarch64/pr113550.c b/gcc/testsuite/gcc.target/aarch64/pr113550.c new file mode 100644 index 00000000000..0ff3c7b5c00 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113550.c @@ -0,0 +1,48 @@ +/* { dg-options "-O" } */ +/* { dg-do run } */ + +#pragma GCC push_options +#pragma GCC target "+ls64" +#pragma GCC aarch64 "arm_acle.h" +#pragma GCC pop_options + +#define DEF_FUNCTION(NAME, ARGS) \ + __attribute__((noipa)) \ + __arm_data512_t \ + NAME ARGS \ + { \ + return *ptr; \ + } + +DEF_FUNCTION (f0, (__arm_data512_t *ptr)) +DEF_FUNCTION (f1, (int x0, __arm_data512_t *ptr)) +DEF_FUNCTION (f2, (int x0, int x1, __arm_data512_t *ptr)) +DEF_FUNCTION (f3, (int x0, int x1, int x2, __arm_data512_t *ptr)) +DEF_FUNCTION (f4, (int x0, int x1, int x2, int x3, __arm_data512_t *ptr)) +DEF_FUNCTION (f5, (int x0, int x1, int x2, int x3, int x4, + __arm_data512_t *ptr)) +DEF_FUNCTION (f6, (int x0, int x1, int x2, int x3, int x4, int x5, + __arm_data512_t *ptr)) +DEF_FUNCTION (f7, (int x0, int x1, int x2, int x3, int x4, int x5, int x6, + __arm_data512_t *ptr)) + +int +main (void) +{ + __arm_data512_t x = { 0, 10, 20, 30, 40, 50, 60, 70 }; + __arm_data512_t res[8] = + { + f0 (&x), + f1 (0, &x), + f2 (0, 1, &x), + f3 (0, 1, 2, &x), + f4 (0, 1, 2, 3, &x), + f5 (0, 1, 2, 3, 4, &x), + f6 (0, 1, 2, 3, 4, 5, &x), + f7 (0, 1, 2, 3, 4, 5, 6, &x) + }; + for (int i = 0; i < 8; ++i) + if (__builtin_memcmp (&x, &res[i], sizeof (x)) != 0) + __builtin_abort (); + return 0; +}
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 662ef696630..ba079298b84 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -7985,7 +7985,7 @@ (define_insn "*aarch64_mov<mode>" ) (define_insn "*aarch64_movv8di" - [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,r") + [(set (match_operand:V8DI 0 "nonimmediate_operand" "=r,m,&r") (match_operand:V8DI 1 "general_operand" " r,r,m"))] "(register_operand (operands[0], V8DImode) || register_operand (operands[1], V8DImode))"
The split for movv8di is not ready to handle the case where the setting register overlaps with the address of the memory that is being load. Fixing the split than just making the output constraint as an early clobber for this alternative. The split would first need to figure out which register is overlapping with the address and then only emit that move last. Build and tested for aarch64-linux-gnu with no regressions gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_movv8di): Mark the last alternative's output constraint as an early clobber. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> --- gcc/config/aarch64/aarch64-simd.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)