Message ID | 578668FD.1020706@foss.arm.com |
---|---|
State | New |
Headers | show |
On 07/13/16 11:14, Kyrill Tkachov wrote: > Hi all, > > The most common way to load and store TImode value in aarch64 is to > perform an LDP/STP of two X-registers. > This is the *movti_aarch64 pattern in aarch64.md. > There is a bug in the logic in aarch64_classify_address where it > validates the offset in the address used > to load a TImode value. It passes down TImode to the > aarch64_offset_7bit_signed_scaled_p check which rejects > offsets that are not a multiple of the mode size of TImode (16). > However, this is too conservative as X-reg LDP/STP > instructions accept immediate offsets that are a multiple of 8. > > Also, considering that the definition of > aarch64_offset_7bit_signed_scaled_p is: > return (offset >= -64 * GET_MODE_SIZE (mode) > && offset < 64 * GET_MODE_SIZE (mode) > && offset % GET_MODE_SIZE (mode) == 0); > > I think the range check may even be wrong for TImode as this will > accept offsets in the range [-1024, 1024) > (as long as they are a multiple of 16) > whereas X-reg LDP/STP instructions only accept offsets in the range > [-512, 512). > So since the check is for an X-reg LDP/STP address we should be > passing down DImode. > > This patch does that and enables more aggressive generation of REG+IMM > addressing modes for 64-bit aligned > TImode values, eliminating many address calculation instructions. > For the testcase in the patch we currently generate: > bar: > add x1, x1, 8 > add x0, x0, 8 > ldp x2, x3, [x1] > stp x2, x3, [x0] > ret > > whereas with this patch we generate: > bar: > ldp x2, x3, [x1, 8] > stp x2, x3, [x0, 8] > ret > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? LGTM
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4033,9 +4033,11 @@ aarch64_classify_address (struct aarch64_address_info *info, X,X: 7-bit signed scaled offset Q: 9-bit signed offset We conservatively require an offset representable in either mode. - */ + When performing the check for pairs of X registers i.e. LDP/STP + pass down DImode since that is the natural size of the LDP/STP + instruction memory accesses. */ if (mode == TImode || mode == TFmode) - return (aarch64_offset_7bit_signed_scaled_p (mode, offset) + return (aarch64_offset_7bit_signed_scaled_p (DImode, offset) && offset_9bit_signed_unscaled_p (mode, offset)); /* A 7bit offset check because OImode will emit a ldp/stp diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c new file mode 100644 index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c @@ -0,0 +1,20 @@ +/* { dg-options "-O2" } */ + +/* Check that we can use a REG + IMM addressing mode when moving an unaligned + TImode value to and from memory. */ + +struct foo +{ + long long b; + __int128 a; +} __attribute__ ((packed)); + +void +bar (struct foo *p, struct foo *q) +{ + p->a = q->a; +} + +/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */