diff mbox

[AArch64] Allow multiple-of-8 immediate offsets for TImode LDP/STP

Message ID 578668FD.1020706@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov July 13, 2016, 4:14 p.m. UTC
Hi all,

The most common way to load and store TImode value in aarch64 is to perform an LDP/STP of two X-registers.
This is the *movti_aarch64 pattern in aarch64.md.
There is a bug in the logic in aarch64_classify_address where it validates the offset in the address used
to load a TImode value. It passes down TImode to the aarch64_offset_7bit_signed_scaled_p check which rejects
offsets that are not a multiple of the mode size of TImode (16). However, this is too conservative as X-reg LDP/STP
instructions accept immediate offsets that are a multiple of 8.

Also, considering that the definition of aarch64_offset_7bit_signed_scaled_p is:
   return (offset >= -64 * GET_MODE_SIZE (mode)
       && offset < 64 * GET_MODE_SIZE (mode)
       && offset % GET_MODE_SIZE (mode) == 0);

I think the range check may even be wrong for TImode as this will accept offsets in the range [-1024, 1024)
(as long as they are a multiple of 16)
whereas X-reg LDP/STP instructions only accept offsets in the range [-512, 512).
So since the check is for an X-reg LDP/STP address we should be passing down DImode.

This patch does that and enables more aggressive generation of REG+IMM addressing modes for 64-bit aligned
TImode values, eliminating many address calculation instructions.
For the testcase in the patch we currently generate:
bar:
         add     x1, x1, 8
         add     x0, x0, 8
         ldp     x2, x3, [x1]
         stp     x2, x3, [x0]
         ret

whereas with this patch we generate:
bar:
         ldp     x2, x3, [x1, 8]
         stp     x2, x3, [x0, 8]
         ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.c (aarch64_classify_address): Use DImode when
     performing aarch64_offset_7bit_signed_scaled_p check for TImode LDP/STP
     addresses.

2016-07-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ldp_stp_unaligned_1.c: New test.

Comments

Evandro Menezes July 13, 2016, 8:10 p.m. UTC | #1
On 07/13/16 11:14, Kyrill Tkachov wrote:
> Hi all,
>
> The most common way to load and store TImode value in aarch64 is to 
> perform an LDP/STP of two X-registers.
> This is the *movti_aarch64 pattern in aarch64.md.
> There is a bug in the logic in aarch64_classify_address where it 
> validates the offset in the address used
> to load a TImode value. It passes down TImode to the 
> aarch64_offset_7bit_signed_scaled_p check which rejects
> offsets that are not a multiple of the mode size of TImode (16). 
> However, this is too conservative as X-reg LDP/STP
> instructions accept immediate offsets that are a multiple of 8.
>
> Also, considering that the definition of 
> aarch64_offset_7bit_signed_scaled_p is:
>   return (offset >= -64 * GET_MODE_SIZE (mode)
>       && offset < 64 * GET_MODE_SIZE (mode)
>       && offset % GET_MODE_SIZE (mode) == 0);
>
> I think the range check may even be wrong for TImode as this will 
> accept offsets in the range [-1024, 1024)
> (as long as they are a multiple of 16)
> whereas X-reg LDP/STP instructions only accept offsets in the range 
> [-512, 512).
> So since the check is for an X-reg LDP/STP address we should be 
> passing down DImode.
>
> This patch does that and enables more aggressive generation of REG+IMM 
> addressing modes for 64-bit aligned
> TImode values, eliminating many address calculation instructions.
> For the testcase in the patch we currently generate:
> bar:
>         add     x1, x1, 8
>         add     x0, x0, 8
>         ldp     x2, x3, [x1]
>         stp     x2, x3, [x0]
>         ret
>
> whereas with this patch we generate:
> bar:
>         ldp     x2, x3, [x1, 8]
>         stp     x2, x3, [x0, 8]
>         ret
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Ok for trunk?

LGTM
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index bea67f88b900be39b6f1ae002353b44c5a4a9f7d..8fd93a54c54ab86c6e600afba48fa441101b57c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4033,9 +4033,11 @@  aarch64_classify_address (struct aarch64_address_info *info,
 	     X,X: 7-bit signed scaled offset
 	     Q:   9-bit signed offset
 	     We conservatively require an offset representable in either mode.
-	   */
+	     When performing the check for pairs of X registers i.e.  LDP/STP
+	     pass down DImode since that is the natural size of the LDP/STP
+	     instruction memory accesses.  */
 	  if (mode == TImode || mode == TFmode)
-	    return (aarch64_offset_7bit_signed_scaled_p (mode, offset)
+	    return (aarch64_offset_7bit_signed_scaled_p (DImode, offset)
 		    && offset_9bit_signed_unscaled_p (mode, offset));
 
 	  /* A 7bit offset check because OImode will emit a ldp/stp
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..a70f92100fb91bcfdcfd4af1cab6f58915038568
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_stp_unaligned_1.c
@@ -0,0 +1,20 @@ 
+/* { dg-options "-O2" } */
+
+/* Check that we can use a REG + IMM addressing mode when moving an unaligned
+   TImode value to and from memory.  */
+
+struct foo
+{
+  long long b;
+  __int128 a;
+} __attribute__ ((packed));
+
+void
+bar (struct foo *p, struct foo *q)
+{
+  p->a = q->a;
+}
+
+/* { dg-final { scan-assembler-not "add\tx\[0-9\]+, x\[0-9\]+" } } */
+/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */
+/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\], .*8" 1 } } */